Sizing disk storage for Weaviate

vradhik · April 23, 2025, 8:05am

Description

I am new to Weaviate and looking for inputs to size the storage. I understand we need to consider size of object storage and index storage.
Object storage : Size of object ingested + the vector (one or more)
index storage : size of vector index + inverted index

For object storage, I can calculate the size based on the expected objects I plan to load to Weaviate + the size of vector using the calculation in the link below - Please correct me if my understanding is correct. However, for the index storage, can I get some inputs/rule of thumb to follow for the calculation? Thanks

I found the details for calculating memory here but did not find the details on how to calculate storage size https://weaviate.io/developers/weaviate/concepts/resources

Server Setup Information

Weaviate Server Version: Weaviate Community Edition on AWS (Yet to be setup, sizing for the same)
Deployment Method: k8 on ROSA
Multi Node? Number of Running Nodes: single
Client Language and Version:
Multitenancy?: No

Any additional Information

Mohamed_Shahin · April 23, 2025, 10:31am

Hello @vradhik,

Welcome to our community and it’s great to have you here.

Planning ahead for storage needs is definitely important. For object storage, your understanding is correct. You need to account for:

The raw data (Object size as it is)
The vector embeddings (each dimension using 4 bytes for a 32-bit float)

So in Weaviate, vector embeddings are typically stored as 32-bit floating point numbers (float32). This means:

Each dimension requires 4 bytes of storage
Total vector size = dimensions × 4 bytes

Say → 768 or 1536 dimensions:

768-dimensional vector: 768 × 4 bytes = 3,072 bytes about 3 KB per vector
1536-dimensional vector: 1536 × 4 bytes = 6,144 bytes about 6 KB per vector

In addition to vector storage and raw data, we’ll also need to take into account the searchable and filterable properties:

Searchable properties with Blockmax WAND size varies a lot depending on the dataset,
- A rough approximation would be about 33% of the object bucket size which is the raw data with extra metadata, so larger than raw data
Still, the easiest way to calculate is to start with a small dataset (ideally 100k+) and see the disk size, then use linear extrapolation to estimate larger deployments.

When I mention linear extrapolation, I mean using your small dataset measurements to estimate larger deployments by simple multiplication (e.g., if 100K documents use 5GB, then 1M documents would use 50GB). In reality, Weaviate’s storage often scales sublinearly (meaning the storage growth rate slows as data size increases).

Lastly, do not forget about Compression Options to Reduce Storage

Weaviate offers several compression techniques:

Scalar Quantization (SQ): Reduces each dimension from 32 bits to 8 bits, cutting storage by 75% with only about 5% loss in retrieval recall
Product Quantization (PQ): Reduces vector size by creating segments that are stored as 8-bit integers instead of 32-bit floats
Binary Quantization (BQ): Reduces each dimension to a single bit, providing a 1:32 compression rate (best for high-dimensional vectors)

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

Mohamed_Shahin · April 25, 2025, 2:24pm

@vradhik

I’ve built this WebApp Weaviate Disk Calculator which should help in estimating the disk size roughly - see if this helps

I am still working on it to improve a couple of points in there but it can give a rough good estimation for now.

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

vradhik · May 2, 2025, 4:49am

This is very useful! Really appreciate it, thank you so much, Mohamed

Topic		Replies	Views
Weaviate Disk Usage Question Support python	1	388	February 5, 2025
Infra Configuration for Docker Setup of Weaviate Support	1	350	June 5, 2024
Embeddings taking up more space than expected? Support	4	1215	July 4, 2023
I would like to inquire about storage space occupied by Weaviate when embedding txt file Support	1	562	April 5, 2024
Weaviate memory consumption Support technical	1	308	August 4, 2025

Sizing disk storage for Weaviate

Description

Server Setup Information

Any additional Information

Related topics