Description
I am new to Weaviate and looking for inputs to size the storage. I understand we need to consider size of object storage and index storage.
Object storage : Size of object ingested + the vector (one or more)
index storage : size of vector index + inverted index
For object storage, I can calculate the size based on the expected objects I plan to load to Weaviate + the size of vector using the calculation in the link below - Please correct me if my understanding is correct. However, for the index storage, can I get some inputs/rule of thumb to follow for the calculation? Thanks
I found the details for calculating memory here but did not find the details on how to calculate storage size https://weaviate.io/developers/weaviate/concepts/resources
Server Setup Information
- Weaviate Server Version: Weaviate Community Edition on AWS (Yet to be setup, sizing for the same)
- Deployment Method: k8 on ROSA
- Multi Node? Number of Running Nodes: single
- Client Language and Version:
- Multitenancy?: No
Any additional Information
Hello @vradhik,
Welcome to our community and it’s great to have you here.
Planning ahead for storage needs is definitely important. For object storage, your understanding is correct. You need to account for:
- The raw data (Object size as it is)
- The vector embeddings (each dimension using 4 bytes for a 32-bit float)
So in Weaviate, vector embeddings are typically stored as 32-bit floating point numbers (float32). This means:
- Each dimension requires 4 bytes of storage
- Total vector size = dimensions × 4 bytes
Say → 768 or 1536 dimensions:
- 768-dimensional vector: 768 × 4 bytes = 3,072 bytes about 3 KB per vector
- 1536-dimensional vector: 1536 × 4 bytes = 6,144 bytes about 6 KB per vector
In addition to vector storage and raw data, we’ll also need to take into account the searchable and filterable properties:
- Searchable properties with Blockmax WAND size varies a lot depending on the dataset,
- A rough approximation would be about 33% of the object bucket size which is the raw data with extra metadata, so larger than raw data
- Still, the easiest way to calculate is to start with a small dataset (ideally 100k+) and see the disk size, then use linear extrapolation to estimate larger deployments.
When I mention linear extrapolation, I mean using your small dataset measurements to estimate larger deployments by simple multiplication (e.g., if 100K documents use 5GB, then 1M documents would use 50GB). In reality, Weaviate’s storage often scales sublinearly (meaning the storage growth rate slows as data size increases).
Lastly, do not forget about Compression Options to Reduce Storage 
Weaviate offers several compression techniques:
- Scalar Quantization (SQ): Reduces each dimension from 32 bits to 8 bits, cutting storage by 75% with only about 5% loss in retrieval recall
- Product Quantization (PQ): Reduces vector size by creating segments that are stored as 8-bit integers instead of 32-bit floats
- Binary Quantization (BQ): Reduces each dimension to a single bit, providing a 1:32 compression rate (best for high-dimensional vectors)
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)
1 Like
@vradhik
I’ve built this WebApp Weaviate Disk Calculator which should help in estimating the disk size roughly - see if this helps 
I am still working on it to improve a couple of points in there but it can give a rough good estimation for now.
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)
1 Like
This is very useful! Really appreciate it, thank you so much, Mohamed
1 Like