How is Storage footprint reduced after inserting vectors in to Weaviate

Hello folks and @DudaNogueira

I have a parquet file with embeddings stored in it and the actual raw size of the parquet file is 44GB and when inserting in to the DB and post insertion the foot print is just 40GB on the persistent data path.

My persistent datapath is set an NFS share mounted on to the Weaviate DB server running single node . My expectation is that everything will be stored after insertion. To my surprise the actual size of the directory post insertion is lesser than the file size and i have not enabled any PQ or BQ compression .

Questions :

  1. Does Weaviate uses local server cache storage + persistent data storage like 70 -30 % or something of that fashion . Because i could see some data written in the local cache. ?

  2. How weavieate stores its vectors inside the collection , does it perform any compression by default. In my configuration PQ and BQ is disabled.

So i am wondering do Weaviate DB do some sort of Quantization techniques or data compression technique’s to have lesser footprint ?

Any pointers on how the collections are stored and retrieved .

hi @Adi_Sra_Ga !!

Sorry for the delay here

I am assuming that this parquet already includes the vector, right?

There some possible explanations here.

For example, Weaviate uses 32 bit floats for storing vectors. so you have 64 bit float arrays in the parquet file than could explain things.

Also, your parquet may have some extra paddings, spaces, etc.

Let me know if this helps.