No change in vector size after turning Product Quantization on

Hello! I was curious to try out how Product Quantization work. To embed data I use gtr-t5-large model, which creates 768-dimensional vectors. My database stores around 2k of vectors.

My python code to turn PQ on is following:

client.schema.update_config(
    "Document",
    {
        "vectorIndexConfig": {
            "pq": {
                "enabled": True, 
                "trainingLimit": 100000, 
                "segments": 96
            }
        }
    },
)

where Document is my class name.

I’ve got the following confirmation in docker logs:

semantic_search_analysis-weaviate-1  | {"action":"lsm_compaction","class":"Document","index":"document","level":"warning","msg":"compaction halted due to shard READONLY status","path":"data/document/BPZRlHodv3mc/lsm","shard":"BPZRlHodv3mc","time":"2024-01-31T16:27:24Z"}
semantic_search_analysis-weaviate-1  | {"action":"compress","level":"info","msg":"switching to compressed vectors","time":"2024-01-31T16:27:24Z"}
semantic_search_analysis-weaviate-1  | {"action":"compress","level":"info","msg":"vector compression complete","time":"2024-01-31T16:27:26Z"}

After turning PQ on I can see slightly difference in returning distances, but that’s all. Docker memory usage didn’t change and vector size is the same as it was before vector compression (768).

To find out vector size I apply the code:

top_responses = (
    client.query.get("Document", properties=["content", "source"])
    .with_near_text({"concepts": ["How to use gpu?"]})
    .with_additional(["distance", "vector"])
    .with_limit(5)
    .do()
)

vector = top_responses["data"]["Get"]["Document"][0]["_additional"]["vector"]
print(len(vector))

My questions are:

  1. Shouldn’t I see any difference in vectors shape? I supposed that original vectors should be replaced with their compressed ones, but I might miss something.
  2. What changes I am supposed to observe with Product Quantization? Where I should look for it?

I would appreciate your support!

Hi @klem ! That’s an interesting question.

I will need to check internally on what exactly we can expect to happen.

Fow now I know that you can monitor prometheus metrics for segments info:

I will ask internally and get back to you.

Thanks!

Hi @klem, it looks like you enabled product quantization correctly.

With Product Quantization enabled Weaviate still stores the original vectors on disk. The quantized vectors are stored in memory and used for distance calculations in the HNSW graph. Once the final candidates (in your case 5) are selected, Weaviate will return the entire object data (fields such as content and the original vector) from disk. Because of this, things will seem to work as before with the only real user-facing change being recall and distance calculations changing as the vectors are now smaller / quantized.

In terms of why you do not see much memory reduction. As you have 2k vectors the total memory stored by these originally would be 2000 * 768 * 4 bytes = ~6.1 megabytes. It can be hard to see the drop with a small amount of vector such as this due to noise from garbage collection timing. With a large vector dataset you will be able to see the memory drop particularly on the go_memstats_heap_inuse_bytes metric (exposed in Prometheus).

2 Likes

Cool, your response is very clear. Thank you!