Understanding BQ configuration parameters

Michael_Pont · October 10, 2024, 2:43pm

Hello I wanted to understand the BQ config parameters a bit more.
What exactly does rescoreLimit of -1 do? What are tradeoffs with setting cache to true?
What parameters should I set if I still want to ensure high accuracy? If I understand correctly a higher rescoreLimit is better to increase recall (Compression | Weaviate). Should I set the rescoreLimit based on what limit parameter I normally set for my queries?

For one collection I expect to have between 1 and 200 objects for each tenant when running a hybrid query
For another collection I expect to have stored between 1 and 10,000 objects for each tenant when running a hybrid query.
I am returning normally around 10-20 results from the hybrid query.

vectorIndexConfig: configure.vectorIndex.dynamic({
      hnsw: {
        quantizer: configure.vectorIndex.quantizer.bq({
          cache: true,
          rescoreLimit: -1,
        }),
      },
      flat: {
        quantizer: configure.vectorIndex.quantizer.bq({
          cache: true,
          rescoreLimit: -1,
        }),
      },
    }),

Abdel_Rodriguez · October 21, 2024, 8:45am

Hi, thanks for formalising your question. I try to answer.

About rescoreLimit:
In case you use -1 you will ask your server to rescore only the vectors retrieved. This means that the recall will probably be affected but you will have maximum performance. Depending on the embeddings you are using you might want to higher this parameter to obtain better quality of the results. It is hard to say if you use retrieve 10 objects then rescore 15 or so (but I would start somewhere like this 1.5xK). You need to experiment a bit with it and find the right balance between the quality of the results and the performance you are expecting. If the performance is good still try to higher more as it will give you a better quality for sure.

About setting the cache to true:
When using the flat index (used under the hood if using the dynamic index) you will read the full vectors always from disk. If you use compression, as you are now using, then the full vectors are only used for rescoring. This means that if you have a rescoreLimit of 15, for example, you will only read 15 vectors from disk for each query. Now the brute force will touch all vectors on its compressed form. If you set cache to true, the compressed vectors will be stored in memory and you will not touch the disk for the brute force part. This means, much better performance for a bit of memory. I wrote a bit of memory because the vectors are compressed using BQ which means you only need one bit per dimension. Normally if you would have for example 1,536 dimensions, you need four bytes for each dimension meaning 1,536*4 bytes per vector which for the 10,000 vectors example you posted means more than 60MB. If you translate this into BQ vectors it would only need less than 2KB.

I hope this helps. If you have further questions, please let us know and we try to assist you as best as we know.
Cheers!

Topic		Replies	Views
Help Needed: Explain top scored documents, Increase query speed Support	4	767	August 24, 2023
Limit parameter change results of near_vector query Support	3	194	November 6, 2024
Keyword, vector and hybrid searching cause less rows to be retrieved Support	4	416	February 13, 2024
How to planning HNSW index ef, efConstruction and maxConnections parameters with PQ? Support technical	1	157	January 6, 2025
Increase number of shards and update HNSW vector index parameters Support python	6	574	August 28, 2024

Understanding BQ configuration parameters

Related topics