How to determine the optimal number of segments in PQ to reduce requests latency (search)?

DevelMyCry · December 24, 2024, 10:08am

Hey Weaviate community! I don’t know how to achieve maximum response time reduction (latency reduction) through Product Quantization.
There are two named vectors. The lengths of these vectors are 768 and 1024. I followed the steps on the official site for PQ, but I can’t understand a few points:

Is it possible to specify a different number of segments for each vector when creating collection (for example, on Standalone)? I tried to do this, but I got weaviate log messages (errors) like “inconsistent vector length”…
Why does search time with PQ for a dataset size from 1 to 5 million (I haven’t tested on larger datasets yet) is worse than Weaviate instance without quantization?
What is the best way to configure quantization to prioritize fast service response time for storage with more than 20 million vectors? The goal is to gain some processing time advantage compared to regular storage.

P.S. I mostly configure PQ in creation time of collection. I don’t know which option is better, update config or creation with pq. Also I turned on ASYNC_INDEXING. And I tried set segments to 0 (or just set training limit), but don’t understand in which case that helps (test results aren’t OK). Weaviate version 1.25.25

DudaNogueira · December 26, 2024, 2:47pm

hi @DevelMyCry !!

Welcome to our community

Yes. You can provide the segments, as described here. If you do not specify one, Weaviate will try to figure out the best segments number.
Not sure here. Can you provide the code to replicate and the same steps you did? For example, it may take a while for Weaviate to index and compress all data, so maybe if you do queries right after, the response time may be affected.
One of the motivations of replication is to increase QPS. Also, making sure you have fast disks and servers close to the client (same region, for example).

Let me know if this helps!

Thanks!

DevelMyCry · December 27, 2024, 4:53am

Hi @DudaNogueira!
My Configuration and steps:

Server: Standalone
CPU LIMIT: 40
LENGTH OF VECTOR TYPE ONE - 768
LENGTH OF VECTOR TYPE TWO - 1024
Index - HNSW
CONSISTENCY LEVEL : ONE
docker-compose:

version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:1.25.25
    ports:
    - 8081:8080
    - 50052:50051
    volumes:
    - ./test_data:/data
    - ./backups:/tmp/backups
    restart: on-failure:0
    environment:
      TOMBSTONE_DELETION_CONCURRENCY: '4'

      DISABLE_LAZY_LOAD_SHARDS: 'true'
      HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE: 'true'

      STANDALONE_MODE: 'true'
      AUTOSCHEMA_ENABLED: 'false'
      QUERY_MAXIMUM_RESULTS: 10000

      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/data'
      DEFAULT_VECTORIZER_MODULE: 'none'
      CLUSTER_HOSTNAME: 'node1'
      ENABLE_MODULES: 'backup-filesystem,backup-s3'
      BACKUP_FILESYSTEM_PATH: '/tmp/backups'

      LIMIT_RESOURCES: 'true'
      ASYNC_INDEXING: 'true'

      LOG_LEVEL: 'debug'

      PROMETHEUS_MONITORING_ENABLED: 'false'
      GOMAXPROCS: 40
      GOGC: 90
      PERSISTENCE_HNSW_MAX_LOG_SIZE: 4GiB
    deploy:
      resources:
        limits:
          cpus: '40.0'

Block of code for collection creation:

client.collections.create(
        name=some_name,
        properties=[
            weaviate.classes.config.Property(
                name=SOME_CATEGORY,
                data_type=weaviate.classes.config.DataType.TEXT
            ),
            weaviate.classes.config.Property(
                name=SOME_PROP,
                data_type=weaviate.classes.config.DataType.TEXT
            )
        ],
        vectorizer_config=[
            weaviate.classes.config.Configure.NamedVectors.none(
                name=WV_VECTOR_TYPE_ONE,
                vector_index_config=weaviate.classes.config.Configure.VectorIndex.hnsw(
                    #quantizer=weaviate.classes.config.Configure.VectorIndex.Quantizer.pq(segments=8, training_limit=100000),
                    distance_metric=weaviate.classes.config.VectorDistances.COSINE,
                    ef=320,
                    ef_construction=320,
                    max_connections=100
                )
            ),
            weaviate.classes.config.Configure.NamedVectors.none(
                name=WV_VECTOR_TYPE_TWO,
                vector_index_config=weaviate.classes.config.Configure.VectorIndex.hnsw(
                    #quantizer=weaviate.classes.config.Configure.VectorIndex.Quantizer.pq(segments=8, training_limit=100000),
                    distance_metric=weaviate.classes.config.VectorDistances.COSINE,
                    ef=480,
                    ef_construction=480,
                    max_connections=120
                )
            )
        ],
    )

Block of search query:

SEARCH THREADS = 4
QUERY_TIMEOUT_SEC = 60

collection = client.collections.get(SCHEMA)
if CONSISTENCY_LEVEL:
     collection =
     collection.with_consistency_level(CONSISTENCY_LEVEL)
                query = collection.query.near_vector(
                    near_vector=vectors[VECTOR],
                    limit=LIMIT,
                    filters=Filter.by_property(filter_property).equal(filter_value) if filter_property and filter_value else None,
                    return_metadata=MetadataQuery(distance=True),
                    return_properties=properties,
                    target_vector=VECTOR,
                )

Importing was through batching (batch size = 5000). After importing, I waited until “queue size” was equal to 0.

With the same import and after waiting finish of “queue” in a normal configuration (without compression), the speed is an order of magnitude higher than with compression.
search on 5 million vectors:

(without any compression)

(segments: 128)

And I noticed that with segments=6 results are better than for example segments=128 or 256. But even with the best segment option (segments=6), the speed is about 2 times slower than without quantization.

DudaNogueira · December 27, 2024, 1:45pm

Hi!

Is there any specific reason to use 1.25.25? Do you get the same results with 1.28.latest?

Also, Those different times (148.8 obj/s x 37.58obj/s) are about querying, not importing, right?

Another question: why those values for max_connections and ef/ef_construction?

I will need to grab more info here so I can escalate this with our team.

Are you able to perform this test on a publicly available dataset?

Thanks!

DevelMyCry · December 30, 2024, 5:33am

Hello!

Using 1.25.25 is temporary, but I tried 1.27 - results were the same.
About querying (search)
For high recall on production (may be to do fix it?)
I’ll be looking forward for your response.
It is important for me to find out from the team what solutions are available in this situation, what I need to pay attention to, and does compression give a speed advantage compared to regular HNSW (with the same ef, ef_construction, max_connections)?

Topic		Replies	Views
Why enabled PQ significantly impacted recall. (version 1.23.7) Support	4	261	March 1, 2024
[Question] Quantized Vectors in Weaviate Support technical	2	163	January 28, 2025
Configuring PQ compression in a collection Support	7	356	February 29, 2024
Choosing optimal 'segment' size in PQ Support	1	200	March 6, 2024
How to planning HNSW index ef, efConstruction and maxConnections parameters with PQ? Support technical	1	171	January 6, 2025

How to determine the optimal number of segments in PQ to reduce requests latency (search)?

Related topics