High Query latency in Weaviate

Description

Dimensions: 1536 and number of objects: 500k
We are observing high latency 490ms(p50),640ms(p90), 1.4s(p99) for throughput 32 RPS, Pod have enough CPU and memory available during the test however it is not utilizing fully. These results are not even close to benchmarks performed by Weaviate. Any suggestions to reduce the latency?

Server Setup Information

  • Weaviate Server Version: 1.22.8
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: single node
  • Client Language and Version: python v3
  • Multitenancy?: No

Any additional Information

hi @hanumanhuda !!

Curious, is there a reason to use 1.22.8?

Have you tried latest 1.26? There is A LOT of changes that may fix something that may be causing this.

Have you used our oficial helm chart?
Are there any limits in place for this cluster at k8s?
Have you changed any environment variables regarding resource planning?

Thanks!

1 Like

No, we haven’t used the helm chart for above tests, however we can try that with latest version 1.26. We haven’t changed any environment variable and going with default configuration, is there any specific recommendation for this use-case to make it faster?

Hi @hanumanhuda,

Building on @DudaNogueira’s recommendation, I wanted to share some insights that could help improve your setup:

  1. Running multiple nodes can lead to noticeable improvements. This approach improve the query as distribution across multiple nodes.
  2. With a cluster setup (minimum 3 nodes), you’ll be able to set a replication factor of 3. The default consistency level for this is Quorum, which should work well for most cases.

To boost performance, you can set the consistency level to ONE for queries. While this trades some consistency for speed, Weaviate handles background consistency for you with repairs, so you don’t need to worry.

Here is detailed technical explanations:

Let us know if you have any other questions or if there’s anything else we can help with!

1 Like

hi @hanumanhuda !

Give enough resources, 500k objects will run lightning fast.

In order to get better latencies, as mentioned by my friend and colleague Mohamad, using newer versions than the one you are using is important as it leverages GRPC.

The version you are using will only expose REST/HTTP endpoints. GRPC will help it here tremendously, as it is way faster.

on top of that, there are a lot of other improvements.

Here you find more information on GRPC:

And here you can have all information needed to run Weaviate using our oficial helm chart:

The best way to migrate here, considering you have not used our helm in the first place, is to spin up a new Weaviate cluster using that helm, and migration your data over using this migration guide:

Let me know if this helps!

1 Like

We tested the latest version 1.26.1 with 3 shards on a collection containing 500k objects, using the default settings for GOMEMLIMIT and GOMAXPROCS. The results were as follows:

  • Latency for 32 RPS:
    • P50: 300 ms
    • P90: 490 ms
    • P99: 740 ms
  • Latency for 2 RPS:
    • P50: 76 ms
    • P90: 87 ms
    • P99: 92 ms

We haven’t enabled replication since our primary goal is to reduce latency, not just increase availability. Although we are currently working with 500k objects, our target is to scale a single collection to 5 million objects.

One key observation during these tests was that none of the nodes fully utilized the available compute resources. As RPS increased, latency also increased, even though there was ample memory and compute available across the nodes. This raises the question: Is there a bottleneck preventing the utilization of compute resources across multiple queries?

hi @hanumanhuda !

I believe you should try tweaking those parameters according to your deployment.

Also, consider that replication will also give you better room for higher QPS: Use Cases (Motivation) | Weaviate

Other index configs you can tune for better QPS are ef, efConstruction and maxConnections. More on those options here: Vector indexes | Weaviate

Let me know if this helps.

Thanks!