High Query latency in Weaviate

hanumanhuda · August 27, 2024, 10:54am

Description

Dimensions: 1536 and number of objects: 500k
We are observing high latency 490ms(p50),640ms(p90), 1.4s(p99) for throughput 32 RPS, Pod have enough CPU and memory available during the test however it is not utilizing fully. These results are not even close to benchmarks performed by Weaviate. Any suggestions to reduce the latency?

Server Setup Information

Weaviate Server Version: 1.22.8
Deployment Method: k8s
Multi Node? Number of Running Nodes: single node
Client Language and Version: python v3
Multitenancy?: No

Any additional Information

DudaNogueira · August 27, 2024, 1:16pm

hi @hanumanhuda !!

Curious, is there a reason to use 1.22.8?

Have you tried latest 1.26? There is A LOT of changes that may fix something that may be causing this.

Have you used our oficial helm chart?
Are there any limits in place for this cluster at k8s?
Have you changed any environment variables regarding resource planning?

Thanks!

hanumanhuda · August 29, 2024, 7:39am

No, we haven’t used the helm chart for above tests, however we can try that with latest version 1.26. We haven’t changed any environment variable and going with default configuration, is there any specific recommendation for this use-case to make it faster?

Mohamed_Shahin · August 29, 2024, 11:39am

Hi @hanumanhuda,

Building on @DudaNogueira’s recommendation, I wanted to share some insights that could help improve your setup:

Running multiple nodes can lead to noticeable improvements. This approach improve the query as distribution across multiple nodes.
With a cluster setup (minimum 3 nodes), you’ll be able to set a replication factor of 3. The default consistency level for this is Quorum, which should work well for most cases.

To boost performance, you can set the consistency level to ONE for queries. While this trades some consistency for speed, Weaviate handles background consistency for you with repairs, so you don’t need to worry.

Here is detailed technical explanations:

Let us know if you have any other questions or if there’s anything else we can help with!

DudaNogueira · August 29, 2024, 6:48pm

hi @hanumanhuda !

Give enough resources, 500k objects will run lightning fast.

In order to get better latencies, as mentioned by my friend and colleague Mohamad, using newer versions than the one you are using is important as it leverages GRPC.

The version you are using will only expose REST/HTTP endpoints. GRPC will help it here tremendously, as it is way faster.

on top of that, there are a lot of other improvements.

Here you find more information on GRPC:

And here you can have all information needed to run Weaviate using our oficial helm chart:

The best way to migrate here, considering you have not used our helm in the first place, is to spin up a new Weaviate cluster using that helm, and migration your data over using this migration guide:

Let me know if this helps!

hanumanhuda · September 4, 2024, 7:49am

We tested the latest version 1.26.1 with 3 shards on a collection containing 500k objects, using the default settings for GOMEMLIMIT and GOMAXPROCS. The results were as follows:

Latency for 32 RPS:
- P50: 300 ms
- P90: 490 ms
- P99: 740 ms
Latency for 2 RPS:
- P50: 76 ms
- P90: 87 ms
- P99: 92 ms

We haven’t enabled replication since our primary goal is to reduce latency, not just increase availability. Although we are currently working with 500k objects, our target is to scale a single collection to 5 million objects.

One key observation during these tests was that none of the nodes fully utilized the available compute resources. As RPS increased, latency also increased, even though there was ample memory and compute available across the nodes. This raises the question: Is there a bottleneck preventing the utilization of compute resources across multiple queries?

DudaNogueira · September 5, 2024, 1:28pm

hi @hanumanhuda !

I believe you should try tweaking those parameters according to your deployment.

Also, consider that replication will also give you better room for higher QPS: Use Cases (Motivation) | Weaviate

Other index configs you can tune for better QPS are ef, efConstruction and maxConnections. More on those options here: Vector indexes | Weaviate

Let me know if this helps.

Thanks!

hanumanhuda · September 12, 2024, 12:41pm

We have tried with replication factor 2 and consistency level 1 during the read, however response time is quite higher than replication factor 1 setup, which was surprising for us. It was using memory 2 times however compute remained same.

hanumanhuda · September 25, 2024, 9:09am

Any further update on this?

DudaNogueira · September 25, 2024, 11:03am

Did you get the same results with replication factor 3?

hanumanhuda · September 25, 2024, 6:24pm

No, we didn’t do it for RF 3 If it is not improving with 2 replicas then it didn’t make sense for us to go for 3 replicas. We have used the consistency level as one with 2 replicas.

We are seeing interesting aspects on memory usage for 1.5M chunks(D:1536), and it is using 28 GB during queries otherwise 15GB. As per calculations it would be using at max 12GB.

hanumanhuda · September 25, 2024, 6:26pm

Also is there guideline how many shards should be used to get best performance based on number of chunks?

Dirk · September 26, 2024, 5:52am

Hey,

I saw that you were starting with 1.22.6+python v3 - did you update to python v4 when you updated the weaviate version?

hanumanhuda · October 1, 2024, 1:40pm

yes, we did try with V4 client(GRPC), dense queries latency has improved by 40%, however sparse queries has worsen the performance by 40% for P99. This was surprise for us as we have changed client with V4 from v3 and enabled GRPC on server. Is this expected for V4 clients?

Topic		Replies	Views
HIgh Cpu Usage Support technical	3	436	May 28, 2025
Weaviate cluster is very unstable (1.29.2) Support	8	667	April 9, 2025
Explosive growth (to 10sec) of request latency when one cluster's node fails Support technical	4	419	August 28, 2024
Support needed for fixing Weaviate performance issues Support python , technical	4	614	October 17, 2024
Weaviate performance unatable by vectordbbenchmark evaluate Support developer-experience , technical	1	233	June 27, 2025

High Query latency in Weaviate

Description

Server Setup Information

Any additional Information

Related topics