Dimensions: 1536 and number of objects: 500k
We are observing high latency 490ms(p50),640ms(p90), 1.4s(p99) for throughput 32 RPS, Pod have enough CPU and memory available during the test however it is not utilizing fully. These results are not even close to benchmarks performed by Weaviate. Any suggestions to reduce the latency?
Have you tried latest 1.26? There is A LOT of changes that may fix something that may be causing this.
Have you used our oficial helm chart?
Are there any limits in place for this cluster at k8s?
Have you changed any environment variables regarding resource planning?
No, we haven’t used the helm chart for above tests, however we can try that with latest version 1.26. We haven’t changed any environment variable and going with default configuration, is there any specific recommendation for this use-case to make it faster?
Building on @DudaNogueira’s recommendation, I wanted to share some insights that could help improve your setup:
Running multiple nodes can lead to noticeable improvements. This approach improve the query as distribution across multiple nodes.
With a cluster setup (minimum 3 nodes), you’ll be able to set a replication factor of 3. The default consistency level for this is Quorum, which should work well for most cases.
To boost performance, you can set the consistency level to ONE for queries. While this trades some consistency for speed, Weaviate handles background consistency for you with repairs, so you don’t need to worry.
Here is detailed technical explanations:
Let us know if you have any other questions or if there’s anything else we can help with!
Give enough resources, 500k objects will run lightning fast.
In order to get better latencies, as mentioned by my friend and colleague Mohamad, using newer versions than the one you are using is important as it leverages GRPC.
The version you are using will only expose REST/HTTP endpoints. GRPC will help it here tremendously, as it is way faster.
on top of that, there are a lot of other improvements.
Here you find more information on GRPC:
And here you can have all information needed to run Weaviate using our oficial helm chart:
The best way to migrate here, considering you have not used our helm in the first place, is to spin up a new Weaviate cluster using that helm, and migration your data over using this migration guide:
We tested the latest version 1.26.1 with 3 shards on a collection containing 500k objects, using the default settings for GOMEMLIMIT and GOMAXPROCS. The results were as follows:
Latency for 32 RPS:
P50: 300 ms
P90: 490 ms
P99: 740 ms
Latency for 2 RPS:
P50: 76 ms
P90: 87 ms
P99: 92 ms
We haven’t enabled replication since our primary goal is to reduce latency, not just increase availability. Although we are currently working with 500k objects, our target is to scale a single collection to 5 million objects.
One key observation during these tests was that none of the nodes fully utilized the available compute resources. As RPS increased, latency also increased, even though there was ample memory and compute available across the nodes. This raises the question: Is there a bottleneck preventing the utilization of compute resources across multiple queries?
We have tried with replication factor 2 and consistency level 1 during the read, however response time is quite higher than replication factor 1 setup, which was surprising for us. It was using memory 2 times however compute remained same.
No, we didn’t do it for RF 3 If it is not improving with 2 replicas then it didn’t make sense for us to go for 3 replicas. We have used the consistency level as one with 2 replicas.
We are seeing interesting aspects on memory usage for 1.5M chunks(D:1536), and it is using 28 GB during queries otherwise 15GB. As per calculations it would be using at max 12GB.
yes, we did try with V4 client(GRPC), dense queries latency has improved by 40%, however sparse queries has worsen the performance by 40% for P99. This was surprise for us as we have changed client with V4 from v3 and enabled GRPC on server. Is this expected for V4 clients?