I noticed the documentation says query latency is not improved by sharding—only by replication. My mental model is that, when a query arrives, the coordinator contacts every shard-hosting pod, each pod searches its local data in parallel, and the coordinator then merges the partial results. Intuitively that sounds faster as we add more shards. Where does my reasoning diverge from how Weaviate actually processes a query?
Hi @Saketh !!
Your mental model is correct. However, the bottleneck will not be searching, but merging the results.
More sharding means:
- Wait for all shards to complete their searches
- Merge and sort results from all shards
- Apply global limits and consistency checks
This means query latency is bounded by the slowest shard, not improved by parallelization.
Replication, on the other hand, will help as it provides multiple copies of the same data, allowing the system to route queries to the fastest available replica rather than waiting for a specific slow shard.
Let me know if this clarifies it for you
Happy coding!
In the case of sharding wont the slowest shard be still faster than the case of not sharding, because search in each shard is only limited to a part of the data not the entire data.
Yes, but more shards will also mean more overhead on merging results. So adding more shards will make the merging more costful, and less effective if you want increased query latency.
Oh I see, thank you @DudaNogueira !