Hybrid Search Recall Inconsistency: limit parameter significantly alters Top-1 retrieval results (False Negatives at lower limits)

Description

I am running a Hybrid Search on a collection with hundreds of thousands of objects. I am trying to retrieve a specific chunk which I know exists in the database. I have performed a controlled experiment using the exact same query but varying the alpha and limit parameters.

The Experiment: I tested three alpha settings: 0 , 0.5, and 1 . For each alpha, I compared the results between limit: 1000 and limit: 10000.

The Observation:

  • When limit: 1000: The target chunk was NOT found in the returned list for any of the alpha settings (0, 0.5, or 1).
  • When limit: 10000: The target chunk was successfully retrieved and, notably, it was ranked #1 in the results.

The Confusion: This behavior is counter-intuitive. My understanding is that without an explicit sort order, Weaviate returns results by relevance score (descending). Therefore, I expected the Top-1000 results of a limit=1000 query to be identical to the Top-1000 subset of a limit=10000 query.

The fact that the #1 ranked item (at limit 10k) completely disappears when the limit is reduced to 1k suggests that the limit parameter is implicitly controlling the search depth (e.g., HNSW ef parameter or WAND pruning threshold) rather than just truncating the final result list.

My Concerns & Questions:

  1. Search Scope: Does Weaviate dynamically adjust the underlying ANN search scope (ef) or the inverted index pruning aggressiveness based on the requested limit?
  2. Reliability: If the search scope is indeed tied to the limit, I am concerned about false negatives. If a target chunk is missed at limit=10000 (which is often the default hard cap), is there any way to ensure it is considered during the retrieval phase?
  3. Configuration: How can I configure the search to ensure that high-scoring candidates are not pruned early in the process, even if I use a smaller limit? (I am already using ID filters to narrow down the scope, but the candidate pool remains large).

I would appreciate any insights into the underlying mechanism and advice on how to guarantee recall for top-ranking items without having to request excessively large limits.

Server Setup Information

  • Weaviate Server Version:1.31.2
  • Deployment Method: docker
  • Multi Node? Number of Running Nodes: 1 node
  • Client Language and Version: 4.15.2
  • Multitenancy?: No

Any additional Information

hi @Carloszone !!

Can you reproduce this on latest version? We had a lot of fixes since 1.31.2, and this will help us narrow it down from the get go.

Try to always be on at least 1.31.latest, as we backport the most important fixes.

However, it is important to note:

Yes, Weaviate dynamically adjusts search scope based on the limit parameter. The limit affects both the HNSW ef parameter for vector search and the WAND pruning threshold for BM25 search, which explains why your top-ranked item appears at limit=10000 but disappears at limit=1000.

If you want to dive in on more explanation tied to our codebase and suggested nobs to change on your configuration, check out this cool link :wink:

Let me know if this helps!