Different Results for Same Query In Hybrid Search

Description

Inconsistent Results in Hybrid Search. The results vary for the exact same query .

Index Config:

import weaviate
from weaviate.classes.config import Configure,VectorDistances, VectorFilterStrategy, ReplicationDeletionStrategy
client = weaviate.connect_to_local()
client.collections.create(
    "Test",
    vectorizer_config=[
        Configure.NamedVectors.none(
            name="default",
            vector_index_config=Configure.VectorIndex.hnsw(
                distance_metric=VectorDistances.COSINE,
                ef=170,
                ef_construction=256,
                max_connections=32,
                filter_strategy=VectorFilterStrategy.ACORN
            )
        )
    ],
    #properties=properties,
    inverted_index_config=Configure.inverted_index(
        bm25_b=0.7,
        bm25_k1=1.25,
        index_null_state=True
    ),
    replication_config=Configure.replication(
        factor=3,
        async_enabled=True,
        deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION,
    ),
    sharding_config=Configure.sharding(
        desired_count=1,
    )
)

Server Setup Information

  • Weaviate Server Version: : 1.30.3
  • Deployment Method: K8 3 node Cluster
  • Multi Node? Number of Running Nodes: 3
  • Client Language and Version: Python
  • Multitenancy?: No

Any additional Information

The Explain Score section of the top result. Both are different id. The document with id starting 29205c82 is 6 th with the same score in the 2nd result.

Alpha = 0.5

Top Result 1st Run:

  1. Hybrid (Result Set vector,hybridVector) Document 29205c82-d8d9-40e0-98ad-287e068f8324: original score 0.22071475, normalized score: 0.7

Top Result 2nd Run:

  1. Hybrid (Result Set keyword,bm25) Document d81e99c4-8917-4203-b78d-28925caee313: original score 19.018465, normalized score: 0.036954176 -
  2. Hybrid (Result Set vector,hybridVector) Document d81e99c4-8917-4203-b78d-28925caee313: original score 0.35757267, normalized score: 0.7’

hi @apattnaik !

Do you see this behavior also for vector and bm25 searches?

How big is your dataset?

This may be some inconsistency during index. Could it be possible to reindex this dataset and test? Depending on the size, you can even migrate it to the same cluster.

Here we have a guide on how to move data around:

Let me know if this helps!

Hey @DudaNogueira

The Index size is 2.1 million documents. 1 Shard, 2 replicas.
This is occurring for near vector and hybrid. Only BM25 search is giving consistent results.
Yes the indexed is recreated everyday, and this is a recurring issue. If not for one query then for another query.
We get to know only when user points out.