Issue with Weaviate Hybrid Search (Alpha = 1) Not Returning Exact Match

I am using Weaviate hybrid search with alpha=1 to perform a pure vector search in some scenarios. Below is my query implementation:

Query Code:

response = self.collection.query.hybrid(
    query=query_text.strip(),
    query_properties=[field_name],
    alpha=alpha,  # Set to 1 for pure vector search
    target_vector=[f"{field_name}_embeddings"],
    max_vector_distance=0.5,
    return_metadata=MetadataQuery(score=True, explain_score=True, distance=True),
    filters=search_filter,
    limit=top_k
)

Filtering Results Based on Score:

if obj.metadata.score >= score_threshold:
    print(obj.properties)

Problem Statement

I am searching for the text “abc”, which exists exactly as “abc” in my dataset.

However, instead of retrieving the exact match, Weaviate returns some other results with higher scores.

Possible Reasons & Questions

  1. Why is the exact match “abc” not appearing in the response?
  2. Could this be due to how Weaviate computes similarity scores for vector search?
  3. Am I missing something in my query setup?

Any insights would be appreciated!

hi @Rohini_vaidya !!

When you set alpha to 1, it will only perform the vector search, but still normalize its distance as a score, even if there isn’t an actual keyword/bm25 phase.

So now even if you search for a specific keyword with hybrid and alpha=1, the search will be about similarity, and not keywords.

One thing that may help you here - And we were just discussing this internally earlier this week - is to adapt your alpha based on the length size.

So for example, if your query is small, your user may have some keywords in mind, so lower alpha may make more sense.

And also the opposite.

But bear in mind this may or may not make sense for your use case :slight_smile:

def adaptative_alpha(query, threshold = 5, alpha = 0.5):
    query_length = len(set(query.split()))
    print("QL", query_length, "thereshold", threshold, "alpha", alpha)
    if query_length < threshold:
        return alpha
    else:
        return (1 - 1/query_length)
    
# generate some tests
examples = [
    "one two three",
    "one two three four",
    "one two three four five",
    "one two three four five six",
    "one two three four five six seven",
    "hello world is something used to give more traction",
    "hello world is something used to give more traction and more",
    "hello world is something used to give more traction and more and more",
    "if there something we can add here that is very big, should increase",
    "if there something we can add here that is very big, should increase one two three four five six seven eight nine ten",
]   
for example in examples:
    print("###"*10)
    print(example, adaptative_alpha(example))

Output:
##############################
QL 3 thereshold 5 alpha 0.5
one two three 0.5
##############################
QL 4 thereshold 5 alpha 0.5
one two three four 0.5
##############################
QL 5 thereshold 5 alpha 0.5
one two three four five 0.8
##############################
QL 6 thereshold 5 alpha 0.5
one two three four five six 0.8333333333333334
##############################
QL 7 thereshold 5 alpha 0.5
one two three four five six seven 0.8571428571428572
##############################
QL 9 thereshold 5 alpha 0.5
hello world is something used to give more traction 0.8888888888888888
##############################
QL 10 thereshold 5 alpha 0.5
hello world is something used to give more traction and more 0.9
##############################
QL 10 thereshold 5 alpha 0.5
hello world is something used to give more traction and more and more 0.9
##############################
QL 13 thereshold 5 alpha 0.5
if there something we can add here that is very big, should increase 0.9230769230769231
##############################
QL 23 thereshold 5 alpha 0.5
if there something we can add here that is very big, should increase one two three four five six seven eight nine ten 0.9565217391304348

Let me know if this helps!

Thans!

Thank you @DudaNogueira for your response !!!

I will check the solution you suggested and will revert back.

1 Like