Scores for Hybrid search

Rohini_vaidya · January 2, 2025, 5:16pm

Hi, I am trying to perform a hybrid search with alpha=0.5 and am getting responses, but I would like to validate the search process. To do so, I printed the scores from the metadata. However, I observed that for every document in the results, the score is the same, i.e., 0.5.

Here is my code snippet:

response = collection.query.hybrid(
    query="my query",
    query_properties=["test"],
    alpha=0.5,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    
    return_metadata=MetadataQuery(score=True, explain_score=True),
    limit=5,
)

for o in response.objects:
    print(o.properties)
    print(o.metadata.score,o.metadata.explain_score)

Example Output:

Document 1 Result:
Hybrid (Result Set keyword, bm25) Document 854678e5-af01-4658-b55f-0427c0544a32:
Original score: 15.578449, Normalized score: 0.5
Document 2 Result:
Hybrid (Result Set vector, hybridVector) Document d745302e-84a6-42d9-bee1-5b104be285e3:
Original score: 0.7183615, Normalized score: 0.5

Observations:

When alpha=0.5, the normalized scores for all documents are the same (0.5), irrespective of the query type.
When experimenting with alpha:

At alpha=0 (pure keyword search), the scores differ for each document.
At alpha=1 (pure vector search), the normalized scores are again the same for all documents.

Query: Am I missing something in my implementation or understanding of the hybrid search process? Could you provide any suggestions or clarifications on this behavior?

DudaNogueira · January 2, 2025, 5:55pm

hi @Rohini_vaidya !!

How many objects are there?

What I am finding weird is that looks like there is no vector distance for Document1

the keyword,bm25 score will only be accounted if the object has indeed any of the query tokens.

However, the vector distance should always be present.

For example:

client.collections.delete("Test")
collection = client.collections.create("Test", vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai())
collection.data.insert({"text": "a beautiful dog"})
collection.data.insert({"text": "a nice cat"})

r = collection.query.hybrid(query="cat", return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True), alpha=0.5)
for o in r.objects:
    print("#*10")
    print(o.properties)
    print(o.metadata.explain_score)

and this is the output:

##########
{‘text’: ‘a nice cat’}

Hybrid (Result Set keyword,bm25) Document 2b19e024-5261-46d3-85ae-5110806d081c: original score 0.3150669, normalized score: 0.5 -
Hybrid (Result Set vector,hybridVector) Document 2b19e024-5261-46d3-85ae-5110806d081c: original score 0.4917053, normalized score: 0.5
##########
{‘text’: ‘a beautiful dog’}

Hybrid (Result Set vector,hybridVector) Document 34a189b1-4fd5-4b4f-8e77-23e9e2ea2def: original score 0.32074285, normalized score: 0

Let me know if you can provide a dataset where this is reproducible.

Thanks!

Rohini_vaidya · January 3, 2025, 4:00am

Thank you, @DudaNogueira.

I have a total of three objects in my dataset, but I’m unable to share the dataset itself. When I tested the example you provided, it returned the correct results.

Now, I’m a bit confused about why this discrepancy is occurring in my case. Could you help to clarify?

DudaNogueira · January 3, 2025, 1:59pm

Feel free to reach out to me in our public slack.

We could then do a screen sharing session so I can take a closer look.

I am assuming you are using latest version on server, right?

Rohini_vaidya · January 4, 2025, 5:47am

Thank you, @DudaNogueira.

Could you please provide a reference document or guidance on importing multiple vectors from a dictionary into a collection?

Scenario:
I have a dictionary with key-value pairs in the format string: [vector]. I need to import these vectors into a collection. For example:

vector = {
    "a_vector": strings_map[row["a"]],
    "b_vector": strings_map[row["b"]],
    "c_vector": strings_map[row["c"]]
}

Here,have a dictionary strings_map with key-value pairs in the format string: [vector]. I’m mapping specific keys (a, b, c) from a row object to corresponding vector values from a strings_map dictionary.

How can I efficiently import such vectors into a collection?

DudaNogueira · January 6, 2025, 2:00pm

hi @Rohini_vaidya !!

Here is how you can do that:

vectors = {
    "a_vector": [1,2,3],
    "b_vector": [1,2,3,4],
    "c_vector": [1,2,3,4,5]
}

client.collections.delete("Test")
collection = client.collections.create(
    "Test", 
    vectorizer_config=[
        wvc.config.Configure.NamedVectors.none(name="a_vector"),
        wvc.config.Configure.NamedVectors.none(name="b_vector"),
        wvc.config.Configure.NamedVectors.none(name="c_vector"),
    ]
)
collection.data.insert(
    properties={"text": "music for running", "brand": "Bosch"},
    vector=vectors
)

now you can get your objects:

query = collection.query.fetch_objects(include_vector=True)
print(query.objects[0].properties)
print(query.objects[0].vector)

# outputs:
# {'text': 'music for running', 'brand': 'Bosch'}
# {'a_vector': [1.0, 2.0, 3.0], 'b_vector': [1.0, 2.0, 3.0, 4.0], 'c_vector': [1.0, 2.0, 3.0, 4.0, 5.0]}

You can also search using near_vector:

query = collection.query.near_vector(
    near_vector=[5,4,3,2,1], include_vector=True, target_vector="c_vector", return_metadata=wvc.query.MetadataQuery(distance=True)
)
print(query.objects[0].properties)
print(query.objects[0].vector)
print(query.objects[0].metadata.distance)

# outputs:
# {'text': 'music for running', 'brand': 'Bosch'}
# {'c_vector': [1.0, 2.0, 3.0, 4.0, 5.0], 'a_vector': [1.0, 2.0, 3.0], 'b_vector': [1.0, 2.0, 3.0, 4.0]}
# 0.3636362552642822

Let me know if this helps!

Thanks!

Rohini_vaidya · January 6, 2025, 4:19pm

Thank you, @DudaNogueira.
This works for me.

Topic		Replies	Views
Issue with Distance Value in Weaviate Hybrid Search and Applying Similarity Score Threshold Support	2	162	April 2, 2025
Issue with Weaviate Hybrid Search (Alpha = 1) Not Returning Exact Match General	2	162	March 27, 2025
Hybrid similarity scoring is so weird - it doesn't make any sense Support	1	171	November 12, 2024
Hybrid search score calculation anomaly Support	3	453	January 30, 2024
.near_text vector search score is very low Support python	3	287	September 18, 2024

Scores for Hybrid search

Related topics