Scores for Hybrid search

Hi, I am trying to perform a hybrid search with alpha=0.5 and am getting responses, but I would like to validate the search process. To do so, I printed the scores from the metadata. However, I observed that for every document in the results, the score is the same, i.e., 0.5.

Here is my code snippet:

response = collection.query.hybrid(
    query="my query",
    query_properties=["test"],
    alpha=0.5,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    
    return_metadata=MetadataQuery(score=True, explain_score=True),
    limit=5,
)

for o in response.objects:
    print(o.properties)
    print(o.metadata.score,o.metadata.explain_score)

Example Output:

  • Document 1 Result:
    Hybrid (Result Set keyword, bm25) Document 854678e5-af01-4658-b55f-0427c0544a32:
    Original score: 15.578449, Normalized score: 0.5
  • Document 2 Result:
    Hybrid (Result Set vector, hybridVector) Document d745302e-84a6-42d9-bee1-5b104be285e3:
    Original score: 0.7183615, Normalized score: 0.5

Observations:

  1. When alpha=0.5, the normalized scores for all documents are the same (0.5), irrespective of the query type.
  2. When experimenting with alpha:
  • At alpha=0 (pure keyword search), the scores differ for each document.
  • At alpha=1 (pure vector search), the normalized scores are again the same for all documents.

Query: Am I missing something in my implementation or understanding of the hybrid search process? Could you provide any suggestions or clarifications on this behavior?

hi @Rohini_vaidya !!

How many objects are there?

What I am finding weird is that looks like there is no vector distance for Document1 :thinking:

the keyword,bm25 score will only be accounted if the object has indeed any of the query tokens.

However, the vector distance should always be present.

For example:

client.collections.delete("Test")
collection = client.collections.create("Test", vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai())
collection.data.insert({"text": "a beautiful dog"})
collection.data.insert({"text": "a nice cat"})

r = collection.query.hybrid(query="cat", return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True), alpha=0.5)
for o in r.objects:
    print("#*10")
    print(o.properties)
    print(o.metadata.explain_score)

and this is the output:

##########
{‘text’: ‘a nice cat’}

Hybrid (Result Set keyword,bm25) Document 2b19e024-5261-46d3-85ae-5110806d081c: original score 0.3150669, normalized score: 0.5 -
Hybrid (Result Set vector,hybridVector) Document 2b19e024-5261-46d3-85ae-5110806d081c: original score 0.4917053, normalized score: 0.5
##########
{‘text’: ‘a beautiful dog’}

Hybrid (Result Set vector,hybridVector) Document 34a189b1-4fd5-4b4f-8e77-23e9e2ea2def: original score 0.32074285, normalized score: 0

Let me know if you can provide a dataset where this is reproducible.

Thanks!

Thank you, @DudaNogueira.

I have a total of three objects in my dataset, but I’m unable to share the dataset itself. When I tested the example you provided, it returned the correct results.

Now, I’m a bit confused about why this discrepancy is occurring in my case. Could you help to clarify?

Feel free to reach out to me in our public slack.

We could then do a screen sharing session so I can take a closer look.

I am assuming you are using latest version on server, right?

Thank you, @DudaNogueira.

Could you please provide a reference document or guidance on importing multiple vectors from a dictionary into a collection?

Scenario:
I have a dictionary with key-value pairs in the format string: [vector]. I need to import these vectors into a collection. For example:

vector = {
    "a_vector": strings_map[row["a"]],
    "b_vector": strings_map[row["b"]],
    "c_vector": strings_map[row["c"]]
}

Here,have a dictionary strings_map with key-value pairs in the format string: [vector]. I’m mapping specific keys (a, b, c) from a row object to corresponding vector values from a strings_map dictionary.

How can I efficiently import such vectors into a collection?

hi @Rohini_vaidya !!

Here is how you can do that:

vectors = {
    "a_vector": [1,2,3],
    "b_vector": [1,2,3,4],
    "c_vector": [1,2,3,4,5]
}

client.collections.delete("Test")
collection = client.collections.create(
    "Test", 
    vectorizer_config=[
        wvc.config.Configure.NamedVectors.none(name="a_vector"),
        wvc.config.Configure.NamedVectors.none(name="b_vector"),
        wvc.config.Configure.NamedVectors.none(name="c_vector"),
    ]
)
collection.data.insert(
    properties={"text": "music for running", "brand": "Bosch"},
    vector=vectors
)

now you can get your objects:

query = collection.query.fetch_objects(include_vector=True)
print(query.objects[0].properties)
print(query.objects[0].vector)

# outputs:
# {'text': 'music for running', 'brand': 'Bosch'}
# {'a_vector': [1.0, 2.0, 3.0], 'b_vector': [1.0, 2.0, 3.0, 4.0], 'c_vector': [1.0, 2.0, 3.0, 4.0, 5.0]}

You can also search using near_vector:

query = collection.query.near_vector(
    near_vector=[5,4,3,2,1], include_vector=True, target_vector="c_vector", return_metadata=wvc.query.MetadataQuery(distance=True)
)
print(query.objects[0].properties)
print(query.objects[0].vector)
print(query.objects[0].metadata.distance)

# outputs:
# {'text': 'music for running', 'brand': 'Bosch'}
# {'c_vector': [1.0, 2.0, 3.0, 4.0, 5.0], 'a_vector': [1.0, 2.0, 3.0], 'b_vector': [1.0, 2.0, 3.0, 4.0]}
# 0.3636362552642822

Let me know if this helps!

Thanks!

Thank you, @DudaNogueira.
This works for me.