Inconsistency Between Near Vector Score and Hybrid Explained Vector Scores

Landon_Edwards · February 3, 2024, 9:56am

TLDR; The explained vecotr similarly score included as supplemental information when preforming a hybrid search does not match by a significant amount the same distance metric returned by a near_vector search for the same two object.

For my current setup I have Class A and Class B which both use the same vectorizer and both which uses the cosine distance metric enabling the certainty metric to be returned.

I have noticed the following inconsistency:

When preforming hybrid search for an object by pulling a vector V1 from class A and then searching within class B using the near search operator.

Suppose that one of the results returned for this search is for Object Z1 from class B. It has score S_z

Now if I preform a hybrid search on Class B and provide the same vector used above and ensure the Object Z1 is returned by using a where filter.

By enabling the Explained metrics with response I can see the explained (Unnormalized) Vector score returned for Z1 is different than the one obtained from the near vector search above.

I’ve tried to reconcile this a number of ways including:

Possibly it using the [0,2] cosine distance. I tried comparing this as well and it didn’t match
Maybe it returning the ranked fusion score for the vector instead. I tried reversing the calculating calculation based on the score = 1/ rank + 60. This was not it either.

I was under the impression that the vector similarity used in the hybrid search was based on the classes underlying vectorizer but perhaps it is using some other fast dense vectorizer?

Any clarification on how the Vector Similarity component in the hybrid search is be computed would help. Or why it so vastly different that the same metric returned by vector search for the same two objects.

DudaNogueira · February 6, 2024, 6:51pm

Hi @Landon_Edwards

I have asked about this internally, and we should have a response soon.

Thanks!

JanHoellmer · May 2, 2024, 11:57am

Hi @DudaNogueira ,
is there an update on this topic?
I’m also currently extracting the vector search score via the explain_score metadata when using hybrid search, and don’t have a clue, what the score currently represents (as @Landon_Edwards said, it is not the same score you receive when only using vector search) .

Example from a hybrid search (only showing two documents):

‘\nHybrid (Result Set vector) Document 3a4264e1-5627-51c0-8b8c-d1d01eee586d: original score 0.79198617, normalized score: 0.5 - \nHybrid (Result Set keyword) Document 3a4264e1-5627-51c0-8b8c-d1d01eee586d: original score 6.1190777, normalized score: 0.5’
‘\nHybrid (Result Set vector) Document f400be4e-6d24-5705-bc69-79c9d685e76e: original score 0.7832961, normalized score: 0.38622352 - \nHybrid (Result Set keyword) Document f400be4e-6d24-5705-bc69-79c9d685e76e: original score 0.23710166, normalized score: 0’

It seems like a higher vector original score results in a higher normalized score, which is contrary to most of the vector distance metrics, where a lower score is better.

DudaNogueira · May 6, 2024, 8:04pm

hi @JanHoellmer !! Welcome to our community

And thanks to reviving this thread. I believe I have answered it somewhere else (maybe slack)

There is an open issue that covers this:

github.com/weaviate/weaviate

Issue with the (vector) distance using hybrid query

opened 04:54PM - 19 Mar 24 UTC

closed 12:49PM - 21 Mar 24 UTC

NohTow

bug

### How to reproduce this bug? Call `collection.query.hybrid` function with the… parameter `return_metadata=wvc.query.MetadataQuery(distance=True)` on an existing collection. ### What is the expected behavior? The parameter `"distance"` in the `MetadataReturn` object should contain the distance between the query vector and the one from the element stored in the db. ### What is the actual behavior? The parameter is set to "None". ### Supporting information I wanted to be able to get the vector distance when using hybrid querying. Besides not returning it in the "distance" element, I tried to extract it from the `"explain_score"` element, but the scores from "(Result Set vector)" seem differents from the value I get when doing dense search, as illustrated in the attached screenshot (the `0.21` score is from the dense search, the trace above is from the hybrid query. ![image](https://github.com/weaviate/weaviate/assets/38869395/31688dd3-804f-4185-9fa7-b1bb0d4d3bab) I get that "distance" might be ambiguous in the context of hybrid search, but we should be able to get vector distance somehow. ### Server Version v4 ### Code of Conduct - [X] I have read and agree to the Weaviate's [Contributor Guide](https://weaviate.io/developers/contributor-guide) and [Code of Conduct](https://weaviate.io/service/code-of-conduct)

Thanks!

Topic		Replies	Views
Hybrid search score calculation anomaly Support	3	659	January 30, 2024
Hybrid similarity scoring is so weird - it doesn't make any sense Support	1	368	November 12, 2024
Weaviate HybridSearch explainScore General technical	1	591	March 17, 2025
How weaviate calculates score in similarity_search_with_score? Support	4	779	July 2, 2024
Scores for Hybrid search Support technical	6	602	January 6, 2025

Inconsistency Between Near Vector Score and Hybrid Explained Vector Scores

Related topics