Inconsistency Between Near Vector Score and Hybrid Explained Vector Scores

TLDR; The explained vecotr similarly score included as supplemental information when preforming a hybrid search does not match by a significant amount the same distance metric returned by a near_vector search for the same two object.

For my current setup I have Class A and Class B which both use the same vectorizer and both which uses the cosine distance metric enabling the certainty metric to be returned.

I have noticed the following inconsistency:

When preforming hybrid search for an object by pulling a vector V1 from class A and then searching within class B using the near search operator.

Suppose that one of the results returned for this search is for Object Z1 from class B. It has score S_z

Now if I preform a hybrid search on Class B and provide the same vector used above and ensure the Object Z1 is returned by using a where filter.

By enabling the Explained metrics with response I can see the explained (Unnormalized) Vector score returned for Z1 is different than the one obtained from the near vector search above.

I’ve tried to reconcile this a number of ways including:

  • Possibly it using the [0,2] cosine distance. I tried comparing this as well and it didn’t match
  • Maybe it returning the ranked fusion score for the vector instead. I tried reversing the calculating calculation based on the score = 1/ rank + 60. This was not it either.

I was under the impression that the vector similarity used in the hybrid search was based on the classes underlying vectorizer but perhaps it is using some other fast dense vectorizer?

Any clarification on how the Vector Similarity component in the hybrid search is be computed would help. Or why it so vastly different that the same metric returned by vector search for the same two objects.

Hi @Landon_Edwards

I have asked about this internally, and we should have a response soon. :slight_smile:

Thanks!

Hi @DudaNogueira ,
is there an update on this topic?
I’m also currently extracting the vector search score via the explain_score metadata when using hybrid search, and don’t have a clue, what the score currently represents (as @Landon_Edwards said, it is not the same score you receive when only using vector search) .

Example from a hybrid search (only showing two documents):

  • ‘\nHybrid (Result Set vector) Document 3a4264e1-5627-51c0-8b8c-d1d01eee586d: original score 0.79198617, normalized score: 0.5 - \nHybrid (Result Set keyword) Document 3a4264e1-5627-51c0-8b8c-d1d01eee586d: original score 6.1190777, normalized score: 0.5’
  • ‘\nHybrid (Result Set vector) Document f400be4e-6d24-5705-bc69-79c9d685e76e: original score 0.7832961, normalized score: 0.38622352 - \nHybrid (Result Set keyword) Document f400be4e-6d24-5705-bc69-79c9d685e76e: original score 0.23710166, normalized score: 0’

It seems like a higher vector original score results in a higher normalized score, which is contrary to most of the vector distance metrics, where a lower score is better.

hi @JanHoellmer !! Welcome to our community :hugs:

And thanks to reviving this thread. I believe I have answered it somewhere else (maybe slack)

There is an open issue that covers this:

Thanks!