Hybrid search explanation explanation :)

I am testing my weaviate collection with hybrid searches performed with several different embedding models (using named vectors) and varying alpha values through a new cool interface:

Can anyone help me better understand the explanation that comes back along with the results? Especially the first one seems confusing.

PS This frontend was developed by a super cool guy that I cannot recommend enough. If you need front end developments for your weaviate based project you can contact him at cesarnml@outlook.com

Ciao amico! Come stai?

edit:
By default, Weaviate will do a Ranked Fusion Relative Score Fusion see here

We have a nice blog article on this, for instance:

The rankedFusion algorithm is the original hybrid fusion algorithm that has
been available since the launch of hybrid search in Weaviate.

In this algorithm, each object is scored according to its position in the results for the given search, starting from the highest score for the top-ranked object and decreasing down the order. The total score is calculated by adding these rank-based scores from the vector and keyword searches

There is also the Relative Score. There is a nice explanation in the source code:

// FusionRelativeScore uses the relative differences in the scores from keyword and vector search to combine the
// results. This method retains more information than ranked fusion and should result in better results.
//
// The scores from each result are normalized between 0 and 1, e.g. the maximum score becomes 1 and the minimum 0 and the
// other scores are in between, keeping their relative distance to the other scores.
// Example:
//
// Input score = [1, 8, 6, 11] => [0, 0.7, 0.5, 1]
//
// The normalized scores are then combined using their respective weight and the combined scores are sorted

Also from that blog article:

In contrast to rankedFusion , however, relativeScoreFusion derives each objects score by normalizing the metrics output by the vector search and keyword search respectively. The highest value becomes 1, the lowest value becomes 0, and others end up in between according to this scale. The total score is thus calculated by a scaled sum of normalized vector similarity and normalized BM25 score.

Also notice that the Result Set keyword part of the explanation will only appear if there is a key word match against your objects.

Let me know if this helps :slight_smile:

Thanks!

1 Like

Since 1.24 the default is relative score fusion :slight_smile:

1 Like

Thanks a lot my friends.

Great! I have created a PR in our docs here to reflect that

Thanks @Dirk !!

Edit: PR merged!