Possible bug with relativeScoreFusion

Description

When executing hybrid searches with relativeScoreFusion, documents without any bm25 score can be scored better than the last bm25 result which receives a score of 0 after normalization.

IMHO the documents retrieved only via vector search must be assigned a value of 0 for the bm25 part before applying the fusion, by doing that the last bm25 result will rank up as the normalized score won’t be zero.

Is that a bug or is it by design?

Server Setup Information

  • Weaviate Server Version: 1.24.8
  • Deployment: docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: api directly
  • Multitenancy?: no

IMHO the documents retrieved only via vector search must be assigned a value of 0 for the bm25 part before applying the fusion, by doing that the last bm25 result will rank up as the normalized score won’t be zero.

relativeScoreFusion just normalizes linearly from [worst_score, best_score] to [0, 1]. The problem is that if a document does not have a BM25 score, we simply cannot scale it.

Is that a bug or is it by design?

It is more a limitation of the current design. In principle, you could compute the missing vector/Bm25 scores before fusion, but it is not trivial

I know the algorithm is expected to have only one worst case, but before the final normalization we could assume a value of 0 for all the documents missing a BM25 score. Don’t know if it’s a oversight, but it could be done right after querying the vector results.

Bm25 scores can be negative, this won’t work