When executing hybrid searches with relativeScoreFusion, documents without any bm25 score can be scored better than the last bm25 result which receives a score of 0 after normalization.
IMHO the documents retrieved only via vector search must be assigned a value of 0 for the bm25 part before applying the fusion, by doing that the last bm25 result will rank up as the normalized score won’t be zero.
IMHO the documents retrieved only via vector search must be assigned a value of 0 for the bm25 part before applying the fusion, by doing that the last bm25 result will rank up as the normalized score won’t be zero.
relativeScoreFusion just normalizes linearly from [worst_score, best_score] to [0, 1]. The problem is that if a document does not have a BM25 score, we simply cannot scale it.
Is that a bug or is it by design?
It is more a limitation of the current design. In principle, you could compute the missing vector/Bm25 scores before fusion, but it is not trivial
I know the algorithm is expected to have only one worst case, but before the final normalization we could assume a value of 0 for all the documents missing a BM25 score. Don’t know if it’s a oversight, but it could be done right after querying the vector results.