I am using hybrid search with max-vector-distance to limit the vector similarity contributions in my search results. However, when I inspect the results using explain_score, I notice that the vector similarity scores still exceed the max-vector-distance threshold.
This behavior is unexpected, as I assumed setting max-vector-distance would filter out any results beyond the specified threshold. The scores seem inconsistent with my expectations for hybrid search.
Am I misunderstanding how max-vector-distance is applied in hybrid search?
Server Setup Information
Weaviate Server Version: semitechnologies/weaviate:1.26.1
Now, I performed a search and printed all the infos:
for o in collection.query.hybrid(
query="futebol",
#max_vector_distance=0.4,
return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True, distance=True)
).objects:
print("#"*10)
print(o.properties)
print(o.metadata.distance, o.metadata.score, o.metadata.explain_score)
Now, using max_vector_distance in a way to better understand it
Those are the distances we will filter out:
score 0.35348773
score 0.23937017
score 0.019589365
for o in collection.query.hybrid(
query="futebol",
max_vector_distance=1-0.020,
return_metadata=wvc.query.MetadataQuery(score=True, explain_score=True, distance=True)
).objects:
print("#"*10)
print(o.properties)
print(o.metadata.distance, o.metadata.score, o.metadata.explain_score)
Thank you for the quick response! I haven’t seen any filtering of results in weaviate 1.26.1, regardless of the max-vector-distance value I set. I will try this again on the upgraded version. Since we are using weaviate-vectorstore in production, I’ll need to check for potential regressions before upgrading.
I’d also like to clarify the behavior of max-vector-distance. My understanding is that documents with a vector distance greater than max-vector-distanceshould be excluded. However, is the intuition here that the score returned from hybrid search represents similarity rather than actual distance?
If that’s the case, then the following values are similarity scores, and setting max_vector_distance = 0.98 filters out the 0.0195 document because its distance is higher:
0.3534
0.2393
0.0195
Is there a way to explicitly display the distance values in the search results to compare them directly with max-vector-distance for a more apples-to-apples comparison?
Also, the vector distance may vary for different query and objects, so you couldn’t define a threshold solely on vector distance.
So when you do a hybrid search, the distance calculated for the vector part of the search will be normalized in order to be fused. And that normalized vector distance is the one you can filter out with max-vector-distance.
IIRC there was a bug in the first release of the max vector distance that could cause some objects to be included even if their vector distance was larger than the threshold.
yes, this is correct.
So when you do a hybrid search, the distance calculated for the vector part of the search will be normalized in order to be fused. And that normalized vector distance is the one you can filter out with max-vector-distance.
The filtering is happening before the normalization+fusion