Description
The distance field in the HybridVector.near_text
doesn’t filter contrary to the near_text
search. Is this the desired behavior or a bug? This issue seems to implement the same behavior as the near_text
search. Improve Hybrid Search · Issue #4325 · weaviate/weaviate · GitHub
For me, objects with a distance superior to this parameter should be filtered out of the results (independently of the fusion score). In my case, the ability to filter the results with the near_text
distance is useful, as we don’t have the vector distance and the BM25 score returned.
The results I got with the code in the Any additional Information section.
== Near text ==
no distance limit
QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('35ddc998-e530-44a2-8b6a-e65dc8cb9afb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=0.11371487379074097, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'title': 'article'}, references=None, vector={}, collection='Article')])
== Near text ==
distance limit 0.1
QueryReturn(objects=[])
== Hybrid search ==
distance limit 0.1
QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('35ddc998-e530-44a2-8b6a-e65dc8cb9afb'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'title': 'article'}, references=None, vector={}, collection='Article')])
I would like a way to obtain the same result from the near_text
search with the distance limit applied to the hybrid search. For example, with the scores returned after an hybrid search we would be able to apply an post filtering.
Server Setup Information
- Weaviate Server Version: 1.26.4
- Deployment Method: docker
- Multi Node? Number of Running Nodes: No
- Client Language and Version: python weaviate-client-4.8.1
- Multitenancy?: No
Any additional Information
# setup.py
import os
import weaviate
from dotenv import load_dotenv
from weaviate.classes.config import Configure, DataType, Property
load_dotenv()
client = weaviate.connect_to_local(
headers={"X-Azure-Api-Key": os.getenv("AZURE_API_KEY")}
)
client.collections.delete("Article")
client.collections.create(
"Article",
properties=[
Property(name="title", data_type=DataType.TEXT),
],
vectorizer_config=Configure.Vectorizer.text2vec_azure_openai(
base_url=os.environ.get("AZURE_BASE"),
resource_name=os.environ.get("AZURE_RESOURCE_NAME"),
deployment_id=os.environ.get("AZURE_DEPLOYMENT_ID"),
vectorize_collection_name=False,
),
)
article = client.collections.get("Article")
article.data.insert(
properties={
"title": "article",
},
)
import os
import weaviate
from dotenv import load_dotenv
from weaviate.classes.query import HybridFusion, HybridVector, MetadataQuery, Move
load_dotenv()
client = weaviate.connect_to_local(
headers={"X-Azure-Api-Key": os.getenv("AZURE_API_KEY")}
)
article = client.collections.get("Article")
response = article.query.near_text(
query="Article", return_metadata=MetadataQuery(distance=True)
)
print(f"== Near text ==\nno distance limit\n{response}")
response = article.query.near_text(
query="Article", distance=0.1, return_metadata=MetadataQuery(distance=True)
)
print(f"== Near text ==\ndistance limit 0.1\n{response}")
response = article.query.hybrid(
query="Article", vector=HybridVector.near_text(query="article", distance=0.1)
)
print(f"== Hybrid search ==\ndistance limit 0.1\n{response}")
client.close()