hi @cong.dao ! Welcome to our community data:image/s3,"s3://crabby-images/8d8d3/8d8d34200c70df47c303a0cffdc6d7d03a111582" alt=":hugs: :hugs:"
When performing a hybrid search, the relative score functions will kick in:
So the score is not the distance you are looking for.
You will need to do a nearText instead of a hybrid.
Here an example to illustrate this:
from weaviate.classes import config
# lets first create our collection and import data
client.collections.delete("MyCollection")
collection = client.collections.create(
"MyCollection",
vectorizer_config=config.Configure.Vectorizer.text2vec_openai(),
properties=[
config.Property(name="text", data_type=config.DataType.TEXT),
config.Property(name="source", data_type=config.DataType.TEXT)
]
)
collection.data.insert({"text": "something about cats", "source": "document1"})
collection.data.insert({"text": "something about tiger", "source": "document1"})
collection.data.insert({"text": "something about lion", "source": "document1"})
collection.data.insert({"text": "something about dogs", "source": "document2"})
collection.data.insert({"text": "something about wolf", "source": "document2"})
collection.data.insert({"text": "something about coyotes", "source": "document2"})
now we perform a nearText:
from weaviate import classes as wvc
result = collection.query.near_text(
limit=2,
query="pet animals",
return_metadata=wvc.query.MetadataQuery(distance=True, score=True)
)
for object in result.objects:
print(object.properties)
print(object.metadata.distance)
With this output:
{‘text’: ‘something about dogs’, ‘source’: ‘document2’}
0.17943477630615234
{‘text’: ‘something about cats’, ‘source’: ‘document1’}
0.1885947585105896
Now, a hybrid search:
from weaviate import classes as wvc
result = collection.query.hybrid(
alpha=1,
query="pet animals",
return_metadata=wvc.query.MetadataQuery(distance=True, score=True, explain_score=True)
)
for object in result.objects:
print(object.properties)
print(object.metadata.score)
print(object.metadata.explain_score)
and this is the output (note the data under explain score)
{‘text’: ‘something about dogs’, ‘source’: ‘document2’}
1.0
Hybrid (Result Set vector,hybridVector) Document e89cc799-b180-4ae1-a496-aa806a458915: original score 0.8205652, normalized score: 1
{‘text’: ‘something about cats’, ‘source’: ‘document1’}
0.8020085096359253
Hybrid (Result Set vector,hybridVector) Document 34fd8612-06ab-4b01-b0cb-aa9b81e1d6dc: original score 0.81140524, normalized score: 0.8020085
Note that the first result is normalized at 1.
Let me know if this helps.
Thanks!