.near_text vector search score is very low

I am creating a schema as below

I’ve added the data to the collection, but when I query it for the nearest results to the test term “Art Briefs” (which is an exact match in the collection), I’m getting unexpectedly low scores. Here’s an example:
For search of

org_collection = client.collections.get(“Org_Test”)
response = org_collection.query.near_text(
query=“Art Briefs”,
limit=5,
return_metadata=MetadataQuery.full()
)
for o in response.objects:
print(o.properties[‘skill_name’])
cosine_similarity = 2 * o.metadata.certainty - 1
print(f"Cosine similarity: {cosine_similarity}“)
print(f"Certainty: {o.metadata.certainty}”)

I get
{Art Briefs
Cosine similarity: 0.8635821342468262
Certainty: 0.9317910671234131}

I’ve set vectorize_property_name=True for the skill name field only. This low score is causing issues with my API, and I need the score to be much higher. I’ve searched extensively but I’m stuck. Please provide a resolution as soon as possible.

hi @Rishi_Prakash !!

Welcome to our community :hugs:

the near_text (similarity/vector search) will not be a literal search. So even if you have a exact match, it may not be placed near to your query.

For that, the bm25/keyword and hybrid search will be more effective.

How far on the ranking the exact match object from the query is?

Hi Duda,

Thanks for the response,
Could you update the below code to use bm25/keyword and hybrid search.

org_collection = client.collections.get(“Org_Test”)
response = org_collection.query.near_text(
query=“Art Briefs”,
limit=5,
return_metadata=MetadataQuery.full()
)
for o in response.objects:
print(o.properties[‘skill_name’])
cosine_similarity = 2 * o.metadata.certainty - 1
print(f"Cosine similarity: {cosine_similarity}“)
print(f"Certainty: {o.metadata.certainty}”)

Because I don’t see the V4 code in documentation.

Thanks!

Hi!

it is here:

Note that you get distance with a vector search, but for hybrid and bm25/keyword, you get score.

here is the docs for bm25:

Your code will be need to change to:

response = org_collection.query.hybrid(
    query="Art Briefs",
    limit=5,
    return_metadata=MetadataQuery.full()
)