nearText algorithm not returning expected value compared to cosine similarity

Teng_Hoo · June 18, 2024, 5:58am

Hi there,

I am trying to understand how the nearText algorithm works. My expectation is that it uses cosine similarity (since it is the default metric) to perform the similarity between two different embeddings.

When I tried this, it returns different results. Just want to see how I can match the two values?

Current testing:
Weaviate embedding model - text2vec-openai (hence default = text-embedding-ada-002)

Testing with cosine similarity:

using text-embedding-ada-002 to embed the text (openAI API)
perform cosine similarity (with below function)
def _cosine_similarity(vec1: np.array, vec2: np.array):
return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

May I know what are some of the discrepancies there and if possible, how I can use nearText and output results that match with the cosine similarity function?

Cheers!

DudaNogueira · June 20, 2024, 6:37pm

Hi @Teng_Hoo !!

Here is how you can check the calculation using the cosine:

import weaviate
from weaviate import classes as wvc
from weaviate.util import generate_uuid5

client = weaviate.connect_to_local()

client.collections.delete("Collection")
collection = client.collections.create(
    "Collection",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai()
)


collection.data.insert({"text": "Something about cat"}, uuid=generate_uuid5("cat"))
collection.data.insert({"text": "That house is beautiful"}, uuid=generate_uuid5("house"))

# now comparing text1 vs text2
from weaviate.classes.query import Filter
results = collection.query.near_object(
    near_object=generate_uuid5("cat"),
    return_metadata=wvc.query.MetadataQuery(distance=True)
)
for object in results.objects:
    print(object.properties, object.metadata.distance)

# output
#{'text': 'Something about cat'} 0.0
# {'text': 'That house is beautiful'} 0.1678454875946045

# now using your function
import numpy as np

results = collection.query.fetch_objects(include_vector=True)
vec1 = results.objects[0].vector.get("default")
vec2 = results.objects[1].vector.get("default")

def _cosine_similarity(vec1: np.array, vec2: np.array):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

print(_cosine_similarity(vec1, vec2) - 1)
# output
# -0.16784559529191356

Let me know if this helps!

Thanks!

Topic		Replies	Views
.near_text results are not satisfactory (distance scores too close) Support neartext	2	1007	June 20, 2023
Cosine similarity differs between ScikitLearn and Weaviate for SentenceTransformer vectors Support bug , developer-experience , python , technical	0	167	December 25, 2024
Weaviate cosine similarity completelly different than ScikitLearn with SentenceTransformer vectorizer Support bug , developer-experience , python , technical	2	277	August 6, 2025
How weaviate calculates score in similarity_search_with_score? Support	4	633	July 2, 2024
Near_text with own embeddings Support	1	338	February 12, 2024

nearText algorithm not returning expected value compared to cosine similarity

Related topics