Vector Distance not consistent with external cosine similarity measure (Sklearn)

Richard_Y · December 18, 2023, 12:41pm

In a nutshell this is what i’ve done:

Used this query to get articles that are close to an input prompt, with distance = 0.3:

image914×527 40.3 KB
Saved the returned articles into a seperate class.
To validate, I took 100 random samples from the new class (included) and 100 articles that were not added to the new class (excluded)
I used sklearns cosine similarity implementation as so:

from sklearn.metrics.pairwise import cosine_similarity
# Calculate cosine similarity between each pair of vectors from the two lists
similarity_orig_to_prompt = cosine_similarity(list(dict_excluded_vectors.values()), [text_0_embed])
similarity_trimmed_to_prompt = cosine_similarity(list(dict_included_vectors.values()), [text_0_embed])

here were the average similarity measures:

Both groups pretty much had the same similarity score…
(0.7117798016779362, 0.7196412496249364)

I am wondering why is this the case? It would be good to understand more about what is happening under the hood of with_near_vector. Is the distance metric used here comparable with SKLearn?

Thank you so much, this is becoming a blocker for me and may lead my to consider other options.

sebawita · February 15, 2024, 11:32am

Hi @Richard_Y,

Sorry for the late response.
We’ve missed your post.

The two distances seem pretty close: 0.7117 vs 0.71964.

Did you expect the results to be exactly the same?

Perhaps there are some differences in how SKLearn and Weaviate calculate cosine distance.

Do you happen to have the two vectors that you were comparing?

Topic		Replies	Views
Cosine similarity differs between ScikitLearn and Weaviate for SentenceTransformer vectors Support bug , developer-experience , python , technical	0	98	December 25, 2024
Weaviate cosine similarity completelly different than ScikitLearn with SentenceTransformer vectorizer Support bug , developer-experience , python , technical	1	148	January 14, 2025
nearText algorithm not returning expected value compared to cosine similarity General	1	148	June 20, 2024
How weaviate calculates score in similarity_search_with_score? Support	4	472	July 2, 2024
Cosine similarity between unrelated keywords return a high score Support python	7	327	June 13, 2024

Vector Distance not consistent with external cosine similarity measure (Sklearn)

Related topics