Similarity search returns chunks that all have exactly the same distance value

A_S · November 24, 2023, 11:03am

There is something not quite right with the matches I’m getting when using the weaviate similarity search. After making the following query, the resulting chunks I get all have the same distance value (up to the very last decimal) - it doesn’t matter which max_matches value I choose, the distance for every chunk is still the same. The chunks themselves are quite different though, and if I manually perform a regular dot product between the embedded query and the embedded chunks, I get different resulting dot product for the every chunk.

result = self.client.query \
            .get(.....chunks with some properties.....) \
            .with_where(where_clause) \
            .with_near_vector({"vector": embedded_query}) \
            .with_limit(max_matches) \
            .with_additional(['distance', 'id']) \
            .do()

In the schema, distance is defined as “dot”. Also, I’m using ada-002 openai embeddings to embed the chunks and the queries. (however, the chunks are not being vectorized by weaviate text2vec openai but separately by calling on openai embedding endpoint- could that maybe be an issue?)

In general, what exactly happens underneath with_near_vector?

jphwang · November 24, 2023, 2:58pm

Hi @A_S and welcome!

From what you describe, it sounds as though maybe something went awry with the vector generation when adding the data to Weaviate, and the underlying vectors might have been identical somehow.

Could you try retrieving the stored vectors back (like shown here), with .with_additional(['distance', 'id', 'vector']) and see if the vectors are different underneath?

If so I would suspect there was an issue with the vector generation script, or with the insertion script while looping through the dataset.

Let us know how you go, and if it’s not those issues we can tackle them further.

Cheers!
JP

A_S · November 27, 2023, 7:31am

Thank you for your quick response! You are right, the vectors underneath were identical for different text-chunks, thus the same distance score makes sense. Will tackle the vector script.

jphwang · November 29, 2023, 10:41am

Yay, I’m glad we got to the bottom of that. Cheers!

Topic		Replies	Views
Similarity search issues Support	3	186	August 26, 2024
Simple vectors storage and similarity search not working Support developer-experience	3	685	July 7, 2023
.near_text results are not satisfactory (distance scores too close) Support neartext	2	858	June 20, 2023
[Non deterministic vector search return] Support	4	365	April 12, 2024
Hybrid Queries on new OpenAI Embedding Models failing server restart Support	15	625	January 8, 2025

Similarity search returns chunks that all have exactly the same distance value

Related topics