Similarity search returns chunks that all have exactly the same distance value

There is something not quite right with the matches I’m getting when using the weaviate similarity search. After making the following query, the resulting chunks I get all have the same distance value (up to the very last decimal) - it doesn’t matter which max_matches value I choose, the distance for every chunk is still the same. The chunks themselves are quite different though, and if I manually perform a regular dot product between the embedded query and the embedded chunks, I get different resulting dot product for the every chunk.

result = self.client.query \
            .get(.....chunks with some properties.....) \
            .with_where(where_clause) \
            .with_near_vector({"vector": embedded_query}) \
            .with_limit(max_matches) \
            .with_additional(['distance', 'id']) \
            .do()

In the schema, distance is defined as “dot”. Also, I’m using ada-002 openai embeddings to embed the chunks and the queries. (however, the chunks are not being vectorized by weaviate text2vec openai but separately by calling on openai embedding endpoint- could that maybe be an issue?)

In general, what exactly happens underneath with_near_vector?

Hi @A_S and welcome!

From what you describe, it sounds as though maybe something went awry with the vector generation when adding the data to Weaviate, and the underlying vectors might have been identical somehow.

Could you try retrieving the stored vectors back (like shown here), with .with_additional(['distance', 'id', 'vector']) and see if the vectors are different underneath?

If so I would suspect there was an issue with the vector generation script, or with the insertion script while looping through the dataset.

Let us know how you go, and if it’s not those issues we can tackle them further.

Cheers!
JP

Thank you for your quick response! You are right, the vectors underneath were identical for different text-chunks, thus the same distance score makes sense. Will tackle the vector script.

1 Like

Yay, I’m glad we got to the bottom of that. Cheers!