Is the cosine distance bound in the docs correct?

Guillermo_Ripa · September 5, 2024, 8:56pm

Description

In the docs, there’s a note that says vectors are normalized for cosine similarity, and then we use dot product.

Wouldn’t that mean that the cosine distance ends up being -1 <= distance <= 1? Or do we do 1 - dot(a,b) in the code?

I’m asking because we found negative distance using cosine when returning them in metadata.

DudaNogueira · September 6, 2024, 8:31pm

hi @Guillermo_Ripa !!

That’s an interesting question.

I will need to ask internally for more context on this.

I’ll get back with more info. Thanks!

andrewisplinghoff · September 10, 2024, 9:57am

The cosine distance should never be negative, it is defined as 1 - dot(a,b) in the code:

github.com

weaviate/weaviate/blob/598b4db26ba52c77f7777996ac100f48cd895431/adapters/repos/db/vector/hnsw/distancer/cosine_dist.go#L44


      
          func NewCosineDistanceProvider() CosineDistanceProvider {
          	return CosineDistanceProvider{}
          }
          
          func (d CosineDistanceProvider) SingleDist(a, b []float32) (float32, error) {
          	if len(a) != len(b) {
          		return 0, errors.Wrapf(ErrVectorLength, "%d vs %d",
          			len(a), len(b))
          	}
          
          	prod := 1 - dotProductImplementation(a, b)
          
          	return prod, nil
          }
          
          func (d CosineDistanceProvider) Type() string {
          	return "cosine-dot"
          }
          
          func (d CosineDistanceProvider) New(a []float32) Distancer {
          	return &CosineDistance{a: a}

Here in the tests you can also see the expected distance measures, e.g. opposing vectors lead to cosine distance 2:

github.com

weaviate/weaviate/blob/598b4db26ba52c77f7777996ac100f48cd895431/adapters/repos/db/vector/hnsw/distancer/cosine_dist_test.go#L66-L68


      
          		vec1 := Normalize([]float32{0.1, 0.3, 0.7})
          		vec2 := Normalize([]float32{-0.1, -0.3, -0.7})
          		expectedDistance := float32(2)

Could you give us an example where there was a negative cosine distance calculated?

Guillermo_Ripa · September 10, 2024, 12:40pm

Thanks @DudaNogueira and @andrewisplinghoff for the code snippets! That’s really helpful.

The distance was a -1e-5, so I brushed it off as a floating point error. But it got me looking into docs.

The code snippet and the unit test puts my doubt to rest. thank you

DudaNogueira · September 10, 2024, 1:15pm

Great!! Thanks for jumping in, @andrewisplinghoff !

That was the code that our team pointed me to

Thanks!!

Topic		Replies	Views
Distance vs certainty scores Support	3	1656	June 21, 2023
Distance metrics in vector search Resources	4	1437	October 1, 2024
Similarity search returns chunks that all have exactly the same distance value Support bug	3	838	November 29, 2023
Default distance while creating collection Support	1	114	October 17, 2024
How weaviate calculates score in similarity_search_with_score? Support	4	459	July 2, 2024

Is the cosine distance bound in the docs correct?

Description

Related topics