Vectors within a specified distance range from query

Hi everyone. Is there a way to retrieve vectors within a specified cosine distance range from the query vector without having to retrieve all the vectors that are closer? For example, retrieving vectors which have a cosine distance in the range [0.5,0.6] from the query vector.

hi @skb !!

Welcome to our community :hugs:

Sorry, missed your message :grimacing:

If I understood it correctly, this is what you are looking for:

So you can set a threshold of distance.

Also, you will want to take a look at our auto cut / auto limit feature:

Let me know if this helps!

Thanks!

Thanks @DudaNogueira !

Searching for vectors near a query vector, gets result vectors with distances ranging from 0 up to the threshold distance. I wanted to know if there was a way to limit the results to only the vectors with a distance between dist1 and dist2.

I know I can do this using two searches with threshold distances of dist1 and dist2. But if the distances are large, then I end up having to consider a couple of large result sets and getting the difference. It would be great if there was a way to filter out results closer than dist1 while using a threshold distance of dist2 in a single query?

Hi!

So you need a minimum threshold on top of the the one parameter we have?

I am curious what is the use case here :thinking:

You can always process this post search. The only downside of this approach is that the objects that will be cut of your post process will be unnecessary sent in the result from the database to the client.

This could be a nice feature request, but it would be interesting to request it with some compelling use cases.

Let me know if this helps!

Sorry for the late reply. This is the use case - Not likely to be a very common one: I’m looking to build a classifier model for a category that the query vector represents. I’m looking to select items for the training data spread over the entire range of distances from the query vector. For the higher distances, this represents a lot of items. So, it would be helpful to have a way to randomly choose a particular distance range and identify the items only falling in that range.

I have found an alternate way to do this. I use the aggregate queries to identify the no. of items for threshold distances of dist1 and dist2. Then randomly choose a no. in between and use that as the offset to identify an item inside the distance range.

1 Like

Oh, that’s a smart solution!!

Thanks for sharing.

1 Like