Limit parameter change results of near_vector query

Description

I’m geting an unexpected result running a near_vector search on my Weaviate database.

The database contains 570 objects.

Running a near vector query, using limit 3, I receive 3 wrong objects, the lowest distance being 0.6878337.

Running the exact same query without the limit parameter, the result is 100 objects, and the lowest distance is 0.5604408.

It seem’s that the query does not search on the full database. Is this the behavior of the limit parameter? Even without it, we can expect the query to use all the nodes or the 100 internal limit can also be an issue?

Server Setup Information

  • Weaviate Server Version: 1.24.9 deployed, on my dev machine I can reproduce the issue on versions 1.23.16, 1.24.26, 1.25.24, 1.26.8 and 1.27.1
  • Deployment Method: docker
  • Client Language and Version: python client 4.7.1 and direct graphQL query using the REST API
  • Multitenancy?: no

:hugs: @accorrea1-stf,

Welcome to our community! It’s great to have you here.

I wonder about the HNSW configuration in your cluster. Have you explored the ef operator? A higher ef value results in a more extensive search, enhancing accuracy.

By increasing the ef value, you improve the accuracy of your searches. I would suggest trying out a few fixed values such as 100, 250, 500, and 1000. My hope is that by setting it to one of these higher values, you will achieve more consistent results, as the graph for the vector index will be explored more exhaustively.

Here’s some code to help you achieve this:

from weaviate.classes.config import Reconfigure

collection.config.update(

vector_index_config=Reconfigure().VectorIndex.hnsw(ef=512)

)

Best regards,
Mohamed Shahin
Weaviate Support

Thank you for your answer Mohamed!

I’m going to do some tests increasing the ef value.

Our current config is:

{'cleanupIntervalSeconds': 300,
  'distanceMetric': 'cosine',
  'dynamicEfMin': 100,
  'dynamicEfMax': 500,
  'dynamicEfFactor': 8,
  'ef': -1,
  'efConstruction': 128,
  'filterStrategy': 'sweeping',
  'flatSearchCutoff': 40000,
  'maxConnections': 64,
  'skip': False,
  'vectorCacheMaxObjects': 1000000000000}

@accorrea1-stf Awesome, let me know how it goes.