Aggregate and query results mismatch

Description

I’m trying to perform a pagination using weaviate as near_text search. So I need the total count of results using aggregate, and then I move inside results with limit and offset. However the total count of results given by the aggregate function it’s different from the results taken from the query.

count = collection.aggregate.near_text(query=query, filters=filters, target_vector=“search”, distance=0.69 )
AggregateReturn(properties={}, total_count=6)
res = collection.query.near_text(query=query, filters=filters, target_vector=“search”, distance=0.69 )
len(res.objects)
2

I don’t know if I’m missing something

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:

Any additional Information

Hello @Antonio_Cesari,

Welcome to our community! It’s great to have you here.

The difference in results between the aggregate and query is a result of how Weaviate handles aggregation vs. query searches. The aggregate method retrieves all potential matches within the distance , but the query method also takes into account ranking and limits . By default, the query returns only the top results ranked by relevance, which is why the query returns fewer results compared to the total count from the aggregation.

I’ve seen your ticket in our support system. Let’s continue working together there since you’re using a Cloud cluster.

Best regards!

Hello Mohamed,

well I was using limit in my application code, however doing some more tests, I’ve found something weird:

res = collection.query.near_text(query=query, offset=0, limit=10, filters=filters, target_vector=‘search’, distance=0.69 )
print(len(res.objects))
2
res = collection.query.near_text(query=query, limit=10, filters=filters, target_vector=‘search’, distance=0.69 )
print(len(res.objects))
2
res = collection.query.near_text(query=query, limit=100, filters=filters, target_vector=‘search’, distance=0.69 )
print(len(res.objects))
6

res = collection.query.near_text(query=query, offset=0, limit=100, filters=filters, target_vector=‘search’, distance=0.69 )
print(len(res.objects))
6

If I limit my results to 10 I get only 2 results, if I go to 100 I get all 6. I have a pagination with 10 results per page so I was getting 2 in the app, but I expect to get all 6.

Let me know if you need more information.