Description
I have a collection with 323,025 documents. As I load more documents I run an aggregate query that does a count of documents that correspond to a specific category. When there were 306,000 documents there were 38 categories and the query was taking between 20 and 30 seconds with the occasional timeout. The aggregate query is now not returning any results due to timeouts.
I have tried adding specific timeout values when I create the client (code below), but I still get the timeout at 30 seconds.
I saw an earlier post where reinstalling the V4 library solved this issue, so I upgraded my weaviate-client to the latest version. There was no effect.
Server Setup Information
Weaviate Cloud
- Weaviate Server Version: 1.25.20
- Deployment Method: n/a
- Multi Node? Number of Running Nodes:
- Client Language and Version: Python V4 (weaviate-client 4.12.1)
- Multitenancy?: no
Any additional Information
client connect code
client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.environ["wv_url"],
auth_credentials=Auth.api_key(os.environ["wv_api_key"]),
headers={
"X-Cohere-Api-Key": os.environ["cohere_api_key"]
},additional_config=AdditionalConfig(
timeout=Timeout(init=60, query=60, insert=120) # Values in seconds
)
)
Hi @michelca,
Welcome to our community – it’s great to have you here!
Could you please share the aggregation query you’re using, especially the filter you’ve applied? Also, what’s the expected output when filtering your dataset of 300k+ objects? For instance, are you expecting around 1K results, more, or less?
Additionally, have you observed any noticeable resource usage patterns — like spikes in CPU or memory — during the query?
I’ve seen a case similar to this but in large scale so I would say improvements to aggregation with filters might be needed, so your details will really help us investigate.
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)
Hi Mohamed - and thanks for the welcome!
Previously I was getting 38 results, indicating (306,000 / 38 =) about 8050 per category.
Now I know there are 360,000 documents, so I’m guessing that there will be (360,000 / 8050) about 40 results.
I believe the size of the response JSON would be about 6k - is this what you wanted to know?
This is a Weaviate-hosted managed cluster, so I’m not sure where I would find logs?
The query I’m running is as follows (with some proprietary strings changed)
{
Aggregate {
Dictionary_ChL_2 (groupBy: ["category"]) {
groupedBy {
path
value
}
meta {
count
}
}
}
}
Oh @michelca — if this is hosted by us, please open a ticket by emailing support@weaviate.io. That’s our dedicated support channel for customers, and it allows us to check server logs, monitoring, and provide deeper assistance on our end.
Once you open the ticket, I’ll take over from there, and we can continue the investigation. Later, I’ll summarize the findings here for future reference.
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)