[How to recover from Weaviate cluster crash due to memory limit?]

fairymane · February 28, 2024, 3:38am

Description

Hi Weaviate community, I have a Weaviate cluster running under AWS EKS, it crashed earlier when importing data because it hit the allocated memory limit.
Here are some error logs:

“No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true”
…
“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”
It seems to be a deadlock state: the EKS keeps restarting the service and Weaviate keep crashing – now that I cannot connect to the weaviate from client side and delete some data to meet the memory requirement. Nor deploy new resources (managed through Helm Chart) to the weaviate pod works as it keep crashing that the new resource config failed to be in sync by the ArgoCD.
Is there a way to solve this issue? Any suggestion would be very appreciated, thank you!

ps: I added LIMIT_RESOURCES: true as the env variable in helm chart after this crash – but again deploy new config changes to the weaviate is not synced by weaviate cluster through ArgoCD because it keeps crashing.

Server Setup Information

Weaviate Server Version: 1.23.7
Deployment Method: k8s
Multi Node? Number of Running Nodes: 1
Client Language and Version: 3.21.0

Any additional Information

DudaNogueira · February 28, 2024, 12:51pm

Hi!

Depending on the amount of data you have, it may take some time to startup Weaviate, and liveness and readiness probes from K8s may timeout, forcing a restart of the nodes.

Can you try increasing those values?

Let me know if this helps, otherwise I can ask for help from our SRE team

fairymane · February 28, 2024, 6:17pm

Didn’t find a solution to break the ‘deadlock’ state.
I ended with deleting the pod completely and recreate one to unbolck.

Topic		Replies	Views
Weaviate Cluster OOMs and Recovery Support	4	658	June 14, 2023
Support needed for fixing Weaviate performance issues Support python , technical	4	277	October 17, 2024
Crashing weaviate 24.6 Support technical	1	361	March 8, 2025
Crashing 1.24.10 Support wcs , python	5	484	May 7, 2024
Production weaviate 24.6 crashed Support technical	1	491	March 8, 2025

[How to recover from Weaviate cluster crash due to memory limit?]

Description

Server Setup Information

Any additional Information

Related topics