Weaviate Cluster OOMs and Recovery

Lewiky · June 14, 2023, 1:29pm

I’m the admin of a Weaviate 1.13 cluster deployed on Kubernetes that is used by a few different teams in my organisation.

Several of the replicas in the cluster have recently gone down due to OOM errors, and now it seems like they’re unable to recover and are continuously crashlooping. The logs for each of the instances has nothing of note, even when set to DEBUG level. Several classes are now inaccessible due to data loss.

What’s the best way to recover from this scenario? I assume we need to delete the missing classes and re-index them? Is there some configuration I can set in Weaviate that will stop it from indexing new content when it’s close to its memory limit?

Thanks!

jphwang · June 14, 2023, 1:46pm

Hi @Lewiky. I’ll pass that on to the team internally and someone will get back to you here.

antas-marcin · June 14, 2023, 1:50pm

Your Weaviate cluster is on v1.13.x version yes?

Lewiky · June 14, 2023, 2:04pm

That’s right. I’ve been trying to get authorisation internally to upgrade to a newer version.

If the solution is to just upgrade because this problem is fixed - that’s music to my ears!

antas-marcin · June 14, 2023, 7:14pm

WIth later versions of Weaviate you are able to set GOMEMLIMIT value which should be set to 10-20% of your total memory for Weaviate. This setting greatly helps with OOM-kills. Beside that we have made numerous improvements like:

backup API for backuping your DB
roaring bitmaps - which greatly improves the performance
PQ compression
BM25 and Hybrid filters
replication

and much much more. I would suggest upgrading but of course before doing so please make a backup

Topic		Replies	Views
Node Desync and Cluster Inconsistencies After OOM on Weaviate-0 Support bug	8	337	May 13, 2025
OOM issues when importing data Support	3	232	May 7, 2024
[How to recover from Weaviate cluster crash due to memory limit?] Support	2	587	February 28, 2024
High Memory Usage After Upgrading Weaviate to Version 1.25 General bug , integration , technical	5	270	October 1, 2024
Error: not enough memory mappings Support	12	412	November 19, 2024

Weaviate Cluster OOMs and Recovery

Related topics