RAM Not Freed After Deleting ~600K Objects – Is Restarting Weaviate the Only Option?

Description

Hi everyone,

We’re running a Weaviate cluster on Kubernetes and recently deleted ~600K stale objects from our internal accounts to free up space.

Here’s a comparison before and after the deletion:

Volume

  • Before: 66.3 GiB used (22.5%)
  • After: 59.7 GiB used (20.2%)
    ~6.6 GiB of disk space freed

Inodes

  • Before: 13.24M used (67.4%)
  • After: 13.00M used (66.2%)
    So, inode usage also dropped as expected.

:floppy_disk: Memory (RAM)

  • Before: 47.37 GiB
  • After: 46.72 GiB
    Only a ~650 MiB reduction despite freeing over 6.6 GiB from disk.

From online discussions, it seems that while deleted objects are immediately removed from disk, memory might not be reclaimed immediately due to caching or background tasks.

I’ve come across two options:

  1. Wait – memory might be freed eventually.
  2. Restart the Weaviate pods to trigger memory cleanup.

My Questions:

  • Is it expected that Weaviate holds on to RAM even after significant deletions?
  • Is restarting the cluster the only reliable way to force memory to be freed?
  • Should we see a RAM drop roughly equal to deleted data (~6.6 GiB)? Or is memory management more nuanced?

Thanks in advance for your help!

Server Setup Information

  • Weaviate Server Version: 1.22.5
  • Deployment Method: k8s
  • Client Language and Version: Python, weaviate-client 3.25.3
  • Multitenancy: Yes

hi @akhilsharma !!

1.22.5 is quite an old version, and a lot have improved since, specially on the tombstone/delete management from 1.24+

Here you can have this information:

Can you check, using metrics, about any dangling tombstone?

You should look for vector_index_tombstones

Let me know if this helps!

THanks!

Deleting hundreds of thousands of stale vectors is brutal on HNSW memory overhead. We bypassed this entirely for our clinical RAG pipelines because batch deletions were crashing our clusters.

Instead of physically deleting old data, we built an intermediate temporal governance API. It intercepts the retrieval payload and applies deterministic decay scoring (exponential/power-law). The ‘stale’ data physically stays in the DB, but the middleware mathematically hard-gates it from ever reaching the LLM context window. Zero database memory spikes, zero manual deletion scripts.

I actually just posted the architecture for this in the Showcase channel under ‘Preventing RAG Context Rot’. Happy to share the Python trace if it helps you stop those OOM crashes.

Hi!

A good tip is that you can control the agressiveness of Tombstone clean up with TOMBSTONE* environment variables, such as TOMBSTONE_DELETION_CONCURRENCY, TOMBSTONE_DELETION_MAX_PER_CYCLE and TOMBSTONE_DELETION_MIN_PER_CYCLE

More info: Environment variables reference | Weaviate Documentation

@DudaNogueira Appreciate the tip! You are spot on—tuning the TOMBSTONE_DELETION_CONCURRENCY and MAX_PER_CYCLE definitely smooths out the memory spikes during heavy batch cleanups.

The architectural trade-off we hit in clinical pipelines, though, is that throttling the garbage collection increases the “mutation lag.” If a superseded FDA policy is sitting in the index waiting for a throttled tombstone cycle, it can still be retrieved and fed to the LLM during that window.

That was the primary driver for our middleware approach. By calculating the temporal decay post-retrieval, we completely decouple inference safety from the database’s garbage collection schedule. Even if the DB is lagging on cleanup, the math hard-gates the payload before the LLM sees it.

Great to know those environment variables are there for the backend maintenance, though!

It will not. Once the object is marked with tombstone, it gets into the deny list, and will not surface to search results and fed to LLM. :thinking:

@DudaNogueira Ah, appreciate the correction on the deny list implementation! That makes total sense—good to know the tombstone masks it from the search payload immediately.

​I used the wrong terminology there. The ‘mutation lag’ we are actually fighting with our middleware isn’t database GC—it’s application-level rot. In our clinical pipelines, developers rarely issue DELETE commands for superseded FDA policies because they need them in the index for historical audits.

The issue is that both the 2023 and 2026 policies sit in the index as active objects, and vector search retrieves both because the semantic similarity is identical. The temporal decay API kicks in post-retrieval to hard-gate that 2023 document before it hits the LLM context window, purely based on domain half-life. But I stand corrected on how Weaviate handles the memory cleanup side of things!

Oh got it!

Hopefully the upcoming boost feature will help handling that!

Helpful explanation. It sounds like tombstone cleanup settings are worth checking before assuming that restarting the Weaviate pods is the only option.