Storage Size Not Reducing After Deleting Content Chunks in Weaviate — Expected Behavior or Issue?

Description

Hi team,
We’re facing an issue where deleting large volumes of content chunks from our Weaviate cluster does not reduce the reported storage size of the tenant/collection. Even after cleanup operations and waiting for compaction, the storage footprint remains the same.
Context:
We programmatically deleted content chunk objects from a multi-tenant setup.
Storage metrics in Weaviate (and underlying disk usage) show no meaningful reduction.
Our expectation was that deleting data would free up storage or at least reduce the tenant’s reported usage.
Questions:
Is this the expected behavior due to how Weaviate handles deletion, LSM-trees, or segment compaction?
Are there recommended steps to force or trigger compaction so that storage is reclaimed?
Is there a known issue with storage accounting or compaction in multi-tenant clusters?
Any best practices for managing or reducing storage usage when regularly pruning large chunks of data?
Any guidance or clarification on how Weaviate handles storage reclamation after deletes would be greatly appreciated.
Thanks!

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:

Any additional Information

hi @EVAi_Developer-1 !!

Welcome to our community :hugs:

Yes, this is expected behavior in Weaviate’s LSM-tree storage system. Deletions create tombstones that don’t immediately free space - storage is only reclaimed after compaction runs multiple times and specific conditions are met.

Multi-tenant setups have additional complexity with hot/cold tenant metrics.

Check this code comment here:

With that said, one thing you could do to easy this process is to tweaking the TOMBSTONE_* env vars so tombstone clean up doesn’t pile and observe it’s metrics. And even if all tombstones are cleared, the space will not be free immediately.

Going forward, other changes that you could do is tweaking PERSISTENCE_LSM_MAX_SEGMENT_SIZE to smaller sizes, so more compaction there are more compaction opportunities or turning on PERSISTENCE_LSM_SEPARATE_OBJECTS_COMPACTIONS for separate compaction for objects.

Let me know if this helps!

Thanks @DudaNogueira — this is super helpful!

To give more context on why this is important for us:

We’re building a platform where users can be created and deleted dynamically. When a user is deleted, we also delete all their data and vectors from Weaviate. However, even after these deletions, Dimensions stored andObject count doesn’t decrease, and Weaviate continues reporting usage as if those vectors still exist.

Since Weaviate’s pricing is based on the amount of vector data stored, this puts us in a tricky position: we’re being billed for vectors that have been deleted from our application but not actually reclaimed by Weaviate due to tombstoning and compaction behavior.

Given this, we’re trying to understand:

  • Is there any reliable way to ensure storage reclamation after user/tenant deletion so that billing reflects actual active data?

  • Are there recommended compaction or configuration strategies that work well for high-churn environments where tenants (and all their vector data) frequently come and go?

  • Would forcing more aggressive compactions, smaller segment sizes, or adjusting TOMBSTONE_* settings meaningfully help in a scenario like ours?

Ultimately, we want to avoid paying for vectors that aren’t really present anymore, so any guidance on best practices or Weaviate roadmap improvements around this would be hugely appreciated.

Thanks again for the help!