Memory Pressure in Single-Instance Weaviate Under Continuous Write/Deletion Load

Description

We deployed a single-instance Weaviate in Kubernetes with memory requests and limits both set to 3 GiB. After Weaviate starts up, its memory usage quickly rises to around 2.64 GiB. This isn’t an issue under light load, but when we send continuous requests (e.g., adding or deleting objects) about 300K objects, the Weaviate Python client starts throwing “out of memory” errors.
Below is the relevant portion of our Kubernetes configuration:

    Limits:
      cpu:     3
      memory:  3000Mi
    Requests:
      cpu:      500m
      memory:   3000Mi
    Environment:
      CLUSTER_GOSSIP_BIND_PORT:                 7000
      CLUSTER_DATA_BIND_PORT:                   7001
      GOGC:                                     100
      PROMETHEUS_MONITORING_ENABLED:            false
      GOMEMLIMIT:                               2500MiB
      QUERY_MAXIMUM_RESULTS:                    100000
      TRACK_VECTOR_DIMENSIONS:                  false
      REINDEX_VECTOR_DIMENSIONS_AT_STARTUP:     false
      STANDALONE_MODE:                          true
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED:  true
      CLUSTER_HOSTNAME:                         8b4f4cd0f4c5
      PERSISTENCE_DATA_PATH:                    /var/lib/weaviate
      BACKUP_FILESYSTEM_PATH:                   /var/lib/backup
      ENABLE_MODULES:                           backup-filesystem
      DISABLE_GRAPHQL:                          true
    Mounts:
      /var/lib/weaviate from weaviate-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sjqt6 (ro)
      /weaviate-config from weaviate-config (rw)

Error from the Python client:

Query call with protocol GRPC delete failed with message <AioRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "batch delete: cannot process batch delete object: not enough memory"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_status:2, grpc_message:"batch delete: cannot process batch delete object: not enough memory"}"
>.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/weaviate/connect/v4.py", line 1219, in grpc_batch_delete
    return await self.grpc_stub.BatchDelete(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/grpc/aio/_call.py", line 327, in __await__
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "batch delete: cannot process batch delete object: not enough memory"

Corresponding error from the Weaviate server logs:

{"build_git_commit":"6f11fca","build_go_version":"go1.24.5","build_image_tag":"v1.30.13","build_wv_version":"1.30.13","error":"not enough memory","level":"error","msg":"memory pressure: cannot process batch delete object","time":"2025-11-02T00:21:44Z"}
{"build_git_commit":"6f11fca","build_go_version":"go1.24.5","build_image_tag":"v1.30.13","build_wv_version":"1.30.13","error":"not enough memory","level":"error","msg":"memory pressure: cannot process batch delete object","time":"2025-11-02T00:21:44Z"}

We’re trying to understand why Weaviate is hitting memory limits despite having 3 GiB allocated, especially since the GOMEMLIMIT is set to 2500 MiB . Any insights would be greatly appreciated!

Server Setup Information

  • Weaviate Server Version: 1.30.13
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: python 3.11
  • Multitenancy?: No

Any additional Information

  1. When I restarted the weaviate pod, it raises “not enough memory” immediately
{"action":"lsm_compaction","build_git_commit":"6f11fca","build_go_version":"go1.24.5","build_image_tag":"v1.30.13","build_wv_version":"1.30.13","class":"PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236","error":"not enough memory","event":"compaction_skipped_oom","index":"ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236","level":"warning","msg":"skipping compaction due to memory pressure","path":"/var/lib/weaviate/ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236/x22wtE6RzHmd/lsm/objects","shard":"x22wtE6RzHmd","time":"2025-11-02T03:18:27Z"}
  1. When I delete the class “PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236” which has 300K objects,and restart the pod, the memory is only 519M.

I’ve read the article, and based on my use case, storing all the vectors in memory would require approximately 1.89 GB—calculated as:

(300000 * ((1536 * 4) + (64 * 10)))/1024/1024/1024≈1.89 GB

Given that I also have several other classes in memory, it now makes sense why the total memory usage could easily exceed 3 GB. Does my understanding sound correct?

My main question now is:

according to the article, during querying, it should be possible to keep only a subset of vectors in memory while storing the rest on disk. I understand that I can set a large vectorCacheMaxObjects during import for optimal performance and then reduce it after the import is complete.

My question is: how exactly do I configure this setting after import? Is it done via an environment variable, or do I need to update the collection’s schema (e.g., through a REST API call or change code)? What’s the correct way to apply this change post-import?

hi @Charlie_Chen !!

You are on the right track.

Do you have ASYNC_INDEXING turned on? This can help alleviate the pressure as Weaviate will take it’s time into indexing.

Note that, as a rule of thumb, you should be looking at double the memory calculated.

So 4G should be good for your calculated usage.

Also, Instead of looking at the memory usage - because Weaviate will eat almost all the available memory - you should look at the heap memory

Regarding, vectorCacheMaxObjects you can change it using REST or our client:

from weaviate.classes.config import Reconfigure

collection = client.collections.use("MyCollection")
collection.config.update(
    vector_config=Reconfigure.Vectors.update(
        name="default",
        vector_index_config=Reconfigure.VectorIndex.hnsw(
            vector_cache_max_objects=100000  # Set your desired value here
        ),
    )
)
  • During import, set vectorCacheMaxObjects high enough to hold all vectors in memory for best performance.

  • After import, you may experiment with lower values if your workload is mostly querying and memory is limited.

  • If the cache fills, Weaviate drops the whole cache and reloads vectors from disk as needed, which is much slower than memory access Vector cache considerations.

Let me know if this helps!

Thanks for your reply.
I tested in my local environment given about 2G memory, with GOMEMLIMIT = 1G , ASYNC_INDEXING = True and imported about 100000+ objects.

When I tried running the code directly, I ran into an issue — it said that there was no vector_config key in the schema. So I modified the code as shown below, and it worked.

    collection = async_client.collections.use(class_name)
    await collection.config.update(        
        vector_index_config=Reconfigure.VectorIndex.hnsw(
                vector_cache_max_objects=1000  # Set your desired value here            
        )
    )

In the Weaviate logs, I saw the following message:

{"action":"update_shard_status","build_git_commit":"15ca21c","build_go_version":"go1.24.9","build_image_tag":"v1.32.16","build_wv_version":"1.32.16","class":"PtAi_rag_3a010d31_750d_5dad_5f3c_8665f3ec8ac0_knowledge_690c4b6c6bc1878a02011d6a","level":"warning","msg":"shard status changed","prev":"READY","reason":"UpdateVectorIndexConfig","shard":"Y92rVARSnBv5","status":"READONLY","time":"2025-11-10T00:22:06Z"}

However, I checked the heap memory usage — it was 1.6 GB before execution and still 1.6 GB afterward. There was no noticeable change.
Also, in the Weaviate logs, I still saw a lot of “not enough memory” errors like the following:

{"action":"hnsw_tombstone_cleanup","build_git_commit":"15ca21c","build_go_version":"go1.24.9","build_image_tag":"v1.32.16","build_wv_version":"1.32.16","class":"PtAi_rag_3a010d31_750d_5dad_5f3c_8665f3ec8ac0_knowledge_690c4b6c6bc1878a02011d6a","error":"not enough memory","event":"cleanup_skipped_oom","level":"warning","msg":"skipping hnsw cleanup due to memory pressure","time":"2025-11-10T00:25:34Z"}
{"action":"lsm_compaction","build_git_commit":"15ca21c","build_go_version":"go1.24.9","build_image_tag":"v1.32.16","build_wv_version":"1.32.16","class":"PtAi_rag_3a010d31_750d_5dad_5f3c_8665f3ec8ac0_knowledge_690c4b6c6bc1878a02011d6a","error":"not enough memory","event":"compaction_skipped_oom","index":"ptai_rag_3a010d31_750d_5dad_5f3c_8665f3ec8ac0_knowledge_690c4b6c6bc1878a02011d6a","level":"warning","msg":"skipping compaction due to memory pressure","path":"/var/lib/weaviate/ptai_rag_3a010d31_750d_5dad_5f3c_8665f3ec8ac0_knowledge_690c4b6c6bc1878a02011d6a/Y92rVARSnBv5/lsm/property_knowledge_id_searchable","shard":"Y92rVARSnBv5","time":"2025-11-10T00:26:57Z"}

After restarting, the change finally took effect. The memory usage has dropped to 1.22 GB.
Is this the expected behavior? I’m concerned that in a production environment, restarting may not be an option.

The suggested value for GOMEMLIMIT is 80-90%, so 1G will not do it in your scenario. :thinking:

HI @Charlie_Chen

Thanks for the detailed information. I check all info. carefully This is a classic memory pressure scenario in Weaviate, and there are several factors at play here.

Understanding the Problem

1. Memory Architecture Mismatch

You have:

  • Container Memory Limit: 3000 MiB (3 GiB)

  • GOMEMLIMIT: 2500 MiB

  • Actual Usage: ~2.64 GiB at idle

The issue is that GOMEMLIMIT only controls Go’s heap memory, not the total process memory. Weaviate also uses:

  • Off-heap memory for vector indexes (HNSW graphs)

  • Memory-mapped files for LSM storage

  • OS page cache

  • gRPC buffers and other system overhead

With 300K objects, your data structures (especially LSM compactions and vector indexes) need additional headroom that simply isn’t available.

2. LSM Compaction Pressure

Your logs show compaction being skipped due to OOM:

msg":"skipping compaction due to memory pressure"

This is critical coz:

  • LSM stores accumulate uncompacted segments

  • This leads to increased memory usage over time

  • Delete operations are particularly expensive as they create tombstones that need compaction

  • Without compaction, the database becomes progressively slower and more memory-hungry

3. The Restart Clue

When you deleted the problematic class and restarted, memory dropped to 519 MiB. This confirms the issue is data-related, not a memory leak.

Solutions

Increase Memory Allocation

Recommend config.:

resources:
limits:
cpu: 3
memory: 6Gi # Doubled from 3Gi
requests:
cpu: 500m
memory: 6Gi

env:
- name: GOMEMLIMIT
value: "5200MiB" # ~85% of 6Gi
- name: GOGC
value: "50" # More aggressive GC (consider lowering from 100)

Why this helps:

  • Provides adequate headroom for LSM compactions

  • Allows vector index operations to complete

  • Enables proper garbage collection cycles

Alternative: Optimize for Lower Memory

If increasing memory isn’t an option, try these optimizations:

1. Adjust Vector Index Settings

class_config = {
"vectorIndexConfig": {
"efConstruction": 64, # Lower from default 128
"maxConnections": 16, # Lower from default 64
"ef": -1, # Use dynamic ef
}
}

2. Enable Async Indexing

env:
- name: ASYNC_INDEXING
value: "true"

3. Reduce Batch Sizes

# In your Python client
with client.batch.dynamic() as batch:
batch.batch_size = 50 # Reduce from default
# ... your operations

4. Implement Rate Limiting

Add delays between batch operations to give compaction time to run:

import time
for batch in batches:
process_batch(batch)
time.sleep(0.5) # Allow background tasks to catch up

Let me know if you need help implementing any of these solutions or if you have questions about the memory breakdown!

-Chaitanya

1 Like

I’ve adjusted the memory limit to 1200M and set GOMEMLIMIT=1G. After restarting Weaviate, I saw a bunch of “out of memory” errors. I then used the RESTful API to change the vector_cache_max_objects setting to 100. However, even after waiting for over 4 hours without restarting, the “out of memory” errors persisted. The errors only disappeared after I restarted the service again.

Wait, you reduced the memory from 3Gi to 1.2Gi? That’s going in the wrong direction! :sweat_smile:

With 300K objects, 1.2Gi is way too small. The recommendation was to increase to 6Gi, not decrease. Here’s why 1.2Gi won’t work:

Your memory breakdown:

  • Base Weaviate: ~500 MiB

  • 300K objects + indexes: ~1.5-2 GiB minimum

  • Compaction headroom: ~500 MiB

  • Total needed: ~3-4 GiB minimum

With only 1.2Gi total and GOMEMLIMIT=1G, you’re basically guaranteeing OOM errors.

Why restarting temporarily “fixed” it:

  • Fresh start = empty caches

  • No compaction backlog yet

  • But as data loads into memory, you hit the limit again

About vector_cache_max_objects=100: This limits how many vectors are cached in memory, but your 300K objects still need to be indexed and stored. Lowering the cache just makes queries slower, it doesn’t solve the underlying memory shortage.

What you should do:

  1. Increase memory to at least 6Gi:

resources:
limits:
memory: 6Gi
requests:
memory: 6Gi
env:
- name: GOMEMLIMIT
value: "5200MiB"

  1. If 6Gi is absolutely not possible, try 4Gi as a minimum:

resources:
limits:
memory: 4Gi
requests:
memory: 4Gi
env:
- name: GOMEMLIMIT
value: "3400MiB"
- name: GOGC
value: "50"

  1. Only if you must stay at low memory, you’d need to drastically reduce your data:

    • Lower vector dimensions

    • Reduce efConstruction and maxConnections way down

    • Delete objects to get below 100K

Out of curiosity, is there a hard limit on how much memory you can allocate? Because honestly, for 300K objects, anything below 4 Gi is going to struggle.

-Chaitanya

Thank you for your reply. I can increase the memory to 6GB for now, but we’ll need to vectorize significantly more data in the future—there are many tenants, and the total dataset could reach several million records, each with a 1536-dimensional vector. Continuously scaling memory isn’t sustainable long-term.

We’re fine with a few minutes of latency when inserting data into Weaviate—it’s not noticeable to the end users. However, we need to avoid significantly impacting query performance for vector similarity queries (occasional slow responses are acceptable, but not sustained degradation).

You mentioned tuning efConstruction and ef in the Vector Index Settings—how much of an impact do these have on query performance? Also, must these be configured before creating the collection? If data is already imported, can we still modify them?

Given my use case, are there other better strategies or optimizations you’d recommend?

No, I didn’t reduce memory in production—I ran a local experiment to test whether adjusting vector_cache_max_objects via the REST API could lower memory usage at runtime. The test dataset contained fewer than 100K objects.

Ah got it! Thanks for clarifying about the local test - that makes more sense now.

For your production scenario with millions of records across tenants, you’re absolutely right that infinite memory scaling isn’t the answer. Let me address your questions:

Vector Index Settings Impact

efConstruction (build time):

  • Higher = better recall, slower indexing, more memory during build

  • Lower = faster indexing, less memory, slightly lower recall

  • Default: 128, you could go down to 64 or even 32 for large datasets

  • Can only be set at collection creation - cannot change later

ef (query time):

  • Higher = better recall, slower queries

  • Lower = faster queries, lower recall

  • Default: -1 (dynamic)

  • Can be changed per-query without restarting!

Example:

# At query time, you can override ef
collection.query.near_vector(
near_vector=vector,
limit=10,
return_metadata=MetadataQuery(certainty=True),
# Lower ef for faster queries with slightly less recall
vector_index_config={"ef": 64}
)

Performance impact:

  • efConstruction: 64 vs 128 → ~10-15% less memory, negligible recall difference for most use cases

  • ef: 64 vs 128 → ~2x faster queries, ~2-5% lower recall

Better Strategies for Multi-Tenant Scale

1. Multi-tenancy with Weaviate’s built-in feature

Instead of one massive collection, use Weaviate’s multi-tenancy:

client.collections.create(
"YourCollection",
multi_tenancy_config=Configure.multi_tenancy(enabled=True)
)

# Each tenant gets isolated data
collection.tenants.create(["tenant_1", "tenant_2", ...])

Benefits:

  • Tenants can be activated/deactivated (hot/cold storage)

  • Memory only used for active tenants

  • Better isolation and performance

2. Optimize vector dimensions

1536 dimensions is quite high. Consider:

  • Using a dimension reduction model (384 or 768 dimensions)

  • Matryoshka embeddings (can truncate dimensions)

  • This alone could save 50-75% memory!

3. Sharding strategy

For millions of objects:

sharding_config=Configure.sharding(
desired_count=3, # Distribute across shards
actual_count=3,
virtual_per_physical=128
)

4. Memory-optimized HNSW config

For your scale, try:

vector_index_config=Configure.VectorIndex.hnsw(
distance_metric=VectorDistances.COSINE,
ef_construction=64, # Reduced from 128
max_connections=16, # Reduced from 32-64
ef=-1, # Dynamic
vector_cache_max_objects=100000 # Limit cache
)

5. Async indexing for writes

env:
- name: ASYNC_INDEXING
value: "true"

This spreads out indexing load over time, perfect since you mentioned write latency is acceptable.

6. PQ (Product Quantization) compression

Coming in newer Weaviate versions - compresses vectors significantly with minimal recall loss.

Recommende Architecture for Your Scale

client.collections.create(
"MultiTenantCollection",
multi_tenancy_config=Configure.multi_tenancy(enabled=True),
vectorizer_config=Configure.NamedVectors.none(
name="default",
vector_index_config=Configure.VectorIndex.hnsw(
distance_metric=VectorDistances.COSINE,
ef_construction=64,
max_connections=16,
ef=-1,
vector_cache_max_objects=50000 # Per tenant
)
),
sharding_config=Configure.sharding(
desired_count=3
)
)

Memory Estimation

With optimizations:

  • 1M objects × 768 dims (reduced) × 4 bytes = ~3 GB for vectors

  • HNSW overhead: ~1.5-2x = ~6-8 GB total

  • With multi-tenancy (hot tenants only): Much lower active memory

You can plan be like

Short term (now):

  1. Increase to 6Gi to stabilize current workload

  2. Enable async indexing

  3. Test with reduced ef_construction on new collections

Medium term (before scaling):

  1. Implement multi-tenancy

  2. Evaluate dimension reduction (huge memory savings!)

  3. Set up tenant activation/deactivation based on usage

Long term:

  1. Consider horizontal scaling (multiple Weaviate nodes)

  2. Look into PQ compression when available

  3. Monitor and adjust cache sizes per tenant

The multi-tenancy approach is probably your biggest win here - you can keep inactive tenants “cold” and only load them into memory when needed.

Want to discuss any of these in more detail?