What are the best practices and hard limitations for deploying weaviate that can handle a collection of 500k+ records?
My Current Issues
Out-of-memory errors with default settings:
When creating a collection with default configuration and ingesting over 300k records (each with a 1536-dimensional vector), I consistently ran into “not enough memory” errors, rendering the collection unusable. Restarting Weaviate occasionally reduced memory usage temporarily.
Indexing stalled for hours with async indexing enabled:
I then adjusted vector_cache_max_objects to 20,000 and set the environment variables ASYNC_INDEXING=true and GOMEMLIMIT=2500MiB. After re-importing the 300k+ dataset, the shard remained stuck in INDEXING status—after 8 hours, only a few tens of thousands of vectors had been processed. All queries timed out with the error: "knn search: search layer at level 0: context canceled".
Upgrade to v1.33 didn’t resolve the bottleneck:
To address this, I upgraded Weaviate from v1.30 to v1.33 and increased vector_cache_max_objects to hold the entire dataset in cache. However, indexing progress remained extremely slow (high disk read I/O at ~130 MiB/s).
Increasing memory to 4 GB (GOMEMLIMIT=3200MiB) had no effect.
Further memory increase to 6 GB (GOMEMLIMIT=5000MiB) improved indexing:
After this change and a restart, disk I/O dropped significantly, and indexing completed quickly—shard status changed to READY, and queries stopped timing out. However, CPU usage immediately spiked to 100%.
CPU stayed high even after restarts:
Restarting Weaviate briefly lowered CPU, but it quickly climbed back to 100%.
CPU finally normalized after ~8 hours—but no logs explained why:
Roughly 8 hours later, CPU usage dropped to below 10%. Even with debug logging enabled, the logs gave no clear indication of what the system was doing during that period.
Reduced vector_cache_max_objects back to 20,000 to avoid excessive memory consumption going forward.
I’d appreciate any guidance on:
Why CPU remained high for hours after indexing completed
What’s the lowest memory/CPU/resource footprint that can support a 500k+ (he dims of vectors is 1536) collection without above issues?
What trade-offs should I expect at the minimum configuration (e.g., slow indexing, high latency, instability)?
Is a single-node setup ever appropriate for 500k+ vectors, or is a multi-node (e.g., distributed Weaviate with separate DB, vector, and backup roles) deployment strongly advised? What about Kubernetes-specific guidance (e.g., pod resource limits, PVC sizing, horizontal scaling)?
For 500k+ records with 1536-dimensional vectors, you need at least 6-8GB memory, proper vector cache tuning, and should expect trade-offs between indexing speed and resource usage. Single-node is viable but requires careful configuration.
Your out-of-memory errors are expected with default settings. Each 1536-dimensional vector requires ~6KB (1536 * 4 bytes), so 500k vectors need ~3GB just for storage, plus additional memory for HNSW graph structures and cache.
Key memory-related configurations:
Set GOMEMLIMIT to at least 6GB (as you discovered)
Configure vector_cache_max_objects based on available memory
Enable LIMIT_RESOURCES=true to limit resources
Vector Cache Tuning
The vector_cache_max_objects parameter is critical:
Default is 1e12 (effectively unlimited)
For 500k vectors, setting to 20,000 (as you did) reduces memory but increases disk I/O
Full cache (500k+) provides best performance but requires ~6GB+ memory
Async Indexing Behavior
Async indexing (ASYNC_INDEXING=true) can stall with insufficient memory:
It processes vectors in batches via a queue system
The “context canceled” errors indicate timeout during search while indexing
Requires sufficient memory for queue and temporary structures
Post-Indexing CPU Spike
The 100% CPU for 8 hours after indexing completion is likely due to:
Thank you—this has really clarified things for me. However, I still have a few specific questions:
In a single-node deployment, at what collection size (e.g., number of objects or total vector memory footprint) should I enable sharding, even if I’m not scaling horizontally? Does sharding on one node improve indexing throughput or query responsiveness for large collections?
At what data scale does a multi-node deployment become practically necessary? For instance, is 500k vectors still feasible on a single node with proper tuning, or does something like 1M–2M+ vectors fundamentally require distribution? Are there hard thresholds (e.g., per-shard object limits, memory ceilings, or indexing bottlenecks) that indicate “you must scale out”?