What's the best practices for handling 500k+ records of data

Description

What are the best practices and hard limitations for deploying weaviate that can handle a collection of 500k+ records?

My Current Issues

  1. Out-of-memory errors with default settings:
    When creating a collection with default configuration and ingesting over 300k records (each with a 1536-dimensional vector), I consistently ran into “not enough memory” errors, rendering the collection unusable. Restarting Weaviate occasionally reduced memory usage temporarily.

  2. Indexing stalled for hours with async indexing enabled:
    I then adjusted vector_cache_max_objects to 20,000 and set the environment variables ASYNC_INDEXING=true and GOMEMLIMIT=2500MiB. After re-importing the 300k+ dataset, the shard remained stuck in INDEXING status—after 8 hours, only a few tens of thousands of vectors had been processed. All queries timed out with the error:
    "knn search: search layer at level 0: context canceled".

  3. Upgrade to v1.33 didn’t resolve the bottleneck:
    To address this, I upgraded Weaviate from v1.30 to v1.33 and increased vector_cache_max_objects to hold the entire dataset in cache. However, indexing progress remained extremely slow (high disk read I/O at ~130 MiB/s).

  4. Increasing memory to 4 GB (GOMEMLIMIT=3200MiB) had no effect.

  5. Further memory increase to 6 GB (GOMEMLIMIT=5000MiB) improved indexing:
    After this change and a restart, disk I/O dropped significantly, and indexing completed quickly—shard status changed to READY, and queries stopped timing out.
    However, CPU usage immediately spiked to 100%.

  6. CPU stayed high even after restarts:
    Restarting Weaviate briefly lowered CPU, but it quickly climbed back to 100%.

  7. CPU finally normalized after ~8 hours—but no logs explained why:
    Roughly 8 hours later, CPU usage dropped to below 10%. Even with debug logging enabled, the logs gave no clear indication of what the system was doing during that period.

  8. Reduced vector_cache_max_objects back to 20,000 to avoid excessive memory consumption going forward.


I’d appreciate any guidance on:

  • Why CPU remained high for hours after indexing completed

  • What’s the lowest memory/CPU/resource footprint that can support a 500k+ (he dims of vectors is 1536) collection without above issues?

  • What trade-offs should I expect at the minimum configuration (e.g., slow indexing, high latency, instability)?

  • Is a single-node setup ever appropriate for 500k+ vectors, or is a multi-node (e.g., distributed Weaviate with separate DB, vector, and backup roles) deployment strongly advised? What about Kubernetes-specific guidance (e.g., pod resource limits, PVC sizing, horizontal scaling)?

Server Setup Information

  • Weaviate Server Version: 1.33.9
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: single
  • Client Language and Version: python 3.11
  • Multitenancy?: No

Any additional Information

related issues:

hi @Charlie_Chen !!

For 500k+ records with 1536-dimensional vectors, you need at least 6-8GB memory, proper vector cache tuning, and should expect trade-offs between indexing speed and resource usage. Single-node is viable but requires careful configuration.

Your out-of-memory errors are expected with default settings. Each 1536-dimensional vector requires ~6KB (1536 * 4 bytes), so 500k vectors need ~3GB just for storage, plus additional memory for HNSW graph structures and cache.

Key memory-related configurations:

  • Set GOMEMLIMIT to at least 6GB (as you discovered)
  • Configure vector_cache_max_objects based on available memory
  • Enable LIMIT_RESOURCES=true to limit resources

Vector Cache Tuning

The vector_cache_max_objects parameter is critical:

  • Default is 1e12 (effectively unlimited)
  • For 500k vectors, setting to 20,000 (as you did) reduces memory but increases disk I/O
  • Full cache (500k+) provides best performance but requires ~6GB+ memory

Async Indexing Behavior

Async indexing (ASYNC_INDEXING=true) can stall with insufficient memory:

  • It processes vectors in batches via a queue system
  • The “context canceled” errors indicate timeout during search while indexing
  • Requires sufficient memory for queue and temporary structures

Post-Indexing CPU Spike

The 100% CPU for 8 hours after indexing completion is likely due to:

  • HNSW graph consolidation and optimization
  • Vector compression if enabled
  • LSM store compaction cycles

Minimum Viable Configuration

For 500k+ vectors on a single node:

environment:  
  GOMEMLIMIT: 6GiB  
  ASYNC_INDEXING: "true"  
  LIMIT_RESOURCES: "true"  
  PERSISTENCE_MEMTABLES_MAX_SIZE_MB: 128  
  PERSISTENCE_LSM_MAX_SEGMENT_SIZE: 1GB

Let me know if this helps!

Thanks!

Thank you—this has really clarified things for me. However, I still have a few specific questions:

  1. In a single-node deployment, at what collection size (e.g., number of objects or total vector memory footprint) should I enable sharding, even if I’m not scaling horizontally? Does sharding on one node improve indexing throughput or query responsiveness for large collections?

  2. At what data scale does a multi-node deployment become practically necessary? For instance, is 500k vectors still feasible on a single node with proper tuning, or does something like 1M–2M+ vectors fundamentally require distribution? Are there hard thresholds (e.g., per-shard object limits, memory ceilings, or indexing bottlenecks) that indicate “you must scale out”?