What's the best practices for handling 500k+ records of data

Charlie_Chen · December 9, 2025, 11:34am

Description

What are the best practices and hard limitations for deploying weaviate that can handle a collection of 500k+ records?

My Current Issues

Out-of-memory errors with default settings:
When creating a collection with default configuration and ingesting over 300k records (each with a 1536-dimensional vector), I consistently ran into “not enough memory” errors, rendering the collection unusable. Restarting Weaviate occasionally reduced memory usage temporarily.
Indexing stalled for hours with async indexing enabled:
I then adjusted vector_cache_max_objects to 20,000 and set the environment variables ASYNC_INDEXING=true and GOMEMLIMIT=2500MiB. After re-importing the 300k+ dataset, the shard remained stuck in INDEXING status—after 8 hours, only a few tens of thousands of vectors had been processed. All queries timed out with the error:
"knn search: search layer at level 0: context canceled".
Upgrade to v1.33 didn’t resolve the bottleneck:
To address this, I upgraded Weaviate from v1.30 to v1.33 and increased vector_cache_max_objects to hold the entire dataset in cache. However, indexing progress remained extremely slow (high disk read I/O at ~130 MiB/s).
Increasing memory to 4 GB (GOMEMLIMIT=3200MiB) had no effect.
Further memory increase to 6 GB (GOMEMLIMIT=5000MiB) improved indexing:
After this change and a restart, disk I/O dropped significantly, and indexing completed quickly—shard status changed to READY, and queries stopped timing out.
However, CPU usage immediately spiked to 100%.
CPU stayed high even after restarts:
Restarting Weaviate briefly lowered CPU, but it quickly climbed back to 100%.
CPU finally normalized after ~8 hours—but no logs explained why:
Roughly 8 hours later, CPU usage dropped to below 10%. Even with debug logging enabled, the logs gave no clear indication of what the system was doing during that period.
Reduced vector_cache_max_objects back to 20,000 to avoid excessive memory consumption going forward.

I’d appreciate any guidance on:

Why CPU remained high for hours after indexing completed
What’s the lowest memory/CPU/resource footprint that can support a 500k+ （he dims of vectors is 1536) collection without above issues?
What trade-offs should I expect at the minimum configuration (e.g., slow indexing, high latency, instability)?
Is a single-node setup ever appropriate for 500k+ vectors, or is a multi-node (e.g., distributed Weaviate with separate DB, vector, and backup roles) deployment strongly advised? What about Kubernetes-specific guidance (e.g., pod resource limits, PVC sizing, horizontal scaling)?

Server Setup Information

Weaviate Server Version: 1.33.9
Deployment Method: k8s
Multi Node? Number of Running Nodes: single
Client Language and Version: python 3.11
Multitenancy?: No

Any additional Information

related issues:

DudaNogueira · December 9, 2025, 12:00pm

hi @Charlie_Chen !!

For 500k+ records with 1536-dimensional vectors, you need at least 6-8GB memory, proper vector cache tuning, and should expect trade-offs between indexing speed and resource usage. Single-node is viable but requires careful configuration.

Your out-of-memory errors are expected with default settings. Each 1536-dimensional vector requires ~6KB (1536 * 4 bytes), so 500k vectors need ~3GB just for storage, plus additional memory for HNSW graph structures and cache.

Key memory-related configurations:

Set GOMEMLIMIT to at least 6GB (as you discovered)
Configure vector_cache_max_objects based on available memory
Enable LIMIT_RESOURCES=true to limit resources

Vector Cache Tuning

The vector_cache_max_objects parameter is critical:

Default is 1e12 (effectively unlimited)
For 500k vectors, setting to 20,000 (as you did) reduces memory but increases disk I/O
Full cache (500k+) provides best performance but requires ~6GB+ memory

Async Indexing Behavior

Async indexing (ASYNC_INDEXING=true) can stall with insufficient memory:

It processes vectors in batches via a queue system
The “context canceled” errors indicate timeout during search while indexing
Requires sufficient memory for queue and temporary structures

Post-Indexing CPU Spike

The 100% CPU for 8 hours after indexing completion is likely due to:

HNSW graph consolidation and optimization
Vector compression if enabled
LSM store compaction cycles

Minimum Viable Configuration

For 500k+ vectors on a single node:

environment:  
  GOMEMLIMIT: 6GiB  
  ASYNC_INDEXING: "true"  
  LIMIT_RESOURCES: "true"  
  PERSISTENCE_MEMTABLES_MAX_SIZE_MB: 128  
  PERSISTENCE_LSM_MAX_SEGMENT_SIZE: 1GB

Let me know if this helps!

Thanks!

Charlie_Chen · December 9, 2025, 11:57pm

Thank you—this has really clarified things for me. However, I still have a few specific questions:

In a single-node deployment, at what collection size (e.g., number of objects or total vector memory footprint) should I enable sharding, even if I’m not scaling horizontally? Does sharding on one node improve indexing throughput or query responsiveness for large collections?
At what data scale does a multi-node deployment become practically necessary? For instance, is 500k vectors still feasible on a single node with proper tuning, or does something like 1M–2M+ vectors fundamentally require distribution? Are there hard thresholds (e.g., per-shard object limits, memory ceilings, or indexing bottlenecks) that indicate “you must scale out”?

Topic		Replies	Views
Memory Pressure in Single-Instance Weaviate Under Continuous Write/Deletion Load Support bug , python	11	335	November 18, 2025
Weaviate resource usage Support	1	702	May 28, 2024
Weaviate docker container consume 35gb of memory with only 100k records Support	17	2185	May 22, 2024
Infra Configuration for Docker Setup of Weaviate Support	1	380	June 5, 2024
Support needed for fixing Weaviate performance issues Support python , technical	4	777	October 17, 2024