Is there a way to predict import speed base on records count?
My test env is a 4 core 16g vm.
The vector index setting is all default for HNSW. I am using random fake vector data (I know hnsw is not good for random fake vectors )
The import speed is about 1.3-1.5s for every 100 records when there is about 400k records.
How would the import speed increase when I have like 4M records if my memory can hold all the vectors?
Hi!
Recent versions of Weaviate has some nice features that will improve the import process.
For example:
On top of that, having a proper infrastructure will help. For example, on Kubernetes deployments, opt for fewer large machines rather than many small ones to minimize network latency.
Also, using batch import properly and making sure the client is not a bottleneck, having enough resources for pocessing the batch is a good point to watch for, as well as doing proper error handling.
Be aware that importing is a CPU bound process, and while importing there are several tasks that Weaviate has to do in order properly index data. So if you need to import a huge ammount of data in a short period of time, you will need to play around withh those knobs to have the best outcome.
Let me know if that helps