I’m working on a script to import around ~200 million records and want it to run around 6 times faster. I have 4 weaviate instances (each node with 4 threads and one weaviate instance per node). I only replicate my data once, and have 4 total shards (1 shard per instance). I’m tuning the parallelism level on my data loading script (right now have found the best performance at 12 threads). The batch size that has reached the best performance is about 5000 records per batch.
To back this, I also have 1 GPU running 1 instance of the inference model.
If I want this to run faster, what are the obvious bottlenecks in this set up? I was surprised when I bumped my number of weaviate instances (nodes) to 8 and added 8 shards, and performance worsened.
Is there a benefit to having more than 1 shard per node (as it relates to imports)?