I’m wondering about optimizing import speed. Does weaviate natively parallelize vectorizing imported batches over multiple containers running inference models (e.g. one GPU has one inference container running on it)? Or will each batch always hit one container?
Thanks in advance!
I am not sure about this, but I believe that you for the model inference you can setup multiple containers and load balance them, so this paralelization will happen transparently for Weaviate.
But be aware that even if you have multiple inference models, indexing itself takes time, as Weaviate builds the index while importing.
We have released a new feature that allows async import
It is experimental for now and ready to be tried
Let me know if that helps!