I just setup a new cluster (basic) and I am trying to index ~150K documents to it from Google Collab.
I am getting 502 errors while indexing and the weaviate python package doesn’t handles it correctly.
It seems that the 502 errors are pretty common, even after retrying my self, after ~3K documents on average I am getting the 502 again.
Hi @rafael_zilberman - just wondering, are you using a vectorizer? Or manually specifying your vectors?
The error handling aside, I’m wondering what the root cause of the errors might be.
Yes, we are using an OpenAI vectorizer with the default model (ada-002).
We had another closer look at this, and found the cause. Actually the connection for the vectorizer to OpenAI took quite long during your import (>30s), probably because they were under heavy load. This caused
- Weaviate to take exceptionally long to import your objects
- The Python client to eventually time out
- The load balancer to assume that Weaviate was not available, resulting in the 502 code (this should only have lasted for a couple of seconds)
Unfortunately there’s nothing we can do about the external request timing out, this is a scenario that might occur and is out of our control. However we are already making plans to improve the communication and timeout processes around this behaviour on our side, so that you should be able to get better insights right from the responses in the future.
Thanks for the reply.
I just suggest handling such timeouts using the retry_timeouts parameter and improve the error handling in general in the Python library
Any updates here I’m confused the entire point of weavite is to loader vectors into a schema and I continue to get 502 errors when I try to upload my data set (about 642k rows ) . How is there not a better solution?