Batch inserts failing for weaviate

Hi team,
I’m running weaviate(V1.25) in kubernetes.
When I try to bulk index documents, I randomly get this exception:

weaviate.exceptions.UnexpectedStatusCodeException: batch response!
Unexpected status code: 404, with response body: None.`

This isn’t happening always though. For the same set of documents, when I try to index the same documents after sometime, it works.
Can someone please help.

hi @bharath97 !!

Welcome to our community :hugs:

How have you exposed Weaviate service http and grpc services at your K8s?

This may be related as this messages indicates that the client got 404 from the server. Considering that this is not happening always, it may be k8s load balancer returning 404 instead of the corresponding service.

Let me know if this helps.

Thanks!

Hey @DudaNogueira , thanks for your response.
This is exposed over http only. (Traefik is the proxy, just FYI).

My traefik proxy has recorded a 499 status code for batch operations. The default timeout configured is 60 seconds. The document size is not too high as well.
For records of similar sizes, few batch inserts response was under 20ms.

After looking at the logs of weaviate server, the batch operation is completed in 3.7ms. Attached the log from weaviate server:

{
  "action": "batch_objects",
  "batch_size": 5,
  "level": "trace",
  "msg": "object batch took 3.702183ms",
  "time": "2024-09-10T06:36:42Z",
  "took": 3702183
}

Another log on my proxy server says:

10.2.2.131 - - [10/Sep/2024:06:36:42 +0000] "POST /v1/batch/objects HTTP/1.1" 499 21 "-" "-" 1211875 "xplus-weaviate-weaviate-xplus-xplus-weaviate-prodca-phenom-local@kubernetes" "http://10.2.214.160:8080" 60059ms

If I understand this correctly, the batch insert happened instantaneously, however the response wasn’t sent back to the client.

Any help is appreciated!

Ok, what version specifically are you running?

What client language and version are you using?

Can you map it directly to a port so we can isolate any Traefik intereference?

Hi @DudaNogueira ,
My weaviate server version is: 1.25.0
Client language - Python(via langchain)
Client version - 3.24.1

I tried calling the kubernetes service port endpoint, bypassing traefik and I haven’t observed any timeouts after testing.

I’m surprised why this issue has popped up very recently when I’ve been using the same version of weaviate and traefik for over an year.

Edit: I reverted back to using r53 record now and I don’t see any failures. There was one retry out of my 100 requests and the rest worked seamlessly.

1 Like

Glad to hear that!

Notice that you are using python v3 client, while the new python v4 will get you a lot of improvements, specially on batch inserts, as it uses GRPC instead of HTTP.

Check this doc on how to migrate your code from v3 to v4:

The cool part is that the new python v4 client also includes the python v3, so you can migrate your codebase gradually, and start for example on the batch, already leveraging the GRPC features.

THanks!

Thank you!
Would you mind explaining the reason for these intermittent timeouts on http?
Also, I believe using http with v4 client wouldn’t still solve the issue right?

Hi! I believe this is an issue on how your Weaviate is exposed.

Maybe, the client is hitting the url, but traefik is not delivering the request correctly, and returning with a 404 error.

and yes, if the services are not exposed correctly, using v4 should make no difference.

Hello,
I believe that too. I can check on the configuration of ingress for this.
However, from the logs, the request was forwarded from traefik to weavaite server as I could see the log with message: object batch took 3.702183ms
After this the response was not received for traefik.

Also, if you have any references on how to expose weaviate on kubernetes cluster, can you please share?