Exception: Query call with protocol GRPC batch failed with message recvmsg:Connection reset by peer

Description

So I have a custom linux home server which I have built and I have deployed a weaviate instance using docker. Now I am transfering around 7M records from my mongo instance (which is also running as a docker instance in my current setup) to weaviate using multithreading.

The thing is after migrating around 3M records the python script crashes with the following error:
Exception: Query call with protocol GRPC batch failed with message recvmsg:Connection reset by peer.

Server Setup Information

  • Weaviate Server Version:
  • Deployment Method: docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python v4
  • Multitenancy?: No

Any additional Information

Here’s the complete log:

weaviate.exceptions.WeaviateBatchError: Query call with protocol GRPC batch failed with message recvmsg:Connection reset by peer.
Traceback (most recent call last):
  File "/home/abc/test/datamigration/.venv/lib/python3.12/site-packages/weaviate/collections/batch/grpc_batch_objects.py", line 137, in __send_batch
    res, _ = self._connection.grpc_stub.BatchObjects.with_call(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/test/datamigration/.venv/lib64/python3.12/site-packages/grpc/_channel.py", line 1198, in with_call
    return _end_unary_response_blocking(state, call, True, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/test/datamigration/.venv/lib64/python3.12/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "recvmsg:Connection reset by peer"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-07-11T18:15:30.460550531+12:00", grpc_status:14, grpc_message:"recvmsg:Connection reset by peer"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abc/test/datamigration/transfer_mongo2weav.py", line 194, in <module>
    upload2weaviate(
  File "/home/abc/test/datamigration/transfer_mongo2weav.py", line 39, in upload2weaviate
    uuids = weaviate_collection.data.insert_many(data)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/test/datamigration/.venv/lib/python3.12/site-packages/weaviate/collections/data.py", line 410, in insert_many
    return self._batch_grpc.objects(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/test/datamigration/.venv/lib/python3.12/site-packages/weaviate/collections/batch/grpc_batch_objects.py", line 97, in objects
    errors = self.__send_batch(weaviate_objs, timeout=timeout)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/abc/test/datamigration/.venv/lib/python3.12/site-packages/weaviate/collections/batch/grpc_batch_objects.py", line 151, in __send_batch
    raise WeaviateBatchError(e.details())  # pyright: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
weaviate.exceptions.WeaviateBatchError: Query call with protocol GRPC batch failed with message recvmsg:Connection reset by peer.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abc/test/datamigration/transfer_mongo2weav.py", line 216, in <module>
    raise Exception(e)
Exception: Query call with protocol GRPC batch failed with message recvmsg:Connection reset by peer.

Hi @Adityam_Ghosh,

Welcome to our community! It’s great to have you here.

I’ve noticed this issue can occur when there’s latency in the connection.

Can you try adding the skip_init_checks=True flag to your connection call to bypass the initial connection checks? Here’s how you can do it:

import weaviate

client = weaviate.connect_to_local(
    ...
    skip_init_checks=True
)

Initial Connection Checks - If you stop seeing the error, it would likely point to latency issues when checking the port.

Thanks, but unfortunately this also didn’t work out. I’m thinking, is it because I am trying to upload the data to weaviate using multithreading? Like since it’s receiving a lot of request, the server isn’t able to process all of these at once?

Happy Friday @Adityam_Ghosh!

It’s good point! have you considered running multiple nodes setup then like at least 3?

Hi @Mohamed_Shahin, thanks for the suggestion. I haven’t thought about this. Will surely try it and post an update regarding this.

1 Like

Awesome, and configure the replication factor to 3:

Also, this batch import best practice may add to the code you have

Let me know how it goes!

Have a good weekend!

Thanks mate, you too have a great weekend! Right now, I am trying another configuration where I have just increased the memory limit from 1GB to 4GB and see what happens. And then I will try the solution that you have shared.

1 Like

I have also seen this. for my case, it seems to have something to do with latency of the network connection (to the free weaviate cloud instance).

client = weaviate.connect_to_wcs(
        additional_config=AdditionalConfig(timeout=Timeout(init=30, query=60, insert=120)),
        # skip_init_checks=True,
        cluster_url=WCS_URL,
        auth_credentials=weaviate.auth.AuthApiKey(WCS_API_KEY)
      )

adding the timeout config seemed to fix the issues (mostly), and I opt not to do the skip init check.

however, it is still happening intermittently. it tends to happen around

collection.data.delete_many(…)

where I deleted a lot of stuff. But I haven’t tested enough to be sure. It is definitely intermittent since some jobs will run through fine. And using tenacity on retry didnt seem to help (more debugging there needed).

but if you have fixed your problem, and please share. I will detail my setup on another thread if it proves to be very problematic.