GRCP connection failure when processing data-intensive batches

Nancy_Viviana_Espino · August 16, 2024, 2:42pm

Good morning!
A question, we are batch processing a set of data, we have noticed that after processing the first batch we started to receive this error:

 Query call with protocol GRPC search failed with message <AioRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-08-15T22:09:21.067965744-04:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"
>.  [level: ERROR]
Exception in thread Thread-30 (worker_thread):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/grpc/query.py", line 762, in __call
    res = await self._connection.grpc_stub.Search(
  File "/usr/local/lib/python3.10/site-packages/grpc/aio/_call.py", line 318, in __await__
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-08-15T22:09:21.067965744-04:00", grpc_status:4, grpc_message:"Deadline Exceeded"}"

we have tried to change the batch size, but something similar happens to us, after processing a small number of batches they start to appear. what we do is to find for each vector the nearest vectors (query.near_object).
does anybody know if it could be because of the number of vectors? do you know how we can optimize this process?

weaviate-client = “4.7.1”
version=“1.24.9”

DudaNogueira · August 16, 2024, 6:50pm

hi @Nancy_Viviana_Espino !!

What is the batch configuration you are using?

We suggest using something like this as a base, and start tweaking the batch size and concurrent requests according to the resources you have for you cluster:

with movies.batch.fixed_size(batch_size=20, concurrent_requests=2) as batch:
    for i, row in df.iterrows():
        obj_body = {
            c: row[c] for c in data_columns
        }
        batch.add_object(
            properties=obj_body
        )

Let me know if this helps.

Thanks!

Nancy_Viviana_Espino · August 16, 2024, 7:14pm

Thank you for reviewing this case.
I would like to clarify that we use batches mainly to process our vectors, and not to add data to the collection. However, we have faced a problem when searching for nearby vectors, as there is a disconnect in the GRCP communication. Also, I mentioned that we process each batch on a different thread, with the intent to determine if I might be overloading the system by using this connection.

DudaNogueira · August 16, 2024, 7:19pm

You mean that you ingest not only the data but also the vectors, right?

You can also do that with batch:

with collection.batch.dynamic() as batch:
    for i, data_row in enumerate(data_rows):
        batch.add_object(
            properties=data_row,
            vector=vectors[i]
        )

Do you have any readings on this cluster memory and cpu usage? do they still have resources?

How is this cluster deployed? Have you tried changing some of the options as stated here?

Nancy_Viviana_Espino · August 16, 2024, 8:10pm

Effectively, we generate the vectors in another independent flow and, during this process we download them, upload them to a collection in order to process them and find the closest one for each vector. I have been monitoring the process and have noticed some warning messages, such as the following:

/usr/local/lib/python3.10/asyncio/selector_events.py:701: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=598 read=idle write=<idle, bufsize=0>>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback

I leave an image of the memory and cpu monitor.

DudaNogueira · August 19, 2024, 3:41pm

Hi @Nancy_Viviana_Espino !

How are you uploading the vectors? Have you tried changing some config as per the resource planning doc?

One option is try using the ASYNX_INDEXING.

My guess here is that the cluster is having a hard time both indexing your content and ingesting new data.

Topic		Replies	Views
Query call with protocol GRPC batch failed with message Deadline Exceeded Support	4	1806	March 31, 2025
Error: 'WeaviateBatchError('Query call with protocol GRPC batch failed with message <>) Support	4	188	January 2, 2025
Issue During Batch Insert Support	2	174	August 2, 2024
GRPC Resource Exhausted Error Support	2	505	January 20, 2025
Vectorizer Timeout settings and behavior Support	4	370	July 26, 2024

GRCP connection failure when processing data-intensive batches

Related topics