Batch insert logs 'Failed to send 1 objects in a batch of 1' but collection.batch.failed_objects is empty

Owie_de_la_Pena · September 3, 2024, 9:28am

Description

Hello,

I’m using python v4 client to do batch import.

My code looks like this (basically just a copy of what’s in the documentation edited to work with multi-tenancy):

collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
   for datum in data:
      vector_uuid = self.generate_deterministic_id(datum)
      embedding = self.get_embedding(datum)
      batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)

   failed_objects = collection.batch.failed_objects # empty
   failed_objects1 = collection.with_tenant(tenant=tenant).batch.failed_objects # results to empty
   failed_objects2 = self._client.batch.failed_objects # also empty

Note: this is not the actual complete codes, I simplified it for the sake of brevity. I can confirm though that the code runs fine and is able to run import when correct data is used

The batch, in some cases when I’m using improper data, fails with this error:

{'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}

But when I try to inspect the failed_objects, it’s always empty. I tried different ways to see where the failed_objects are, but all everything result to empty arrays. (see in codes: failed_objects, failed_objects1 and failed_objects2)

Server Setup Information

Weaviate Server Version: 1.25.11
Deployment Method: docker
Multi Node? Number of Running Nodes: 3
Client Language and Version: Python v4
Multitenancy?: Yes

Any additional Information

Logs:

46549357-e140-4347-b5cd-635bbdf758bb	{'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}

[ERROR] 2024-09-03T09:00:15.354Z 46549357-e140-4347-b5cd-635bbdf758bb {'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb collection.with_tenant(tenant=tenant).batch.failed_objects: [] len=0
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb collection.batch.failed_objects: [] len=0
2024-09-03T11:00:15.374+02:00
[INFO]	2024-09-03T09:00:15.374Z	46549357-e140-4347-b5cd-635bbdf758bb	self._client.batch.failed_objects: [] len=0

[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb self._client.batch.failed_objects: [] len=0

Where should I actually get the failed_objects?

DudaNogueira · September 3, 2024, 12:09pm

Hi @Owie_de_la_Pena !! Welcome to our community.

The issue here is that you should get those failed objects outside of the batch context, like so:

collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
   for datum in data:
      vector_uuid = self.generate_deterministic_id(datum)
      embedding = self.get_embedding(datum)
      batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)

failed_objects = collection.batch.failed_objects # empty
print(failed_objects)

Check here the documentation on error handling:

Let me know if this helps!

Thanks!

Owie_de_la_Pena · September 3, 2024, 1:35pm

Hi @DudaNogueira ,

Thank you for looking into my query.
I tried your suggestion and made sure that the collection.batch.failed_objects line is outside the context manager block. However, I’m getting the same result.

Owie_de_la_Pena · September 3, 2024, 2:07pm

Additional info: I am able to pull the logs from the Weaviate node itself, if that’s of any help.

{"build_git_commit":"2c51c29","build_go_version":"go1.21.13","build_image_tag":"1.25.13","build_wv_version":"1.25.13","class":"MyObject","level":"error","msg":"[conflict \"Validate vector index for 48efafd2-77cc-55e1-9f3c-c4d6457993d2: new node has a vector with length 1536. Existing nodes have vectors with length 1\": \u003cnil\u003e]","op":"put.many","shard":"TSTDRV_OWIE3","time":"2024-09-03T14:03:51Z"}

This is intentional - I’m forcing my batch to fail.

DudaNogueira · September 3, 2024, 2:30pm

This is the error message.

You probably inserted one object with 1 dimension, and now is trying to ingest objects with 1536 dimensions.

here is how to reproduce this exact error:

import weaviate
from weaviate import classes as wvc
client = weaviate.connect_to_local()
    
client.collections.delete("Test")

collection = client.collections.create(
    "Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)
collection.data.insert({"text": "1 dimensions"}, vector=[1])
collection.data.insert({"text": "1536 dimensions"}, vector=list(range(1536)))

Error:

File ~/dev/weaviate/lab/.venv/lib/python3.12/site-packages/weaviate/connect/v4.py:458, in ConnectionV4.__send(self, method, url, error_msg, status_codes, is_gql_query, weaviate_object, params)
    456     res = await self._client.send(req)
    457     if status_codes is not None and res.status_code not in status_codes.ok:
--> 458         raise UnexpectedStatusCodeError(error_msg, response=res)
    459     return cast(Response, res)
    460 except RuntimeError as e:

UnexpectedStatusCodeError: Object was not added! Unexpected status code: 500, with response body: {'error': [{'message': 'put object: import into index test: put local object: shard="D483s6MIIvw9": store object in LSM store: Validate vector index for 3e14872a-fc3d-4e08-9bb7-61debec9a85c: new node has a vector with length 1536. Existing nodes have vectors with length 1'}]}.

Let me know if this helps.

Thanks!

Owie_de_la_Pena · September 3, 2024, 4:49pm

Hi again @DudaNogueira,

Yes, that’s also how I’m doing it. Again, I was intentionally making the batch fail. After the batch fails, I want to know which records failed and mark them as failed in my own database.

Going back to the original question:

collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
   for datum in data:
      vector_uuid = self.generate_deterministic_id(datum)
      embedding = self.get_embedding(datum)
      batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)

failed_objects = collection.batch.failed_objects

I have this now in my codes. After the batch fails because of the incorrect vector size

failed_objects = collection.batch.failed_objects # still empty

It is still giving me empty array.

Owie_de_la_Pena · September 3, 2024, 7:37pm

Update: I got it working by modifying the batch codes to:

with client.batch.dynamic() as batch:
   ...
      ...
      batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding, collection=collection_name, tenant=tenant)

instead of

with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:

DudaNogueira · September 5, 2024, 12:53pm

Nice!

Glad you figured it out and thanks for sharing.

Indeed you can create a batch context using the client directly and then specifying the collection and tenant.

This is a great feature as you can add data to multiple collections in multi tenants in one batch context, only by changing those parameters.

Thanks!

00.lope.naughts · November 19, 2024, 6:44am

I also have question about proper error handling in general. specifically, I am considering:

collection.data.delete_many(
where=Filter.by_property(“my_id”).contains_any(my_ids)
)

I looked through the doc in the link but I aint sure if collection.batch.failed_objects is still appropriate. for lack of better idea, I am doing this for now:

try:
collection.data.delete_many(…)
except weaviate.exceptions.UnexpectedStatusCodeError as e:
print(e)

Dirk · November 19, 2024, 7:49am

Delete many has the following return:

@dataclass
class DeleteManyReturn(Generic[T]):
    """This class contains the results of a `delete_many` operation.."""

    failed: int
    matches: int
    objects: T
    successful: int

where you get the status for each object (objects: T) if you set verbose to true. Otherwise it is only the counts.

00.lope.naughts · November 19, 2024, 4:50pm

Thanks. I will use this to try handle it. I have a container thats admittedly running sometimes close to what the physical memory will allow. And recently, I got this quite often:

ERROR: Error: Query call with protocol GRPC delete failed with message <AiGrpcError that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = “Deadline Exceeded”
debug_error_string = “UNKNOWN:Error received from peer {created_time:“2024-11-15T17:05:18.271641472+00:00”, grpc_status:4, grpc_message:“Deadline Exceeded”}”

is this kind of error (looks pretty fatal) still have a “nice” DeleteManyReturn obj? or it may still be prudent/necessary to do try/except to detect and handle this?

Dirk · November 19, 2024, 6:38pm

That means that the request was aborted because it took too long. You could increase the timeout value like shown here: Python | Weaviate

(can’t remember which one is used by delete)

Topic		Replies	Views
Batch import silently fails Support	1	117	December 26, 2024
Batch insert error Support	1	149	November 21, 2024
Inconsistent errors for weaviate batchInsert General	6	574	August 29, 2024
Error in getting failed objects Support	1	481	March 22, 2024
How to handle error for Batch Import (add_object) when weaviate instance becomes unavailable Support developer-experience , python	8	352	December 4, 2024

Batch insert logs 'Failed to send 1 objects in a batch of 1' but collection.batch.failed_objects is empty

Description

Server Setup Information

Any additional Information

Related topics