My code looks like this (basically just a copy of what’s in the documentation edited to work with multi-tenancy):
collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
for datum in data:
vector_uuid = self.generate_deterministic_id(datum)
embedding = self.get_embedding(datum)
batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)
failed_objects = collection.batch.failed_objects # empty
failed_objects1 = collection.with_tenant(tenant=tenant).batch.failed_objects # results to empty
failed_objects2 = self._client.batch.failed_objects # also empty
Note: this is not the actual complete codes, I simplified it for the sake of brevity. I can confirm though that the code runs fine and is able to run import when correct data is used
The batch, in some cases when I’m using improper data, fails with this error:
{'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
But when I try to inspect the failed_objects, it’s always empty. I tried different ways to see where the failed_objects are, but all everything result to empty arrays. (see in codes: failed_objects, failed_objects1 and failed_objects2)
Server Setup Information
Weaviate Server Version: 1.25.11
Deployment Method: docker
Multi Node? Number of Running Nodes: 3
Client Language and Version: Python v4
Multitenancy?: Yes
Any additional Information
Logs:
46549357-e140-4347-b5cd-635bbdf758bb {'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
[ERROR] 2024-09-03T09:00:15.354Z 46549357-e140-4347-b5cd-635bbdf758bb {'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb collection.with_tenant(tenant=tenant).batch.failed_objects: [] len=0
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb collection.batch.failed_objects: [] len=0
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb self._client.batch.failed_objects: [] len=0
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb self._client.batch.failed_objects: [] len=0
Thank you for looking into my query.
I tried your suggestion and made sure that the collection.batch.failed_objects line is outside the context manager block. However, I’m getting the same result.
Additional info: I am able to pull the logs from the Weaviate node itself, if that’s of any help.
{"build_git_commit":"2c51c29","build_go_version":"go1.21.13","build_image_tag":"1.25.13","build_wv_version":"1.25.13","class":"MyObject","level":"error","msg":"[conflict \"Validate vector index for 48efafd2-77cc-55e1-9f3c-c4d6457993d2: new node has a vector with length 1536. Existing nodes have vectors with length 1\": \u003cnil\u003e]","op":"put.many","shard":"TSTDRV_OWIE3","time":"2024-09-03T14:03:51Z"}
This is intentional - I’m forcing my batch to fail.
File ~/dev/weaviate/lab/.venv/lib/python3.12/site-packages/weaviate/connect/v4.py:458, in ConnectionV4.__send(self, method, url, error_msg, status_codes, is_gql_query, weaviate_object, params)
456 res = await self._client.send(req)
457 if status_codes is not None and res.status_code not in status_codes.ok:
--> 458 raise UnexpectedStatusCodeError(error_msg, response=res)
459 return cast(Response, res)
460 except RuntimeError as e:
UnexpectedStatusCodeError: Object was not added! Unexpected status code: 500, with response body: {'error': [{'message': 'put object: import into index test: put local object: shard="D483s6MIIvw9": store object in LSM store: Validate vector index for 3e14872a-fc3d-4e08-9bb7-61debec9a85c: new node has a vector with length 1536. Existing nodes have vectors with length 1'}]}.
Yes, that’s also how I’m doing it. Again, I was intentionally making the batch fail. After the batch fails, I want to know which records failed and mark them as failed in my own database.
Going back to the original question:
collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
for datum in data:
vector_uuid = self.generate_deterministic_id(datum)
embedding = self.get_embedding(datum)
batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)
failed_objects = collection.batch.failed_objects
I have this now in my codes. After the batch fails because of the incorrect vector size
failed_objects = collection.batch.failed_objects # still empty
I looked through the doc in the link but I aint sure if collection.batch.failed_objects is still appropriate. for lack of better idea, I am doing this for now:
try:
collection.data.delete_many(…)
except weaviate.exceptions.UnexpectedStatusCodeError as e:
print(e)
@dataclass
class DeleteManyReturn(Generic[T]):
"""This class contains the results of a `delete_many` operation.."""
failed: int
matches: int
objects: T
successful: int
where you get the status for each object (objects: T) if you set verbose to true. Otherwise it is only the counts.
Thanks. I will use this to try handle it. I have a container thats admittedly running sometimes close to what the physical memory will allow. And recently, I got this quite often:
ERROR: Error: Query call with protocol GRPC delete failed with message <AiGrpcError that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = “Deadline Exceeded”
debug_error_string = “UNKNOWN:Error received from peer {created_time:“2024-11-15T17:05:18.271641472+00:00”, grpc_status:4, grpc_message:“Deadline Exceeded”}”
is this kind of error (looks pretty fatal) still have a “nice” DeleteManyReturn obj? or it may still be prudent/necessary to do try/except to detect and handle this?