Description
Hello,
I’m using python v4 client to do batch import.
My code looks like this (basically just a copy of what’s in the documentation edited to work with multi-tenancy):
collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
for datum in data:
vector_uuid = self.generate_deterministic_id(datum)
embedding = self.get_embedding(datum)
batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)
failed_objects = collection.batch.failed_objects # empty
failed_objects1 = collection.with_tenant(tenant=tenant).batch.failed_objects # results to empty
failed_objects2 = self._client.batch.failed_objects # also empty
Note: this is not the actual complete codes, I simplified it for the sake of brevity. I can confirm though that the code runs fine and is able to run import when correct data is used
The batch, in some cases when I’m using improper data, fails with this error:
{'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
But when I try to inspect the failed_objects, it’s always empty. I tried different ways to see where the failed_objects are, but all everything result to empty arrays. (see in codes: failed_objects
, failed_objects1
and failed_objects2
)
Server Setup Information
- Weaviate Server Version: 1.25.11
- Deployment Method: docker
- Multi Node? Number of Running Nodes: 3
- Client Language and Version: Python v4
- Multitenancy?: Yes
Any additional Information
Logs:
46549357-e140-4347-b5cd-635bbdf758bb {'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
[ERROR] 2024-09-03T09:00:15.354Z 46549357-e140-4347-b5cd-635bbdf758bb {'message': 'Failed to send 1 objects in a batch of 1. Please inspect client.batch.failed_objects or collection.batch.failed_objects for the failed objects.'}
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb collection.with_tenant(tenant=tenant).batch.failed_objects: [] len=0
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb collection.batch.failed_objects: [] len=0
2024-09-03T11:00:15.374+02:00
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb self._client.batch.failed_objects: [] len=0
[INFO] 2024-09-03T09:00:15.374Z 46549357-e140-4347-b5cd-635bbdf758bb self._client.batch.failed_objects: [] len=0
Where should I actually get the failed_objects?
Hi @Owie_de_la_Pena !! Welcome to our community.
The issue here is that you should get those failed objects outside of the batch context, like so:
collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
for datum in data:
vector_uuid = self.generate_deterministic_id(datum)
embedding = self.get_embedding(datum)
batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)
failed_objects = collection.batch.failed_objects # empty
print(failed_objects)
Check here the documentation on error handling:
Let me know if this helps!
Thanks!
Hi @DudaNogueira ,
Thank you for looking into my query.
I tried your suggestion and made sure that the collection.batch.failed_objects
line is outside the context manager block. However, I’m getting the same result.
Additional info: I am able to pull the logs from the Weaviate node itself, if that’s of any help.
{"build_git_commit":"2c51c29","build_go_version":"go1.21.13","build_image_tag":"1.25.13","build_wv_version":"1.25.13","class":"MyObject","level":"error","msg":"[conflict \"Validate vector index for 48efafd2-77cc-55e1-9f3c-c4d6457993d2: new node has a vector with length 1536. Existing nodes have vectors with length 1\": \u003cnil\u003e]","op":"put.many","shard":"TSTDRV_OWIE3","time":"2024-09-03T14:03:51Z"}
This is intentional - I’m forcing my batch to fail.
This is the error message.
You probably inserted one object with 1 dimension, and now is trying to ingest objects with 1536 dimensions.
here is how to reproduce this exact error:
import weaviate
from weaviate import classes as wvc
client = weaviate.connect_to_local()
client.collections.delete("Test")
collection = client.collections.create(
"Test",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)
collection.data.insert({"text": "1 dimensions"}, vector=[1])
collection.data.insert({"text": "1536 dimensions"}, vector=list(range(1536)))
Error:
File ~/dev/weaviate/lab/.venv/lib/python3.12/site-packages/weaviate/connect/v4.py:458, in ConnectionV4.__send(self, method, url, error_msg, status_codes, is_gql_query, weaviate_object, params)
456 res = await self._client.send(req)
457 if status_codes is not None and res.status_code not in status_codes.ok:
--> 458 raise UnexpectedStatusCodeError(error_msg, response=res)
459 return cast(Response, res)
460 except RuntimeError as e:
UnexpectedStatusCodeError: Object was not added! Unexpected status code: 500, with response body: {'error': [{'message': 'put object: import into index test: put local object: shard="D483s6MIIvw9": store object in LSM store: Validate vector index for 3e14872a-fc3d-4e08-9bb7-61debec9a85c: new node has a vector with length 1536. Existing nodes have vectors with length 1'}]}.
Let me know if this helps.
Thanks!
Hi again @DudaNogueira,
Yes, that’s also how I’m doing it. Again, I was intentionally making the batch fail. After the batch fails, I want to know which records failed and mark them as failed in my own database.
Going back to the original question:
collection = self._client.collections.get(collection_name)
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
for datum in data:
vector_uuid = self.generate_deterministic_id(datum)
embedding = self.get_embedding(datum)
batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding)
failed_objects = collection.batch.failed_objects
I have this now in my codes. After the batch fails because of the incorrect vector size
failed_objects = collection.batch.failed_objects # still empty
It is still giving me empty array.
Update: I got it working by modifying the batch codes to:
with client.batch.dynamic() as batch:
...
...
batch.add_object(properties=self.get_properties(datum), uuid=vector_uuid, vector=embedding, collection=collection_name, tenant=tenant)
instead of
with collection.with_tenant(tenant=tenant).batch.dynamic() as batch:
1 Like
Nice!
Glad you figured it out and thanks for sharing.
Indeed you can create a batch context using the client directly and then specifying the collection and tenant.
This is a great feature as you can add data to multiple collections in multi tenants in one batch context, only by changing those parameters.
Thanks!