Issue with Duplicate UUID Handling in Weaviate Batch Import vs. Insert Method

I’m experiencing an issue with duplicate UUID handling in Weaviate when using batch import.

In my batch import code, I generate a deterministic UUID using generate_uuid5(guid) for each object. This ensures that the same GUID consistently produces the same UUID. However, despite this, the batch import doesn’t seem to detect duplicates.

Here’s a simplified version of my code:

with collection.batch.fixed_size(batch_size=100) as batch:
    for _, row in processed_data.iterrows():
        for guid in row["GUIDs"]:
            try:
                batch.add_object(
                    properties={
                        "GUID": guid,
                        "a": row["a"]
                    },
                    vector={
                        key + "_embeddings": embeddings_dict.get(
                            row[key], [0.0] * 1536)
                        for key in ["a"]
                    },
                    uuid=generate_uuid5(guid)
                )
                records_processed += 1
            except weaviate.exceptions.UnexpectedStatusCodeError as e:
                skipped_details.append(
                    {
                        "GUID": guid,
                        "message": "Duplicate GUID found"
                        if e.status_code == 422
                        else str(e),
                    }
                )

When I run this code, it processes records with duplicate GUIDs without raising any errors. However, when I use the insert method with the same UUID generation logic, it correctly identifies duplicates and raises a 422 Unprocessable Entity error.

Why does the batch import not detect duplicates while the insert method does? Am I missing something in the batch import process?

Any insights or suggestions would be greatly appreciated.

1 Like

Good morning @Rohini_vaidya,

Your observation is right, when using batch import in Weaviate, adding objects with the same UUID does not raise a duplicate error in the same way as the insert method does. This is expected behavior. When you use batch import, if an object with a given UUID already exists, the new object will replace the existing one. This means that the batch import will silently overwrite the previous object with the same UUID, rather than raising an error:

Best regards,
Mohamed Shahin
Support Engineer – Weaviate
(Ireland, GMT/UTC timezone)