I’m currently working with a Weaviate collection that was initially created without using deterministic UUIDs. My dataset includes several fields (GUID, a, b, c), but my primary concern is avoiding duplicate entries based solely on the GUID field.
I’ve come across the documentation:
will this help me ?
It is possible for me to create another collection with deterministic uuid, but am I able to avoid just duplicate GUIDs, not considering other fields from data.
My question is:
- Can I rely on the deterministic UUID generation approach to prevent duplicates by using just the GUID field?
- Is this an efficient method to avoid duplicate data insertion, or should I consider another strategy?
- How can i update a collection to generate deterministic uuid ?
Any insights or recommendations would be greatly appreciated. Thanks!
hi @Rohini_vaidya !
In order to use the deterministic ID, you will need to “tie” the uuid generation to something unique to each object.
Let’s say you have, as you mentioned:
if A is a unique value for all your objects (eg, books-987, articles-456), you can generate a UUID based on that value.
Now whenever there is a change in B or C or you need to create a new object, you can do a batch ingestion, generating the UUID using the value of A on the fly.
Batch will insert or update the object.
In your case, if you have a unique value, you could can this dataset to a new collection or cluster (here we have a migration guide) .
And while migrating the data, you generate the uuids accordingly.
Let me know if this helps!
Thanks!
Thank you @DudaNogueira.
Let me know if the following steps are correct or not?
- Will create a new collection in which I will generate deterministic uuid only for specific field that is GUID in my case.
- Insert data by checking the uuid to avoid duplicates