Avoid inserting dupes

I am inserting elements with three columns:

  • kicker (this is unique)
  • slug (this is unique)
  • vector (the computed embedding vector of the kicker)
    as follows (where data is a list of json elements):

batch_size = int(os.getenv("BATCH_SIZE")) 
client.batch.configure(batch_size=batch_size)  # Configure batch

# Batch import all objects
with client.batch as batch:
  for item in data:
      properties = {
         textcolname: item[textcolname],
         "slug": item["slug"],
      }
      client.batch.add_data_object(
         class_name=schema_name,
         data_object=properties,
         vector=item[vectcolname] # my vector embeddings go here
      )

is there a way to avoid inserting an item if the kicker (which is unique) is already inserted? (in this case also the vector and the slug would be there already)

Hi @rjalex !

You want to have deterministic ids, by providing your own uuid based on a property or properties.

Check here:

Also, we highly recommend using python v4 as it will significantly improves performance:

Let me know if that helps :slight_smile:

1 Like

Very clear. Muito obrigado.

1 Like