How to store data in the weaviate vector DB

Hi.

I have jewellery dataset around 3000 products. Single product has around 15 images and details. I wanted to do the hybrid search so how to structure it and store the data in the vector DB?.

Have to store each image with same description? or is there a way to group the image and description based on the sku?.

hi @moaabid !!

You could use multi2vec-clip, and define a collection with properties details, image1, image2, …

Here is how using the new python v4 client:

collection = client.collections.create(
    "Jewellery",
    vectorizer_config=wvc.config.Configure.Vectorizer.multi2vec_clip(
        image_fields=[
            wvc.config.Multi2VecField(name="image1", weight=0.3),
            wvc.config.Multi2VecField(name="image2", weight=0.3)
            #...
        ],
        text_fields=[
            wvc.config.Multi2VecField(name="details", float=0.7)
        ]
        
    ),
    properties=[
        wvc.config.Property(name="details", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="image1", data_type=wvc.config.DataType.BLOB),
        wvc.config.Property(name="image2", data_type=wvc.config.DataType.BLOB),
        # ...
    ]
)

What will happen “under the hood” is that Weaviate will combine all those vectors into one, taking into account the weights you specified.

Here is where this “magic” take place:

Let me know if this helps!

Thanks!

Regarding the sku, you can use deterministic ids to make sure each product is unique:

@DudaNogueira
Instead of statically mentioning image1, image2. Is there an array of blob datatype?. Some of the product can have 3 images and other products can have more than that. So it will dynamic.

Example

sku1

  • Image1
  • Image2
    sku2
  • Image1
  • Image2
  • Image3
    sku3
  • Image1
  • image2
  • image3
  • image4

If i store the images individually and i tried to get the response for the input image.

reponse - [{sku1, image1}, {sku2, image1}, {sku1, image2}, {sku3, image4}]

may be sku2 image 1 is closer than the sku1, image2. But i need a response grouping by the sku

response - [{sku1}, {sku2}, {sku3}]

or if there is a way to store multiple images dynamically under one sku. If the input images matches anyone of the image in the array should return the sku. this way the results will have unique data.

Or it has to be handled in the code?. retrieving 50 data and find unique value based on sku

I believe you can try using cross references, so you can have a dynamic number of images to store:

Thanks @DudaNogueira. Actually i wanted to group the result based on sku. There is a group by option in weaviate. Now i’m store the images individually and group it by sku

1 Like