Hi.
I have jewellery dataset around 3000 products. Single product has around 15 images and details. I wanted to do the hybrid search so how to structure it and store the data in the vector DB?.
Have to store each image with same description? or is there a way to group the image and description based on the sku?.
hi @moaabid !!
You could use multi2vec-clip , and define a collection with properties details
, image1
, image2
, …
Here is how using the new python v4 client:
collection = client.collections.create(
"Jewellery",
vectorizer_config=wvc.config.Configure.Vectorizer.multi2vec_clip(
image_fields=[
wvc.config.Multi2VecField(name="image1", weight=0.3),
wvc.config.Multi2VecField(name="image2", weight=0.3)
#...
],
text_fields=[
wvc.config.Multi2VecField(name="details", float=0.7)
]
),
properties=[
wvc.config.Property(name="details", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="image1", data_type=wvc.config.DataType.BLOB),
wvc.config.Property(name="image2", data_type=wvc.config.DataType.BLOB),
# ...
]
)
What will happen “under the hood” is that Weaviate will combine all those vectors into one, taking into account the weights you specified.
Here is where this “magic” take place:
return nil, err
}
vectors = append(vectors, res.TextVectors...)
vectors = append(vectors, res.ImageVectors...)
}
weights, err := v.getWeights(ichek)
if err != nil {
return nil, err
}
return libvectorizer.CombineVectorsWithWeights(vectors, weights), nil
}
func (v *Vectorizer) getWeights(ichek ClassSettings) ([]float32, error) {
weights := []float32{}
textFieldsWeights, err := ichek.TextFieldsWeights()
if err != nil {
return nil, err
}
imageFieldsWeights, err := ichek.ImageFieldsWeights()
if err != nil {
Let me know if this helps!
Thanks!
Regarding the sku, you can use deterministic ids to make sure each product is unique: