I have build a multi modal similarity search using CLIP model for both text and image based search. CLIP models are old. Wanted to try out different models and compare the results. Is there any example code on how to integrate with paligemma or any other multi modal model?. Instead of CLIP.
hi @moaabid !!
While I am not sure we do have an example with paligemma, we do have an example with multi2vec-bind here:
I did a quick search on paligemma, and it doesn’t seems to be an embedding model
Let me know if the multi2vec-bind is what you are looking for.
Thanks!
multi2vec bind is a good option. But Paligemma is a vision language embedding model. It can encode images into embeddings is what i read. I’m new to this whether this kind of new models example phi-3 vision or paligemma is supported in the weaviate?. These are multimodel so i thought it can be used along with weaviate. My usecase is jewellery image and text similarity search. its complex use case where shape, color, different style were involved. Wanted to try out different models so that can able to compare.
Sorry, I don’t know about this enough.
However, if the model indeed produces embeddings, and you see that Weaviate doesn’t have support for that model yet, feel free to open an feature request:
Also, you can always use any embeddings model, considering that you will need to vectorize your data yourself and “bring your own vectors”, as explained here:
Let me know if this helps!
Thans!