Hello Weaviate gurus. Long time no see How boring when I-m not using Weaviate
I started investigating the wonderful world of multimodal similarity hoping to get some work from a nice historical archive.
I played with CLIP and it is nice but in my experience it works okaish for text to image recognition but somehow fails miserably when asking for an image to image similarity (perceptual+semantic).
After some investigation I think I will want to test the CLIPv2 model.
Not sure as to the best way to use that aside from a ‘bring your own vectors’ approach.