Comparing self managed embeddings

Hi fellow Weaviaters :slight_smile: I wish to test around 50.000 text objects with unique IDs, each with around 10 different embedding models.

So I will be looping in my text list and manually build a vector_MODELNAME for each of the models I want to evaluate.

How would you efficiently store this in weaviate for this benchmarking case?

I can think of having one collection for each vectorizer.

Even though I am aware of named vectors, as I am not able to use any of the modules such as text2vec-transformers, I am not quite sure how to define/use named vectors with application provided vectors/embeddings.

I will then build a number of text queries each with a list of IDs of expected matching phrases as my ground truths.

Finally I will go though my list of queries, vectorrize them with each vectorizer and with each retrieve the phrases via the corresponding phrases and then match them to the ground truth.

I would love if you could suggest the best approach for this task.

Thank you

Ciao @rjalex !!

Thatโ€™s look like a really exciting project.

While named vectors would be interesting to use for that, I believe that having one collection per model also will work.

Regarding named vectors, you mean bringing your own vectors to a named vector collection? thatโ€™s described here:

Let me know how more I can help you on that :slight_smile:

1 Like