Multi-Lingual Cosine Similarity Search

SomebodySysop · March 8, 2024, 12:29am

Description

I have added some Hebrew texts to my embeddings. I use NearText queries under the Tex2Vec OpenAI transformer.

I am curious if these queries will match similar contexts in the Hebrew texts?

Server Setup Information

Weaviate Server Version:
Deployment Method: WCS
Multi Node? Number of Running Nodes:
Client Language and Version: 1.23.10

Any additional Information

DudaNogueira · March 8, 2024, 2:30pm

Hi!

It will depend on the LLM inference you use.

If it supports your language, it should work fine as long as Weaviate is aware

Let me know if this helps!

SomebodySysop · March 8, 2024, 8:06pm

gpt-4 is the model. It supports Hebrew.

How is Weaviate made aware?

DudaNogueira · March 8, 2024, 9:34pm

When you create a collection, you can specify a vectorizer model.

As long as this vectorizer model support multi language, and your class was configured to use it, it should work.

for example:

import weaviate.classes as wvc

client.collections.create(
    "Article",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_cohere(
        model="embed-multilingual-v2.0",
        vectorize_collection_name=True
    ),
)

Cohere has a nice multi language model:

Let me know if this helps

SomebodySysop · March 8, 2024, 11:46pm

Yes, thanks.

Actually, I want to create a new cluster and use text-embedding-3-large as my embedding model. Do you have a way of finding out if it is multi-lingual? I have asked on the OpenAI developer forum, but so far nobody seems to know.

SomebodySysop · March 9, 2024, 9:24am

It looks like regardless of whether the model is multi-lingual or not, if I add the English translation to the embedding, the cosine similarity search will find it. That’s good to know.

Topic		Replies	Views
Query regarding similarity search Support	7	694	July 18, 2023
Does Weaviate have a good support for non-English (multi-lingual) search? General	2	406	March 20, 2024
Multilingual embedder for Weaviate Support	10	556	July 17, 2025
Weaviate & ColBERTv2? Support	3	609	May 16, 2024
Weaviate cosine similarity completelly different than ScikitLearn with SentenceTransformer vectorizer Support bug , developer-experience , python , technical	1	152	January 14, 2025

Multi-Lingual Cosine Similarity Search

Description

Server Setup Information

Any additional Information

Related topics