Description
I have added some Hebrew texts to my embeddings. I use NearText queries under the Tex2Vec OpenAI transformer.
I am curious if these queries will match similar contexts in the Hebrew texts?
Server Setup Information
Weaviate Server Version:
Deployment Method: WCS
Multi Node? Number of Running Nodes:
Client Language and Version: 1.23.10
Any additional Information
Hi!
It will depend on the LLM inference you use.
If it supports your language, it should work fine as long as Weaviate is aware
Let me know if this helps!
gpt-4 is the model. It supports Hebrew.
How is Weaviate made aware?
When you create a collection, you can specify a vectorizer model.
As long as this vectorizer model support multi language, and your class was configured to use it, it should work.
for example:
import weaviate.classes as wvc
client.collections.create(
"Article",
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_cohere(
model="embed-multilingual-v2.0",
vectorize_collection_name=True
),
)
Cohere has a nice multi language model:
Cohere offers multilingual language models that map text to a semantic vector space, improving search results and enabling use cases such as multilingual semantic search, customer feedback aggregation, and cross-lingual content moderation. The model...
Let me know if this helps
1 Like
Yes, thanks.
Actually, I want to create a new cluster and use text-embedding-3-large as my embedding model. Do you have a way of finding out if it is multi-lingual? I have asked on the OpenAI developer forum, but so far nobody seems to know.
It looks like regardless of whether the model is multi-lingual or not, if I add the English translation to the embedding, the cosine similarity search will find it. That’s good to know.