Ambiguity in multilingual embeddings

michelca · May 8, 2025, 10:19am

This post is not so much looking for tech support, but rather looking for techniques to get around homograph confusion in multilingual embeddings. I can specify the language of my target by specifically filtering on a non-vectorised language field, but I haven’t been able to find a vector-based solution.

A couple of examples:
words in French that will return results relating to their English homographs: “pain”, “four”
words in Spanish that will return results relating to their English homographs: “pan”, “red”
words in German that will return results relating to their English homographs: “gift”, “rat”

I have tried adding language specific prompts like adding " (contexte français)" but while this helps, it does bring in some unwanted results.

Any inputs would be welcome.

Topic		Replies	Views
Does Weaviate have a good support for non-English (multi-lingual) search? General	2	392	March 20, 2024
Query regarding similarity search Support	7	676	July 18, 2023
Multilingual embedder for Weaviate Support	6	456	May 28, 2025
How can I use a multilingual model for reranking? General	3	568	February 12, 2024
Text search and multiple embeddings Support	4	366	September 19, 2024

Ambiguity in multilingual embeddings

Related topics