This post is not so much looking for tech support, but rather looking for techniques to get around homograph confusion in multilingual embeddings. I can specify the language of my target by specifically filtering on a non-vectorised language field, but I haven’t been able to find a vector-based solution.
A couple of examples:
words in French that will return results relating to their English homographs: “pain”, “four”
words in Spanish that will return results relating to their English homographs: “pan”, “red”
words in German that will return results relating to their English homographs: “gift”, “rat”
I have tried adding language specific prompts like adding " (contexte français)" but while this helps, it does bring in some unwanted results.
Any inputs would be welcome.