Basically, we have implemented a semantic search engine for our user profiles which is really good as of the moment however, we are now having an increasing user base from Spain and Italy that we wanted to support multi-lingual capabilities on our search, since their profile info are written into their native language (e.g., Spanish, Italian)
We are currently using text2vec-transformers with Weaviate pre-built image:
semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1 which looks like not meant for multi-lingual support?
So basically, its a 3-part question:
- Changing the ML model requires re-vectorization of existing data, right?
- What Weaviate pre-built ML images would you recommend for multi-lingual support?
- Lastly, after changing to a multi-lingual model, does this mean using a hybrid-search query in any language can be able to provide me good results?
For example, we have a user profile that states its basic info “I am good in playing drums or any percussion instrument” then does searching a query like:
- Buscando músicos que saben tocar la batería (in Spanish)
- Cerco musicisti che sappiano suonare la batteria (in Italian)
which just means in English “Looking for musicians who knows to play drums” - will it give me the results as intended? My worry is that I saw this blog about the current limitations of Weaviate with non-English languages. Thanks!