In my application I need to apply apply fuzzy search on a text field to find matches even if the string was mis-typed. Imagine an application that needs to filter objects on a “surname” field and especially with foreign names the exact match could not work. Cosine nearness on embeddings would not help either in this case.
In my application I use this python library RapidFuzz · PyPI which is very very good.
Would there be any mechanism to apply an external function such as this to a text field in a collection without having to applicatively iterate through all the collection?
But actually my example was very simplistic. The true task is quite tough with searching through a long list of wordplays.
These wordplays are obtained though techniques such as splitting or fusing a legal word, making deliberate typos etc, therefore the various algorithms of the rapidfuzz library give me more latitude.
The true task is quite tough with searching through a long list of wordplays.
Sadly don’t have a better idea
The mentioned vectorizer has a few options so maybe play around and see if any of them works. Please let me know if something comes out of this!