Is fuzzy search supported here ?
For example, there is a word “executive director” in the database, will this object be found if the query will be “execut” or “execute”.
What languages are supported for bm25 search? Whether word endings or misspelled words are taken into account in the search.
If there is an object “garry.po@gmail.com” in the database
will this record be found if the search query is “garry” or “gary” or “po@gmail”?
Hi @IvankoPo
Lots of good questions. Let me see if I can tackle them one by one.
Is fuzzy search supported here ?
For example, there is a word “executive director” in the database, will this object be found if the query will be “execut” or “execute”.
Kind of. Vector (nearText) search will find similar objects like “fuzzy” searches will, but using different methods.
Typically, fuzzy searches use metrics like Levenshtein distances - where differences in individual characters are used. Vector search uses differences in overall meaning.
So a vector search will find that “execut” is more similar to “executive director” than say, a “book”.
What languages are supported for bm25 search? Whether word endings or misspelled words are taken into account in the search.
It’s agnostic to language as BM25F is based on matches, but stopwords are English-only (or none). Stemming is not yet supported (Feature Request: Stemming on Inverted Index text props · Issue #2439 · weaviate/weaviate · GitHub).
Reference: Collection schema | Weaviate - vector database
If there is an object “garry.po@gmail.com” in the database
will this record be found if the search query is “garry” or “gary” or “po@gmail”?
If the tokenization is set up appropriately, and if you use the right query (e.g. BM25F, or a Boolean filter), yes.
I hope that helps!