If I have a keyword like my_env_var_dont_split I can use ‘lowercase’ or ‘whitespace’ for tokenization during ingest and the keyword will not be split in the index (as far as I can tell). Please confirm.
But during search/filter it appears that I cannot specify a tokenizer and instead weaviate will try all tokenization methods and combine the results.
If the individual words of the keyword are all used in many objects it appears that this is diminishing the search effectiveness.
Is there another way to tell bm25 to only use a specific tokenizer?
Thanks to @hsm207 for the answer. You don’t have to specify tokenization during search/filtering as the search term is automatically tokenized based on the property’s setting: