We built a benchmark of 83 tests against a collection of 14K objects. The corresponding search strings are paraphrases of the original and used as such for the index and also vectorized for the almighty hybrid search with all values of alpha in steps of 0.1. The hybrid search is limited to 20 results. Sharing the result just to show the approach:
As soon as I can I will try to write a little article on the benchmarking methodology hoping it will be useful to someone I think that for the hybrid search looking for one or more optimized alphas is something useful. Keep up the great work!
Do you have any suggestions on a weaviate collection/source material with a property/field that is descriptive that can be publicly used as a benchmark target collection? My own is in Italian and has confidential material.
When I say descriptive I am meaning a field that describes in prose something, in such a way that I could describe it also paraphrasing it without using most/any of the original text, in such a way that if using the same words I can exercise the inverted index search while if I use the paraphrase that would only work with a vector search.
Something like Kaggle’s Wine Reviews - description field, but easily available to everybody?
What I’m going to do is to fetch a subset of Kaggle’s wine descriptions (approx 20K english texts), then use a generative model to generate the paraphrases of it. This will build my “to be searched with vectors” benchmark.
What I’m going to do is to fetch a subset of Kaggle’s wine descriptions (approx 20K english texts), then use a generative model to generate the paraphrases of it.
which not surprisingly demonstrates that tje inverted index sparse search is superior.
If anyone is interested in taking a peek to the repo, it would contain all necessary files (although I will remove the very large Kaggle original list which you would need only if you needed to run the 01 program to generate another selection from it) and a pyproject.toml file to install all prerequisites with poetry.
Just add your Weaviate instance and have fun.
PS The .env.copy file will need to be copied/renamed to .env and you would put your own OpenAPI key in it.
PPS The next experiment will be randomly selecting only 3 keywords and searching with those; this would be closer to a real keyword search.
and now the last index search experiment. The original benchmark had an average of over 16 keywords per wine and with all of those we get the super-duper recall of the previous graph.
I have now randomly selected only 3 keywords from each. Here is an example:
and as you can see even with the hybrid search completely skewed towards using only the inverted index (alpha=0.0) the recall is only around 75%, we get the expected object as first in the retrieved results in only under 40% of the times and within the first 3 in around 50% of the times.
If you start cranking up the contribution of the vector search the perfomances drop dramatically with a total meltdown with an alpha of 1.
Now off to feed the family and tomorrow I might continue with the paraphrase benchmarking.