Starting to have fun :)

We built a benchmark of 83 tests against a collection of 14K objects. The corresponding search strings are paraphrases of the original and used as such for the index and also vectorized for the almighty hybrid search with all values of alpha in steps of 0.1. The hybrid search is limited to 20 results. Sharing the result just to show the approach:

The fact that the search string is a paraphrase that does not use verbatim terms in the original shows with the better results around 0.7 0.8.


Hahaha. This is awesome!

I will share this internally.

Thanks for sharing.

Kewl, thanks for sharing

1 Like

As soon as I can I will try to write a little article on the benchmarking methodology hoping it will be useful to someone :slight_smile: I think that for the hybrid search looking for one or more optimized alphas is something useful. Keep up the great work!

1 Like

That’s awesome! Especially if the blog includes examples where you can see clear difference. People love that. Thanks :pray:

Do you have any suggestions on a weaviate collection/source material with a property/field that is descriptive that can be publicly used as a benchmark target collection? My own is in Italian and has confidential material.

When I say descriptive I am meaning a field that describes in prose something, in such a way that I could describe it also paraphrasing it without using most/any of the original text, in such a way that if using the same words I can exercise the inverted index search while if I use the paraphrase that would only work with a vector search.

Something like Kaggle’s Wine Reviews - description field, but easily available to everybody?

Ooo that’s a good question. I think you could use a generative model to create some demo content tho

What I’m going to do is to fetch a subset of Kaggle’s wine descriptions (approx 20K english texts), then use a generative model to generate the paraphrases of it. This will build my “to be searched with vectors” benchmark.

You can actually kill two birds with one stone.

What I’m going to do is to fetch a subset of Kaggle’s wine descriptions (approx 20K english texts), then use a generative model to generate the paraphrases of it.

That’s an awesome example where a generative feedback loop can be used: Generative Feedback Loops with LLMs for Vector Databases | Weaviate - Vector Database


I have jotted down a simple python repo where I start from the public Kaggle wine list, trim it down to 20K objects such as:

        "title": "Cafaggio 2010 Basilica del Pruneto Merlot (Toscana)",
        "keywords": [

which represent the name of a wine (title) and keywords describing it.
With this base the alphas graph is obviously very different:

which not surprisingly demonstrates that tje inverted index sparse search is superior.
If anyone is interested in taking a peek to the repo, it would contain all necessary files (although I will remove the very large Kaggle original list which you would need only if you needed to run the 01 program to generate another selection from it) and a pyproject.toml file to install all prerequisites with poetry.
Just add your Weaviate instance and have fun.
PS The .env.copy file will need to be copied/renamed to .env and you would put your own OpenAPI key in it.
PPS The next experiment will be randomly selecting only 3 keywords and searching with those; this would be closer to a real keyword search.

1 Like

and now the last index search experiment. The original benchmark had an average of over 16 keywords per wine and with all of those we get the super-duper recall of the previous graph.

I have now randomly selected only 3 keywords from each. Here is an example:

        "title": "Moccagatta 2012 Basarin  (Barbaresco)",
        "keywords": [

and here are the results:

and as you can see even with the hybrid search completely skewed towards using only the inverted index (alpha=0.0) the recall is only around 75%, we get the expected object as first in the retrieved results in only under 40% of the times and within the first 3 in around 50% of the times.

If you start cranking up the contribution of the vector search the perfomances drop dramatically with a total meltdown with an alpha of 1.

Now off to feed the family and tomorrow I might continue with the paraphrase benchmarking.

You all take care. Viva Weaviate !!!

1 Like

Cool, thanks for sharing! :slight_smile:

Some interesting way to extend this would be test the two different hybrid fusion algorithms: Search operators | Weaviate - Vector Database and check if you see a difference