Indexes, filters and tokenization benchmarks

rjalex · February 19, 2024, 7:44am

Are there benchmarks on the speed of the various filter operators as applied to the different tokenization strategies?

Thank you

DudaNogueira · February 19, 2024, 7:54pm

That’s an interesting subject.

I believe that the difference here will be the number of indexed tokens.

For the given example:

You can consider that field tokenization will always get you one indexed token, while the other will get you multiple (if you have more than one token in your content) tokens.

We have a benchmarking repo:

But I don’t think they cover this, but other index benchmarks.

Topic		Replies	Views
Starting to have fun :) General	11	372	February 21, 2024
Performance wise suggestion General developer-experience , python	0	157	May 28, 2024
Like operator performance Support	5	441	August 22, 2023
Vector database benchmarks (ANN benchmarks) Resources	0	410	May 26, 2023
Performance Issue when Extracting Documents with Field Filter in Weaviate Support	3	743	June 21, 2023

Indexes, filters and tokenization benchmarks

Related topics