Indexes, filters and tokenization benchmarks

Are there benchmarks on the speed of the various filter operators as applied to the different tokenization strategies?

Thank you

hi @rjalex !

That’s an interesting subject.

I believe that the difference here will be the number of indexed tokens.

For the given example:

You can consider that field tokenization will always get you one indexed token, while the other will get you multiple (if you have more than one token in your content) tokens.

We have a benchmarking repo:

But I don’t think they cover this, but other index benchmarks.