BM25 need to reindex the whole corpus?

Since bm25 is based on tf-idf, it relies on the relative frequency of terms with respect to their frequency in the whole corpus
So each time I add a new document to the corpus, the relative weights of the terms should be re computed
But it does not seem to be the case when using weaviate and I doubt all the documents / records are parsed each time I add a new record.
What am I missing ?
Is the bm25 only taking into account the term frequency within the document? and not the whole corpus

Hi!

Each time it time you add a new object, it will index the keywords of that object only, taking into account the tokenization of that property.

So no need to reindex all the other objects.

We have a blog post here:

That goes into more details on that.

Let me know if that helps!

Thanks!

Thanks
That’s what I was assuming it does but wanted to be sure

1 Like