Since bm25 is based on tf-idf, it relies on the relative frequency of terms with respect to their frequency in the whole corpus
So each time I add a new document to the corpus, the relative weights of the terms should be re computed
But it does not seem to be the case when using weaviate and I doubt all the documents / records are parsed each time I add a new record.
What am I missing ?
Is the bm25 only taking into account the term frequency within the document? and not the whole corpus
Hi!
Each time it time you add a new object, it will index the keywords of that object only, taking into account the tokenization of that property.
So no need to reindex all the other objects.
We have a blog post here:
That goes into more details on that.
Let me know if that helps!
Thanks!
Thanks
Thatβs what I was assuming it does but wanted to be sure
1 Like