Hi,
In my database (few millions of legal docs) we have short documents (single paragraph) and very long ones (equivalent of 10 pages). Each of them have titles (descriptive of its content)
I want to setup hybrid search with weaviate. For embeddings i need to split long docs in short ones, and append the title to each of them.
For BM25 I want to keep all documents intacts for 2 reasons : first, very purpose of bm25 (vs tf idf) is to take length into account of relevancy statistics.
Also repeating title (which we usually put in a dedicated field) many times for some documents and not for other will modify token statistics and we expect it to make the search less relevant (we plan a real xp with some measures, for now it s just an expectation).
Question is : is there a way to have several vectors for each document and perform a hybrid search? Or are we force to have 2 different indexes and do 2 search and perform some reconciliation after the retrieval step?