Storing multiple vectors per doc for hybrid search

pommedeterresautee · August 10, 2023, 10:17am

Hi,

In my database (few millions of legal docs) we have short documents (single paragraph) and very long ones (equivalent of 10 pages). Each of them have titles (descriptive of its content)

I want to setup hybrid search with weaviate. For embeddings i need to split long docs in short ones, and append the title to each of them.

For BM25 I want to keep all documents intacts for 2 reasons : first, very purpose of bm25 (vs tf idf) is to take length into account of relevancy statistics.
Also repeating title (which we usually put in a dedicated field) many times for some documents and not for other will modify token statistics and we expect it to make the search less relevant (we plan a real xp with some measures, for now it s just an expectation).

Question is : is there a way to have several vectors for each document and perform a hybrid search? Or are we force to have 2 different indexes and do 2 search and perform some reconciliation after the retrieval step?

r4y_Y · September 15, 2023, 5:16am

my understanding is that BM25 doesnt work with vectors at all… but not sure how its implemented in Weaviate. BM25 imo is similar to a keyword search (SELECT * from text WHERE …)

pommedeterresautee · September 15, 2023, 9:32am

There is a working hybrid search feature, with support of 2 fusions strategies. Still it only works with 1 doc == 1 vector and do not match more complex situations. We are testing Vespa which supports it FWIW…

Topic		Replies	Views
Is hybrid search with multiple queries possible? Support	3	211	April 28, 2025
How to manage the merging of an hybrid query on a property and a BM25 on another General	2	271	May 15, 2024
Hybrid search with embedding outside the database Support	1	168	September 16, 2024
How can we make hybrid search results more predictable? Support	8	1187	November 4, 2023
Hybrid search with custom vector embedding for query Support	0	22	July 24, 2025

Storing multiple vectors per doc for hybrid search

Related topics