Is hybrid search with multiple queries possible?

Description

For near_text search, you can provide multiple query vectors for the same target vector:

Is something like this also supported for hybrid search? The search would require an array of query texts and an array of query vectors.

My use case is that I’m splitting a long document into chunks, and then want to search the vector database by each chunk using a hybrid search. The results should be combined using a “join strategy”, as described here: Multiple target vectors | Weaviate

hi @RisingOrange !

Welcome to our community :hugs:

The document you linked is doing in fact a collection.query.near_vector :thinking:

I believe this is what you want:

from weaviate.classes.query import HybridVector, Move, HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
    query="California",
    max_vector_distance=0.4,  # Maximum threshold for the vector search component
    vector=HybridVector.near_vector(
        vector=[v1, v2]
    ),
    alpha=0.75,
    limit=5,
)

Let me know if this helps!

1 Like

Thanks, that’s helpful!

I tried the code on weaviate 4.13.2 and got “Providing lists of lists has been deprecated. Please provide a dictionary with target names as keys and lists of numbers as values.”

I modified it and now have this version, which works:

from weaviate.classes.query import HybridVector

chunks = split_query(query_text)
chunk_vectors = generate_embeddings(chunks)

response = collection.query.hybrid(
    query=query_text,
    target_vector="corpus_vector",
    vector=HybridVector.near_vector(
        vector={"corpus_vector": chunk_vectors},
    ),
    max_vector_distance=similarity_threshold,
    alpha=0.75,
    limit=5,
)

However, I think the results would be different if the collection.query.hybrid function allowed to pass multiple text queries, instead of just allowing one query string. Splitting the query text, doing a BM25F query for each one and then combining the results would give a different result than just doing one BM25F query on the whole query, right?

The text queries will boil down to tokens for the bm25 phase of the search

So

query = "This query has multiple tokens"

will be the same as

query = [
    ["this", "query"], 
    ["has", "multiple", "tokens"]
]

Let me know if this helps!

1 Like