How to manage the merging of an hybrid query on a property and a BM25 on another

I have a collection whose declaration is the following (simplified):

client.collections.create(
                name=wv_artcollname,
                description="A collection of articles data",
                vectorizer_config=vect_config_list,
                properties=[
                    wvcc.Property(name="prose", data_type=wvcc.DataType.TEXT),
                    wvcc.Property(
                        name="entities",
                        data_type=wvcc.DataType.TEXT,
                        skip_vectorization=True,
                    ),
                ],
            )

and I am using 'prose" with an hybrid search and obtain a given ranking and normalized scores.

But I also want to perform a BM25 keyword search on the ā€˜entitiesā€™ property (which is a string with the names of people, places and organizations mentioned in the article and identified by a NER preprocessing). This query will also return another ranking and scores.

What is the suggested approach to make the best of these two informations?

Let me give you an example:

query_string: ā€œGlobal warming tropical Brazil Lulaā€ sent to Weaviate on the ā€˜articlesā€™ collection as an hybrid query with alpha 0.5 to a specific named vector that resulted by the embedding of the ā€˜proseā€™ property.

Now the very same query string could also be used against the ā€œentitiesā€ property, right? And this will produce another ranking.

How would I handle these two classifications? Suggestions?

I also tried the following approach:

response = wv_artcoll.query.hybrid(
        query=query_string,
        query_properties=["entities^2", "prose"],
        vector=query_vector,
        target_vector=graphql_model_name,
        limit=request.result_limit,
        alpha=request.alpha,
        return_metadata=MetadataQuery(score=True, explain_score=True),
    )

does it make sense?

Thank you

hi @rjalex !

Thatā€™s a really good question. hahaha

My first assumption is that query_properties will first run the bm25 query, consider its weights, and then fuse that with the vector findings. This is what you get on the query example.

What you say here is doing a ā€œlate bm25 reinforcementā€? So, after a hybrid search, you reinforce the score with a new bm25 on targeted properties?

This is interesting. :thinking:

Let me know if I got this right!

Thanks!

1 Like

Yes that is exactly what Iā€™m trying to better understand.
I have a set of results with an hybrid query on property ā€˜proseā€™ and also another set of results via a BM25 query on another property ā€˜entitiesā€™ which is only populated with names of places, people, organizations.

I am trying to understand what strategies are the best to fuse these two approaches.

Thanks !!!