I have a collection whose declaration is the following (simplified):
client.collections.create(
name=wv_artcollname,
description="A collection of articles data",
vectorizer_config=vect_config_list,
properties=[
wvcc.Property(name="prose", data_type=wvcc.DataType.TEXT),
wvcc.Property(
name="entities",
data_type=wvcc.DataType.TEXT,
skip_vectorization=True,
),
],
)
and I am using 'prose" with an hybrid search and obtain a given ranking and normalized scores.
But I also want to perform a BM25 keyword search on the āentitiesā property (which is a string with the names of people, places and organizations mentioned in the article and identified by a NER preprocessing). This query will also return another ranking and scores.
What is the suggested approach to make the best of these two informations?
Let me give you an example:
query_string: āGlobal warming tropical Brazil Lulaā sent to Weaviate on the āarticlesā collection as an hybrid query with alpha 0.5 to a specific named vector that resulted by the embedding of the āproseā property.
Now the very same query string could also be used against the āentitiesā property, right? And this will produce another ranking.
How would I handle these two classifications? Suggestions?
I also tried the following approach:
response = wv_artcoll.query.hybrid(
query=query_string,
query_properties=["entities^2", "prose"],
vector=query_vector,
target_vector=graphql_model_name,
limit=request.result_limit,
alpha=request.alpha,
return_metadata=MetadataQuery(score=True, explain_score=True),
)
does it make sense?
Thank you