Can the scores be boosted if the documents contain a set of values in a field.
If I have a
Article class with a
Topic field, and
Topic can contain any of the following values: “Global”,“Local”,“Politics” etc.,
Is there a way to boost “Local” news over “Global” News?
Possible solutions I can think of:
- Filtering on “Local” is not an option as that would completely remove global news.
- Making separate queries with different filters and post-processing them later. This might result in higher latencies.
I believe this question is similar to : Boost recent documents
Please let me know if there’s a better way to handle this.
You should be able to use Reranking for this: Reranking | Weaviate - vector database
Just rerank based on a query on Topic, the example on that site should be pretty much what you need.
Hey @lnatspacy ,
I believe re-ranking only helps in re-ordering of the search results using the query provided as a post-processing step. Please correct me If I am wrong.
I was looking for a way to boost scores while fetching the initial search results itself. Something similar to the field boosting,
You’re correct, but the effect will be the same no? Whether they are boosted in the initial result, or you are “boosting” them after doesn’t really make a big difference, right?
Hey there, you can arbitrarily “weight” fields in BM25 searches like so: BM25 search | Weaviate - vector database
This also applies to hybrid searches.
In terms of vector searches, literally adding the extra terms like “local news” (rather than “news”) will have a similar effect, as they work on semantic similarity.
If your query is different to your desired ranking you could use a reranker as @lnatspacy mentioned, too.
Thanks for the reply.
In BM25, I was looking for a way to give more weightage to some phrases in the search query over others.
For reference (from elasticsearch documentation):
Use the boost operator
^ to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:
Boosts can also be applied to phrases or to groups:
“john smith”^2 (foo bar)^4
My problem with re-ranker is it will only rerank the documents fetched with the
limit specified. This might already exclude some documents that would otherwise be part of the initial set of documents (before re-ranking) if we could boost some terms/phrases.
Hi @vamsi -
Would something like this work? You can weight the query by repeating words as that will affect the score.
for query_text in ["fresh food", "fresh food food food"]:
response = (
.get("JeopardyQuestion", ["question", "answer"])
(Sorry about my earlier response - which I’ve since deleted, I was confused at the time )