Boost fields with certain values

Hi,

Can the scores be boosted if the documents contain a set of values in a field.

Example:

If I have a Article class with a Topic field, and Topic can contain any of the following values: “Global”,“Local”,“Politics” etc.,

Is there a way to boost “Local” news over “Global” News?

Possible solutions I can think of:

  1. Filtering on “Local” is not an option as that would completely remove global news.
  2. Making separate queries with different filters and post-processing them later. This might result in higher latencies.

I believe this question is similar to : Boost recent documents

Please let me know if there’s a better way to handle this.

Thanks

You should be able to use Reranking for this: Reranking | Weaviate - vector database
Just rerank based on a query on Topic, the example on that site should be pretty much what you need.

Hey @lnatspacy ,

I believe re-ranking only helps in re-ordering of the search results using the query provided as a post-processing step. Please correct me If I am wrong.

I was looking for a way to boost scores while fetching the initial search results itself. Something similar to the field boosting, ["topic[Local]^3","topic[Global]^2","content"].

You’re correct, but the effect will be the same no? Whether they are boosted in the initial result, or you are “boosting” them after doesn’t really make a big difference, right?

Hey there, you can arbitrarily “weight” fields in BM25 searches like so: BM25 search | Weaviate - vector database

This also applies to hybrid searches.

In terms of vector searches, literally adding the extra terms like “local news” (rather than “news”) will have a similar effect, as they work on semantic similarity.

If your query is different to your desired ranking you could use a reranker as @lnatspacy mentioned, too.

Hey @jphwang

Thanks for the reply.

In BM25, I was looking for a way to give more weightage to some phrases in the search query over others.

For reference (from elasticsearch documentation):

Use the boost operator ^ to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:

quick^2 fox

Boosts can also be applied to phrases or to groups:

“john smith”^2 (foo bar)^4

My problem with re-ranker is it will only rerank the documents fetched with the limit specified. This might already exclude some documents that would otherwise be part of the initial set of documents (before re-ranking) if we could boost some terms/phrases.

Hi @vamsi -

Would something like this work? You can weight the query by repeating words as that will affect the score.

for query_text in ["fresh food", "fresh food food food"]:
    response = (
        client.query
        .get("JeopardyQuestion", ["question", "answer"])
        .with_bm25(
          query=query_text,
          properties=["question", "answer"]
        )
        .with_additional("score")
        .with_limit(3)
        .do()
      )

    print(json.dumps(response, indent=2))

(Sorry about my earlier response - which I’ve since deleted, I was confused at the time :sweat_smile:)