Different query weightings on properties inside one collection

For example, I have title, description and review in one book collection. When do the near text query, is there any way to assign different weights of each field? like title is 5, description is 2, review is 1.

I know there is rerank, but the weighting results will get the CORRECT set of first query results. I think this is better than increase the limit range to 3 or 5 times then use rerank to focus back on the best query results.

Thanks,

Hi @Lawrence_Hope,

Yes, Weights for keyword-based queries

You can add weights to a query that uses keyword search – bm25, or hybrid (which combines keyword and vector search).

Here is a python example from docs - hybrid search:

jeopardy = client.collections.get("JeopardyQuestion")
    response = jeopardy.query.hybrid(
        query="food",
        query_properties=["question^2", "answer"],
        alpha=0.25,
        limit=3
    )

The key element is the ^2 part, which tells Weaviat to 2x the score for a match on that property. You can add more of these values:

query_properties=["title^3", "genre", "author^2"],

No weights for vector search (yet… see below)

However, for vector search like near_text, you cannot give extra weights for matching to specific properties.
This is because, each object has one vector embedding. When you run a vector search, there is no way to distinguish between a match on a specific field, as we search on the whole embedding.

NamedVectors coming soon

As an FYI, we are working to add support for multiple vector embeddings per object, so you could have a separate embedding for a title, and another one for description, and a third one that combines multiple properties.

Note, in the first release, we will only allow you to search on one named vector, so you won’t be able to add weights to it just yet. But we are planning to add mixed vector search next.

I hope this helps :slight_smile:

2 Likes

Great to know this feature is in plan. This will help Weaviate jumps out from other vector DB.