How do I improve hybrid search on Weaviate? Been poking at this for too long but haven't made much headway

aritraban · April 22, 2024, 10:36pm

I’ve been working on Hybrid Search using Weaviate. I use OpenAI’s latest embeddings model, and then some other stuff. So, the problem is, for some queries, I would like to focus on specific properties while for others, I would like to focus on other properties more. I also have 2 axes on which I want to rank the recommendations - relevance and excellence.

Relevance would be how relevant they are to my search, and excellence would be how excellent the “document” is based on some score that I give it.
So far, the things I’ve tried are:

Cohere reranking. I saw that v3 reranking gave marginally better results for short queries than the “Hybrid Score” for Weaviate
For shorter queries, I do more of a keyword search and for longer queries, more of a semantic search (shifting the alpha value based on word count)
Assigning weights for keyword search in Weaviate
Tried using a linear combination of my in house eval and relevancy (reranked score/hybrid score) and sorted based on that. This didnt really provide satisfactory results at all.

Are there any suggestions based on which I could try improving the Search results? I want:

To be able to “understand” what the query is for, and focus on that property more in my vector DB schema for the search
For common queries, I want to be able to surface more “excellent” recommendations, as if its common, rather than focusing on very very relevant stuff, if it meets a certain level of relevancy and then is really excellent, that is the best way to go and looks really good in search results
For larger/more niche queries, focus on the relevancy a lot more
I think fine tuning Cohere’s reranking model might be an option here?
How do I factor in the excellence?

What are my options here, and where do I go from here? Also, I’ve been checking distributions of scores that are returned from Weaviate upon the hybrid search, and I see that in lot of cases, if, say, I return the top 800 people from my query, most of them (~500-700) fall in the range of <0.3 vector/keyword scores or they do not have keyword/vector scores at all in which case one score is just 0.

What are my options here? Also, will making more than one schema help? Currently, I have all properties in one schema for my objects, so will making different schemas and trying to aggregate scores by performing parallel queries help?

DudaNogueira · April 23, 2024, 1:31pm

hi @aritraban !

Welcome to our community !!

Do you know you can set weights to specific fields?

Here is how:

Also, you can use some metadata to understand why an object is selected based on your query using the explain. Here is how:

Please, let me know if this helps or if you need further assistance.

Thanks!

aritraban · April 23, 2024, 1:49pm

Yes, I’ve played around with weights. I think currently weaviate makes a single vector for each object, is that correct? One thing I could do is to use named vectors to have multiple vector spaces for each object??

Also, how to see the “documents” that are referenced in “explainScore”?
@DudaNogueira

Topic		Replies	Views
Hybrid search in weaviate Support	1	329	January 9, 2025
In Weaviate hybrid search, is there a way to control the results based on score? General	2	1833	June 17, 2023
How to manage the merging of an hybrid query on a property and a BM25 on another General	2	435	May 15, 2024
How can we make hybrid search results more predictable? Support	8	1416	November 4, 2023
Hybrid search explanation explanation :) General documentation	4	787	May 10, 2024

How do I improve hybrid search on Weaviate? Been poking at this for too long but haven't made much headway

Related topics