How do I improve hybrid search on Weaviate? Been poking at this for too long but haven't made much headway

I’ve been working on Hybrid Search using Weaviate. I use OpenAI’s latest embeddings model, and then some other stuff. So, the problem is, for some queries, I would like to focus on specific properties while for others, I would like to focus on other properties more. I also have 2 axes on which I want to rank the recommendations - relevance and excellence.

Relevance would be how relevant they are to my search, and excellence would be how excellent the “document” is based on some score that I give it.
So far, the things I’ve tried are:

  1. Cohere reranking. I saw that v3 reranking gave marginally better results for short queries than the “Hybrid Score” for Weaviate
  2. For shorter queries, I do more of a keyword search and for longer queries, more of a semantic search (shifting the alpha value based on word count)
  3. Assigning weights for keyword search in Weaviate
  4. Tried using a linear combination of my in house eval and relevancy (reranked score/hybrid score) and sorted based on that. This didnt really provide satisfactory results at all.

Are there any suggestions based on which I could try improving the Search results? I want:

  1. To be able to “understand” what the query is for, and focus on that property more in my vector DB schema for the search
  2. For common queries, I want to be able to surface more “excellent” recommendations, as if its common, rather than focusing on very very relevant stuff, if it meets a certain level of relevancy and then is really excellent, that is the best way to go and looks really good in search results
  3. For larger/more niche queries, focus on the relevancy a lot more
  4. I think fine tuning Cohere’s reranking model might be an option here?
  5. How do I factor in the excellence?

What are my options here, and where do I go from here? Also, I’ve been checking distributions of scores that are returned from Weaviate upon the hybrid search, and I see that in lot of cases, if, say, I return the top 800 people from my query, most of them (~500-700) fall in the range of <0.3 vector/keyword scores or they do not have keyword/vector scores at all in which case one score is just 0.

What are my options here? Also, will making more than one schema help? Currently, I have all properties in one schema for my objects, so will making different schemas and trying to aggregate scores by performing parallel queries help?

hi @aritraban !

Welcome to our community :hugs: !!

Do you know you can set weights to specific fields?

Here is how:

Also, you can use some metadata to understand why an object is selected based on your query using the explain. Here is how:

Please, let me know if this helps or if you need further assistance.

Thanks!

1 Like

Yes, I’ve played around with weights. I think currently weaviate makes a single vector for each object, is that correct? One thing I could do is to use named vectors to have multiple vector spaces for each object??

Also, how to see the “documents” that are referenced in “explainScore”?
@DudaNogueira