How do I improve hybrid search on Weaviate? Been poking at this for too long but haven't made much headway

I’ve been working on Hybrid Search using Weaviate. I use OpenAI’s latest embeddings model, and then some other stuff. So, the problem is, for some queries, I would like to focus on specific properties while for others, I would like to focus on other properties more. I also have 2 axes on which I want to rank the recommendations - relevance and excellence.

Relevance would be how relevant they are to my search, and excellence would be how excellent the “document” is based on some score that I give it.
So far, the things I’ve tried are:

  1. Cohere reranking. I saw that v3 reranking gave marginally better results for short queries than the “Hybrid Score” for Weaviate
  2. For shorter queries, I do more of a keyword search and for longer queries, more of a semantic search (shifting the alpha value based on word count)
  3. Assigning weights for keyword search in Weaviate
  4. Tried using a linear combination of my in house eval and relevancy (reranked score/hybrid score) and sorted based on that. This didnt really provide satisfactory results at all.

Are there any suggestions based on which I could try improving the Search results? I want:

  1. To be able to “understand” what the query is for, and focus on that property more in my vector DB schema for the search
  2. For common queries, I want to be able to surface more “excellent” recommendations, as if its common, rather than focusing on very very relevant stuff, if it meets a certain level of relevancy and then is really excellent, that is the best way to go and looks really good in search results
  3. For larger/more niche queries, focus on the relevancy a lot more
  4. I think fine tuning Cohere’s reranking model might be an option here?
  5. How do I factor in the excellence?

What are my options here, and where do I go from here? Also, I’ve been checking distributions of scores that are returned from Weaviate upon the hybrid search, and I see that in lot of cases, if, say, I return the top 800 people from my query, most of them (~500-700) fall in the range of <0.3 vector/keyword scores or they do not have keyword/vector scores at all in which case one score is just 0.

What are my options here? Also, will making more than one schema help? Currently, I have all properties in one schema for my objects, so will making different schemas and trying to aggregate scores by performing parallel queries help?

Do you know you can set weights to specific fields?

Here is how:

Also, you can use some metadata to understand why an object is selected based on your query using the explain. Here is how:

Please, let me know if this helps or if you need further assistance.


Yes, I’ve played around with weights. I think currently weaviate makes a single vector for each object, is that correct? One thing I could do is to use named vectors to have multiple vector spaces for each object??

Also, how to see the “documents” that are referenced in “explainScore”?