I’ve been working on Hybrid Search using Weaviate. I use OpenAI’s latest embeddings model, and then some other stuff. So, the problem is, for some queries, I would like to focus on specific properties while for others, I would like to focus on other properties more. I also have 2 axes on which I want to rank the recommendations - relevance and excellence.
Relevance would be how relevant they are to my search, and excellence would be how excellent the “document” is based on some score that I give it.
So far, the things I’ve tried are:
- Cohere reranking. I saw that v3 reranking gave marginally better results for short queries than the “Hybrid Score” for Weaviate
- For shorter queries, I do more of a keyword search and for longer queries, more of a semantic search (shifting the alpha value based on word count)
- Assigning weights for keyword search in Weaviate
- Tried using a linear combination of my in house eval and relevancy (reranked score/hybrid score) and sorted based on that. This didnt really provide satisfactory results at all.
Are there any suggestions based on which I could try improving the Search results? I want:
- To be able to “understand” what the query is for, and focus on that property more in my vector DB schema for the search
- For common queries, I want to be able to surface more “excellent” recommendations, as if its common, rather than focusing on very very relevant stuff, if it meets a certain level of relevancy and then is really excellent, that is the best way to go and looks really good in search results
- For larger/more niche queries, focus on the relevancy a lot more
- I think fine tuning Cohere’s reranking model might be an option here?
- How do I factor in the excellence?
What are my options here, and where do I go from here? Also, I’ve been checking distributions of scores that are returned from Weaviate upon the hybrid search, and I see that in lot of cases, if, say, I return the top 800 people from my query, most of them (~500-700) fall in the range of <0.3 vector/keyword scores or they do not have keyword/vector scores at all in which case one score is just 0.
What are my options here? Also, will making more than one schema help? Currently, I have all properties in one schema for my objects, so will making different schemas and trying to aggregate scores by performing parallel queries help?