Issue with Vector Search Accuracy – Struggling with Negative Expressions

Hello everyone,

I am performing vector search on structured data using Weaviate and OpenAI’s Ada model for generating embeddings. However, I am facing an issue with accuracy, particularly when handling negative expressions in queries.

Issue:

For example, the queries:

  • “Give the name of the person who likes football”
  • “Give the name of the person who doesn’t like football”

return the same results, even though they should ideally be different. It seems that the model is not properly interpreting negation in queries.

What I’ve Tried:

I referred to the Weaviate HNSW tuning guide and adjusted parameters like efConstruction, ef, and maxConnections, but the issue persists.

Question:

How can I improve the accuracy of vector search to correctly handle negative expressions in queries? Are there any specific strategies, preprocessing steps, or alternative approaches to enhance the understanding of negation?

Any suggestions or insights would be greatly appreciated!

Thanks in advance.

Hi @Rohini_vaidya,

This is indeed a challenging in vector search. I would suggest considering Hybrid Search as an approach to improve accuracy, especially for queries involving negatives. Weaviate supports combining vector search with keyword-based search, which can help capture specific terms like “doesn’t” or “not”. You can adjust the balance between vector and keyword search using the alpha parameter.

Additionally, have a look at Tokenization config for the properties where the default is WORD, or you could leverage Field if that works better for your use case.

While you’ve already tuned HNSW parameters, it’s worth noting that the ef parameter is crucial for balancing search speed and quality. A higher ef value results in a more extensive search, enhancing accuracy but potentially slowing down the query.

Thank you @Mohamed_Shahin
I have tried both solution that you have suggested, but unfortunately it’s not working.

Still I am not able to achieve the accuracy for hybrid search.

Property(
name=“ABC”,
data_type=DataType.TEXT,
vectorize_property_name=True,
tokenization=Tokenization.WORD,
index_filterable=True,
index_searchable=True
)

Despite this configuration, I am still unable to achieve the desired accuracy.

Am I missing something? Are there any alternative approaches or tweaks I should try?

Any guidance would be greatly appreciated.

Thanks in advance!

Thank you in advance.

Hey @Rohini_vaidya

It’s definitely a challenging issue. I did some digging and come across that some models, like Snowflake models, are trained more on “hard negatives.” While not exactly the same as handling negation, they might perform better than the current Ada model. I would give that a try as an option.

There’s also a general issue with negations in search. Amazon published a paper on this challenge and how fine-tuning can improve performance:

Another potential way which could be attempted as first is to add a final reranker or RAG stage to adjust the results post-retrieval.

Hope this gives you a few ideas to explore!