Replica search GET query returns different results

darpan · July 18, 2023, 12:40am

Hello,

I have Weaviate 1.20.1 running with 2 replicas and 2 shards for a particular class with ~4000000 documents. Doing a dense vector search for weaviate-0 replica skips results with a higher certainty. I am testing each replica by port-forwarding from the pod.
So for example i have two documents DocA /DocB and Query .

certainty of Query & DocA = 0.881561815738678
certainty of Query & DocB = 0.7634845674037933

When i query weaviate-1 to get the top result, it correctly returns DocA for the Query . But when i query weaviate-0 it returns DocB as the closest, even though when i examine DocA in weaviate-0 (by adding the where filter)the certainty is 0.881561815738678 .

Let me know if more details are needed.
Thank you in advanced!

jphwang · July 18, 2023, 9:32am

Hi @darpan - that’s odd. I wonder if the query is being run with ConsistencyLevel.ONE somehow.

Could you please try running the same query with different consistency levels (ALL/QUORUM/ONE)?

In Python, for example - it would be set like:

.with_consistency_level(ConsistencyLevel.ALL)
.with_consistency_level(ConsistencyLevel.QUORUM)
.with_consistency_level(ConsistencyLevel.ONE)

darpan · July 26, 2023, 1:49pm

Thanks @jphwang for your quick reply.
Apologize for the delay in response, with all 3, it returns same results from each container.

darpan · July 26, 2023, 2:22pm

To reproduce the issue run the steps below, let me know if you come across anything or have any issues in setting this up:
Related to Slack

Create a new directory
Grab the .gz from this dataset and add it to the new directory gfissore/arxiv-abstracts-2021 at main
unzip contents of “reproduce_index.zip” (Located in this google drive in the new directory
run “docker-compose up -d” to create a 2 instance weaviate
run “python create_schema.py” to create the Article Schema
run “python index.py” to index all documents. This should take a little bit. To speed up, you can bump up the replicas for t2v-transformers if you have more resources.
Run the GQL query below against both http://localhost:6001/v1/graphql and http://localhost:6002/v1/graphql to see the difference

{
    Get {
        Article(
            nearText: {concepts: ["A pilgrimage to gravity on GPUs"]
            }
            limit: 12
        ) {
            title
            body
            _additional {
                isConsistent
                certainty
            }
        }
    }
}

jphwang · July 27, 2023, 10:02am

Hi @darpan I wrote a reply but I see you’re getting assistance from Parker and he would know much better then me

darpan · July 27, 2023, 12:35pm

Thanks @jphwang

I will update here when its resolved.

parkerduckworth · September 7, 2023, 4:30pm

@darpan would you be willing to upgrade to v1.21.2? We have since included some changes that improve the resiliency of replicated search.

I did setup a cluster to try and reproduce your issue according to the steps above, but everything seemed to work as expected for me.

Maybe the upgrade will clear up your issue

darpan · October 5, 2023, 7:08pm

Thanks @parkerduckworth for taking a look at this, i know this is a difficult one.

I have had to scale down the instance to single node for our production for now to avoid complaints. When I upgrade/recreate the schema in the next few weeks, i will try to use the latest version and see if the issue is still occurring.

Will update here when i find something.

spark · March 21, 2024, 11:14am

Hi @jphwang, could you please clarify my query?
Let’s say. I have created a schema by ingesting into weaviate database. when I’m searching a for a query, everytime it provides different number of recommendations. Although they are correct but I somewhat want to fix those recommendations. Is it possible?

jphwang · March 21, 2024, 11:47am

Hi @spark and welcome!

Would you mind clarifying your query a bit further? Would you have a code example to share, and perhaps let us know in what way the results vary?

Thanks!

spark · March 21, 2024, 12:06pm

let’s say if I query using “recommend me a Louis Vuitton handbag for women”. Now I know that the data I ingested does contain these products. But, i want the number of recommendations to be fixed not stochastic. If I get two specific Loius Vuitton handbag first time while retrieving, I want those two specifically to be recommended everytime I search for the same mentioned query. Is there any way?

jphwang · March 21, 2024, 12:38pm

Hmm. The search itself in Weaviate should be deterministic.

But, are you using a generate query? If you are, those use large language models under-the-hood - which are mostly not completely deterministic.

But some model providers let you tune the sampling. So you could make them less stochastic using model parameters such as temperature or top_p.

Does that help?

References:

https://platform.openai.com/docs/api-reference/chat/create

spark · April 9, 2024, 5:51am

Thanks @jphwang , but one thing I’m really confused about that everytime the same query is retrieving different products…1-2 might be similar but rest are different everytime…this is before any openai api call, I understand that setting temperature to 0 and top_p to a much lower value and fixing a seed makes the responses deterministic…but this is occuring from weaviate hybrid search level itself…Do you have any advice for this?

jphwang · April 9, 2024, 8:36am

@spark - I think that’s unusual. Can you share examples of queries, and results?

If you can include the score metadata in the result as well, that might be helpful to us. (Hybrid search | Weaviate - Vector Database)

spark · April 16, 2024, 8:33pm

Thanks @jphwang , we finally found the reason of stochasticity of product recommendations…the hybrid search of weaviate fully determinsitic what i’ve observed…it’s totally becasue of the non-deterministicity of the generative llm which is applied on top of hybrid search

jphwang · April 16, 2024, 10:13pm

Thanks for coming back and clarifying that for us . I’m glad that you got to the bottom of the issue!

Topic		Replies	Views
[Non deterministic vector search return] Support	4	592	April 12, 2024
Messing up search results under parallel write operations! Support bug , python , technical	6	355	August 7, 2025
Different score for the same entry on different replicas General	1	278	March 20, 2024
Empty search results when one node (of 3) is not fully replicated Support technical	2	287	September 6, 2024
Simple keyword search not working Support	4	1284	September 14, 2023

Replica search GET query returns different results

Related topics