Performed with Read consistency as QUORUM and ALL & Async Replication Disabled
1)why is that read-on-repair works for search by filter case but not vector search
2) Setup(3 Node Cluster) - Node A down, insertion happens into Node B, and then bring back Node A
the search operations i performed
vector search
when the coordinator is non-replica node Node C, some times i get the results and sometimes not(could be because HNSW index search is directed to different nodes) and when i got the result thought that it would get repaired on read but the node A didn’t have any objects(no repair). How the coordinator selects which replica of a shard to ask for the index search work (vector HNSW) when there are multiple replicas available???
when the coordinator is replica node but with 0 objects that is Node A, i every time got 0 objects as response and of course it did not get the objects by read on repair even though it returned 0 objects i thought the response from node B would repair it. (Local HNSW Index Search)
when the coordinator node is replica with objects that is Node B(local index first), i got the objects as response every time but still Node A didn’t get any objects repaired it still had 0 objects(no repair)
Search using filter/ID
when the coordinator node is replica with 0 objects Node A, i got 0 objects as the result every time and after the search i thought it would have gotten the objects from Node B but it didn’t
when the coordinator node is replica with objects Node B or non-replica node Node C, i got the objects as results every time and the objects where replicated by read on repair irrespective of whether Coordinator node was Node B or Node C
Node A was down during insertion, Node B got the data and index updated. Later you query, Node B hosts the data, Node A does not. The coordinator uses Node B’s shard / replica for the index search and returns hits. Then i expected fetch to go to all replicas, detect Node A missing data, and trigger repair; but it didn’t.
this env variable solved the issue of only local index search, but why is this env variable not specified in the docs weird , but still have the problem with read repair in vector search.
I believe that Read-on-repair in Weaviate is triggered by the consistency checking mechanism that occurs after search results are returned. The critical difference between filter/ID searches and vector searches is:
Filter/ID searches go through objectSearchByShard which calls CheckConsistency
Vector searches also call CheckConsistency, but only on the final merged results.
The behavior you’re observing is by design in the current implementation. Vector search prioritizes performance by using local indexes when available, but this means read-on-repair doesn’t work when the local index is incomplete. The CheckConsistency method only validates the objects that were returned, not objects that should have been returned but weren’t.
For your scenario where Node A was down during insertion, you would need to either:
Use filter-based searches to trigger repair
Manually trigger repair through other mechanisms
Wait for async replication to sync (though you mentioned it’s disabled)
even when the local shard had returned objects based on the search from the local HNSW index, it has to perform search in all the replicas (based on read CL)right, then the coordinator must have received the objects from one replica and no object from another, in that case there is clear inconsistency, i think the read-on-repair must happen - or is my understanding corrupted
If you want to achieve full consistency even after a node outage, why don’t you consider enabling async replication, which proactively repairs out-of-sync shards in the background, rather than relying solely on repair-on-read.
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)
Thank you for the suggestion regarding async replication.
While I understand that async replication could achieve full consistency, my core question is whether read-on-repair is designed to work specifically for vector searches, or if it is intentionally limited in that context. I was operating under the assumption that it was supposed to cover vector searches as well.
It should, when you perform a vector search, if the coordinator node detects inconsistencies in the returned objects from different replicas, it will attempt to repair the out-of-sync data using the repair-on-read mechanism.
You should perform search with consistency is set to All
Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)