Empty search results when one node (of 3) is not fully replicated

Description

We are testing 3-node cluster (replication factor 3), all data synchronized and… suddenly one node physically failed. The new pod (empty) was deployed on a new machine and activated “read repair” procedure (using batch reading), but It work too slowly ~50obj/sec (how to speedup it?).
Problem: when search request (text/vector) is routed to “empty” node (with any QUORUM or ALL consistency level) - not valid result avalaible. It seems that the “empty” node explicitly looks for data within itself at first… not founds them and quorum/all conditions are not satisfied. At result:

  • for “property equal” search (getting objects by id’s list at once with “ContainsAny”) - not any results returned;
  • for vector search - not self and not closest vectors returned.

If request routed to any node “with this data”, search results are correct.
How to exclude “empty” (not replicated) node from operations or QUORUM/ALL conditions?

Server Setup Information

  • Weaviate Server Version: 1.25.12
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: Yes, 3
  • Client Language and Version: Python3, Python Client v3
  • Multitenancy?: No

hi @wvuser !!

Are you aware of the new feature that landed in 1.26, the async replication?

This will proactively start the repair async.

It may reduce the time between the node realizing that it is inconsistent and a part of a cluster.

I will ask about this scenario to our team.

Thanks!

Yes:) Tried to upgrade:

  1. Pods succesfully started with 1.26.3 image version;
  2. Manually updated schema with “asyncEnabled”:true;
  3. In logs appeared information about async replication starand crash:
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashtree initialization is progress…”,“object_count”:14842232,“shard_name”:“QZOXu2zY6rKM”,“time”:“2024-09-05T09:49:23Z”}
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashbeater stopped”,“shard_name”:“b4ulnsitcsN2”,“time”:“2024-09-05T09:49:24Z”}
    {“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“error”,“msg”:“Recovered from panic: runtime error: index out of range [1] with length 1”,“time”:“2024-09-05T09:49:24Z”}
    goroutine 692758 [running]:
    runtime/debug.Stack()
  • /usr/local/go/src/runtime/debug/stack.go:24 +0x5e*
    runtime/debug.PrintStack()
  • /usr/local/go/src/runtime/debug/stack.go:16 +0x13*
    github.com/weaviate/weaviate/entities/errors.GoWrapper.func1.1()
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:32 +0x150*
    panic({0x20565e0?, 0xf71f95d218?})
  • /usr/local/go/src/runtime/panic.go:914 +0x21f*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(segment).newCursorWithSecondaryIndex(0xc002ae0870, 0x1)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_segment_replace.go:58 +0x290*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(SegmentGroup).newCursorsWithSecondaryIndex(0xc0052dc0c0, 0xf4def49c70?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_segment_replace.go:99 +0xab*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(Bucket).CursorWithSecondaryIndex(0xc003e64000, 0x21996ef?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_bucket_replace.go:82 +0x9c*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).ObjectDigestsByTokenRange(0xc0050ff180?, {0x29050a0?, 0xc029e42780?}, 0x150000000000000, 0x19dffffffffffff, 0x3e8)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_read.go:138 +0x9f*
    *github.com/weaviate/weaviate/adapters/repos/db.(Index).DigestObjectsInTokenRange(0x459eab?, {0x29050a0, 0xc029e42780}, {0xc005823a90, 0xc}, 0xf4def4a0d8?, 0x565ff2?, 0x0?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/replication.go:475 +0x169*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).stepsTowardsShardConsistency(0xc005466000, {0x29050a0, 0xc029e42780}, {0xc005823a90, 0xc}, {0xc5f4305e78, 0x11}, 0xc5f434d440?, 0x19dffffffffffff, 0x186a0)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:336 +0x105*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).hashBeat(0xc005466000)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:288 +0x81f*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).initHashBeater.func1()
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:80 +0x45b*
    github.com/weaviate/weaviate/entities/errors.GoWrapper.func1()
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:36 +0x62*
    created by github.com/weaviate/weaviate/entities/errors.GoWrapper in goroutine 580955
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:26 +0x79*
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashbeater stopped”,“shard_name”:“PfYchyFAyseL”,“time”:“2024-09-05T09:49:24Z”}
    {“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“error”,“msg”:“Recovered from panic: runtime error: index out of range [1] with length 1”,“time”:“2024-09-05T09:49:24Z”}
    goroutine 662155 [running]:
    runtime/debug.Stack()
  • /usr/local/go/src/runtime/debug/stack.go:24 +0x5e*
    runtime/debug.PrintStack()
  • /usr/local/go/src/runtime/debug/stack.go:16 +0x13*
    github.com/weaviate/weaviate/entities/errors.GoWrapper.func1.1()
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:32 +0x150*
    panic({0x20565e0?, 0xf737c47218?})
  • /usr/local/go/src/runtime/panic.go:914 +0x21f*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(segment).newCursorWithSecondaryIndex(0xc002b4e000, 0x1)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_segment_replace.go:58 +0x290*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(SegmentGroup).newCursorsWithSecondaryIndex(0xc00553a300, 0xf6e26e1c70?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_segment_replace.go:99 +0xab*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(Bucket).CursorWithSecondaryIndex(0xc004034000, 0x21996ef?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_bucket_replace.go:82 +0x9c*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).ObjectDigestsByTokenRange(0xc0050ff180?, {0x29050a0?, 0xc029e428c0?}, 0x2fe000000000000, 0x4b3ffffffffffff, 0x3e8)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_read.go:138 +0x9f*
    *github.com/weaviate/weaviate/adapters/repos/db.(Index).DigestObjectsInTokenRange(0x459eab?, {0x29050a0, 0xc029e428c0}, {0xc0058229f0, 0xc}, 0xf6e26e20d8?, 0x565ff2?, 0x0?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/replication.go:475 +0x169*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).stepsTowardsShardConsistency(0xc005a20000, {0x29050a0, 0xc029e428c0}, {0xc0058229f0, 0xc}, {0xe825b2cb88, 0x11}, 0xf6e26b0300?, 0x4b3ffffffffffff, 0x186a0)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:336 +0x105*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).hashBeat(0xc005a20000)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:288 +0x81f*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).initHashBeater.func1()
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:80 +0x45b*
    github.com/weaviate/weaviate/entities/errors.GoWrapper.func1()
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:36 +0x62*
    created by github.com/weaviate/weaviate/entities/errors.GoWrapper in goroutine 580957
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:26 +0x79*
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashtree initialization is progress…”,“object_count”:15064448,“shard_name”:“QZOXu2zY6rKM”,“time”:“2024-09-05T09:49:24Z”}
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashtree successfully initialized”,“shard_name”:“QZOXu2zY6rKM”,“time”:“2024-09-05T09:49:24Z”}
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashbeater started…”,“shard_name”:“QZOXu2zY6rKM”,“time”:“2024-09-05T09:49:24Z”}
    {“action”:“async_replication”,“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“class_name”:“MyClass”,“level”:“info”,“msg”:“hashbeater stopped”,“shard_name”:“QZOXu2zY6rKM”,“time”:“2024-09-05T09:49:28Z”}
    {“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“error”,“msg”:“Recovered from panic: runtime error: index out of range [1] with length 1”,“time”:“2024-09-05T09:49:28Z”}
    goroutine 695187 [running]:
    runtime/debug.Stack()
  • /usr/local/go/src/runtime/debug/stack.go:24 +0x5e*
    runtime/debug.PrintStack()
  • /usr/local/go/src/runtime/debug/stack.go:16 +0x13*
    github.com/weaviate/weaviate/entities/errors.GoWrapper.func1.1()
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:32 +0x150*
    panic({0x20565e0?, 0xf653e906a8?})
  • /usr/local/go/src/runtime/panic.go:914 +0x21f*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(segment).newCursorWithSecondaryIndex(0xc003f803c0, 0x1)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_segment_replace.go:58 +0x290*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(SegmentGroup).newCursorsWithSecondaryIndex(0xc0040bca80, 0xf4def49c70?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_segment_replace.go:99 +0xab*
    *github.com/weaviate/weaviate/adapters/repos/db/lsmkv.(Bucket).CursorWithSecondaryIndex(0xc005a2e240, 0x21996ef?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/lsmkv/cursor_bucket_replace.go:82 +0x9c*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).ObjectDigestsByTokenRange(0xc0050ff180?, {0x29050a0?, 0xc029e42820?}, 0x0, 0x1ffffffffffff, 0x3e8)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_read.go:138 +0x9f*
    *github.com/weaviate/weaviate/adapters/repos/db.(Index).DigestObjectsInTokenRange(0x459eab?, {0x29050a0, 0xc029e42820}, {0xc005823240, 0xc}, 0xf4def4a0d8?, 0x565ff2?, 0x0?)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/replication.go:475 +0x169*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).stepsTowardsShardConsistency(0xc0004b1dc0, {0x29050a0, 0xc029e42820}, {0xc005823240, 0xc}, {0xf737c47578, 0x11}, 0xf68ad82a40?, 0x1ffffffffffff, 0x186a0)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:336 +0x105*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).hashBeat(0xc0004b1dc0)
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:288 +0x81f*
    *github.com/weaviate/weaviate/adapters/repos/db.(Shard).initHashBeater.func1()
  • /go/src/github.com/weaviate/weaviate/adapters/repos/db/shard_hashbeater.go:80 +0x45b*
    github.com/weaviate/weaviate/entities/errors.GoWrapper.func1()
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:36 +0x62*
    created by github.com/weaviate/weaviate/entities/errors.GoWrapper in goroutine 580956
  • /go/src/github.com/weaviate/weaviate/entities/errors/go_wrapper.go:26 +0x79*
    {“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“debug”,“msg”:" memberlist: Initiating push/pull sync with: weaviate-2 xxx.yyy.zzz.www:7000",“time”:“2024-09-05T09:49:31Z”}
    {“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“debug”,“msg”:" memberlist: Stream connection from=xxx.yyy.zzz.www:53136",“time”:“2024-09-05T09:49:33Z”}
    {“build_git_commit”:“9a4ea6d”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“debug”,“msg”:" memberlist: Stream connection from=xxx.yyy.zzz.www:33106",“time”:“2024-09-05T09:49:41Z”}
  1. Async replication didn’t starts any more;
  2. Tried to restart pods (too long termination time! after ‘timeout’ pod terminated forcelly);
  3. After load “asyncEnabled” flag auto resetted to false.

Problem with “empty for text (or not closest for vectors)” search results with QUORUM consistency when requests routed to “empty” node are stay…