After deleting one of the Weaviate pods in our two replica cluster to test cluster resiliency (Weaviate version: 1.30.2), we are seeing the same error, e.g:
hashbeat iteration failed: collecting differences: "10.129.38.229:7001": status code: 401
When sending GET v1/cluster/statistics, we get one of the following responses:
{“error”:[{“message”:“node: weaviate-0: unexpected status code 401 ()”}]}
{“error”:[{“message”:“node: weaviate-1: unexpected status code 401 ()”}]}
See below for the log messages that appear to be most relevant around the time frame shortly after we removed the weaviate-0 pod (after it got started again automatically):
weaviate-0:
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“last-leader-addr”:“”,“last-leader-id”:“”,“level”:“warning”,“msg”:“heartbeat timeout reached, starting election”,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“id”:“weaviate-0”,“level”:“debug”,“msg”:“pre-voting for self”,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“debug”,“msg”:“calculated votes needed”,“needed”:1,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“from”:“weaviate-0”,“level”:“debug”,“msg”:“pre-vote received”,“tally”:0,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“from”:“weaviate-0”,“level”:“debug”,“msg”:“pre-vote granted”,“tally”:1,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“info”,“msg”:“pre-vote successful, starting election”,“refused”:0,“tally”:1,“term”:241,“time”:“2025-05-07T10:27:34Z”,“votesNeeded”:1}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:{“weaviate-0”:“10.130.33.254:8300”},“time”:“2025-05-07T10:27:34Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“10.130.33.254:8300”,“status”:8,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“id”:“weaviate-0”,“level”:“debug”,“msg”:“voting for self”,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“from”:“weaviate-0”,“level”:“debug”,“msg”:“vote granted”,“tally”:1,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“info”,“msg”:“election won”,“tally”:1,“term”:241,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“leader”:{},“level”:“info”,“msg”:“entering leader state”,“time”:“2025-05-07T10:27:34Z”}
{“action”:“raft”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“info”,“msg”:“added peer, starting replication”,“peer”:“weaviate-1”,“time”:“2025-05-07T10:27:34Z”}
{“action”:“async_replication”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“class_name”:“FeedbackGC_v3”,“level”:“warning”,“msg”:"hashbeat iteration failed: collecting differences: "10.130.3.101:7001": status code: 401, error: ",“shard_name”:“odmAjZzBYSNB”,“time”:“2025-05-07T11:12:45Z”}
weaviate-1:
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“debug”,“msg”:" memberlist: Stream connection from=10.129.38.229:46354",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-0. Mine: 10.128.5.90:7000 Theirs: 10.129.38.229:7000 Old state: 2",“time”:“2025-05-07T10:24:47Z”}
{“action”:“raft-net”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“debug”,“local-address”:“10.130.3.101:8300”,“msg”:“accepted connection”,“remote-address”:“10.129.38.229:39512”,“time”:“2025-05-07T10:24:49Z”}
{“action”:“raft-net”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“level”:“debug”,“local-address”:“10.130.3.101:8300”,“msg”:“accepted connection”,“remote-address”:“10.129.38.229:39518”,“time”:“2025-05-07T10:24:49Z”}
{“action”:“async_replication”,“build_git_commit”:“80dac5a”,“build_go_version”:“go1.24.2”,“build_image_tag”:“v1.30.2”,“build_wv_version”:“1.30.2”,“class_name”:“FeedbackGC_v3”,“level”:“warning”,“msg”:"hashbeat iteration failed: collecting differences: "10.129.38.229:7001": status code: 401, error: ",“shard_name”:“exiR1xBmWBBO”,“time”:“2025-05-07T10:25:42Z”}
After a first look, I’m wondering if leader selection is properly being performed, from the log messages I can just see that weaviate-0 assumes to be the leader without hearing back from weaviate-1. Also, there seems to be some problem in authenticating communication between the pods (401 errors). Any hints on how to debug/how to fix?