RAFT_JOIN without helm

Team, we can’t use helm in our production environment so we created a helm like tools for our production deployment. In weaviate deployment, the RAFT_JOIN is handled by helm based on the number of nodes. But for our case of 3 node cluster, how to handle this .
By default each node make itself as leader.
So I added the env variable RAFT_JOIN= weaviate-node-1,weaviate-node-2,weaviate-node-3.
After adding this , All nodes elect weaviate-node-1 as leader. But in case of node-1 failure/restart. other nodes don’t become as candidate and they select only weaviate-node-1 as their leader
How to resolve this?

hi @Dharanish !

That’s strange. After node-1 death as a leader, they should elect a new leader.

Do you see anything outstanding on raft comms log?

“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“warning”,“msg”:" memberlist: Was able to connect to weaviate-codeassistant-1 over TCP but UDP probes failed, network may be misconfigured"," │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“info”,“msg”:" memberlist: Suspect weaviate-codeassistant-0 has failed, no acks received",“time”:“2025-03-27T12:16:53Z”} │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“info”,“msg”:" memberlist: Suspect weaviate-codeassistant-0 has failed, no acks received",“time”:“2025-03-27T12:16:57Z”} │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“info”,“msg”:" memberlist: Marking weaviate-codeassistant-0 as failed, suspect timeout reached (0 peer confirmations)“,“time”:“2025-03-27T12: │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“error”,“msg”:” memberlist: Conflicting address for weaviate-codeassistant-0. Mine: 10.244.5.72:7000 Theirs: 10.244.5.73:7000 Old state: 2”," │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-codeassistant-0. Mine: 10.244.5.72:7000 Theirs: 10.244.5.73:7000 Old state: 2"," │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-codeassistant-0. Mine: 10.244.5.72:7000 Theirs: 10.244.5.73:7000 Old state: 2"," │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-codeassistant-0. Mine: 10.244.5.72:7000 Theirs: 10.244.5.73:7000 Old state: 2"," │
│ {“build_git_commit”:“6edf2b8”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.28.11”,“build_wv_version”:“1.28.11”,“level”:“error”,“msg”:" memberlist: Conflicting address for weaviate-codeassistant-0. Mine: 10.244.5.72:7000 Theirs: 10.244.5.73:7000 Old state: 2","

I get the above logs only

It seems memberlist is not allowing this node to join, as it’s expecting weaviate-node-1 :thinking:

No , actual node name is weaviate-codeassistant-x, we set those in RAFT_JOIN . I just used the term node-1 for simplicity.

Could you share those manifests?

Otherwise is hard to tackle.

it looks like something may be blocking udp internal comms between nodes.

So nodes are probably crashlooping, changing ips between pods, and becoming suspects of each other, as the present themselves with a hostname, but now with a different ip.

Let me know if this helps :grimacing: