Description
I am running a multi-node Weaviate (v1.26.3) deployment on EKS with 4 nodes. After performing infrastructure changes, object queries no longer return data, though schemas and tenants are visible.fqef
Steps performed before the issue:
- Scaled down the Weaviate StatefulSet to 0.
- Updated the EKS node group for Weaviate to use encrypted root volumes; nodes were replaced.
- Created snapshots of existing Weaviate data volumes in AWS, then created encrypted volumes. Deleted PVCs in the cluster and updated PVs to use the new encrypted volumes.
- Scaled up the StatefulSet to 4.
- Volumes attached correctly to pods.
- GET /schema and GET /schema//tenants return correct results.
- Any object queries return no data.
- Attempting to revert to old volumes did not resolve the issue.
Server Setup Information
- Weaviate Server Version: 1.26.3
- Deployment Method: EKS
- Multi Node? Number of Running Nodes: 4
- Client Language and Version: python, weaviate-client 4.14.4
- Multitenancy?: yes
Any additional Information
kubectl logs weaviate-0 -n multi-node-weaviate --tail=100 | grep -i “corrupt|recover|error”
Defaulted container “weaviate” out of: weaviate, configure-sysctl (init)
{“action”:“raft”,“backoff time”:10000000,“build_git_commit”:“git-id”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“error”:“dial tcp 10.0.2.59:8300: connect: no route to host”,“level”:“error”,“msg”:“raft failed to heartbeat to”,“peer”:“peer-ip”,“time”:“2025-12-17T16:28:50Z”}
{“action”:“raft”,“build_git_commit”:“git-id”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“error”:“dial tcp 10.0.2.59:8300: connect: no route to host”,“level”:“error”,“msg”:“raft failed to appendEntries to”,“peer”:{“Suffrage”:0,“ID”:“weaviate-1”,“Address”:“10.0.2.59:8300”},“time”:“2025-12-17T16:28:53Z”}
{“action”:“raft”,“build_git_commit”:“git-id”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“error”:“dial tcp peer-ip: connect: no route to host”,“level”:“error”,“msg”:“raft failed to make requestVote RPC”,“target”:{“Suffrage”:0,“ID”:“weaviate-1”,“Address”:“peer-ip”},“term”:475,“time”:“2025-12-17T16:28:53Z”} {“action”:“raft”,“build_git_commit”:“git-id”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“level”:“error”,“msg”:“raft peer has newer term, stopping replication”,“peer”:{“Suffrage”:0,“ID”:“weaviate-1”,“Address”:“10.0.2.53:8300”},“time”:“2025-12-17T16:28:53Z”}
kubectl logs weaviate-1 -n multi-node-weaviate --tail=100 | grep -i “corrupt|recover|error”
Defaulted container “weaviate” out of: weaviate, configure-sysctl (init)
{“action”:“raft”,“build_git_commit”:“git-id”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“error”:“log not found”,“last-index”:140008,“level”:“warning”,“msg”:“raft failed to get previous log”,“previous-index”:140014,“time”:“2025-12-17T16:28:56Z”}
kubectl logs weaviate-2 -n multi-node-weaviate --tail=100 | grep -i “corrupt|recover|error” Defaulted container “weaviate” out of: weaviate, configure-sysctl (init)
kubectl logs weaviate-3 -n multi-node-weaviate --tail=100 | grep -i “corrupt|recover|error” Defaulted container “weaviate” out of: weaviate, configure-sysctl (init)
{“action”:“raft”,“build_git_commit”:“git-id”,“build_go_version”:“go1.21.13”,“build_image_tag”:“1.26.3”,“build_wv_version”:“1.26.3”,“error”:“log not found”,“last-index”:140008,“level”:“warning”,“msg”:“raft failed to get previous log”,“previous-index”:140011,“time”:“2025-12-17T16:28:44Z”}
Question / Request:
-
After scaling down, replacing nodes, and re-attaching volumes, the Raft state appears broken and object data is inaccessible.
-
Schema and tenants are still visible.
-
Is there a supported way to recover the object data from these existing PVs?
-
Should I rebuild the cluster from scratch using backup / export?
Any guidance on safe recovery of multi-node clusters after node replacement or volume migration would be appreciated.