Description
I upgraded my Weaviate version to 1.26.1, but am observing all three of my Weaviate replicas in crash loops despite having a rolling update of 1; they all are crashing with the same error, though obviously the peer ids change depending on the node I’m looking at.
{"action":"raft","level":"info","msg":"raft entering candidate state","node":{},"term":792,"time":"2024-07-31T23:10:31Z"}
{"action":"raft-net","error":"could not resolve server id weaviate-2","fallback":"10.0.20.103:8300","id":"weaviate-2","level":"warning","msg":"raft-net unable to get address for server, using fallback address","time":"2024-07-31T23:10:31Z"}
{"action":"raft-net","error":"could not resolve server id weaviate-1","fallback":"10.0.3.69:8300","id":"weaviate-1","level":"warning","msg":"raft-net unable to get address for server, using fallback address","time":"2024-07-31T23:10:31Z"}
{"action":"raft","error":"dial tcp 10.0.20.103:8300: connect: connection refused","level":"error","msg":"raft failed to make requestVote RPC","target":{"Suffrage":0,"ID":"weaviate-2","Address":"10.0.20.103:8300"},"term":792,"time":"2024-07-31T23:10:31Z"}
{"action":"raft","error":"dial tcp 10.0.3.69:8300: connect: connection refused","level":"error","msg":"raft failed to make requestVote RPC","target":{"Suffrage":0,"ID":"weaviate-1","Address":"10.0.3.69:8300"},"term":792,"time":"2024-07-31T23:10:31Z"}
{"level":"warning","msg":"raft Election timeout reached, restarting election","time":"2024-07-31T23:10:32Z"}
update: i deleted all the pods to see if recreating all together would solve the issue, but now i’m seeing a different error in all of them. The first 2 lines are repeated thousands of times (I assume corresponding with every object I had in every collection).
{"action":"lsm_recover_from_active_wal","class":"Chunk_89043da9b84a4a8a9ef299c9f608602a","index":"chunk_89043da9b84a4a8a9ef299c9f608602a","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/var/lib/weaviate/chunk_89043da9b84a4a8a9ef299c9f608602a/J0VmCeiHB40Q/lsm/property__lastUpdateTimeUnix/segment-1722465036274445156","shard":"J0VmCeiHB40Q","time":"2024-07-31T23:26:07Z"}
{"action":"lsm_recover_from_active_wal","class":"Chunk_89043da9b84a4a8a9ef299c9f608602a","index":"chunk_89043da9b84a4a8a9ef299c9f608602a","level":"warning","msg":"empty write-ahead-log found. Did weaviate crash prior to this or the tenant on/loaded from the cloud? Nothing to recover from this file.","path":"/var/lib/weaviate/chunk_89043da9b84a4a8a9ef299c9f608602a/J0VmCeiHB40Q/lsm/property__lastUpdateTimeUnix/segment-1722468246818239897","shard":"J0VmCeiHB40Q","time":"2024-07-31T23:26:07Z"}
{"action":"hnsw_prefill_cache_async","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-07-31T23:26:07Z","wait_for_cache_prefill":false}
{"level":"info","msg":"Completed loading shard chunk_89043da9b84a4a8a9ef299c9f608602a_J0VmCeiHB40Q in 12.150606ms","time":"2024-07-31T23:26:07Z"}
{"checkpoint":0,"component":"index_queue","count":0,"last_stored_id":1,"level":"info","msg":"enqueued vectors from last indexed checkpoint","shard_id":"chunk_89043da9b84a4a8a9ef299c9f608602a_J0VmCeiHB40Q","target_vector":"","time":"2024-07-31T23:26:07Z","took":58961}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-07-31T23:26:07Z","took":369480}
{"action":"bootstrap","error":"could not join a cluster from [10.0.5.66:8300 10.0.28.8:8300 10.0.27.24:8300]","level":"warning","msg":"failed to join cluster, will notify next if voter","servers":["10.0.5.66:8300","10.0.28.8:8300","10.0.27.24:8300"],"time":"2024-07-31T23:26:07Z","voter":true}
{"action":"bootstrap","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","level":"error","msg":"notify all peers","servers":["10.0.5.66:8300","10.0.28.8:8300","10.0.27.24:8300"],"time":"2024-07-31T23:26:07Z"}
{"action":"startup","error":"bootstrap: context deadline exceeded","level":"fatal","msg":"could not open cloud meta store","time":"2024-07-31T23:26:07Z"}
Once they hit the last line, after ~30s or so, the pods get killed and it restart:
{"action":"config_load","config_file_path":"/weaviate-config/conf.yaml","level":"info","msg":"Usage of the weaviate.conf.json file is deprecated and will be removed in the future. Please use environment variables.","time":"2024-07-31T23:33:48Z"}
{"deprecation":{"apiType":"Configuration","id":"config-files","locations":["--config-file=\"\""],"mitigation":"Configure Weaviate using environment variables.","msg":"use of deprecated command line argument --config-file","sinceTime":"2020-09-08T09:46:00.000Z","sinceVersion":"0.22.16","status":"deprecated"},"level":"warning","msg":"use of deprecated command line argument --config-file","time":"2024-07-31T23:33:48Z"}
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-07-31T23:33:48Z"}
{"action":"startup","auto_schema_enabled":false,"level":"info","msg":"auto schema enabled setting is set to \"false\"","time":"2024-07-31T23:33:48Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-07-31T23:33:48Z"}
{"level":"info","msg":"async indexing enabled","time":"2024-07-31T23:33:48Z"}
{"level":"info","msg":"module offload-s3 is enabled","time":"2024-07-31T23:33:48Z"}
{"level":"info","msg":"open cluster service","servers":{"weaviate-0":8300,"weaviate-1":8300,"weaviate-2":8300},"time":"2024-07-31T23:33:50Z"}
{"address":"10.0.5.66:8301","level":"info","msg":"starting cloud rpc server ...","time":"2024-07-31T23:33:50Z"}
{"level":"info","msg":"starting raft sub-system ...","time":"2024-07-31T23:33:50Z"}
{"address":"10.0.5.66:8300","level":"info","msg":"tcp transport","tcpMaxPool":3,"tcpTimeout":10000000000,"time":"2024-07-31T23:33:50Z"}
{"level":"info","msg":"loading local db","time":"2024-07-31T23:33:50Z"}
{"level":"info","msg":"local DB successfully loaded","time":"2024-07-31T23:33:50Z"}
{"level":"info","msg":"schema manager loaded","n":0,"time":"2024-07-31T23:33:50Z"}
{"level":"info","metadata_only_voters":false,"msg":"construct a new raft node","name":"weaviate-0","time":"2024-07-31T23:33:50Z"}
{"action":"raft","index":616,"level":"info","msg":"raft initial configuration","servers":"[[{Suffrage:Voter ID:weaviate-2 Address:10.0.27.24:8300} {Suffrage:Voter ID:weaviate-1 Address:10.0.28.8:8300} {Suffrage:Voter ID:weaviate-0 Address:10.0.5.66:8300}]]","time":"2024-07-31T23:33:50Z"}
{"last_snapshot_index":0,"last_store_applied_index":0,"last_store_log_applied_index":553,"level":"info","msg":"raft node constructed","raft_applied_index":0,"raft_last_index":616,"time":"2024-07-31T23:33:50Z"}
{"action":"raft","follower":{},"leader-address":"","leader-id":"","level":"info","msg":"raft entering follower state","time":"2024-07-31T23:33:50Z"}
{"action":"raft","last-leader-addr":"","last-leader-id":"","level":"warning","msg":"raft heartbeat timeout reached, starting election","time":"2024-07-31T23:33:52Z"}
{"action":"raft","level":"info","msg":"raft entering candidate state","node":{},"term":917,"time":"2024-07-31T23:33:52Z"}
{"docker_image_tag":"1.26.1","level":"info","msg":"configured versions","server_version":"1.26.1","time":"2024-07-31T23:33:52Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-07-31T23:33:52Z"}
{"action":"restapi_management","docker_image_tag":"1.26.1","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-07-31T23:33:52Z"}
{"level":"warning","msg":"raft Election timeout reached, restarting election","time":"2024-07-31T23:33:53Z"}
{"action":"raft","level":"info","msg":"raft entering candidate state","node":{},"term":918,"time":"2024-07-31T23:33:53Z"}
{"level":"warning","msg":"raft Election timeout reached, restarting election","time":"2024-07-31T23:33:54Z"}
{"action":"raft","level":"info","msg":"raft entering candidate state","node":{},"term":919,"time":"2024-07-31T23:33:54Z"}
{"level":"warning","msg":"raft Election timeout reached, restarting election","time":"2024-07-31T23:33:56Z"}
{"action":"raft","level":"info","msg":"raft entering candidate state","node":{},"term":920,"time":"2024-07-31T23:33:56Z"}
{"action":"raft","follower":{},"leader-address":"","leader-id":"","level":"info","msg":"raft entering follower state","time":"2024-07-31T23:33:57Z"}
{"action":"raft","error":"log not found","last-index":616,"level":"warning","msg":"raft failed to get previous log","previous-index":620,"time":"2024-07-31T23:33:57Z"}
{"address":"10.0.28.8:8300","level":"info","msg":"current Leader","time":"2024-07-31T23:33:58Z"}
Server Setup Information
- Weaviate Server Version: 1.26.1
- Deployment Method: Kubernetes
- Multi Node? Number of Running Nodes: 3
- Client Language and Version:
- Multitenancy?: No