Rolling Update Not Working

Hi I am experiencing issues when re-deploing a multi-replica setup on K8.
What I am finding is if I do a full delete of all created k8 object and then deploy everything again it launches as expected however when attempting to do a rolling update over the existing pods the first new replica to be added experiences the following errors repeatedly:

{"action":"startup", "error":"could not load or initialize schema: sync schema with other nodes in the cluster: read schema: open transaction: broadcast open transaction: host "xxx.xx.xx.xxx:7001": unexpected status code 401 ()", "level":"fatal", "msg":"could not initialize schema manager"}

The rolling update then just get stuck in a crashback loop while this repeatedly happens.

Any help that can be offered would be greatly appreciated.

Hi!

Just to confirm, you are using our helm chart, right?

It seems some connectivity issue between pods :thinking:

Yes using your helm templates

Can you detail the steps? I would like to reproduce this.

Thanks!

I’m using the helm chart and observing the same issue, I use ArgoCD to manage my weaviate deployment, after one pod gets replaced, the others would report that the pod is not available but will be always trying to reach it in its old IP address, somehow the old pods will never figure out what the new pod IP address is, I don’t know which resource is the responsible of updating the IP is.

This behaviour however doesn’t occur if one of the pods is deleted, when the pod is deleted the rest of the pods will notice that it went away and after some time the new one is created they will stop reporting the error.

@walter , @Landon_Edwards, Thanks for reporting. Could you provide us the logs from the nodes, from start till reporting that error ? that would help us to detect the exact problem.

Hi @MohamedBadawi ,

We are experiencing the same issue. These are our logs:

{"action":"startup","level":"debug","msg":"created startup context, nothing done so far","startup_time_left":"59m59.998652288s","time":"2024-03-21T15:00:49Z"}
{"action":"config_load","config_file_path":"/weaviate-config/conf.yaml","level":"info","msg":"Usage of the weaviate.conf.json file is deprecated and will be removed in the future. Please use environment variables.","time":"2024-03-21T15:00:49Z"}
{"deprecation":{"apiType":"Configuration","id":"config-files","locations":["--config-file=\"\""],"mitigation":"Configure Weaviate using environment variables.","msg":"use of deprecated command line argument --config-file","sinceTime":"2020-09-08T09:46:00.000Z","sinceVersion":"0.22.16","status":"deprecated"},"level":"warning","msg":"use of deprecated command line argument --config-file","time":"2024-03-21T15:00:49Z"}
{"action":"startup","default_vectorizer_module":"text2vec-huggingface","level":"info","msg":"the default vectorizer modules is set to \"text2vec-huggingface\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-03-21T15:00:49Z"}
{"action":"startup","auto_schema_enabled":false,"level":"info","msg":"auto schema enabled setting is set to \"false\"","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","msg":"config loaded","startup_time_left":"59m59.997784462s","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","msg":"configured OIDC and anonymous access client","startup_time_left":"59m59.997761381s","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","msg":"initialized schema","startup_time_left":"59m59.997737462s","time":"2024-03-21T15:00:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with:  10.42.11.190:7000","time":"2024-03-21T15:00:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with:  10.42.12.82:7000","time":"2024-03-21T15:00:49Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.12.82:47438","time":"2024-03-21T15:00:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with:  10.42.12.157:7000","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","msg":"startup routine complete","time":"2024-03-21T15:00:49Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","msg":"start registering modules","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","module":"reranker-cohere","msg":"enabled module","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","module":"text2vec-openai","msg":"enabled module","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","module":"text2vec-huggingface","msg":"enabled module","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","module":"text2vec-cohere","msg":"enabled module","time":"2024-03-21T15:00:49Z"}
{"action":"startup","level":"debug","msg":"completed registering modules","time":"2024-03-21T15:00:49Z"}
{"level":"info","msg":"async indexing enabled","time":"2024-03-21T15:00:49Z"}
{"action":"broadcast_abort_transaction","error":"host \"10.42.12.157:7001\": unexpected status code 401: ","id":"ce132362-4a30-4383-adb2-44092fa40d11","level":"error","msg":"broadcast tx abort failed","time":"2024-03-21T15:00:49Z"}
{"action":"startup","error":"could not load or initialize schema: sync schema with other nodes in the cluster: read schema: open transaction: broadcast open transaction: host \"10.42.12.157:7001\": unexpected status code 401 ()","level":"fatal","msg":"could not initialize schema manager","time":"2024-03-21T15:00:49Z"}

Thanks @joris for reporting, as mentioned in thread it’s connectivity issue, see here. can you details the steps which made ended up to this situation

Hi @MohamedBadawi,

The connectivity issue happens when you have rolling updates enabled and one of the shards has a different version than the other.

Hi @joris , Were you able to fix this issue ? We are also seeing the same issue.
It would be of great help if you can share the fix details.

Regards,
Adithya