Horizontal Scaling or Upgrade issue - Weaviate cluster

Hi Team,

Seeing below error when we try to upgrade weaviate cluster(changing the image tag version) or perform scaling(changing the replicas value).

{"deprecation":{"apiType":"Configuration","id":"config-files","locations":["--config-file=\"\""],"mitigation":"Configure Weaviate using environment variables.","msg":"use of deprecated command line argument --config-file","sinceTime":"2020-09-08T09:46:00.000Z","sinceVersion":"0.22.16","status":"deprecated"},"level":"warning","msg":"use of deprecated command line argument --config-file","time":"2024-04-22T03:18:12Z"}
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-04-22T03:18:12Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-04-22T03:18:12Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-04-22T03:18:12Z"}
{"action":"broadcast_abort_transaction","error":"host \"****:7001\": unexpected status code 401: ","id":"97fec38d-b981-40db-a038-7b70e72595f0","level":"error","msg":"broadcast tx abort failed","time":"2024-04-22T03:18:12Z"}
{"action":"startup","error":"could not load or initialize schema: sync schema with other nodes in the cluster: read schema: open transaction: broadcast open transaction: host \"****:7001\": unexpected status code 401 ()","level":"fatal","msg":"could not initialize schema manager","time":"2024-04-22T03:18:12Z"}

New controller is getting created automatically and trying to perform the operation in rolling fashion.

But pods doesn’t come up because of above errors.

Note ***

When i completely delete the Statefulset, weaviate scaling or upgrade works fine !! But we are looking for rolling update. Let me know if any changes to values.yaml to be done .

Regards,
Adithya

hi @adithya.ch ! I have changed the category of this thread to Support

This happens both when you upgrade and try to scale? No sure I understood this part.

Can you reproduce this on a test environment?
What versions are you upgrading from and to?

I assume you are just change the version or the replicas in the values of our helm chart, right?

Let me know those info so we figure this out.

Thanks!

Hello @DudaNogueira

yes, we are seeing the same error while we upgrade or while we try to scale.

We just changed the image tag version from 1.24.3 to 1.24.10 for upgrade. and replicas parameter from 3 to 5 for scaling.

I am testing all the above actions in test k8s cluster.

Error

Back-off restarting failed container weaviate in pod weaviate-5_vector(a4619704-4d70-4951-9afc-995601ad0045)

{"action":"broadcast_abort_transaction","error":"host \"10.36.6.100:7001\": unexpected status code 401: ","id":"6b95cb70-2d84-45c3-acd7-bfd6a17c3b55","level":"error","msg":"broadcast tx abort failed","time":"2024-04-22T13:08:48Z"}

{"action":"startup","error":"could not load or initialize schema: sync schema with other nodes in the cluster: read schema: open transaction: broadcast open transaction: host \"*****:7001\": unexpected status code 401 ()","level":"fatal","msg":"could not initialize schema manager","time":"2024-04-22T13:08:48Z"}

Regards,
Adithya

Also node status doesn’t show the newly added nodes.

nodes_status = client.cluster.get_nodes_status()
print(nodes_status)
[{‘batchStats’: {‘queueLength’: 0, ‘ratePerSecond’: 0}, ‘gitHash’: ‘86660ba’, ‘name’: ‘weaviate-0’, ‘shards’: None, ‘status’: ‘HEALTHY’, ‘version’: ‘1.24.10’}, {‘batchStats’: {‘queueLength’: 0, ‘ratePerSecond’: 0}, ‘gitHash’: ‘86660ba’, ‘name’: ‘weaviate-1’, ‘shards’: None, ‘status’: ‘HEALTHY’, ‘version’: ‘1.24.10’}, {‘batchStats’: {‘queueLength’: 0, ‘ratePerSecond’: 0}, ‘gitHash’: ‘86660ba’, ‘name’: ‘weaviate-2’, ‘shards’: None, ‘status’: ‘HEALTHY’, ‘version’: ‘1.24.10’}, {‘batchStats’: {‘queueLength’: 0, ‘ratePerSecond’: 0}, ‘gitHash’: ‘86660ba’, ‘name’: ‘weaviate-3’, ‘shards’: None, ‘status’: ‘HEALTHY’, ‘version’: ‘1.24.10’}, {‘batchStats’: {‘queueLength’: 0, ‘ratePerSecond’: 0}, ‘gitHash’: ‘86660ba’, ‘name’: ‘weaviate-4’, ‘shards’: None, ‘status’: ‘HEALTHY’, ‘version’: ‘1.24.10’}]

Here i have changed replicas from 5 to 7

Ideally weaviate-5 and weaviate-6 should be showing unhealthy from the above command.

Regards,
Adithya

Hello @DudaNogueira

Any suggestions on how to fix the issue. I see multiple old posts with same error nut don’t see the solution.

Regards,
Adithya

Hi @DudaNogueira , Let me know if there us any update on the above mentioned issue

Thank you

hi @adithya.ch !

Does it persist?

Not sure how to fix this.

Can you provide some step by step to reproduce? Then I can try to achieve this situation myself and explore more.

Thanks!

Hello @DudaNogueira

Yes, still the issue exists.

It’s similar to Rolling Update Not Working

We use ArgoCD to deploy the resouces in Openshift cluster and we deployed using helm chart after changing the variables in the values.yaml file for vertical/horizontal scaling.

  1. Downloaded the helm chart (templates / chart.yaml / values.yaml)

in gitops config

targetRevision: develop
applicationConfig:
path: vector-rcdn
name: vector-rcdn
namespace: vector
helm:
valueFiles:
- values.yaml
releaseName: weaviate-helm-rcdn

based on this config ArgoCD deploying the code changes to k8s cluster.

Changed the value of replicas for horizontal scaling.

Attached the list of steps with errors.



Regards,
Adithya