Weaviate backup failing- "cannot resolve hostname for "weaviate-3"

Tibin · October 24, 2025, 9:10am

Description

We have a Weaviate cluster deployed via Helm on Azure Kubernetes Service (AKS), backed by Azure PVC. The cluster has successfully ingested a substantial number of documents.

However, when attempting to initiate a backup, the process fails with an error stating that a node “cannot resolve hostname for”.

We suspect this issue might be related to manually increasing the replica count from 3 to 4 at some point, although we’re not entirely certain. Currently, there are only 3 pods running in the cluster.

Server Setup Information

Cluster Type: AKS (Azure Kubernetes Service)
Deployment Method: Helm
Weaviate Version: 1.30.0
Modules Enabled: backup-azure

Environment Variables / Helm Values:

- name: AUTHENTICATION_APIKEY_ENABLED
  value: "true"
- name: AUTHENTICATION_APIKEY_USERS
  value: admin
- name: CLUSTER_DATA_BIND_PORT
  value: "7001"
- name: CLUSTER_GOSSIP_BIND_PORT
  value: "7000"
- name: GOGC
  value: "100"
- name: PROMETHEUS_MONITORING_ENABLED
  value: "false"
- name: PROMETHEUS_MONITORING_GROUP
  value: "false"
- name: QUERY_MAXIMUM_RESULTS
  value: "100000"
- name: RAFT_BOOTSTRAP_TIMEOUT
  value: "600"
- name: REINDEX_VECTOR_DIMENSIONS_AT_STARTUP
  value: "false"
- name: TRACK_VECTOR_DIMENSIONS
  value: "false"
- name: AUTHENTICATION_APIKEY_ALLOWED_KEYS
  valueFrom:
    secretKeyRef:
      name: weaviate-secret
      key: AUTHENTICATION_APIKEY_ALLOWED_KEYS
- name: RUNTIME_OVERRIDES_ENABLED
  value: "false"
- name: RUNTIME_OVERRIDES_PATH
  value: /config/overrides.yaml
- name: RUNTIME_OVERRIDES_LOAD_INTERVAL
  value: 2m
- name: CLUSTER_BASIC_AUTH_USERNAME
  valueFrom:
    secretKeyRef:
      name: weaviate-cluster-api-basic-auth
      key: username
- name: CLUSTER_BASIC_AUTH_PASSWORD
  valueFrom:
    secretKeyRef:
      name: weaviate-cluster-api-basic-auth
      key: password
- name: PERSISTENCE_DATA_PATH
  value: /var/lib/weaviate
- name: DEFAULT_VECTORIZER_MODULE
  value: none
- name: ENABLE_MODULES
  value: backup-azure
- name: RAFT_JOIN
  value: weaviate-0,weaviate-1,weaviate-2
- name: RAFT_BOOTSTRAP_EXPECT
  value: "3"
- name: BACKUP_AZURE_CONTAINER
  value: weaviate-backups
- name: AZURE_STORAGE_CONNECTION_STRING
  valueFrom:
    secretKeyRef:
      name: weaviate-secret
      key: AZURE_STORAGE_CONNECTION_STRING
- name: CLUSTER_JOIN
  value: weaviate-headless.weaviate.svc.cluster.local.

Cluster Details

Weaviate Version: 1.30.0
Number of Running Nodes: 3
Multitenancy: Enabled

Additional Information

Backups are configured using the backup-azure module
The suspected root cause is a mismatch between the Helm configuration (RAFT_BOOTSTRAP_EXPECT=3) and manual scaling (replicas=4).

DudaNogueira · October 27, 2025, 8:04pm

Hi!

Welcome to our community

Was the backup created while the cluster had replica factor of 4 and now you want to restore to a factor 3 cluster?

You should have the exact same number of pods to backup to and restore from.

Let me know if this is the scenario.

Thanks!

Tibin · October 29, 2025, 1:26pm

Hi [@DudaNogueira]()

Thanks for coming back. We identified that our scaled-down multi-tenant enabled collection action caused this behavior. We were able to take the backup after bringing the cluster size back to 4. I assume this might be regarding RAFT state information stored on the other nodes. (The 4th node was not holding any tenant shard before, though.)

Also, thank you for confirming that while restoring, we require the exact number of nodes.

DudaNogueira · October 29, 2025, 7:42pm

hi @Tibin !!

Glad it all worked out. Our team is working on shard movements feature, that will allow you. to increase and tweak replication factors, or drain nodes, etc.

For now, unless you are using only multitenant, the best way to grow your cluster is moving to a new one with the necessary resources.

THanks!

Balraj_Sabharwal · March 23, 2026, 11:49am

Hi @DudaNogueira
My use case is similar. Initially, I was running 3 pod replicas. After some time, I scaled up by adding one more replica, making it a total of 4 pods. However, the data remained distributed across the original 3 pods, and the new pod (weaviate-3) only had an empty class with no data.

I then took a backup while all 4 pods were running, and it completed successfully. Since the additional pod was unnecessary, I scaled the replicas back down to 3 and deleted the PV and PVC associated with the removed pod.

After that, when I tried to take a backup again with 3 pods, I encountered the following error:
“cannot resolve hostname for ‘weaviate-3’”

My goal is to restore the data from the 3 pods into a different Weaviate namespace (New Weaviate on new nodes). As backup and restoration pods should be the same. I only want to deal with 3 replicas. I can see the data of weaviate-3 (class name with empty data) in the raft.db. Is there any way to remove the weaviate-3 data from the raft.db, or any other way to get rid of this?

DudaNogueira · March 23, 2026, 8:40pm

hi @Balraj_Sabharwal !!

Welcome to our comnunity

I was not able to replicate this with a current version.

I deployed using helm chart, scaled up to 4 replicas, then down to 2. After a rollout restart, raft went back to valid state and a leader was elected.

What version are you running?

One thing you could try is explicitly setting RAFT_BOOTSTRAP_EXPECT to 3,
and RAFT_JOIN weaviate-1,weaviate-2,weaviate-3

Let me know if this helps!

Thanks!

Balraj_Sabharwal · March 24, 2026, 6:47am

Thanks for replying, @DudaNogueira

I am running Weaviate version 1.33.0, and I’ve checked the configuration of Weaviate it’s already configured at RAFT_JOIN = weaviate-0, weaviate-1, weaviate-2 and RAFT_BOOTSTRAP_EXPECT = 3.

Even when the replica goes to 4, this configuration never changed. It’s just that dumpy pod running with an empty class in weaviate-3. Basically, I need to take a backup for 3 nodes by doing replica 3.

DudaNogueira · March 24, 2026, 6:35pm

Can you read data from this cluster?

If positive, a good way would be creating a new cluster (using latest version) and then migrating your data over.

One option, but would need to test first, is to stop all nodes, delete/move the raft.db on all nodes, and startup again.

Also, we strongly suggest to be at least on 1.X.latest, as we will backport the most important patches.

Let me know if this helps!

Balraj_Sabharwal · March 25, 2026, 5:43am

Moving to the new cluster seems to be a better and safer approach. I appreciate your effort. Thanks

DudaNogueira · March 25, 2026, 1:42pm

Sure!

Also this can be a good opportunity to play around with compression (we are strongly suggesting rq-8) and any other changes, like indexing null states, or timestamps, changing tokenization, etc.

Let me know if you need any help on that as well! Feel free to open a new forum thread

Thanks for choosing Weaviate and happt coding

Topic		Replies	Views
Cannot resolve hostname after restoring a backup from another cluster Support technical	2	490	May 30, 2025
Restore a backup on another node Support	3	1047	December 19, 2023
Horizontal Scaling or Upgrade issue - Weaviate cluster Support	15	1547	September 5, 2024
Multi Node Weaviate EKS Cluster - Raft consensus data corrupted Support	2	161	December 18, 2025
Unable to restart my weaviate container Support	8	2488	May 20, 2024

Weaviate backup failing- "cannot resolve hostname for "weaviate-3"

Description

Server Setup Information

Cluster Details

Additional Information

Related topics