Description
We have a Weaviate cluster deployed via Helm on Azure Kubernetes Service (AKS), backed by Azure PVC. The cluster has successfully ingested a substantial number of documents.
However, when attempting to initiate a backup, the process fails with an error stating that a node “cannot resolve hostname for”.
We suspect this issue might be related to manually increasing the replica count from 3 to 4 at some point, although we’re not entirely certain. Currently, there are only 3 pods running in the cluster.
Server Setup Information
-
Cluster Type: AKS (Azure Kubernetes Service)
-
Deployment Method: Helm
-
Weaviate Version:
1.30.0 -
Modules Enabled:
backup-azure
Environment Variables / Helm Values:
- name: AUTHENTICATION_APIKEY_ENABLED
value: "true"
- name: AUTHENTICATION_APIKEY_USERS
value: admin
- name: CLUSTER_DATA_BIND_PORT
value: "7001"
- name: CLUSTER_GOSSIP_BIND_PORT
value: "7000"
- name: GOGC
value: "100"
- name: PROMETHEUS_MONITORING_ENABLED
value: "false"
- name: PROMETHEUS_MONITORING_GROUP
value: "false"
- name: QUERY_MAXIMUM_RESULTS
value: "100000"
- name: RAFT_BOOTSTRAP_TIMEOUT
value: "600"
- name: REINDEX_VECTOR_DIMENSIONS_AT_STARTUP
value: "false"
- name: TRACK_VECTOR_DIMENSIONS
value: "false"
- name: AUTHENTICATION_APIKEY_ALLOWED_KEYS
valueFrom:
secretKeyRef:
name: weaviate-secret
key: AUTHENTICATION_APIKEY_ALLOWED_KEYS
- name: RUNTIME_OVERRIDES_ENABLED
value: "false"
- name: RUNTIME_OVERRIDES_PATH
value: /config/overrides.yaml
- name: RUNTIME_OVERRIDES_LOAD_INTERVAL
value: 2m
- name: CLUSTER_BASIC_AUTH_USERNAME
valueFrom:
secretKeyRef:
name: weaviate-cluster-api-basic-auth
key: username
- name: CLUSTER_BASIC_AUTH_PASSWORD
valueFrom:
secretKeyRef:
name: weaviate-cluster-api-basic-auth
key: password
- name: PERSISTENCE_DATA_PATH
value: /var/lib/weaviate
- name: DEFAULT_VECTORIZER_MODULE
value: none
- name: ENABLE_MODULES
value: backup-azure
- name: RAFT_JOIN
value: weaviate-0,weaviate-1,weaviate-2
- name: RAFT_BOOTSTRAP_EXPECT
value: "3"
- name: BACKUP_AZURE_CONTAINER
value: weaviate-backups
- name: AZURE_STORAGE_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: weaviate-secret
key: AZURE_STORAGE_CONNECTION_STRING
- name: CLUSTER_JOIN
value: weaviate-headless.weaviate.svc.cluster.local.
Cluster Details
-
Weaviate Version:
1.30.0 -
Number of Running Nodes: 3
-
Multitenancy: Enabled
Additional Information
-
Backups are configured using the
backup-azuremodule -
The suspected root cause is a mismatch between the Helm configuration (
RAFT_BOOTSTRAP_EXPECT=3) and manual scaling (replicas=4).