Description
Hello, followed this doc to upgrade from Weaviate 1.22.6 to 1.25.0 on a 3-replica cluster resulted in incomplete Raft schema migration, causing 7 out of 19 classes to become inaccessible and failing with “shard not found” errors. The issue only occurs on multi-replica clusters. Single-replica on staging upgraded successfully. No errors in pod logs during upgrade - pods started successfully, no crashes.
Server Setup Information
- Weaviate Server Version: 1.22.6 (attempted upgrade to 1.25.0, rolled back)
- Deployment Method: Helm chart on AWS EKS Fargate
- Multi Node? Number of Running Nodes: Yes, 3 nodes (StatefulSet with 3 replicas)
- **Client Language and Version:**Python client v3
- Multitenancy?: Yes - 5 out of 19 classes use multi-tenancy with 23-39 tenants each
Additional Information
Environment Details
Platform: AWS EKS Fargate
Storage: AWS EFS with persistent volumes
ReplicationFactor: 1 (no data redundancy across nodes)
image:
registry: docker.io
tag: 1.22.6
repo: semitechnologies/weaviate
pullPolicy: IfNotPresent
pullSecrets: []
command: ["/bin/weaviate"]
args:
- '--host'
- '0.0.0.0'
- '--port'
- '8080'
- '--scheme'
- 'http'
- '--config-file'
- '/weaviate-config/conf.yaml'
- --read-timeout=120s
- --write-timeout=120s
initContainers:
sysctlInitContainer:
enabled: true
sysctlVmMaxMapCount: 524288
image:
registry: docker.io
repo: alpine
tag: latest
pullPolicy: IfNotPresent
extraInitContainers: {}
# 3-replica cluster configuration
replicas: 3
# Resource configuration
resources:
requests:
cpu: '16000m'
memory: '64Gi'
limits:
cpu: '16000m'
memory: '80Gi'
securityContext: {}
serviceAccountName:
# Persistent storage using AWS EFS
storage:
size: 100Gi
storageClassName: "efs-sc"
# Service configuration (NodePort for internal ALB)
service:
name: weaviate
ports:
- name: http
protocol: TCP
port: 80
type: NodePort
loadBalancerSourceRanges: []
clusterIP:
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/cross-zone-load-balancing-enabled: "true"
alb.ingress.kubernetes.io/scheme: internet-facing
# Subnet IDs redacted
service.beta.kubernetes.io/aws-load-balancer-subnets: <REDACTED>
service.beta.kubernetes.io/aws-load-balancer-type: "ip"
# Probes configuration
startupProbe:
enabled: false
initialDelaySeconds: 300
periodSeconds: 60
failureThreshold: 50
successThreshold: 1
timeoutSeconds: 3
livenessProbe:
initialDelaySeconds: 900
periodSeconds: 10
failureThreshold: 30
successThreshold: 1
timeoutSeconds: 3
readinessProbe:
initialDelaySeconds: 120
periodSeconds: 10
failureThreshold: 10
successThreshold: 1
timeoutSeconds: 15
terminationGracePeriodSeconds: 150
# Weaviate Authentication Configuration
authentication:
apikey:
enabled: true
allowed_keys:
- '<REDACTED_API_KEY_1>'
- '<REDACTED_API_KEY_2>'
users:
- admin@example.com
- readonly@example.com
anonymous_access:
enabled: false
oidc:
enabled: false
# Authorization Configuration
authorization:
admin_list:
enabled: true
users:
- admin@example.com
readonly_users:
- readonly@example.com
query_defaults:
limit: 100
debug: false
# Environment variables
env:
CLUSTER_GOSSIP_BIND_PORT: 7000
CLUSTER_DATA_BIND_PORT: 7001
# Aggressive GC settings for memory management
GOGC: 50
LIMIT_RESOURCES: true
# Prometheus metrics enabled
PROMETHEUS_MONITORING_ENABLED: true
# GOMEMLIMIT set to 60GB (64424509440 bytes)
# Note: This is critical for preventing OOM kills
GOMEMLIMIT: "64424509440"
# Query limits
QUERY_MAXIMUM_RESULTS: 15000
# Vector dimension tracking disabled for performance
TRACK_VECTOR_DIMENSIONS: false
REINDEX_VECTOR_DIMENSIONS_AT_STARTUP: false
envSecrets: {}
# Backup providers (all disabled)
backups:
filesystem:
enabled: false
s3:
enabled: false
gcs:
enabled: false
azure:
enabled: false
# Modules configuration - all disabled (no vectorization modules)
modules:
text2vec-contextionary:
enabled: false
text2vec-transformers:
enabled: false
text2vec-openai:
enabled: false
text2vec-huggingface:
enabled: false
text2vec-cohere:
enabled: false
text2vec-palm:
enabled: false
ref2vec-centroid:
enabled: false
multi2vec-clip:
enabled: false
qna-transformers:
enabled: false
qna-openai:
enabled: false
generative-openai:
enabled: false
generative-cohere:
enabled: false
generative-palm:
enabled: false
img2vec-neural:
enabled: false
reranker-cohere:
enabled: false
reranker-transformers:
enabled: false
text-spellcheck:
enabled: false
ner-transformers:
enabled: false
sum-transformers:
enabled: false
# No default vectorizer - using external embedding services
default_vectorizer_module: none
custom_config_map:
enabled: false
name: 'custom-config'
annotations:
nodeSelector:
tolerations:
# Pod anti-affinity to spread replicas across nodes
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- weaviate