Fatal error during Weaviate 1.25 startup: could not open cloud meta store

Description

We are seeing our single Weaviate pod repeatedly fail during startup with the fatal error “could not open cloud meta store”. It seems to depend on how much data is in the database if the Pod manages to successfully start up after multiple restarts. What can be the reason for this error? There is only one single Weaviate pod in our Kubernetes cluster, so there is not even any need for any communication to occur between pods. Is there any way to disable the “join cluster” logic as long as there is only one Pod? RAFT_BOOTSTRAP_EXPECT is set to 1 already, so I would expect the system to know there is not any other pods running?

Server Setup Information

  • Weaviate Server Version: 1.25.5
  • Deployment Method: k8s using Helm
  • Multi Node? Number of Running Nodes: 1
  • Multitenancy?: no

Any additional Information

Bootstrap related logs:

{"action":"bootstrap","error":"could not join a cluster from [10.130.37.168:8300]","level":"warning","msg":"failed to join cluster, will notify next if voter","servers":["10.130.37.168:8300"],"time":"2024-07-01T11:06:49Z","voter":true}
{"action":"bootstrap","candidates":[{"Suffrage":0,"ID":"weaviate-0","Address":"10.130.37.168:8300"}],"level":"info","msg":"starting cluster bootstrapping","time":"2024-07-01T11:06:49Z"}
{"action":"bootstrap","error":"bootstrap only works on new clusters","level":"error","msg":"could not bootstrapping cluster","time":"2024-07-01T11:06:49Z"}
{"action":"bootstrap","level":"info","msg":"notified peers this node is ready to join as voter","servers":["10.130.37.168:8300"],"time":"2024-07-01T11:06:49Z"}
{"action":"bootstrap","error":"could not join a cluster from [10.130.37.168:8300]","level":"warning","msg":"failed to join cluster, will notify next if voter","servers":["10.130.37.168:8300"],"time":"2024-07-01T11:08:18Z","voter":true}
{"action":"bootstrap","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","level":"error","msg":"notify all peers","servers":["10.130.37.168:8300"],"time":"2024-07-01T11:08:18Z"}
{"action":"startup","error":"bootstrap: context deadline exceeded","level":"fatal","msg":"could not open cloud meta store","time":"2024-07-01T11:08:18Z"}

Environment variables of the StatefulSet:

- name: AUTHENTICATION_APIKEY_ENABLED
  value: 'true'
- name: AUTHENTICATION_APIKEY_USERS
  value: 'api-key-user-readOnly,api-key-user-admin'
- name: AUTHORIZATION_ADMINLIST_ENABLED
  value: 'true'
- name: AUTHORIZATION_ADMINLIST_READONLY_USERS
  value: api-key-user-readOnly
- name: AUTHORIZATION_ADMINLIST_USERS
  value: api-key-user-admin
- name: AUTOSCHEMA_ENABLED
  value: 'false'
- name: CLUSTER_DATA_BIND_PORT
  value: '7001'
- name: CLUSTER_GOSSIP_BIND_PORT
  value: '7000'
- name: DISABLE_TELEMETRY
  value: 'true'
- name: GOGC
  value: '100'
- name: LIMIT_RESOURCES
  value: 'true'
- name: LOG_LEVEL
  value: debug
- name: PROMETHEUS_MONITORING_ENABLED
  value: 'true'
- name: PROMETHEUS_MONITORING_GROUP
  value: 'false'
- name: PROMETHEUS_MONITORING_PORT
  value: '9091'
- name: QUERY_MAXIMUM_RESULTS
  value: '100000'
- name: REINDEX_VECTOR_DIMENSIONS_AT_STARTUP
  value: 'false'
- name: TIKTOKEN_CACHE_DIR
  value: /weaviate-backups/tiktoken_cache
- name: TRACK_VECTOR_DIMENSIONS
  value: 'false'
- name: AUTHENTICATION_APIKEY_ALLOWED_KEYS
  valueFrom:
    secretKeyRef:
      name: weaviate-api-keys
      key: AUTHENTICATION_APIKEY_ALLOWED_KEYS
- name: CLUSTER_BASIC_AUTH_USERNAME
  valueFrom:
    secretKeyRef:
      name: weaviate-cluster-api-basic-auth
      key: username
- name: CLUSTER_BASIC_AUTH_PASSWORD
  valueFrom:
    secretKeyRef:
      name: weaviate-cluster-api-basic-auth
      key: password
- name: STANDALONE_MODE
  value: 'true'
- name: PERSISTENCE_DATA_PATH
  value: /var/lib/weaviate
- name: DEFAULT_VECTORIZER_MODULE
  value: none
- name: ENABLE_MODULES
  value: 'text2vec-openai,backup-filesystem'
- name: RAFT_JOIN
  value: weaviate-0
- name: RAFT_BOOTSTRAP_EXPECT
  value: '1'
- name: BACKUP_FILESYSTEM_PATH
  value: /weaviate-backups
- name: CLUSTER_JOIN
  value: weaviate-headless.015461-skaios-dev.svc.cluster.local.

hi @andrewisplinghoff !!

Welcome to our community!

That’s strange.

Have you upgrade this cluster or was it a clear install at 1.25.5?

Hi @DudaNogueira,

we upgraded from 1.24.x. This error only started occurring with 1.25.x, what makes sense I guess because the RAFT algorithm is new in 1.25.x.

Oh, on that case, have you followed this migration guide?

Sorry for the delay :grimacing:

I believe this may be related.

Otherwise, I will need to escalate this internally. I have seen some similar log errors, but that was usually for multi node clusters.

Yes, we followed the migration instructions.

Have there been any fixes related to this in 1.25.7? The Pod directly came up with the data it was failing with earlier (maybe we just got lucky though).

Looks like we just got lucky yesterday. Today the pod again failed the first time with the “could not open cloud meta store” error when using 1.25.7, but at least it worked after one restart.

Hi, sorry! I was out of office those days.

Have it fixed with the new version? Not sure I understood.

Upgrading fixed it, but it failed again and was back with a restart?

No, it’s not fixed. The error also happened with the latest version. After it happened, the Pod got restarted and in the second try, it came up successfully. Looks like some sort of race condition to me.

Ok, can you open a Github issue with that?

I will escalate it with out team.

THanks!

Thanks & done: “could not open cloud meta store” error during Weaviate Startup using one node · Issue #5362 · weaviate/weaviate (github.com)