Pods hang in "Terminating" state, continue performing async replication checks

Description

Hello!
I migrated some collections and used replication (async-enabled) for the first time. I waited some time and manually checked the shard object count to verify all my objects were migrated/replicated/synced, and after that I applied some simple environment variable changes to the Helm chart.

When the update rolled out though, each pod - one at a time - became stuck in Terminating state for several minutes. I checked the logs stream and I saw it was still performing async replication checks while it was supposed to be terminating. The whole operation took 20+ minutes and I wound up intervening to forcefully kill the pods.

Later on I did a k8s version update (for separate reasons), and the same issue happened as k8s attempted to relocate the pods to updated nodes.

Server Setup Information

  • Weaviate Server Version: 1.26.5
  • Deployment Method: k8s
  • Number of Running Nodes: 5

Hello @g_parki , thanks for reaching out.

In order to use async replication we highly recommend to upgrade to latest release 1.29.0. If that were an impediment for you we could discuss about it otherwise the upgrade will be the way to go.

Best,
Jeronimo

Gotcha, thanks for the quick response. For the time being I’m stuck on this version due to broken support for Azure OpenAI ada-002 models. Vector dimension erroneously being sent to ada-002 model when using Azure OpenAI · Issue #6334 · weaviate/weaviate · GitHub
I’ll up the priority on revectorizing everything and will give 1.29.0 a try.