Pods hang in "Terminating" state, continue performing async replication checks

g_parki · February 21, 2025, 7:32pm

Description

Hello!
I migrated some collections and used replication (async-enabled) for the first time. I waited some time and manually checked the shard object count to verify all my objects were migrated/replicated/synced, and after that I applied some simple environment variable changes to the Helm chart.

When the update rolled out though, each pod - one at a time - became stuck in Terminating state for several minutes. I checked the logs stream and I saw it was still performing async replication checks while it was supposed to be terminating. The whole operation took 20+ minutes and I wound up intervening to forcefully kill the pods.

Later on I did a k8s version update (for separate reasons), and the same issue happened as k8s attempted to relocate the pods to updated nodes.

Server Setup Information

Weaviate Server Version: 1.26.5
Deployment Method: k8s
Number of Running Nodes: 5

jeronimo_irazabal · February 21, 2025, 7:49pm

Hello @g_parki , thanks for reaching out.

In order to use async replication we highly recommend to upgrade to latest release 1.29.0. If that were an impediment for you we could discuss about it otherwise the upgrade will be the way to go.

Best,
Jeronimo

g_parki · February 21, 2025, 8:19pm

Gotcha, thanks for the quick response. For the time being I’m stuck on this version due to broken support for Azure OpenAI ada-002 models. Vector dimension erroneously being sent to ada-002 model when using Azure OpenAI · Issue #6334 · weaviate/weaviate · GitHub
I’ll up the priority on revectorizing everything and will give 1.29.0 a try.

Topic		Replies	Views
Whole cluster hangups on pod termination Support bug	3	203	September 19, 2024
[QUESTION] Async replication hashbeat fails with context deadline timeout Support bug , python , technical	6	295	February 25, 2025
Async_replication context deadline exceeded, unable to Activate Tenant Support bug , python , technical	3	243	December 19, 2024
Panic after migration to 1.25.1 Support bug	3	339	June 3, 2024
Horizontal Scaling or Upgrade issue - Weaviate cluster Support	15	559	September 5, 2024

Pods hang in "Terminating" state, continue performing async replication checks

Description

Server Setup Information

Related topics