Async_replication context deadline exceeded, unable to Activate Tenant

Description

Hi all, I am trying to run Weaviate on Mixed Arch Kube ( Arm65 and Amd64). 3-Nodes and 3 Replicas. Nodes are geographically apart (2 Nodes at one place, 1 at another).But I am constantly getting these timeouts, only for a Tenant which has almost 3GB data.

{“action”:“async_replication”,“build_git_commit”:“64457c2”,“build_go_version”:“go1.22.9”,“build_image_tag”:“v1.27.2”,“build_wv_version”:“1.27.2”,“class_name”:“Question”,“hashbeat_iteration”:1679,“level”:“warning”,“msg”:“hashbeat iteration failed: collecting differences: "10.233.84.228:7001": connect: Post "http://10.233.84.228:7001/replicas/indices/Question/shards/tenantA/objects/hashtree/0?schema_version=0\”: context deadline exceeded",“shard_name”:“tenantA”,“time”:“2024-11-18T06:48:56Z”}

Can somebody please shed some light over it. I mean, what would happen if you have 300GB data, and it has to be moved from one node to another node. So even if the network is slow it should work but just take longer.

Server Setup Information

  • Weaviate Server Version: 1.27
  • Deployment Method: Helm Chart
  • Multi Node? 4
  • Client Language and Version: Python, 4.9.3
  • Multitenancy?: Yes

Any additional Information

Activating client, sometimes works, sometime it times out, usually problem appears after offloading to s3 and then setting the client active again.

thanks for reaching out @Ali_Raza.
We will review async replication connectivity in slow networking.
Would you mind opening an issue in weaviate github repo?

@jeronimo_irazabal sure. I am updating my complete journey on github issue.

Async_replication context deadline exceeded, unable to Activate Tenant · Issue #6380 · weaviate/weaviate

so far, it seems like python client module has a potential to corrupt the complete weaviate pod.

2 Likes