Catastrophic performance drop when upscaling a cluster - while pod not in READY state (loading data/index)

wvuser · November 29, 2024, 5:32am

Description

We have a K8s cluster with 3 servers and 3 pods (1 pod per server).
Weaviate has 20mln objects (2 named vectors per object: 1024 and 768).
All requests (set of CRUD+vectors-search operations) executed with QUORUM concistency level.
With 3 active (READY) pods we have a total performance with ~380RPS.
When downscaling cluster to 2 pods (“weaviate-2” goes away), cluster’s performance returns to same.
Upscaling cluster back to 3 pods (“weaviate-2” returns) make performance dropping to ~160RPS… on period while pod loading data/index (not in READY state).
Can’t see any CRUD/search requests in weaviate-2’s logs.
When “weaviate-2’s” loading finished (became READY), performance returns to normal ~380RPS.

Server Setup Information

Weaviate Server Version: 1.25.25
Deployment Method: k8s
Multi Node? Number of Running Nodes: yes, 3 nodes, repl.factor=3
Client Language and Version: Python-3, weaviate-client-4.6.2
Multitenancy?: no

Any additional Information

DISABLE_LAZY_LOAD_SHARDS=true
HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE=true

DudaNogueira · November 29, 2024, 10:11am

hi @wvuser !!

A lot has changed since 1.25, so if possible, I would suggest you to upgrade to latest version.

Can you see any logs while doing those changes?

wvuser · December 19, 2024, 9:42am

Hi @DudaNogueira !
Sorry for the long absence…
What LOG_LEVEL minimum value do you recommend/require?
The logs (from all PODs) will be large because with a ‘light load’ the performance drop is not so visible.

Topic		Replies	Views
Downtime in replicated two-node cluster when one node is restarting Support	12	441	May 8, 2025
Support needed for fixing Weaviate performance issues Support python , technical	4	263	October 17, 2024
High Query latency in Weaviate Support	13	430	October 1, 2024
Darshan Hiranandani : Scaling StatefulSet in Weaviate on Kubernetes – Need Help with Replica Adjustment General technical	1	145	January 27, 2025
Explosive growth (to 10sec) of request latency when one cluster's node fails Support technical	4	221	August 28, 2024

Catastrophic performance drop when upscaling a cluster - while pod not in READY state (loading data/index)

Description

Server Setup Information

Any additional Information

Related topics