Weaviate cluster is very unstable (1.29.2)

Hello,

We recently changed our single node weaviate to a cluster (3 nodes, 3 replication). We have small amount of data (<5 millions docs), small amount of requests (<20/minute).

When the cluster starts we see huge disk usage (seem the vector cache filling ?). We added
HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE and DISABLE_LAZY_LOAD_SHARDS to wait that it ends before setting the node as ready. Because when queries are coming at the same time, our logs get flooded by “An I/O timeout occurs when the request takes longer than the specified server-side timeout”, and we get small amount of production downtime. We also see fail message from memberlist. After the cache filling it seems it starts working again.

But event this these fix, with very small amount of queries, the disk are almost at 250MB/S continuoustly. The cluster is very unstable with frequent timeout. We configured a consistency level of ONE, we do only read (no write except at indexing time).

In single node setup we don’t have any issue.

Here is the rest of the config

    - name: RAFT_BOOTSTRAP_EXPECT
      value: "3"
    - name: RAFT_METADATA_ONLY_VOTERS
      value: "false"
    - name: CLUSTER_GOSSIP_BIND_PORT
      value: "7000"
    - name: CLUSTER_DATA_BIND_PORT
      value: "7001"
    - name: REPLICATION_MINIMUM_FACTOR
      value: "3"
    - name: ASYNC_INDEXING
      value: "false"
    - name: QUERY_DEFAULTS_LIMIT
      value: "25"
    - name: QUERY_MAXIMUM_RESULTS
      value: "10000"
    - name: AUTOSCHEMA_ENABLED
      value: "false"
    - name: PROMETHEUS_MONITORING_ENABLED
      value: "true"
    - name: TRACK_VECTOR_DIMENSIONS
      value: "false"
    - name: REINDEX_VECTOR_DIMENSIONS_AT_STARTUP
      value: "false"

I suppose we are missing something, waviate probably run way larger deployments.

EDIT: I tried to keep 3 nodes, but only 1 replication factor, and I don’t have the issue. So the issue is only with replication

Hey @gaetansnl,

Welcome to the community — great to have you here!

The timeout you’re seeing is likely expected, especially since you’re using HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE and DISABLE_LAZY_LOAD_SHARDS. This is due to vectors being loaded into memory during startup.

I’d recommend enabling lazy shard loading:

With this setting, you can start querying right away while shards continue loading in the background. Just keep in mind that if a query targets an object that hasn’t yet been loaded, it might be delayed slightly as it’s brought into memory — so some initial latency can occur.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

Hello @Mohamed_Shahin . Thank you for your reponse. We tried with both HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE and DISABLE_LAZY_LOAD_SHARDS and without.

The issue seems more related to replication. We do only reads on the database (no write). When replication is disabled everything works fine, even with multiple nodes. When replication is enabled we see very high usage of disks and healchecks are failing every few hours

Thank you so much for the details!

Would you mind confirming with me if async_replication is set to true in the collection config when you see the issues?

Also, could you share the logs with me from when replication is to 3? I’d like to go through everything and check the flow.

Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

Hello! it seems async replication is indeed enabled because we don’t set ASYNC_REPLICATION_DISABLED. Here is the log with replica 3, as you can see there are many errors.

Here are the logs

Summary

{“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“term”:387,“time”:“2025-04-07T16:14:32Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“follower”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering follower state”,“time”:“2025-04-07T16:14:32Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:52612-\u003e192.168.14.184:7000: i/o timeout",“time”:“2025-04-07T16:14:40Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:52616-\u003e192.168.14.184:7000: i/o timeout",“time”:“2025-04-07T16:14:41Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“from”:“192.168.91.100:8300”,“leader”:“192.168.14.184:8300”,“leader-id”:“weaviate-1743897603-1”,“level”:“warning”,“msg”:“rejecting pre-vote request since we have a leader”,“time”:“2025-04-07T16:14:41Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“last-leader-addr”:“192.168.14.184:8300”,“last-leader-id”:“weaviate-1743897603-1”,“level”:“warning”,“msg”:“heartbeat timeout reached, starting election”,“time”:“2025-04-07T16:14:42Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“term”:388,“time”:“2025-04-07T16:14:42Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“pre-vote successful, starting election”,“refused”:0,“tally”:2,“term”:388,“time”:“2025-04-07T16:14:42Z”,“votesNeeded”:2}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“follower”:{},“leader-address”:“192.168.14.184:8300”,“leader-id”:“weaviate-1743897603-1”,“level”:“info”,“msg”:“entering follower state”,“time”:“2025-04-07T16:14:42Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“duplicate requestVote for same term”,“term”:388,“time”:“2025-04-07T16:14:42Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“last-leader-addr”:“192.168.14.184:8300”,“last-leader-id”:“weaviate-1743897603-1”,“level”:“warning”,“msg”:“heartbeat timeout reached, starting election”,“time”:“2025-04-07T16:15:12Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“term”:389,“time”:“2025-04-07T16:15:12Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“pre-vote successful, starting election”,“refused”:0,“tally”:2,“term”:389,“time”:“2025-04-07T16:15:13Z”,“votesNeeded”:2}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:48526-\u003e192.168.14.184:7000: i/o timeout",“time”:“2025-04-07T16:15:13Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“election won”,“tally”:2,“term”:389,“time”:“2025-04-07T16:15:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“leader”:{},“level”:“info”,“msg”:“entering leader state”,“time”:“2025-04-07T16:15:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“added peer, starting replication”,“peer”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:15:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“added peer, starting replication”,“peer”:“weaviate-1743897603-0”,“time”:“2025-04-07T16:15:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“from”:“192.168.91.100:8300”,“leader”:“192.168.53.62:8300”,“leader-id”:“weaviate-1743897603-2”,“level”:“warning”,“msg”:“rejecting pre-vote request since we have a leader”,“time”:“2025-04-07T16:15:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:600933315,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:15:15Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:600918186,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-0”,“time”:“2025-04-07T16:15:15Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“warning”,“msg”:“failed to contact quorum of nodes, stepping down”,“time”:“2025-04-07T16:15:15Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“follower”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering follower state”,“time”:“2025-04-07T16:15:15Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:48534-\u003e192.168.14.184:7000: i/o timeout",“time”:“2025-04-07T16:15:15Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:45836-\u003e192.168.91.100:7000: i/o timeout",“time”:“2025-04-07T16:15:16Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:" memberlist: Suspect weaviate-1743897603-0 has failed, no acks received",“time”:“2025-04-07T16:15:16Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“last-leader-addr”:“”,“last-leader-id”:“”,“level”:“warning”,“msg”:“heartbeat timeout reached, starting election”,“time”:“2025-04-07T16:15:16Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“term”:390,“time”:“2025-04-07T16:15:16Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“pre-vote successful, starting election”,“refused”:0,“tally”:2,“term”:390,“time”:“2025-04-07T16:15:17Z”,“votesNeeded”:2}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“candidate”:“192.168.14.184:8300”,“last-candidate-term”:388,“last-term”:389,“level”:“warning”,“msg”:“rejecting pre-vote request since our last term is greater”,“time”:“2025-04-07T16:15:17Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“election won”,“tally”:2,“term”:390,“time”:“2025-04-07T16:15:17Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“leader”:{},“level”:“info”,“msg”:“entering leader state”,“time”:“2025-04-07T16:15:17Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“added peer, starting replication”,“peer”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:15:17Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“added peer, starting replication”,“peer”:“weaviate-1743897603-0”,“time”:“2025-04-07T16:15:17Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“warning”,“msg”:“appendEntries rejected, sending older logs”,“next”:263,“peer”:{“Suffrage”:0,“ID”:“weaviate-1743897603-1”,“Address”:“192.168.14.184:8300”},“time”:“2025-04-07T16:15:18Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:507214295,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-0”,“time”:“2025-04-07T16:15:18Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“pipelining replication”,“peer”:{“Suffrage”:0,“ID”:“weaviate-1743897603-1”,“Address”:“192.168.14.184:8300”},“time”:“2025-04-07T16:15:18Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“warning”,“msg”:“appendEntries rejected, sending older logs”,“next”:263,“peer”:{“Suffrage”:0,“ID”:“weaviate-1743897603-0”,“Address”:“192.168.91.100:8300”},“time”:“2025-04-07T16:15:18Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“pipelining replication”,“peer”:{“Suffrage”:0,“ID”:“weaviate-1743897603-0”,“Address”:“192.168.91.100:8300”},“time”:“2025-04-07T16:15:19Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:500512588,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:15:32Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:511279959,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-0”,“time”:“2025-04-07T16:16:01Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:500267640,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:506842371,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:535424795,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-1”,“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“fields.time”:505051019,“level”:“warning”,“msg”:“failed to contact”,“server-id”:“weaviate-1743897603-0”,“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“warning”,“msg”:“failed to contact quorum of nodes, stepping down”,“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“follower”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering follower state”,“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“aborting pipeline replication”,“peer”:{“Suffrage”:0,“ID”:“weaviate-1743897603-0”,“Address”:“192.168.91.100:8300”},“time”:“2025-04-07T16:16:14Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“aborting pipeline replication”,“peer”:{“Suffrage”:0,“ID”:“weaviate-1743897603-1”,“Address”:“192.168.14.184:8300”},“time”:“2025-04-07T16:16:15Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:43314-\u003e192.168.14.184:7000: i/o timeout",“time”:“2025-04-07T16:16:16Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“last-leader-addr”:“”,“last-leader-id”:“”,“level”:“warning”,“msg”:“heartbeat timeout reached, starting election”,“time”:“2025-04-07T16:16:16Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“entering candidate state”,“node”:{},“term”:391,“time”:“2025-04-07T16:16:16Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“follower”:{},“leader-address”:“”,“leader-id”:“”,“level”:“info”,“msg”:“entering follower state”,“time”:“2025-04-07T16:16:17Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“info”,“msg”:“duplicate requestVote for same term”,“term”:391,“time”:“2025-04-07T16:16:18Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:38066-\u003e192.168.91.100:7000: i/o timeout",“time”:“2025-04-07T16:16:18Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“from”:“192.168.14.184:8300”,“leader”:“192.168.91.100:8300”,“leader-id”:“weaviate-1743897603-0”,“level”:“warning”,“msg”:“rejecting pre-vote request since we have a leader”,“time”:“2025-04-07T16:16:20Z”}
weaviate-1743897603-2 weaviate {“action”:“raft”,“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“from”:“192.168.14.184:8300”,“leader”:“192.168.91.100:8300”,“leader-id”:“weaviate-1743897603-0”,“level”:“warning”,“msg”:“rejecting vote request since we have a leader”,“time”:“2025-04-07T16:16:20Z”}
weaviate-1743897603-2 weaviate {“build_git_commit”:“927897e”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.2”,“build_wv_version”:“1.29.2”,“level”:“error”,“msg”:" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 192.168.53.62:34984-\u003e192.168.91.100:7000: i/o timeout",“time”:“2025-04-07T16:20:13Z”}

Could you please help narrow down the root cause by disabling async_replication on the collections, then restarting the cluster and retrying to see if that stabilizes things?

I’ve got some personal scripts and snippets in my GitHub repo that you can use right away. Just open this notebook and update all your collections by setting async_replication to false:

Additionally, if you want to visualize the collection configuration and quickly see which ones have async replication is set to true or false, feel free to use my local UI tool:

It will look like:


I’m sharing these just to make things easier, but if you’ve already got your own scripts set up, no worries—go ahead with those.

Once async_replication is disabled, let me know how the cluster behaves.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

Hello,
Thank your for you response

We reindexed everything on a new cluster of weaviate 1.30 and disabled async replication using the global ENV variable ASYNC_REPLICATION_DISABLED=true and we still see very high disk usage

And it is for a very low amount of queries, the CPU usage is close to 0 and memory usage is only at 50% of what we provide.

I’m currently looking into this — when running the query with a consistency level of ONE, the disk usage seems normal. However, when using the default QUORUM or ALL, the disk usage increases significantly, just as you described.

I’ll dig further and update you shortly.

Hello Mohamed,
We made progress on the issue, it seems the disk read is the LSM tree. In the past we had instances with 5x the RAM needed to fit the vectors, we reduced it now. It seems read are happening at the same time as page fault

So we think that when we had the 80Go RAM instance the whole dataset was in RAM and now there are reads because we only have 24Go… We will try to use pread instead of mmap since it looks like it is causing less issues (we don’t have OOM anymore for example).

Also, since we updated to 1.30.0 we have less disconnect between the node but we still have connection issues… We will deploy pread first to monitor the behavior, then we will try to debug the raft issue.

2 Likes