Raft make boostrapping take too long

Description

My current usecase need to restart the weaviate pod frequently. The data is quite huge, about 2687 schema. Look like enable Raft on version 1.27.1 make the boostrapping take about 45 minutes. I could not find any instructions to disable Raft to make the boostrapping time shorter, like previous version as 1.24.10. Can you help?

Server Setup Information

  • Weaviate Server Version: 1.27.1
  • Deployment Method: K8S via Helm Chart
  • Multi Node? Number of Running Nodes: Single node (1 pod)
  • Client Language and Version: Python 4.9.3
  • Multitenancy?: No

Any additional Information

# 3 first entries
{"build_git_commit":"05de0db","build_go_version":"go1.22.8","build_image_tag":"v1.27.1","build_wv_version":"1.27.1","level":"info","msg":"Schema catching up: applying log entry: [3/2687]","time":"2024-11-07T04:40:41Z"}
{"build_git_commit":"05de0db","build_go_version":"go1.22.8","build_image_tag":"v1.27.1","build_wv_version":"1.27.1","level":"info","msg":"Schema catching up: applying log entry: [4/2687]","time":"2024-11-07T04:40:41Z"}
{"build_git_commit":"05de0db","build_go_version":"go1.22.8","build_image_tag":"v1.27.1","build_wv_version":"1.27.1","level":"info","msg":"Schema catching up: applying log entry: [5/2687]","time":"2024-11-07T04:40:41Z"}
....
# The last entries
{"action":"hnsw_prefill_cache_async","build_git_commit":"05de0db","build_go_version":"go1.22.8","build_image_tag":"v1.27.1","build_wv_version":"1.27.1","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-11-07T05:20:24Z","wait_for_cache_prefill":false}
{"build_git_commit":"05de0db","build_go_version":"go1.22.8","build_image_tag":"v1.27.1","build_wv_version":"1.27.1","level":"info","msg":"Completed loading shard *** in 26.496718ms","time":"2024-11-07T05:20:24Z"}
{"action":"hnsw_vector_cache_prefill","build_git_commit":"05de0db","build_go_version":"go1.22.8","build_image_tag":"v1.27.1","build_wv_version":"1.27.1","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-11-07T05:20:24Z","took":213805}

It took 40 minutes to complete.

Hello quybao!

Thanks for reporting that issue. We have also noticed it in some internal deployments and we have a tentative fix in progress.

The problem is due to the fact that on starting up, weaviate will replay schema changes to rebuild the internal schema representation. If there are many (in your case >2000) schema changes, these can trigger a graphql rebuild everytime one is applied, this can get very expensive fast and delay the whole operation (one gql rebuild can take seconds if the schema is large).

The tentative fix delay rebuilding graphql until the whole schema is rebuilt internally, and then just rebuild graphql once.

The PR is here skip the schema callbacks when raft schema is still catching up by reyreaud-l · Pull Request #6229 · weaviate/weaviate · GitHub.
If you wish to try it, you can use the image preview-skip-the-schema-callbacks-when-raft-schema-is-still-catching-up-701ac1b.
Note that it is based on the latest 1.26, as you upgraded to 1.27.1 this would mean a downgrade in your case.
So far it has passed all our pipelines and I’m hopeful we’ll get it merged in the next few business days and it will be included in a patch release shortly.

I’ll update here once the fix is merged and a patch release it out.
Sorry for the inconvenience in the meantime!

1 Like

The PR has been merged to our mainline and will be released to 1.25+ versions shortly.

1 Like