Schema loss after scale down and scale up the RAFT cluster

Team,

Due to some issues on the Kubernetes side, we had to delete and recreate the Weaviate cluster. However, the cluster did not start because Raft does not support scale-down or deletion in this manner.

To proceed, we manually removed the Raft directory from the hostPath on all Weaviate nodes and restarted the cluster.

Post restart, we are observing that some collections and tenants are missing (showing not found errors), even though their corresponding data directories are still present under /var/lib/weaviate.

We suspect that collections created after the Raft migration are no longer recognized, possibly due to loss of Raft metadata.


Request:

  • Kindly help us recover the missing collections/tenants, if possible.
  • Please also suggest a proper approach to delete and recreate a Weaviate cluster without data loss, for some unavoidable cases.

Note:
Data directories for the affected collections and tenants are still present in /var/lib/weaviate, but they are not being detected by the cluster.

Server Setup Information

  • Weaviate Server Version: 1.32.27
  • Deployment Method: Kubernetes
  • Multi Node? Number of Running Nodes: Yes, 9
  • Client Language and Version: python 4.16.9
  • Multitenancy?: Yes

Hi @Dharanish !

The recommended way to move shards around is thru replica moviment.

Can you get back the cluster to the 9 nodes/pre scale down stage?

If positive, I see two potential routes:

  1. Use the replica moviment to rebalance your shards and decomission nodes.
  2. Migrate the entire content to a new cluster with the desired replicas.

Here is a brief summary of what can be your scenario:

Your issue stems from the fact that Weaviate’s schema metadata (including collections and tenants) is managed through Raft consensus, while the actual data is stored separately. When you manually removed the Raft directory, you lost the metadata that tracks which collections exist, even though the data files remain.

You can get more info about your scenario at this deepwiki analysis

I will work on a recipe for replica moviment as this is popular topic and a nice feature that need some more love.

Let me know if this helps!