Issues with GCS Backup

Configuration:
Using Weaviate 1.24.9.
Self hosted on Kubernetes cluster with 7 replicas.
Using the provided helm chart and hav a GCS bucket configured in the helm values.

Didn’t have an issue with earlier version however now when running

result = client.backup.create(
  backup_id="my-very-first-backup",
  backend="gcs",
  include_classes= [
                 "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery",
  wait_for_completion=True,
)

The backup get stuck loading forever.

If I choose wait_for_completion=False I can atleast continue to use the database but if I try to create a back up at a later point (A week later for example) It will raise the error:

weaviate.exceptions.UnexpectedStatusCodeException: Backup creation! Unexpected status code: 422, with response body: {'error': [{'message': 'node {"l-weaviate-4" "100.82.xx.xxxx"}: cannot commit : backup my-very-first-backup already in progress'}]}.

And the only way to get around this is to delete the nodes and have them reload from persistent disk volume.

Additionally, when I check the storage buckets location for the back up there is only one file the backup_config.json which looks like the following below. Normally each node would have its own folder along with with backup_config.json

{
    "startedAt": "2024-05-06T16:59:05.076098496Z",
    "completedAt": "0001-01-01T00:00:00Z",
    "id": "my-very-first-backup",
    "nodes": {
        "weaviate-dev-0": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        },
        "weaviate-dev-1": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        },
        "weaviate-dev-2": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        },
        "weaviate-dev-3": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        },
        "weaviate-dev-4": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        },
        "weaviate-dev-5": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        },
        "weaviate-dev-6": {
            "classes": [
                "ENCategory",
                "FRCategory",
                "ENProduct",
                "FRProduct",
                "ENSearchQuery",
                "FRSearchQuery"
            ],
            "status": "",
            "error": ""
        }
    },
    "node_mapping": null,
    "status": "STARTED",
    "version": "2.0",
    "serverVersion": "1.24.9",
    "error": ""
}

Wondering if anyone has any ideas as to why this might be happening with newer versions.

hi @Landon_Edwards !

Do you see any outstanding logs in any of those nodes?