Unable to restore a filesystem based backup on another machine

EDIT to put the solution on top. In my case the process failed since I created the collection I wanted to restore and then tried the restore. Also I find the HTTP methods to check if the restore went ok to be misleading

Please remember that the cURL will ask for restore to the running weaviate container.

For this to find the backup data, weaviate container must have been started with the host directory containing the backup (for example /backups ) mounted into the container (eg /tmp/backup) as per the docker compose service declaration. Also the CLUSTER_HOSTNAME environment variable has to be the same of when the backup was created. Here is an example dor a cluster named finland, with the files on the host in /backups and the backup path within the container in /tmp/backup as docker compose service declaration elements:

environment:
  ENABLE_MODULES: 'backup-filesystem'
  BACKUP_FILESYSTEM_PATH: /tmp/backup
  CLUSTER_HOSTNAME: finland

volumes:
  - /backups:/tmp/backup

Please note that the clustername needs to be the same on the exporting node and on the node on which you want to restore.

Also note that the collections you want to restore must NOT exist already on the node on which you want to restore.

Remember that to start a new weaviate instance from scratch with no data you can stop its container, delete the data volume, recreate it anew, restart.

β€” here follows the original question:

Backed up server on nodeA. NodeA runs with the following docker compose definition:

  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    command:
      - "--host=0.0.0.0"
      - "--port=8080"
      - "--scheme=http"
    restart: unless-stopped
    environment:
      LOG_LEVEL: info
      ENABLE_MODULES: 'backup-filesystem'
      BACKUP_FILESYSTEM_PATH: /tmp/backup
      ENABLE_CUDA: 0
      LIMIT_RESOURCES: true
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: true
      PERSISTENCE_DATA_PATH: /var/lib/weaviate
      CLUSTER_HOSTNAME: finland
      DISABLE_TELEMETRY: true
      GOMAXPROCS: 4
    networks:
      - weaviate_net
      - mema_network
    ports:
      - "8080:8080"
      - "50051:50051"
    volumes:
      - weaviate_data:/var/lib/weaviate
      - /sata/backup:/tmp/backup
    logging: *default-logging

and querying this node for collection and object counts:
curl -L 'http://localhost:8080/v1/nodes?output=verbose'
yields:

{
    "nodes": [
        {
            "batchStats": {
                "queueLength": 0,
                "ratePerSecond": 0
            },
            "gitHash": "a61909a",
            "name": "finland",
            "shards": [
                {
                    "class": "Articles_intfloat_multilingual_e5_large",
                    "compressed": false,
                    "loaded": true,
                    "name": "hjlaDyHIS4Yd",
                    "objectCount": 614044,
                    "vectorIndexingStatus": "READY",
                    "vectorQueueLength": 0
                }
            ],
            "stats": {
                "objectCount": 614044,
                "shardCount": 1
            },
            "status": "HEALTHY",
            "version": "1.25.4"
        }
    ]
}

I back this up with the following command:
curl -L 'http://localhost:8080/v1/backups/filesystem' -H 'Content-Type: application/json' -d "{\"id\": \"20240722060001_mema_wv_backup\"}"
and the resulting backup tree on the host filesystem is is as follows:

mema@newisa:/sata/backup$ tree 20240722060001_mema_wv_backup
20240722060001_mema_wv_backup
β”œβ”€β”€ backup_config.json
└── finland
    β”œβ”€β”€ Articles_intfloat_multilingual_e5_large
    β”‚   └── chunk-1
    └── backup.json

with the following sizes:

mema@newisa:/sata/backup$ du -h 20240722060001_mema_wv_backup/
2.8G    20240722060001_mema_wv_backup/finland/Articles_intfloat_multilingual_e5_large
2.8G    20240722060001_mema_wv_backup/finland
2.8G    20240722060001_mema_wv_backup/

I transfer this file tree on the new node and place it in the /backups host directory. On this new node weaviate is also started by docker compose with the following service definition:

  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    command:
      - "--host=0.0.0.0"
      - "--port=8080"
      - "--scheme=http"
    restart: unless-stopped
    environment:
      LOG_LEVEL: info
      ENABLE_MODULES: 'backup-filesystem'
      BACKUP_FILESYSTEM_PATH: /tmp/backup
      ENABLE_CUDA: 0
      LIMIT_RESOURCES: true
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: true
      PERSISTENCE_DATA_PATH: /var/lib/weaviate
      CLUSTER_HOSTNAME: finland
      DISABLE_TELEMETRY: true
      GOMAXPROCS: 24
    networks:
      - weaviate_net
      - mema_network
    ports:
      - "8080:8080"
      - "50051:50051"
    volumes:
      - weaviate_data:/var/lib/weaviate
      - /backups:/tmp/backup
    logging: *default-logging

which as you can see is mostly identical save for the host /backups mounted on the container /tmp/backup.

When I query the new instance, I see the collection exists and has 0 objects as expected:

curl -L 'http://localhost:8080/v1/nodes?output=verbose'
{
    "nodes": [
        {
            "batchStats": {
                "queueLength": 0,
                "ratePerSecond": 0
            },
            "gitHash": "a61909a",
            "name": "finland",
            "shards": [
                {
                    "class": "[]
                }
            ],
            "stats": {
                "objectCount": 0,
                "shardCount": 1
            },
            "status": "HEALTHY",
            "version": "1.25.4"
        }
    ]
}

so I now run the restore command:

# Execute the curl command with the provided backup_id
curl -X POST -H "Content-Type: application/json" -d "{\"id\": \"$backup_id\"}" "http://localhost:8080/v1/backups/filesystem/$backup_id/restore"

which produces the following output:
{"backend":"filesystem","classes":["Articles_intfloat_multilingual_e5_large"],"id":"20240722060001_mema_wv_backup","path":"/tmp/backup/20240722060001_mema_wv_backup","status":"STARTED"}
I then check its outcome:

# Execute the curl command with the provided backup_id
curl "http://localhost:8080/v1/backups/filesystem/$backup_id"
{"backend":"filesystem","id":"20240722060001_mema_wv_backup","path":"/tmp/backup/20240722060001_mema_wv_backup","status":"SUCCESS"}

but please note that this β€œSUCCESS” is available only a second later, so highly suspect :frowning:

I then check if the collection has been populated:
curl -L 'http://localhost:8080/v1/nodes?output=verbose'
but sadly the outcome is that the collection is still empty:

{
    "nodes": [
        {
            "batchStats": {
                "queueLength": 0,
                "ratePerSecond": 0
            },
            "gitHash": "a61909a",
            "name": "finland",
            "shards": [
                {
                    "class": "Articles_intfloat_multilingual_e5_large",
                    "compressed": false,
                    "loaded": true,
                    "name": "BNVEKxLMqZ1K",
                    "objectCount": 0,
                    "vectorIndexingStatus": "READY",
                    "vectorQueueLength": 0
                }
            ],
            "stats": {
                "objectCount": 0,
                "shardCount": 1
            },
            "status": "HEALTHY",
            "version": "1.25.4"
        }
    ]
}

what am I overlooking / doing wrong?

For sure I am not able to see any errors that might arise from the

# Execute the curl command with the provided backup_id
curl -X POST -H "Content-Type: application/json" -d "{\"id\": \"$backup_id\"}" "http://localhost:8080/v1/backups/filesystem/$backup_id/restore"

command which might help in understanding (and the SUCCESS status of the check is misleading).

Thanks a lot

Ciao my friend!!

Do you see any outstanding logs on the server side?

If I recall it correctly, you shouldn’t have the collection created, or it will yield that the collection exists already.

I just noticed we do not have a backup recipe in our recipes repo, so I will try to both reproduce this and work on something there later today/tomorrow.

1 Like

Hey @DudaNogueira hope life is treating you well my friend.

I restarted the whole process with a new virgin volume for weaviate and I did NOT create that collection beforehand AND despite the query results you will see below, I WAITED some time and THEN checked and much to my joy and surprise I found the collection created and the objects restored.

So good for me save that maybe the queries you see below are misleading:

# Execute the curl command with the provided backup_id
curl -X POST -H "Content-Type: application/json" -d "{\"id\": \"$backup_id\"}" "http://localhost:8080/v1/backups/filesystem/$backup_id/restore"

immediately gives:
{"backend":"filesystem","classes":["Articles_intfloat_multilingual_e5_large"],"id":"20240722060001_mema_wv_backup","path":"/tmp/backup/20240722060001_mema_wv_backup","status":"STARTED"}
which is correct. But:

# Execute the curl command with the provided backup_id
curl "http://localhost:8080/v1/backups/filesystem/$backup_id"

immediately giving:
{"backend":"filesystem","id":"20240722060001_mema_wv_backup","path":"/tmp/backup/20240722060001_mema_wv_backup","status":"SUCCESS"}
in my opinion is misleading. It leads you to believe that the backup FINISHED and was successfull, which is not true.

The backup was still underway as show by trying another:

# Execute the curl command with the provided backup_id
curl -X POST -H "Content-Type: application/json" -d "{\"id\": \"$backup_id\"}" "http://localhost:8080/v1/backups/filesystem/$backup_id/restore"

which gave a tale telling result:
{"error":[{"message":"restoration 20240722060001_mema_wv_backup already in progress"}]}
which led to me to wait some time, crossing my fingers, and after some time passed I HTTP queried weaviate instance and was happy to see:

{
    "nodes": [
        {
            "batchStats": {
                "queueLength": 0,
                "ratePerSecond": 0
            },
            "gitHash": "a61909a",
            "name": "finland",
            "shards": [
                {
                    "class": "Articles_intfloat_multilingual_e5_large",
                    "compressed": false,
                    "loaded": true,
                    "name": "hjlaDyHIS4Yd",
                    "objectCount": 614010,
                    "vectorIndexingStatus": "READY",
                    "vectorQueueLength": 0
                }
            ],
            "stats": {
                "objectCount": 614010,
                "shardCount": 1
            },
            "status": "HEALTHY",
            "version": "1.25.4"
        }
    ]
}

So I have solved the mistery but it was a long hunt :slight_smile:

Thank you and take care

1 Like

Thanks for sharing, amigo!

1 Like