EDIT to put the solution on top. In my case the process failed since I created the collection I wanted to restore and then tried the restore. Also I find the HTTP methods to check if the restore went ok to be misleading
Please remember that the cURL will ask for restore to the running weaviate container.
For this to find the backup data, weaviate container must have been started with the host directory containing the backup (for example /backups ) mounted into the container (eg /tmp/backup) as per the docker compose service declaration. Also the CLUSTER_HOSTNAME environment variable has to be the same of when the backup was created. Here is an example dor a cluster named finland, with the files on the host in /backups and the backup path within the container in /tmp/backup as docker compose service declaration elements:
environment:
ENABLE_MODULES: 'backup-filesystem'
BACKUP_FILESYSTEM_PATH: /tmp/backup
CLUSTER_HOSTNAME: finland
volumes:
- /backups:/tmp/backup
Please note that the clustername needs to be the same on the exporting node and on the node on which you want to restore.
Also note that the collections you want to restore must NOT exist already on the node on which you want to restore.
Remember that to start a new weaviate instance from scratch with no data you can stop its container, delete the data volume, recreate it anew, restart.
β here follows the original question:
Backed up server on nodeA. NodeA runs with the following docker compose definition:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
command:
- "--host=0.0.0.0"
- "--port=8080"
- "--scheme=http"
restart: unless-stopped
environment:
LOG_LEVEL: info
ENABLE_MODULES: 'backup-filesystem'
BACKUP_FILESYSTEM_PATH: /tmp/backup
ENABLE_CUDA: 0
LIMIT_RESOURCES: true
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: true
PERSISTENCE_DATA_PATH: /var/lib/weaviate
CLUSTER_HOSTNAME: finland
DISABLE_TELEMETRY: true
GOMAXPROCS: 4
networks:
- weaviate_net
- mema_network
ports:
- "8080:8080"
- "50051:50051"
volumes:
- weaviate_data:/var/lib/weaviate
- /sata/backup:/tmp/backup
logging: *default-logging
and querying this node for collection and object counts:
curl -L 'http://localhost:8080/v1/nodes?output=verbose'
yields:
{
"nodes": [
{
"batchStats": {
"queueLength": 0,
"ratePerSecond": 0
},
"gitHash": "a61909a",
"name": "finland",
"shards": [
{
"class": "Articles_intfloat_multilingual_e5_large",
"compressed": false,
"loaded": true,
"name": "hjlaDyHIS4Yd",
"objectCount": 614044,
"vectorIndexingStatus": "READY",
"vectorQueueLength": 0
}
],
"stats": {
"objectCount": 614044,
"shardCount": 1
},
"status": "HEALTHY",
"version": "1.25.4"
}
]
}
I back this up with the following command:
curl -L 'http://localhost:8080/v1/backups/filesystem' -H 'Content-Type: application/json' -d "{\"id\": \"20240722060001_mema_wv_backup\"}"
and the resulting backup tree on the host filesystem is is as follows:
mema@newisa:/sata/backup$ tree 20240722060001_mema_wv_backup
20240722060001_mema_wv_backup
βββ backup_config.json
βββ finland
βββ Articles_intfloat_multilingual_e5_large
β βββ chunk-1
βββ backup.json
with the following sizes:
mema@newisa:/sata/backup$ du -h 20240722060001_mema_wv_backup/
2.8G 20240722060001_mema_wv_backup/finland/Articles_intfloat_multilingual_e5_large
2.8G 20240722060001_mema_wv_backup/finland
2.8G 20240722060001_mema_wv_backup/
I transfer this file tree on the new node and place it in the /backups host directory. On this new node weaviate is also started by docker compose with the following service definition:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
command:
- "--host=0.0.0.0"
- "--port=8080"
- "--scheme=http"
restart: unless-stopped
environment:
LOG_LEVEL: info
ENABLE_MODULES: 'backup-filesystem'
BACKUP_FILESYSTEM_PATH: /tmp/backup
ENABLE_CUDA: 0
LIMIT_RESOURCES: true
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: true
PERSISTENCE_DATA_PATH: /var/lib/weaviate
CLUSTER_HOSTNAME: finland
DISABLE_TELEMETRY: true
GOMAXPROCS: 24
networks:
- weaviate_net
- mema_network
ports:
- "8080:8080"
- "50051:50051"
volumes:
- weaviate_data:/var/lib/weaviate
- /backups:/tmp/backup
logging: *default-logging
which as you can see is mostly identical save for the host /backups mounted on the container /tmp/backup.
When I query the new instance, I see the collection exists and has 0 objects as expected:
curl -L 'http://localhost:8080/v1/nodes?output=verbose'
{
"nodes": [
{
"batchStats": {
"queueLength": 0,
"ratePerSecond": 0
},
"gitHash": "a61909a",
"name": "finland",
"shards": [
{
"class": "[]
}
],
"stats": {
"objectCount": 0,
"shardCount": 1
},
"status": "HEALTHY",
"version": "1.25.4"
}
]
}
so I now run the restore command:
# Execute the curl command with the provided backup_id
curl -X POST -H "Content-Type: application/json" -d "{\"id\": \"$backup_id\"}" "http://localhost:8080/v1/backups/filesystem/$backup_id/restore"
which produces the following output:
{"backend":"filesystem","classes":["Articles_intfloat_multilingual_e5_large"],"id":"20240722060001_mema_wv_backup","path":"/tmp/backup/20240722060001_mema_wv_backup","status":"STARTED"}
I then check its outcome:
# Execute the curl command with the provided backup_id
curl "http://localhost:8080/v1/backups/filesystem/$backup_id"
{"backend":"filesystem","id":"20240722060001_mema_wv_backup","path":"/tmp/backup/20240722060001_mema_wv_backup","status":"SUCCESS"}
but please note that this βSUCCESSβ is available only a second later, so highly suspect
I then check if the collection has been populated:
curl -L 'http://localhost:8080/v1/nodes?output=verbose'
but sadly the outcome is that the collection is still empty:
{
"nodes": [
{
"batchStats": {
"queueLength": 0,
"ratePerSecond": 0
},
"gitHash": "a61909a",
"name": "finland",
"shards": [
{
"class": "Articles_intfloat_multilingual_e5_large",
"compressed": false,
"loaded": true,
"name": "BNVEKxLMqZ1K",
"objectCount": 0,
"vectorIndexingStatus": "READY",
"vectorQueueLength": 0
}
],
"stats": {
"objectCount": 0,
"shardCount": 1
},
"status": "HEALTHY",
"version": "1.25.4"
}
]
}
what am I overlooking / doing wrong?
For sure I am not able to see any errors that might arise from the
# Execute the curl command with the provided backup_id
curl -X POST -H "Content-Type: application/json" -d "{\"id\": \"$backup_id\"}" "http://localhost:8080/v1/backups/filesystem/$backup_id/restore"
command which might help in understanding (and the SUCCESS status of the check is misleading).
Thanks a lot