Error restoring backup and file corruption

Description

Hello,

We are using a Weaviate instance 1.26.3 in an Azure Container App and connected to an NFS Azure file share for persistence. It contains almost 1M elements.

We got errors trying to restore different backups (around 7GB) on the instance like this one :

{"msg":"failed to load shard: Unable to load shard TKgQUlWgvhID: init prop \"doc_author\": null index: init disk segments: init segment segment-1730278936414135887.db: mmap file: invalid argument","shard_name":"TKgQUlWgvhID"}

As I understood, it seems there are some corrupted files in our data.

I researched errors in Azure logs and I found some errors occured during the day like this one which seems to be linked to the backup restore error :

{"action":"lsm_memtable_flush","error":"flush: unlinkat /var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_doc_author_nullState/segment-1730278936414135887.scratch.d: directory not empty","index":"<indexName>","level":"error","msg":"flush and switch failed","path":"/var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_doc_author_nullState","shard":"TKgQUlWgvhID"}

We got also other “lsm_memtable_flush” errors on other properties and also “lsm_compaction” errors like this one :

{"action":"lsm_compaction","error":"write index: unlinkat /var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_scope_value_searchable/segment-1730280309487972970.dbcompaction.scratch.d: directory not empty","index":"<indexName>","level":"error","msg":"compaction failed","path":"/var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_scope_value_searchable","shard":"TKgQUlWgvhID"}

It seems to me that Weaviate tried to perform maintenance operation like delete some files, failed and corrupted these files.

Did someone already encountered these errors and know how to resolve them ?

Server Setup Information

  • Weaviate Server Version: 1.26.3
  • Deployment Method: docker
  • Multi Node? No Number of Running Nodes: 1
  • Client Language and Version:
  • Multitenancy?: No

hi @Guillaume !!

Are you trying to restore to the same version?

Hi @DudaNogueira,

Backups comes from a weaviate 1.26.3.

We have two environments, one with weaviate 1.26.3, one with weaviate 1.26.6.

Backup from October 29th restores correctly on both environments.
Backups from October 30th and later fail on restore on both environments.

I made a new try to restore the backup from October 30th on environment using weaviate 1.26.3 to be sure and I got this error :

{"build_git_commit":"9a4ea6d","build_go_version":"go1.21.13","build_image_tag":"1.26.3","build_wv_version":"1.26.3","error":"init shard \"<indexName>_TKgQUlWgvhID\": init shard \"<indexName>_TKgQUlWgvhID\": init prop \"other_infos\": null index: init disk segments: init segment segment-1730279432907267231.db: mmap file: invalid argument","level":"error","msg":"Unable to load shard TKgQUlWgvhID: init shard \"<indexName>_TKgQUlWgvhID\": init shard \"<indexName>_TKgQUlWgvhID\": init prop \"other_infos\": null index: init disk segments: init segment segment-1730279432907267231.db: mmap file: invalid argument","time":"2024-11-27T09:42:04Z"}

{"action":"load_shard","build_git_commit":"9a4ea6d","build_go_version":"go1.21.13","build_image_tag":"1.26.3","build_wv_version":"1.26.3","level":"error","msg":"failed to load shard: Unable to load shard TKgQUlWgvhID: init shard \"<indexName>_TKgQUlWgvhID\": init shard \"<indexName>_TKgQUlWgvhID\": init prop \"other_infos\": null index: init disk segments: init segment segment-1730279432907267231.db: mmap file: invalid argument","shard_name":"TKgQUlWgvhID","time":"2024-11-27T09:42:04Z"}

Also found this error the day of the backup, I don’t know if it’s relevant but it is the same segment :

{"action":"lsm_memtable_flush","build_git_commit":"9a4ea6d","build_go_version":"go1.21.13","build_image_tag":"1.26.3","build_wv_version":"1.26.3","class":"<indexName>","error":"flush: unlinkat /var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_ver_current_version_nullState/segment-1730279432907267231.scratch.d: directory not empty","index":"<indexName>","level":"error","msg":"flush and switch failed","path":"/var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_ver_current_version_nullState","shard":"TKgQUlWgvhID","time":"2024-10-30T09:11:34Z"}