Description
Hello,
We are using a Weaviate instance 1.26.3 in an Azure Container App and connected to an NFS Azure file share for persistence. It contains almost 1M elements.
We got errors trying to restore different backups (around 7GB) on the instance like this one :
{"msg":"failed to load shard: Unable to load shard TKgQUlWgvhID: init prop \"doc_author\": null index: init disk segments: init segment segment-1730278936414135887.db: mmap file: invalid argument","shard_name":"TKgQUlWgvhID"}
As I understood, it seems there are some corrupted files in our data.
I researched errors in Azure logs and I found some errors occured during the day like this one which seems to be linked to the backup restore error :
{"action":"lsm_memtable_flush","error":"flush: unlinkat /var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_doc_author_nullState/segment-1730278936414135887.scratch.d: directory not empty","index":"<indexName>","level":"error","msg":"flush and switch failed","path":"/var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_doc_author_nullState","shard":"TKgQUlWgvhID"}
We got also other “lsm_memtable_flush” errors on other properties and also “lsm_compaction” errors like this one :
{"action":"lsm_compaction","error":"write index: unlinkat /var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_scope_value_searchable/segment-1730280309487972970.dbcompaction.scratch.d: directory not empty","index":"<indexName>","level":"error","msg":"compaction failed","path":"/var/lib/weaviate/<indexName>/TKgQUlWgvhID/lsm/property_scope_value_searchable","shard":"TKgQUlWgvhID"}
It seems to me that Weaviate tried to perform maintenance operation like delete some files, failed and corrupted these files.
Did someone already encountered these errors and know how to resolve them ?
Server Setup Information
- Weaviate Server Version: 1.26.3
- Deployment Method: docker
- Multi Node? No Number of Running Nodes: 1
- Client Language and Version:
- Multitenancy?: No