Description
We are performing a batch import where in the last step we create cross-references between the objects. While doing so, we perform consistency checks that all objects that were written in the previous step can also be retrieved. In our last run, this was not always the case. We did not receive a client-side error during the batch import, but at the time of the import i the server logs there was (among others) the following error:
{"action":"lsm_memtable_flush","class":"PageNode_v3","error":"flush: unlinkat /var/lib/weaviate/pagenode_v3/kUuKzkTaWVxi/lsm/objects/segment-1720459484309792404.scratch.d: directory not empty","index":"pagenode_v3","level":"error","msg":"flush and switch failed","path":"/var/lib/weaviate/pagenode_v3/kUuKzkTaWVxi/lsm/objects","shard":"kUuKzkTaWVxi","time":"2024-07-08T17:25:48Z"}
The following query did not return any objects although they had been inserted before:
weaviate_client.query.get(
'PageNode_v3',
['page_id', 'node_index']
)
.with_where({
"path": ["page_id"],
"operator": "Equal",
"valueText": page_id
})
.with_limit(100_000)
.do()
Interestingly, after a server restart (we upgraded to 1.25.7 during that restart, but I do not think that that made a difference), the objects are now retrievable. During server startup, the following messages related to this shard were printed to the log:
{"action":"lsm_segment_init","class":"PageNode_v3","index":"pagenode_v3","level":"info","msg":"discarded (partially written) LSM segment, because an active WAL for the same segment was found. A recovery from the WAL will follow.","path":"/var/lib/weaviate/pagenode_v3/kUuKzkTaWVxi/lsm/objects/segment-1720459484309792404.db","shard":"kUuKzkTaWVxi","time":"2024-07-09T10:49:47Z","wal_path":"segment-1720459484309792404.wal"}
{"action":"lsm_recover_from_active_wal","class":"PageNode_v3","index":"pagenode_v3","level":"warning","msg":"active write-ahead-log found. Did weaviate crash prior to this? Trying to recover...","path":"/var/lib/weaviate/pagenode_v3/kUuKzkTaWVxi/lsm/objects/segment-1720459484309792404","shard":"kUuKzkTaWVxi","time":"2024-07-09T10:49:47Z"}
Obviously having to restart the server to have all objects readable is not optimal. Could you please help us understand what is happening here and if there is a chance of having this not happen in the first place?
Server Setup Information
- Weaviate Server Version: 1.25.6
- Deployment Method: k8s using helm
- Multi Node? Number of Running Nodes: 1
- Client Language and Version: Python, 3.26.2
- Multitenancy?: no