Error when Data Stored in AWS EFS

I have created a Weaviate Setup in AWS ECS and create auto-scaling. Due to spike load, the weaviate instance restarted and now, I am getting below error which accessing the data.

{
“data”: {
“Get”: {
“WeaviateDemo3”: null
}
},
“errors”: [
{
“locations”: [
{
“column”: 6,
“line”: 1
}
],
“message”: “explorer: list class: search: object search at index weaviatedemo3: local shard object search weaviatedemo3_PXksOv6syq7H: Unable to load shard PXksOv6syq7H: init shard "weaviatedemo3_PXksOv6syq7H": init shard "weaviatedemo3_PXksOv6syq7H": shard db: create objects bucket: init disk segments: init segment segment-1719829104648996364.db: mmap file: invalid argument”,
“path”: [
“Get”,
“WeaviateDemo3”
]
}
]
}

@DudaNogueira @jphwang

Hi!

Do you have any logs from the server side?

Note that, for Weaviate, as long as there is writable disk space, it should work properly.

I found below in logs. Please let me know if this helps.

{"action":"lsm_memtable_flush","class":"WeaviateDemo3","error":"flush: unlinkat /var/lib/weaviate/weaviatedemo3/PXksOv6syq7H/lsm/objects/segment-1719829104648996364.scratch.d: directory not empty","index":"weaviatedemo3","level":"error","msg":"flush and switch failed","path":"/var/lib/weaviate/weaviatedemo3/PXksOv6syq7H/lsm/objects","shard":"PXksOv6syq7H","time":"2024-07-01T10:18:50Z"}

@DudaNogueira

Seems that it raised an error due to this folder not being empty :thinking:

PLease, when opening technical support thread, select the Support category and answer this:

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:

Mainly: what version are you runnig?

for reference, this error traces back to:

Thanks for your response.

1 Like

I did some further research on this as we encountered the same problem (Some objects not readable after batch import / flush and switch failed - Support - Weaviate Community Forum).

The only code path I was able to find that could lead to this exact error output (other calls to os.remove() / os.removeAll should have more output in the log) is:
weaviate/adapters/repos/db/lsmkv/bucket.go at main · weaviate/weaviate (github.com)
weaviate/adapters/repos/db/lsmkv/memtable_flush.go at main · weaviate/weaviate (github.com)
weaviate/adapters/repos/db/lsmkv/segmentindex/indexes.go at main · weaviate/weaviate (github.com)

So in the end os.removeAll() is called on the scratch directory, which is failing with “directory not empty”.

Doing a search on when that would happen I came up with
os: “os.RemoveAll” sometimes returns error “remove files: directory not empty” · Issue #23452 · golang/go (github.com), which provokes the question: are there any goroutines potentially still accessing the scratch directory while trying to remove it?

Hope this helps a bit in figuring out the root cause of this.

1 Like

hi @andrewisplinghoff !! THanks for this.

I have escalated this issue with our core developers.

Hope to bring more info soon.

Thanks!

1 Like

Thanks @andrewisplinghoff for looking into it.

@DudaNogueira Whenever you get any update from the developers, then please let us know as it is critical for us.
Thanks in advance.

Hi! I have asked our team to take concentrate the discussion on this issue as it seems related (unless it is not):

Hi @DudaNogueira
Could you please help us with this issue as we are again facing the same issue, and it is blocking our development.

hi @saurbhhsharrma !

Are all the backups failing? Can you reproduce this scenario on a fresh, clean install?

We will release a feature to cancel the backups in 1.27 (that may be also backported to previous versions)

this can help this scenario while we identify the root cause of those issues.

Let me know if this helps!