'No space left on device' when both RAM and disk have >30% space

Hey all, I have a big dataset I’m working on importing and am getting the following error a little bit under halfway through the job:

{"action":"hnsw_tombstone_cleanup","error":"reassign neighbor edges: write /var/lib/weaviate/papermetadata/xVwwxf162NWy/main.hnsw.commitlog.d/1706657139: no space left on device","level":"error","msg":"tombstone cleanup errord","time":"2024-01-31T04:52:25Z"}

This happens more broadly as well:

{"action":"lsm_compaction","class":"PaperMetadata","error":"write keys: write individual node (left key): write /var/lib/weaviate/papermetadata/xVwwxf162NWy/lsm/property_ss_sha/segment-1706656245063195111.db.tmp: no space left on device","index":"papermetadata","level":"error","msg":"compaction failed","path":"/var/lib/weaviate/papermetadata/xVwwxf162NWy/lsm/property_ss_sha","shard":"xVwwxf162NWy","time":"2024-01-31T04:27:56Z"}
{"action":"lsm_compaction","class":"PaperMetadata","error":"write keys: write individual node (right key): write /var/lib/weaviate/papermetadata/xVwwxf162NWy/lsm/property_corpusid/segment-1706656245107345726.db.tmp: no space left on device","index":"papermetadata","level":"error","msg":"compaction failed","path":"/var/lib/weaviate/papermetadata/xVwwxf162NWy/lsm/property_corpusid","shard":"xVwwxf162NWy","time":"2024-01-31T04:27:56Z"}

But when I inspect the machines, I see that there is plenty of space in both RAM and disk:

[ssh ~]$ df -h
Filesystem        Size  Used Avail Use% Mounted on
devtmpfs          125G     0  125G   0% /dev
tmpfs             125G     0  125G   0% /dev/shm
tmpfs             125G  1.7M  125G   1% /run
tmpfs             125G     0  125G   0% /sys/fs/cgroup
/dev/nvme0n1p1    200G  3.2G  197G   2% /
/dev/nvme0n1p128   10M  3.8M  6.2M  38% /boot/efi
shm                64M     0   64M   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/db751f378089a381d0540223b9b05f47374e7d74d35d8a52525e1b787512a4d1/shm
shm                64M     0   64M   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/c2b52b3a7c2133a426f0c3103b456793ea59a8d83aa9790bfedae16a7fcb1888/shm
shm                64M     0   64M   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/c58032c298ad39a5849f68598f0eaeb79c20771de406bd97ec5ac9cfcac1f5aa/shm
shm                64M     0   64M   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/b4fbc5452e94c92b99e35846ca0c85e2d301490c151afd667c5ffd3378e38b9d/shm
shm                64M     0   64M   0% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/a3dd76a2f202836fd526bbe620492d6ae76e9bcde8b4b7bc6c6f5711161a3b95/shm
tmpfs              25G     0   25G   0% /run/user/1000
[ssh ~]$ df -hi /data
df: ‘/data’: No such file or directory
[ssh ~]$ df -hi
Filesystem       Inodes IUsed IFree IUse% Mounted on
devtmpfs            32M   306   32M    1% /dev
tmpfs               32M     1   32M    1% /dev/shm
tmpfs               32M   854   32M    1% /run
tmpfs               32M    16   32M    1% /sys/fs/cgroup
/dev/nvme0n1p1     100M   64K  100M    1% /
/dev/nvme0n1p128      0     0     0     - /boot/efi
shm                 32M     1   32M    1% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/db751f378089a381d0540223b9b05f47374e7d74d35d8a52525e1b787512a4d1/shm
shm                 32M     1   32M    1% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/c2b52b3a7c2133a426f0c3103b456793ea59a8d83aa9790bfedae16a7fcb1888/shm
shm                 32M     1   32M    1% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/c58032c298ad39a5849f68598f0eaeb79c20771de406bd97ec5ac9cfcac1f5aa/shm
shm                 32M     1   32M    1% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/b4fbc5452e94c92b99e35846ca0c85e2d301490c151afd667c5ffd3378e38b9d/shm
shm                 32M     1   32M    1% /run/containerd/io.containerd.grpc.v1.cri/sandboxes/a3dd76a2f202836fd526bbe620492d6ae76e9bcde8b4b7bc6c6f5711161a3b95/shm
tmpfs               32M     1   32M    1% /run/user/1000

On the RAM side, I’m having no issues:

What is the error?

Hi @Lakshya_Bakshi

This is indeed strange. How are you running your server? Directly from binary?

One thing I can think of would be a misconfigured DISK_USE_READONLY_PERCENTAGE but you would probably would see some other warnings and logs.

Also, what version are you running?

Hey @DudaNogueira

I am running it by docker container (deployed to my EKS cluster via helm chart). I have previously run it with mounted volumes that would definitely be too small and have since upgraded them to 400 GB volumes, to no avail. Could the docker container have limits on how much space it perceives in disk or RAM? I had removed all specifications for hardware constraints on the weaviate containers.

I have been testing against both 1.23.4 and 1.22.3 (I need the latter as my backup needed that version, but once the backup completes I’m only able to add ~1 mil more records before running into the issue)

@Lakshya_Bakshi

If you get a shell from inside that Weaviate container, whay is the df reading from there?