Runtime error: makeslice: len out of range

ilsg · January 19, 2025, 9:45am

Description

Hello, I started using your database, at first everything was fine, but after a long recording of vectors(I recorded 1730168 vectors, disk space Estimated Sizes LSM stores 333Gb), it started to give an error in the logs:

{"action":"cyclemanager","build_git_commit":"","build_go_version":"go1.22.0","build_image_tag":"","build_wv_version":"1.28.2","callback_id":"segmentgroup/compaction//home/user/rdata/weaviate/tksad/K5j5EQ9XTNCU/lsm/objects","callbacks_id":"store/compaction/..","class":"Tksad","index":"tksad","level":"error","msg":"callback panic: runtime error: makeslice: len out of range","shard":"K5j5EQ9XTNCU","time":"2025-01-19T07:56:44Z"}

I thought after rebooting the database would be cured and the error would disappear, but this did not happen.

Server Setup Information

I use a multi-node configuration, I end up with 2 nodes, each running on its own physical server.
Configuration of each server:

128gb memory ram ddr4
xeon 2678v3
4tb ssd nvme on raid0
There were no such errors on the second node.
I launch it manually via binary files, here is an example of launching:

export LOG_LEVEL="trace"
export CLUSTER_HOSTNAME="wv1"
export CLUSTER_GOSSIP_BIND_PORT=7100
export CLUSTER_DATA_BIND_PORT=7101
export AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true
export PERSISTENCE_DATA_PATH="/home/user/rdata/weaviate"
export ASYNC_INDEXING=true
export RAFT_BOOTSTRAP_EXPECT=1
export RAFT_JOIN="wv1:8300"
export RAFT_BOOTSTRAP_TIMEOUT=3600
export PROMETHEUS_MONITORING_ENABLED=true
export LIMIT_RESOURCES=true
export TOMBSTONE_DELETION_MIN_PER_CYCLE=30000
export TOMBSTONE_DELETION_MAX_PER_CYCLE=300000
export QUERY_DEFAULTS_LIMIT=40
export PERSISTENCE_LSM_MAX_SEGMENT_SIZE="100GB"
export REPLICATION_MINIMUM_FACTOR=1

export IMAGE_INFERENCE_API="http://192.168.88.246:8111"
export DEFAULT_VECTORIZER_MODULE="img2vec-neural"
export ENABLE_MODULES="img2vec-neural"
export GO_PROFILING_DISABLE=true

/home/user/app/weaviate --host 0.0.0.0 --port 8080 --scheme http

Why does this error occur?
And how does it affect the functionality of the database?
How can I fix this error?

jeronimo_irazabal · January 20, 2025, 1:31pm

Hello @ilsg, the error log seems to indicate there is a corrupted .db file at /home/user/rdata/weaviate/tksad/K5j5EQ9XTNCU/lsm/objects. if such .db file does not have an associated .wal file it may indicate the file got corrupted after being successfully written on disk.
There may not be a way to recover such file in an isolated manner but restoring from a backup or by moving out such .db file from that path in a multi-node setup the data will be replicated automatically as the data is queried (there is a replication mechanism called read-repair).

There is an ongoing effort already to identify integrity checking on .db files so to automatically detect this situation in a better way.

ilsg · January 20, 2025, 1:36pm

And how will this file damage affect the operation of the database itself?
Because now, the database works normally, it also writes data and performs searches.
Maybe there is a way to delete this damaged file?
Unfortunately, I did not have a replica for this collection in a multi-node configuration to restore it.

jeronimo_irazabal · January 20, 2025, 1:46pm

if the collection was created with replication factor greater than one, the same data will be stored in other nodes, in such a case, removing the corrupted .db files may be the simplest solution. If a backup is not available and replication factor is one, it would mean the inserted data stored in such files wont be recoverable and re-ingestion will be required.
Currently, having a corrupted file could generate that kind of issues you are seeing, probably preventing the operations to succeed, so I’d recommend to remove such file. In that log line the filename is not shown but this situations will be handled in a better manner once integrity checking in such type of files is completed.

note: if possible, it would better to use a three nodes setup as it will be possible to continue normal operations if a node goes down.

ilsg · January 20, 2025, 1:52pm

Am I right in understanding that right now there is no way to find out which file is damaged?
Also, do I understand correctly that the newly inserted data will work correctly in the database?

jeronimo_irazabal · January 20, 2025, 1:56pm

newly inserted data will work correctly. currently you may identify which is the corrupted .db file based on some error log lines, in the compaction one it’s not shown but if you pursue a search e.g. for a non-existing object uuid it may attempt to read from all the .db files and such error may appear (including the filename)

ilsg · January 20, 2025, 2:08pm

OK, thank you for your prompt assistance in resolving my issue.

Topic		Replies	Views
Weaviate crashing because of following error, can someone help on this Support	6	523	July 24, 2024
Panic: index out of range [-1] runtime errors in log Support	4	727	March 20, 2024
Hnsw load commit log corruption error Support	8	498	March 14, 2024
Constant storage drain without increasing number of vectors Support bug	21	657	July 22, 2024
Startup failure with "mmap file: invalid argument" error on v1.24.6 General	9	632	April 17, 2024

Runtime error: makeslice: len out of range

Description

Server Setup Information

Related topics