Panic after migration to 1.25.1

Description

Today we performed migration to Weaviate 1.25.1 (from 1.24.14).

We’ve got PQ turned on.

After a painful starting process (PQ takes a lot of resources on each restart, which impacts the pod discovery, pings, replication overall) we noticed that weaviate-0 has problems with establish replication between weaviate-1 and weaviate-2. Only weaviate-0 has panics described with logs below.

Server Setup Information

  • Weaviate Server Version: 1.25.1
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: 3 pods x 2 shards
  • Client Language and Version:

Any additional Information

{"action":"raft","fields.time":500564529,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-1","time":"2024-05-27T11:29:49Z"}
{"action":"raft","fields.time":500527033,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:30:05Z"}
{"level":"error","msg":" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 10.9.20.27:60186-\u003e10.9.23.39:7000: i/o timeout","time":"2024-05-27T11:30:05Z"}
{"level":"info","msg":" memberlist: Suspect weaviate-2 has failed, no acks received","time":"2024-05-27T11:30:05Z"}
{"action":"raft","fields.time":960353269,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:30:05Z"}
{"action":"raft","fields.time":500028474,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:31:23Z"}
{"action":"raft","fields.time":999288101,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:31:23Z"}
{"action":"raft","fields.time":500106474,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-1","time":"2024-05-27T11:31:23Z"}
{"action":"raft","fields.time":500047267,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-1","time":"2024-05-27T11:31:41Z"}

{"class":"AISkillV2","level":"error","msg":"[{ broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas}]","op":"put.deletes","shard":"1efo8r43N3tC","time":"2024-05-27T11:45:02Z"}
{"level":"error","msg":"Recovered from panic: runtime error: index out of range [0] with length 0, local variables [[]], additional localVars []\n","panic":"runtime error: index out of range [0] with length 0","time":"2024-05-27T11:50:34Z"}
goroutine 966898 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x5e
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x13
github.com/weaviate/weaviate/entities/errors.(*ErrorGroupWrapper).setDeferFunc.func1({0xc002d8bd00, 0x1, 0x1})
	/go/src/github.com/weaviate/weaviate/entities/errors/error_group_wrapper.go:74 +0x145
panic({0x1b6af00?, 0xc8d0026918?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/weaviate/weaviate/adapters/repos/db/priorityqueue.(*Queue[...]).Top(...)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/priorityqueue/queue.go:63
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).processRecursively(0xc005dc7c50, 0x1b02fea, 0xc002903080, {{0xcd763b6000, 0x2951c43, 0x2951c43}}, 0x0, 0xffffffffffffffef)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:148 +0x685
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).doAtLevel(0xc005dc7c50, 0x0)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:207 +0xb4d
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).Do(0xc005dc7c50)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:81 +0x47
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reconnectNeighboursOf(0xc004503860?, 0x0?, 0x0?, {0x0?, 0x20e5258?, 0xc2d07e3e90?}, {0x20e5258?, 0xc2d07e3e90?}, 0xc39d991809?, 0x3, ...)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:43 +0xcb
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighbor(0xc002234000, 0x1fe0e19, {0x20f9c40, 0xca70b10330}, 0xe92d85?)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/delete.go:463 +0x805
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighborsOf.func1()
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/delete.go:339 +0x199
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighborsOf.(*ErrorGroupWrapper).Go.func2()
	/go/src/github.com/weaviate/weaviate/entities/errors/error_group_wrapper.go:88 +0x97
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 965154
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
{"action":"hnsw_tombstone_cleanup","error":"panic occurred: runtime error: index out of range [0] with length 0","level":"error","msg":"tombstone cleanup errord","time":"2024-05-27T11:50:34Z"}

Have you used our latest v17.0.0 version of helm charts to perform upgrade?

Please note that migrating to v1.25 version from older versions requires extra steps to perform. All is described here.

Did you for example remove the stateful prior the upgrade to v1.25? This is a one time operation but is very crucial to successful upgrade process.

Have you used our latest v17.0.0 version of helm charts to perform upgrade?

Yes, and all of the replicas have the status HEALTHY, and the “synchronized” status is “true”.

Did you for example remove the stateful prior the upgrade to v1.25 ? This is a one time operation but is very crucial to successful upgrade process.

Yes, we removed STS prior the upgrade to 1.25. Moreover rest of our environments with similar setup but much less data passed the migration process successfully, without any errors.

After migrating to 1.25.2 we still see errors in the first replica (other two are fine):

{"action":"tombstone_cleanup_begin","class":"AISkillV2","level":"info","msg":"class AISkillV2: shard 1efo8r43N3tC: starting tombstone cleanup","shard":"1efo8r43N3tC","time":"2024-06-03T14:57:36Z","tombstones_in_cycle":30211032,"tombstones_total":30211032}
{"level":"error","msg":"Recovered from panic: runtime error: index out of range [0] with length 0, local variables [[]], additional localVars []\n","panic":"runtime error: index out of range [0] with length 0","time":"2024-06-03T14:58:33Z"}
goroutine 1913883 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x5e
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x13
github.com/weaviate/weaviate/entities/errors.(*ErrorGroupWrapper).setDeferFunc.func1({0xc017eba330, 0x1, 0x1})
	/go/src/github.com/weaviate/weaviate/entities/errors/error_group_wrapper.go:74 +0x145
panic({0x1b73ac0?, 0xc7b9affbc0?})
	/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/weaviate/weaviate/adapters/repos/db/priorityqueue.(*Queue[...]).Top(...)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/priorityqueue/queue.go:63
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).processRecursively(0xc10b11cb38, 0x1b02fea, 0xc0607ab160, {{0xcdb3904000, 0x2951c43, 0x2951c43}}, 0x0, 0xffffffffffffffef)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:156 +0x6f7
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).doAtLevel(0xc00771db38, 0x0)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:220 +0xb51
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).Do(0xc00771db38)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:83 +0x47
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reconnectNeighboursOf(0x101010043472a?, 0x0?, 0x2?, {0x0?, 0xffffffffffffffff?, 0xffffffffffffffff?}, {0x20ef470?, 0xc0654d8cc0?}, 0x1?, 0x3, ...)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:45 +0xcb
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighbor(0xc01476d400, 0x1fe0e19, {0x2104ae0, 0xc19a3e8910}, 0xc040497538?)
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/delete.go:521 +0x47a
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighborsOf.func1()
	/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/delete.go:421 +0x1a8
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighborsOf.(*ErrorGroupWrapper).Go.func2()
	/go/src/github.com/weaviate/weaviate/entities/errors/error_group_wrapper.go:88 +0x97
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1913874
	/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
{"action":"hnsw_tombstone_cleanup","error":"panic occurred: runtime error: index out of range [0] with length 0","level":"error","msg":"tombstone cleanup errord","time":"