Description
Today we performed migration to Weaviate 1.25.1 (from 1.24.14).
We’ve got PQ turned on.
After a painful starting process (PQ takes a lot of resources on each restart, which impacts the pod discovery, pings, replication overall) we noticed that weaviate-0
has problems with establish replication between weaviate-1
and weaviate-2
. Only weaviate-0
has panics described with logs below.
Server Setup Information
- Weaviate Server Version: 1.25.1
- Deployment Method: k8s
- Multi Node? Number of Running Nodes: 3 pods x 2 shards
- Client Language and Version:
Any additional Information
{"action":"raft","fields.time":500564529,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-1","time":"2024-05-27T11:29:49Z"}
{"action":"raft","fields.time":500527033,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:30:05Z"}
{"level":"error","msg":" memberlist: Failed fallback TCP ping: timeout 1s: read tcp 10.9.20.27:60186-\u003e10.9.23.39:7000: i/o timeout","time":"2024-05-27T11:30:05Z"}
{"level":"info","msg":" memberlist: Suspect weaviate-2 has failed, no acks received","time":"2024-05-27T11:30:05Z"}
{"action":"raft","fields.time":960353269,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:30:05Z"}
{"action":"raft","fields.time":500028474,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:31:23Z"}
{"action":"raft","fields.time":999288101,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-2","time":"2024-05-27T11:31:23Z"}
{"action":"raft","fields.time":500106474,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-1","time":"2024-05-27T11:31:23Z"}
{"action":"raft","fields.time":500047267,"level":"warning","msg":"raft failed to contact","server-id":"weaviate-1","time":"2024-05-27T11:31:41Z"}
{"class":"AISkillV2","level":"error","msg":"[{ broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas} { broadcast: cannot reach enough replicas}]","op":"put.deletes","shard":"1efo8r43N3tC","time":"2024-05-27T11:45:02Z"}
{"level":"error","msg":"Recovered from panic: runtime error: index out of range [0] with length 0, local variables [[]], additional localVars []\n","panic":"runtime error: index out of range [0] with length 0","time":"2024-05-27T11:50:34Z"}
goroutine 966898 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x5e
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x13
github.com/weaviate/weaviate/entities/errors.(*ErrorGroupWrapper).setDeferFunc.func1({0xc002d8bd00, 0x1, 0x1})
/go/src/github.com/weaviate/weaviate/entities/errors/error_group_wrapper.go:74 +0x145
panic({0x1b6af00?, 0xc8d0026918?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/weaviate/weaviate/adapters/repos/db/priorityqueue.(*Queue[...]).Top(...)
/go/src/github.com/weaviate/weaviate/adapters/repos/db/priorityqueue/queue.go:63
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).processRecursively(0xc005dc7c50, 0x1b02fea, 0xc002903080, {{0xcd763b6000, 0x2951c43, 0x2951c43}}, 0x0, 0xffffffffffffffef)
/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:148 +0x685
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).doAtLevel(0xc005dc7c50, 0x0)
/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:207 +0xb4d
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*neighborFinderConnector).Do(0xc005dc7c50)
/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:81 +0x47
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reconnectNeighboursOf(0xc004503860?, 0x0?, 0x0?, {0x0?, 0x20e5258?, 0xc2d07e3e90?}, {0x20e5258?, 0xc2d07e3e90?}, 0xc39d991809?, 0x3, ...)
/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/neighbor_connections.go:43 +0xcb
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighbor(0xc002234000, 0x1fe0e19, {0x20f9c40, 0xca70b10330}, 0xe92d85?)
/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/delete.go:463 +0x805
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighborsOf.func1()
/go/src/github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw/delete.go:339 +0x199
github.com/weaviate/weaviate/adapters/repos/db/vector/hnsw.(*hnsw).reassignNeighborsOf.(*ErrorGroupWrapper).Go.func2()
/go/src/github.com/weaviate/weaviate/entities/errors/error_group_wrapper.go:88 +0x97
golang.org/x/sync/errgroup.(*Group).Go.func1()
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 965154
/go/pkg/mod/golang.org/x/sync@v0.6.0/errgroup/errgroup.go:75 +0x96
{"action":"hnsw_tombstone_cleanup","error":"panic occurred: runtime error: index out of range [0] with length 0","level":"error","msg":"tombstone cleanup errord","time":"2024-05-27T11:50:34Z"}