Queue size doesn't go down when enabling async indexing

Description

After a while, between 2 á 3 days, our queue size doesn’t go down anymore. Only after we restart all the nodes, the queue size will start to go down again. We can’t seem to figure out what causes this issue.

Server Setup Information

We use the Weaviate Helm Chart and have a cluster of three shards/nodes. Weaviate runs on version 1.24.8 with async indexing enabled.

We have a few classes, some with a flat index with BQ enabled and some with a HNSW index with PQ enabled.

Any additional Information

The logs don’t tell us much. But, here are some of the messages:

{"action":"lsm_init_disk_segment_build_bloom_filter_primary","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"building bloom filter took 2.598592ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/BManr2wt8USp/lsm/property_content_searchable/segment-1712816098891895866.db","shard":"BManr2wt8USp","time":"2024-04-11T06:39:18Z","took":2598592}
{"action":"lsm_memtable_flush_complete","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"flush and switch took 23.637717ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/BManr2wt8USp/lsm/property_content_searchable","shard":"BManr2wt8USp","time":"2024-04-11T06:39:18Z","took":23637717}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_primary","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"building bloom filter took 3.368174ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/20Q0qcsCETGR/lsm/objects/segment-1712814132644892921_1712816100089725629.db","shard":"20Q0qcsCETGR","time":"2024-04-11T06:39:29Z","took":3368174}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_secondary","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"building bloom filter took 3.22687ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/20Q0qcsCETGR/lsm/objects/segment-1712814132644892921_1712816100089725629.db","secondary_index_position":0,"shard":"20Q0qcsCETGR","time":"2024-04-11T06:39:29Z","took":3226870}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_primary","class":"Bp45_2d14fbda_documentChunks__v1","index":"bp45_2d14fbda_documentchunks__v1","level":"debug","msg":"building bloom filter took 3.035818ms\n","path":"/var/lib/weaviate/bp45_2d14fbda_documentchunks__v1/vSA9hYQUdjh8/lsm/property__id/segment-1712642602769990789_1712815595595119886.db","shard":"vSA9hYQUdjh8","time":"2024-04-11T08:07:42Z","took":3035818}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_primary","class":"Bp45_2d14fbda_documentChunks__v1","index":"bp45_2d14fbda_documentchunks__v1","level":"debug","msg":"building bloom filter took 4.835406ms\n","path":"/var/lib/weaviate/bp45_2d14fbda_documentchunks__v1/vSA9hYQUdjh8/lsm/property__id/segment-1712192964717392978_1712815595595119886.db","shard":"vSA9hYQUdjh8","time":"2024-04-11T08:07:45Z","took":4835406}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-2 10.42.11.111:7000","time":"2024-04-11T08:24:46Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-1 10.42.7.246:7000","time":"2024-04-11T08:24:46Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.11.111:49312","time":"2024-04-11T08:24:46Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.7.246:41560","time":"2024-04-11T08:24:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-0 10.42.11.110:7000","time":"2024-04-11T08:24:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-2 10.42.11.111:7000","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.11.110:58486","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-1 10.42.7.246:7000","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.11.111:52868","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.7.246:44076","time":"2024-04-11T08:25:19Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-2 10.42.11.111:7000","time":"2024-04-11T08:25:19Z"}

Does anyone else experience this issue?

Hi @joris !

I have not seen this kind of issue.

Have it persisted or eventually cleared the queue?

Hi @DudaNogueira,

My guess is that it’s probably more an issue with retrieving the queue size, than an issue with the actual queue itself.

We use the RESTful API nodes/[schema] endpoint to check the queue size of all shards (vectorQueueLength). The value of this property doesn’t change after a while. But, when we restart Weaviate, the queue size seems to be instantly reduced to zero. So, it looks like the endpoint returning a queue size that isn’t the actual size.

Maybe it has something to do with caching?