Queue size doesn't go down when enabling async indexing

joris · April 11, 2024, 8:26am

Description

After a while, between 2 á 3 days, our queue size doesn’t go down anymore. Only after we restart all the nodes, the queue size will start to go down again. We can’t seem to figure out what causes this issue.

Server Setup Information

We use the Weaviate Helm Chart and have a cluster of three shards/nodes. Weaviate runs on version 1.24.8 with async indexing enabled.

We have a few classes, some with a flat index with BQ enabled and some with a HNSW index with PQ enabled.

Any additional Information

The logs don’t tell us much. But, here are some of the messages:

{"action":"lsm_init_disk_segment_build_bloom_filter_primary","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"building bloom filter took 2.598592ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/BManr2wt8USp/lsm/property_content_searchable/segment-1712816098891895866.db","shard":"BManr2wt8USp","time":"2024-04-11T06:39:18Z","took":2598592}
{"action":"lsm_memtable_flush_complete","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"flush and switch took 23.637717ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/BManr2wt8USp/lsm/property_content_searchable","shard":"BManr2wt8USp","time":"2024-04-11T06:39:18Z","took":23637717}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_primary","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"building bloom filter took 3.368174ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/20Q0qcsCETGR/lsm/objects/segment-1712814132644892921_1712816100089725629.db","shard":"20Q0qcsCETGR","time":"2024-04-11T06:39:29Z","took":3368174}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_secondary","class":"Bp117_47ca231a_documentChunks__v1","index":"bp117_47ca231a_documentchunks__v1","level":"debug","msg":"building bloom filter took 3.22687ms\n","path":"/var/lib/weaviate/bp117_47ca231a_documentchunks__v1/20Q0qcsCETGR/lsm/objects/segment-1712814132644892921_1712816100089725629.db","secondary_index_position":0,"shard":"20Q0qcsCETGR","time":"2024-04-11T06:39:29Z","took":3226870}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_primary","class":"Bp45_2d14fbda_documentChunks__v1","index":"bp45_2d14fbda_documentchunks__v1","level":"debug","msg":"building bloom filter took 3.035818ms\n","path":"/var/lib/weaviate/bp45_2d14fbda_documentchunks__v1/vSA9hYQUdjh8/lsm/property__id/segment-1712642602769990789_1712815595595119886.db","shard":"vSA9hYQUdjh8","time":"2024-04-11T08:07:42Z","took":3035818}
{"action":"lsm_precompute_disk_segment_build_bloom_filter_primary","class":"Bp45_2d14fbda_documentChunks__v1","index":"bp45_2d14fbda_documentchunks__v1","level":"debug","msg":"building bloom filter took 4.835406ms\n","path":"/var/lib/weaviate/bp45_2d14fbda_documentchunks__v1/vSA9hYQUdjh8/lsm/property__id/segment-1712192964717392978_1712815595595119886.db","shard":"vSA9hYQUdjh8","time":"2024-04-11T08:07:45Z","took":4835406}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-2 10.42.11.111:7000","time":"2024-04-11T08:24:46Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-1 10.42.7.246:7000","time":"2024-04-11T08:24:46Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.11.111:49312","time":"2024-04-11T08:24:46Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.7.246:41560","time":"2024-04-11T08:24:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-0 10.42.11.110:7000","time":"2024-04-11T08:24:49Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-2 10.42.11.111:7000","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.11.110:58486","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-1 10.42.7.246:7000","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.11.111:52868","time":"2024-04-11T08:25:16Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.42.7.246:44076","time":"2024-04-11T08:25:19Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-2 10.42.11.111:7000","time":"2024-04-11T08:25:19Z"}

Does anyone else experience this issue?

DudaNogueira · April 12, 2024, 1:19pm

Hi @joris !

I have not seen this kind of issue.

Have it persisted or eventually cleared the queue?

joris · April 12, 2024, 1:50pm

Hi @DudaNogueira,

My guess is that it’s probably more an issue with retrieving the queue size, than an issue with the actual queue itself.

We use the RESTful API nodes/[schema] endpoint to check the queue size of all shards (vectorQueueLength). The value of this property doesn’t change after a while. But, when we restart Weaviate, the queue size seems to be instantly reduced to zero. So, it looks like the endpoint returning a queue size that isn’t the actual size.

Maybe it has something to do with caching?

Topic		Replies	Views
Upload without indexing General	7	1109	May 12, 2024
Helm cluster, node spontaneously stops and restarts, shard unavailable for 1 min Support	5	861	December 4, 2023
Increasing Backup Size and Duration for Weaviate Index Support bug , developer-experience , python	1	359	July 3, 2024
Weaviate Cluster OOMs and Recovery Support	4	960	June 14, 2023
Weaviate cluster is very unstable (1.29.2) Support	8	689	April 9, 2025

Queue size doesn't go down when enabling async indexing

Description

Server Setup Information

Any additional Information

Related topics