Hybrid failed after adjust the vector_cache_max_objects and ef

Description

Why context canceled ?

The current dataset contains about 300,000 records, with vector_cache_max_objects set to 20,000, 1536-dimensional vectors, and a fixed memory limit of 4 GB. I also set ef_construction to 128 and left ef at -1.
It took me a full day to delete the original collection, recreate it, and re-ingest all the data. However, today when I tried a hybrid search—passing the query vector directly in the request—it timed out with an error.

Error

{“action”:“hybrid”,“build_git_commit”:“6f11fca”,“build_go_version”:“go1.24.5”,“build_image_tag”:“v1.30.13”,“build_wv_version”:“1.30.13”,“error”:“explorer: get class: vector search: object vector search at index ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236: shard ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236_baXDRad9J6K3: vector search: knn search: search layer at level 0: context canceled”,“level”:“error”,“msg”:“hybrid failed”,“time”:“2025-12-08T05:42:07Z”}

I’ve tested it more than 10 times, and every single attempt resulted in the same issue. And also checked the resource usage—memory consumption is well under 3 GB, and CPU usage is only around 10%.

I tested the same setup on smaller collections, and everything works fine. The issue only occurs with this particular (larger) collection.

Collection Schema

{
  "class": "PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236",
  "description": "Class for PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    },
    "usingBlockMaxWAND": true
  },
  "multiTenancyConfig": {
    "autoTenantActivation": false,
    "autoTenantCreation": false,
    "enabled": false
  },
  "properties": [
    {
       ...
    }
  ],
  "replicationConfig": {
    "asyncEnabled": false,
    "deletionStrategy": "NoAutomatedResolution",
    "factor": 1
  },
  "shardingConfig": {
    "actualCount": 1,
    "actualVirtualCount": 128,
    "desiredCount": 1,
    "desiredVirtualCount": 128,
    "function": "murmur3",
    "key": "_id",
    "strategy": "hash",
    "virtualPerPhysical": 128
  },
  "vectorIndexConfig": {
    "bq": {
      "enabled": false
    },
    "cleanupIntervalSeconds": 300,
    "distance": "cosine",
    "dynamicEfFactor": 8,
    "dynamicEfMax": 500,
    "dynamicEfMin": 100,
    "ef": -1,
    "efConstruction": 128,
    "filterStrategy": "sweeping",
    "flatSearchCutoff": 40000,
    "maxConnections": 32,
    "multivector": {
      "aggregation": "maxSim",
      "enabled": false
    },
    "pq": {
      "bitCompression": false,
      "centroids": 256,
      "enabled": false,
      "encoder": {
        "distribution": "log-normal",
        "type": "kmeans"
      },
      "segments": 0,
      "trainingLimit": 100000
    },
    "skip": false,
    "sq": {
      "enabled": false,
      "rescoreLimit": 20,
      "trainingLimit": 100000
    },
    "vectorCacheMaxObjects": 20000
  },
  "vectorIndexType": "hnsw",
  "vectorizer": "none"
}

My Question
Why is the query taking so long? Could it be related to my configuration?

Server Setup Information

  • Weaviate Server Version: 1.30.13
  • Deployment Method: k8s
  • Multi Node? Number of Running Nodes: Single
  • Client Language and Version: Python 3.11.2
  • Multitenancy?: No

Any additional Information

Good morning,

I’ve come across this error before, and unfortunately, I strongly believe it’s related to a bug, so I recommend starting with an upgrade. As of today, our stable releases are 1.32.21, 1.33.9 and 1.34.4. Please upgrade to either 1.33 or 1.34, it’s best not to stay behind on older versions as well as you have an outdated Python client.

Your records aren’t particularly large, so the operations you’re running shouldn’t take long. However, you have limited memory and CPU resources. Another note, you are on a single node.

To put it simply, when you delete data, it consumes and pushes memory usage, which can create a bottleneck and cause noticeable slowdowns, even by a few points.

My advice is to:

  1. Upgrade your DB & Client
  2. Add a bit more resources if possible, then retry.

I expect the error will disappear after the upgrades unless there’s something corrupt in the schema or shards. Additionally, performance should improve.

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

Thanks! I just noticed the shard status is INDEXING—could this be one of the reasons it’s so slow?
Also, I have upgraded to the 1.33.9, but it’s been stuck in INDEXING status for a while—I’m not sure how much longer it’ll take.

I checked the node status, and the vectorQueueLength of the collection is not changed.
{“asyncReplicationStatus”:,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“compressed”:false,“loaded”:true,“name”:“baXDRad9J6K3”,“numberOfReplicas”:1,“objectCount”:294368,“replicationFactor”:1,“vectorIndexingStatus”:“INDEXING”,“vectorQueueLength”:139230}

Absolutely and that’s great to note! Indeed it’s reason as the data is not fully loaded yet. How long it will take depends on your resources and current load

What you can do is to run GET /v1/nodes?output=verbose (or use the Python client client.cluster.nodes(output="verbose")) and watch vectorQueueLength per shard… if it’s not decreasing over time, indexing may be stalled and you may need to investigate resource constraints (CPU, disk I/O).

Have you noticed any errors in the logs?

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

No errors appear in the logs. The vectorQueueLength has decreased to 125,635 over the past 8 hours. CPU usage remains very low (under 10%), memory usage is around 70%, and disk read I/O is reaching 130 MiB/s, while write I/O is nearly zero.

My environment variables:
CLUSTER_GOSSIP_BIND_PORT: 7000
CLUSTER_DATA_BIND_PORT: 7001
GOGC: 100
PROMETHEUS_MONITORING_ENABLED: true
GOMEMLIMIT: 2500MiB
QUERY_MAXIMUM_RESULTS: 100000
TRACK_VECTOR_DIMENSIONS: false
REINDEX_VECTOR_DIMENSIONS_AT_STARTUP: false
STANDALONE_MODE: true
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: true
CLUSTER_HOSTNAME: 8b4f4cd0f4c5
PERSISTENCE_DATA_PATH: /var/lib/weaviate
BACKUP_FILESYSTEM_PATH: /var/lib/backup
ENABLE_MODULES: backup-filesystem
DISABLE_GRAPHQL: true
HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE: false
ASYNC_INDEXING: true

What I have tried:

I have increased the GOMEMLIMIT to 3200MiB, vectorCacheMaxObjects to 300000 and restarted the weaviate, but the vectorQueueLength remains 125,635 after one hour.

The logs after restarted the weaviate:
{“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“level”:“info”,“msg”:“Completed loading shard ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236_baXDRad9J6K3 in 5.732693178s”,“time”:“2025-12-09T00:32:20Z”}
{“action”:“tombstone_cleanup_begin”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“info”,“msg”:“class PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236: shard baXDRad9J6K3: starting tombstone cleanup”,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T00:37:16Z”,“tombstones_in_cycle”:79344,“tombstones_total”:79344}

Anything else can I do to speed things up or resolve this?

The load of the disk seems too high

After increasing the memory to 6 GB and setting GOMEMLIMIT to 5000 MiB, read I/O started to decrease. However, CPU usage spiked to 100%—and even though indexing has finished, the CPU remains 100%.

The CPU usage didn’t drop even after I set LIMIT_RESOURCES = true and restarted the weaviate service. (It’s been going on for several hours now, and the CPU usage still hasn’t dropped.)

I have enabled the debug LOG_LEVEL, and the logs are:
{“action”:“lsm_segment_group_get_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:435049767,“error”:“deleted: deletion time 2025-12-07 09:59:21.142 +0000 UTC”,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:1,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:37Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:434685796,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:37Z”}
{“action”:“lsm_segment_group_get_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:435138503,“error”:“deleted: deletion time 2025-12-07 12:34:14.877 +0000 UTC”,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:1,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:37Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:433233923,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:37Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:431827182,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:37Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:295764178,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:293119592,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:292801237,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_get_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:290746148,“error”:“deleted: deletion time 2025-12-07 11:36:42.085 +0000 UTC”,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:1,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:293974348,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:295960675,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:297060693,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:295601327,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:295760262,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:293452023,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:296417465,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:296769169,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:296758272,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:295472172,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:295566039,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:297309949,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T05:30:38Z”}

So surprising.. CPU usage finally normalized, but I have no idea why it suddenly dropped to under 10% after roughly 8 hours. The dips in the middle on the graph were due to me restarting Weaviate.

That said, even though things are working normally now, I’m still worried that whatever caused this could happen again. I’ll open a separate issue to track this further.

BTW, the logs seems not so different with previous:
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:944818244,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:945519363,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:944845360,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:944648012,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_get_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:944901677,“error”:“deleted: deletion time 2025-12-07 12:03:25.121 +0000 UTC”,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:1,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_get_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:945068744,“error”:“deleted: deletion time 2025-12-07 15:09:14.927 +0000 UTC”,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:2,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:946117145,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}
{“action”:“lsm_segment_group_getbysecondary_individual_segment”,“build_git_commit”:“4aacef3”,“build_go_version”:“go1.24.11”,“build_image_tag”:“v1.33.9”,“build_wv_version”:“1.33.9”,“class”:“PtAi_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“duration”:945029182,“error”:null,“index”:“ptai_rag_39fa06cf_4178_e8bc_8172_9869630559c1_knowledge_68d5216dcff624278e17e236”,“level”:“debug”,“msg”:“waited over 100ms to get result from individual segment”,“segment_pos”:0,“shard”:“baXDRad9J6K3”,“time”:“2025-12-09T10:11:49Z”}

Oh…my, the same collection become INDEXING again, and it is stalled again. But this time, the memory is enough (I didn’t change anything since last time), the number of records is 294485.

Can you check on the throughput and IOPS as well to see if anything odd there?

Also can you Increase vector_cache_max_objects and restart the cluster then let me know?

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

The read I/O is ~128 MB/s, while write I/O is only a few KB/s—basically the same as before. That feels odd, because memory should be enough: the weaviate has 6 GB total and I’ve set GOMEMLIMIT=5 GB.
vector_cache_max_objects is 3,000,000, which I’d expect to be more than enough. I’ve also restarted the Weaviate service several times, but no luck.

One last thing if you can try please, if you can disable Lazy Shards (ENABLE_LAZY_SHARD_LOADING=false) and ASYNC_INDEXING = False. Then restart the DB and retry?