Increase in CPU after Query , Eventually Crashing

Hi Team

Ran into a very strange issue where the weaviate CPU kept on increasing even when there were no Indexing and Very less queries.

After exhaustive debugging found out a specific query causing this. And even more strange is that the query time out with GPRC deadline Exceeded error but the cpu kept on increasing.

Any help is appreciated as its causing issues in our production.

Upgraded weaviate to 1.32.15 and its reproducible.
9: 15: New Index Created.
9:35 : Searched with the Query and cpu increased.

Same query with single filter works but together is causing timeout.

Query: Hybrid Search

        response = collection.query.hybrid(
            query=query,
            query_properties=['product_id', 'tokens^3', 'brands^2'],
            vector=dense_vector,
            target_vector='default',
            alpha=0.5,
            filters=Filter.all_of([ Filter.by_property("brand").contains_any(brands),Filter.by_property("seller").contains_any(sellers)])
            bm25_operator= BM25OperatorOr(minimum_should_match=2),
            return_properties=['productid', 'Tag', 'Category'],
            return_metadata=MetadataQuery(score=True, explain_score=True),
            max_vector_distance=0.8,
            fusion_type=HybridFusion.RANKED ,
            limit=24,
        )
     

import weaviate
from weaviate.classes.config import Configure,VectorDistances, VectorFilterStrategy, ReplicationDeletionStrategy
client = weaviate.connect_to_local()
client.collections.create(
    "Test",
    vectorizer_config=[
        Configure.NamedVectors.none(
            name="default",
            vector_index_config=Configure.VectorIndex.hnsw(
                distance_metric=VectorDistances.COSINE,
                ef=170,
                ef_construction=256,
                max_connections=32,
                filter_strategy=VectorFilterStrategy.ACORN
            )
        )
    ],
    #properties=properties,
    inverted_index_config=Configure.inverted_index(
        bm25_b=0.7,
        bm25_k1=1.25,
        index_null_state=True
    ),
    replication_config=Configure.replication(
        factor=3,
        async_enabled=True,
        deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION,
    ),
    sharding_config=Configure.sharding(
        desired_count=1,
    )
)

Server Logs:

{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.100.3.1:58766: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:40:51Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.148.51.16:41215: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:40:54Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.148.51.17:3854: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:40:54Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.148.51.16:65086: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:40:58Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.100.3.1:24344: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:41:09Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.148.51.12:31288: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:41:18Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.148.51.6:10653: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:41:22Z"}
{"build_git_commit":"aab5e68","build_go_version":"go1.24.9","build_image_tag":"v1.32.15","build_wv_version":"1.32.15","description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 10.100.3.17:8080-\u003e10.148.51.12:20092: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"consistency_level=ALL","Fragment":"","RawFragment":""},"time":"2025-10-30T15:41:27Z"}

hi @apattnaik !

Is this reproducible on a new server with any dataset or only on this dataset?

If possible, I would love to have a reproducible code from end to end so we can raise this to our core team.

Thanks!

hey @DudaNogueira
I am able to reproduce this on our current dataset. Unfortunately I cannot share this data as its restricted.

I can share system logs.

The issue seems to be with bm25 search and hybrid search with min match Operator . Both show same behaviour.
We have disabled it for now. But would like to find a workaround or solution for this.

As we have use case for min match and its crashing our cluster. It shouldn’t crash the database which is quite concerning.


Is there any setting which can kill this process?
Its one query which has let to this cpu usage and hasn’t even came down after nearly a day.

Hey @apattnaik

I’m currently trying to reproduce the issue with some test data, but it would help to have more information.

Can you get a go routine dump and profile from the period where the CPU is elevated from the broken BM25 OR queries?

Unfortunately, until we find the issue, the only way to make the CPU usage go down, you’ll need to restart.

Is this server part of WCS or self hosted?

Thanks!

Hi @amourao
Will try to get the dump over the weekend as dev use it during the week.
The Server is self hosted and managed in K8 GCP.

Is there any reference for taking go routine dump from k8?

Thanks for the availability!

To get the profiles, you’ll need to port forward port 6060 and run some curl commands to get the go routine and other CPU dumps.

Port forward 6060

(not sure if there are any GCP specific instructions, but for K8 you do)

kubectl port-forward <your pod name, eg weaviate-0> 6060:6060

Most important profiles

Goroutine dump (human-readable)

curl -o goroutines.txt "http://localhost:6060/debug/pprof/goroutine?debug=2"

Goroutine dump (binary format)

curl -o goroutines.pprof "http://localhost:6060/debug/pprof/goroutine"

CPU profile (30s sample)

curl -o cpu.pprof "http://localhost:6060/debug/pprof/profile?seconds=30"

(optional profiles, but may convey useful information)

Heap memory profile

curl -o heap.pprof "http://localhost:6060/debug/pprof/heap"

Block profile

curl -o block.pprof "http://localhost:6060/debug/pprof/block"

Mutex profile

curl -o mutex.pprof "http://localhost:6060/debug/pprof/mutex"

Let me know if you have any more questions.

1 Like

Hey @amourao
Shared the files on slack.

2 Likes

Hi

Looking at your query, you’re combining:

  • Hybrid search (BM25 + vector)

  • Two contains_any filters (brand and seller)

  • minimum_should_match=2 on BM25

  • ACORN filter strategy

The problem is likely that contains_any with multiple values creates a huge candidate set, and when combined with hybrid search, it’s computationally expensive. The query times out on the client side, but the server keeps processing it.

Fixes you can Try

1. Add proper timeouts on server side

Add these flags when starting Weaviate:

--read-timeout=30s

2. Simplify your filters

Try testing with just one filter first:

# Test with only brand filter
filters=Filter.by_property("brand").contains_any(brands)

# Then test with only seller filter
filters=Filter.by_property("seller").contains_any(sellers)

See if one of them is causing the issue.

3. Try a different fusion type

fusion_type=HybridFusion.RELATIVE_SCORE # instead of RANKED

4. Reduce the filter values

How many brands and sellers are you passing? Try limiting to top 3-5:

filters=Filter.all_of([
Filter.by_property("brand").contains_any(brands[:3]),
Filter.by_property("seller").contains_any(sellers[:3])
])

5. Switch filter strategy

In your collection config, try changing from ACORN to SWEEPING:

filter_strategy=VectorFilterStrategy.SWEEPING

Things you can check

  1. How many brand and seller values are you typically passing in the contains_any?

  2. What’s the total number of objects in your collection?

  3. Can you try the query with just the vector search (no BM25) and see if it still happens?

Also noticed your logs show batch import timeouts - are you indexing data while running this query? That could make things worse.

Let me know what you find and we can narrow it down further!

Best, Chaitanya