HIgh Cpu Usage

Description

I have total of 2.5 millions objects in my weaviate collection where i am querying for each doc_id using the code below.I am facing cpu issues where the instance is shutting down if i do 500 queries in parallel. what all can be done to optimise the query performance for this instance before scaling it to more number of nodes ?

query_property = "parent_text"
response: GenerativeSearchReturnType = await chunks_collection.query.hybrid(
                            query=query,
                            vector=vector,
                            alpha=0.5,
                            limit=10,
                            query_properties=[query_property],
                            return_metadata=wq.MetadataQuery(distance=True, score=True, explain_score=True),
                            filters=wq.Filter.by_property("doc_id").equal(doc_id)

Server Setup Information

  • Weaviate Server Version: weaviate:1.28.4
  • Deployment Method: docker
  • Multi Node? 1
  • Client Language and Version: python Version: 4.10.2
  • Multitenancy?: No

Any additional Information

Adding my docker-compose.yaml from ec2 r6i.2xlarge (8vcpu and 64gb ram)
Total data is around 80gb

sudo du -sh /var/lib/docker/volumes/ubuntu_weaviate_data/_data
80G     /var/lib/docker/volumes/ubuntu_weaviate_data/_data
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.28.4
    deploy:
      resources:
        limits:
          cpus: '7.0'
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_API_BASED_MODULES: 'true'
      ENABLE_MODULES: 'text-embedding-3-small,text2vec-ollama,generative-ollama'
      CLUSTER_HOSTNAME: 'node1'
      LIMIT_RESOURCES: 'true'
      LOG_LEVEL: 'info'
      GOMAXPROCS: 7
      GOMEMLIMIT: '48Gib'
volumes:
  weaviate_data:

Hi @Rajat_m7

Your CPU overload with 500 parallel queries is expected however you’ll need more nodes as load balancer will help distribute traffic properly. A single node would struggle to handle 500 parallel queries.

Can’t you split queries and run in loops with sequential queries, or at minimum process in smaller batches (50-100 at a time) instead of 500 simultaneously? This prevents resource exhaustion on your single node.

Best use of the client is to implement Singleton Pattern for Client Connection, reused Weaviate client connection across all queries. See this explanation: using the singleton pattern
You can see a simple practical implementation which you can build on top of: https://github.com/Shah91n/WeaviateDB-Docs-Snippets-Python-Client/blob/main/Singleton_Pattern_Python_Weaviate_Client.ipynb

Please make sure to update your client to latest 4.14.4 & Weaviate to 1.30.4 as you are outdated in both.

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)

Hi @Mohamed_Shahin ,
Hi Mohamed,

Thanks for the suggestions – I’ll try them out.

I also wanted to ask about the ideal storage for multiple nodes. We have roughly 80 GB of data and 2.5 million documents currently on single EC2 which is going to increase

Does S3 offer comparable performance to EBS for this kind of setup?

Thanks
Rajat

Hey @Rajat_m7,

Based on what I know so far and learned—and definitely open to more insights if others have deeper expertise—I’d recommend starting with EBS gp3 volumes. Weaviate generally performs well with gp3.

Start with gp3 EBS per node, monitor your actual IOPS and throughput usage, and scale up only if needed.

Also, here are a few resources that might be helpful:

Weaviate Disk Storage Calculator

Weaviate UI Tool (Open Source)

Best regards,
Mohamed Shahin
Weaviate Support Engineer
(Ireland, GMT/UTC timezone)