Aggregate text slow on 50k plus objects

Hi, I have a collection with approximately 55K objects. I am using the multi2vec-clip as vectorizer. Each object contains a property named “collection” which refers to the image collection the object belongs to. Now I want get the total count of each collection. When I do a count for all objects, I get a response instantly. But when I i do a count for each collection it takes a very long time. To reproduce you can use the following GraphQL

{
      Aggregate  {
        Schema_name{
            collection{
                count
                type
                topOccurrences{
                    value
                    occurs
                }
                
            }   
        }
        
      }
    }

This is my schema

 schemaConfig = {
        'class': schema,  # class name for schema config in Weaviate (change it with a custom name for your images)
        'vectorizer': 'multi2vec-clip',
        'vectorIndexType': 'hnsw',
        "moduleConfig": {
            "multi2vec-clip": {
                "imageFields": [
                    "image"
                ],
                "textFields": [
                    "metadata",
                    "metadata_string",
                    "title",
                    "url",
                    "collection"
                ],

            },
            "generative-openai": {
                "model": "gpt-3.5-turbo"
            },
        },
        'properties': [
            {
                'name': 'image_id',
                'dataType': ['text']
            },
            {
                'name': 'image',
                'dataType': ['blob']
            },
            {
                'name': 'metadata',
                'dataType': ['text[]']
            },
            {
                'name': 'metadata_string',
                'dataType': ['text']
            },
            {
                'name': 'title',
                'dataType': ['text']
            },
            {
                'name': 'url',
                'dataType': ['text']
            },
            {
                'name': 'handle',
                'dataType': ['text']
            },
            {
                'name': 'collection',
                'dataType': ['text']
            }
        ]
    }

Can someone shed some light why the performance is slow when trying to aggregate?

Thank you very much

Hi @ublrama !!

Welcome to our community :hugs:

Aggregate, AFAIK, is a CPU and DISK bound operations.

Do you have any metrics on this cluster?

Also, please, fill in the requested infos so we can understand better your scenario:

  • Weaviate Server Version:
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?:
    Thanks!

Thanks for your reply. Here is the additional information:

  • Weaviate Server Version: 1.25.3
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python 3.11 with Weaviate 4.6.3
  • Multitenancy?: No

Ok, do you only have those 50K objects?

Do you have any reading on memory/cpu usage?

Do you think we could come up with some dataset to replicate this?