Sorting by CreationTime extremely slow

Description

This call below takes such a long time that it timesout. Even with long timeouts. I think this should be extremely fast seeing as though there is not even a hybrid search - this is just trying to get the last 10 items.

It should be near instant, no? Perhaps I am using this wrong? Do I need to specify or add a creationTime index? Also I do have 100 million 384d objects. So it is a lot - but I assume I still must be missing something.

let builder = this.client.graphql
                .get()
                .withClassName("Episode")
                .withFields("title, description,  _additional{ id, creationTimeUnix }")
                .withLimit(options.limit)
                .withOffset(options.page * options.limit);
                builder = builder.withSort([{ path: ['_creationTimeUnix']}])
            
            builder.do().then((res) => {
                console.log(res);
            }).catch((e) => {
                reject(e);
            });

Server Setup Information

  • Weaviate Server Version: 1.24.12
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python/Go/Nodejs (Query in Nodejs)

Any additional Information

weaviate-weaviate-1  | {"action":"restapi_request","level":"debug","method":"POST","msg":"received HTTP request","time":"2024-05-17T15:11:29Z","url":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/graphql","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_start","jobs":1,"level":"debug","msg":"starting job list deduplication","time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_complete","jobs":1,"level":"debug","msg":"completed job list deduplication","removedJobs":0,"time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_start","jobs":1,"level":"debug","msg":"starting job list deduplication","time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_complete","jobs":1,"level":"debug","msg":"completed job list deduplication","removedJobs":0,"time":"2024-05-17T15:11:29Z"}

hi @msj242 !

I believe that with that amount of objects, sharding your data in multiple nodes can help in this scenario.

I will ask internally about this as this is an interesting case.

Thanks!

@DudaNogueira

Thanks! If so, I will look into sharding - As a patch, in the mean time I’ll keep a simple sqllite of recent uploads to speed things up for myself.

@msj242 one thing to consider:

Weaviate does not use any sorting-specific data structures on disk. When objects are sorted, Weaviate identifies the object and extracts the relevant properties. This works reasonably well for small scales (100s of thousand or millions of objects). It is expensive if you sort large lists of objects (100s of millions, billions). In the future, Weaviate may add a column-oriented storage mechanism to overcome this performance limitation.

So I believe that, when possible, filtering out a small dataset of objects,and then sorting them out can improve performance too.

Let me know if this helps!

Thanks!

1 Like

@DudaNogueira Thanks! I will keep this in mind in the future.