Sorting by CreationTime extremely slow

msj242 · May 17, 2024, 3:19pm

Description

This call below takes such a long time that it timesout. Even with long timeouts. I think this should be extremely fast seeing as though there is not even a hybrid search - this is just trying to get the last 10 items.

It should be near instant, no? Perhaps I am using this wrong? Do I need to specify or add a creationTime index? Also I do have 100 million 384d objects. So it is a lot - but I assume I still must be missing something.

let builder = this.client.graphql
                .get()
                .withClassName("Episode")
                .withFields("title, description,  _additional{ id, creationTimeUnix }")
                .withLimit(options.limit)
                .withOffset(options.page * options.limit);
                builder = builder.withSort([{ path: ['_creationTimeUnix']}])
            
            builder.do().then((res) => {
                console.log(res);
            }).catch((e) => {
                reject(e);
            });

Server Setup Information

Weaviate Server Version: 1.24.12
Deployment Method: Docker
Multi Node? Number of Running Nodes: 1
Client Language and Version: Python/Go/Nodejs (Query in Nodejs)

Any additional Information

weaviate-weaviate-1  | {"action":"restapi_request","level":"debug","method":"POST","msg":"received HTTP request","time":"2024-05-17T15:11:29Z","url":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/graphql","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_start","jobs":1,"level":"debug","msg":"starting job list deduplication","time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_complete","jobs":1,"level":"debug","msg":"completed job list deduplication","removedJobs":0,"time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_start","jobs":1,"level":"debug","msg":"starting job list deduplication","time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_complete","jobs":1,"level":"debug","msg":"completed job list deduplication","removedJobs":0,"time":"2024-05-17T15:11:29Z"}

DudaNogueira · May 17, 2024, 7:20pm

hi @msj242 !

I believe that with that amount of objects, sharding your data in multiple nodes can help in this scenario.

I will ask internally about this as this is an interesting case.

Thanks!

msj242 · May 17, 2024, 9:59pm

@DudaNogueira

Thanks! If so, I will look into sharding - As a patch, in the mean time I’ll keep a simple sqllite of recent uploads to speed things up for myself.

DudaNogueira · May 20, 2024, 2:28pm

@msj242 one thing to consider:

Weaviate does not use any sorting-specific data structures on disk. When objects are sorted, Weaviate identifies the object and extracts the relevant properties. This works reasonably well for small scales (100s of thousand or millions of objects). It is expensive if you sort large lists of objects (100s of millions, billions). In the future, Weaviate may add a column-oriented storage mechanism to overcome this performance limitation.

So I believe that, when possible, filtering out a small dataset of objects,and then sorting them out can improve performance too.

Let me know if this helps!

Thanks!

msj242 · June 4, 2024, 1:22am

@DudaNogueira Thanks! I will keep this in mind in the future.

Topic		Replies	Views
Optimizing Weaviate Class Indexing for Efficient Bulk Uploads and Hybrid Searches Support	3	430	February 7, 2024
Sorting the property while fetching Support technical	1	22	June 30, 2025
GraphQL sorting issue? Support bug	3	592	November 29, 2023
Slow query response times Support python	2	128	June 18, 2025
Search with Filter takes longer Support developer-experience , python , technical	1	158	February 3, 2025

Sorting by CreationTime extremely slow

Description

Server Setup Information

Any additional Information

Related topics