Sorting by CreationTime extremely slow

Description

This call below takes such a long time that it timesout. Even with long timeouts. I think this should be extremely fast seeing as though there is not even a hybrid search - this is just trying to get the last 10 items.

It should be near instant, no? Perhaps I am using this wrong? Do I need to specify or add a creationTime index? Also I do have 100 million 384d objects. So it is a lot - but I assume I still must be missing something.

let builder = this.client.graphql
                .get()
                .withClassName("Episode")
                .withFields("title, description,  _additional{ id, creationTimeUnix }")
                .withLimit(options.limit)
                .withOffset(options.page * options.limit);
                builder = builder.withSort([{ path: ['_creationTimeUnix']}])
            
            builder.do().then((res) => {
                console.log(res);
            }).catch((e) => {
                reject(e);
            });

Server Setup Information

  • Weaviate Server Version: 1.24.12
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python/Go/Nodejs (Query in Nodejs)

Any additional Information

weaviate-weaviate-1  | {"action":"restapi_request","level":"debug","method":"POST","msg":"received HTTP request","time":"2024-05-17T15:11:29Z","url":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/graphql","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""}}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_start","jobs":1,"level":"debug","msg":"starting job list deduplication","time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_complete","jobs":1,"level":"debug","msg":"completed job list deduplication","removedJobs":0,"time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_start","jobs":1,"level":"debug","msg":"starting job list deduplication","time":"2024-05-17T15:11:29Z"}
weaviate-weaviate-1  | {"action":"request_cacher_dedup_joblist_complete","jobs":1,"level":"debug","msg":"completed job list deduplication","removedJobs":0,"time":"2024-05-17T15:11:29Z"}

hi @msj242 !

I believe that with that amount of objects, sharding your data in multiple nodes can help in this scenario.

I will ask internally about this as this is an interesting case.

Thanks!

@DudaNogueira

Thanks! If so, I will look into sharding - As a patch, in the mean time I’ll keep a simple sqllite of recent uploads to speed things up for myself.

@msj242 one thing to consider:

Weaviate does not use any sorting-specific data structures on disk. When objects are sorted, Weaviate identifies the object and extracts the relevant properties. This works reasonably well for small scales (100s of thousand or millions of objects). It is expensive if you sort large lists of objects (100s of millions, billions). In the future, Weaviate may add a column-oriented storage mechanism to overcome this performance limitation.

So I believe that, when possible, filtering out a small dataset of objects,and then sorting them out can improve performance too.

Let me know if this helps!

Thanks!

@DudaNogueira Thanks! I will keep this in mind in the future.

Dynamic date sorting and filtering at query time can easily choke the database under load. We hit the same timeout issues when trying to temporally sort high-velocity domains.

To fix the latency, we moved the temporal math out of the vector DB entirely. Weaviate handles the raw semantic retrieval, and we pass the output through a dedicated temporal decay middleware (cached with Redis). It calculates the exact days_until_stale and applies an exponential decay curve in ~40ms, drastically reducing the load on the DB while ensuring the LLM only gets fresh context. Let me know if you want to see the architecture trace for this.