Upload without indexing

Is there a way to upload tons of data, then ask weaviate to build its index once all embeddings and data are uploaded hopefully increasing the speed of upload dramatically (my uploads are seeing 5*slower as the database grows to 1million 384d vectors for my setup).

Hi!

We have been cooking some new features to address this. The idea is to have async index building. This, on top of GRPC will deliver improved import times.

Check it out:

Let me know if this helps :slight_smile:

1 Like

Ah yes - this is what I am looking for. For massive data entry it would be extremely useful. Even as a flag for on/off building the index so setup is quick even if later additions are slow. But the solution suggested (ie: async) works too - more sophisticated.

Any clue timeline for feature? Like 1 month or 1-2 years?

This feature should land soon.

Check the PR here: async: Introduce async indexing by asdine · Pull Request #3424 · weaviate/weaviate · GitHub

so you can keep track of it :slight_smile:

Thanks - looks like it has been merged and hopefully part of next release for anyone searching for this.

It states you need to use ASYNC_INDEXING=true, out of curiosity where would this flag be put? Is it in the docker container config file?

That’s right!

You must have it set as your environment variable and then restart Weaviate.

More info: Release v1.22.0 - Async Indexing, Nested Object Support, gRPC API Support, Schema Repair, OIDC Group Auth, and many other improvements & fixes · weaviate/weaviate · GitHub

1 Like

Hi Duda
Can you let me know when does the ASYNC start to index then? I was able to import the data
Thanks!

hi Samuel!

AFAIK, it will balance between the ingest operations and the index operations. The idea is to avoid having the load of ingesting and indexing running at the same time and not performing well on both. So I believe it will control the server load and keep the index queue grow when the ingestion is in a higher rate.

There is more info here too, as well on the aforementioned PR: