Upload without indexing

Is there a way to upload tons of data, then ask weaviate to build its index once all embeddings and data are uploaded hopefully increasing the speed of upload dramatically (my uploads are seeing 5*slower as the database grows to 1million 384d vectors for my setup).

Hi!

We have been cooking some new features to address this. The idea is to have async index building. This, on top of GRPC will deliver improved import times.

Check it out:

Let me know if this helps :slight_smile:

1 Like

Ah yes - this is what I am looking for. For massive data entry it would be extremely useful. Even as a flag for on/off building the index so setup is quick even if later additions are slow. But the solution suggested (ie: async) works too - more sophisticated.

Any clue timeline for feature? Like 1 month or 1-2 years?

This feature should land soon.

Check the PR here: async: Introduce async indexing by asdine · Pull Request #3424 · weaviate/weaviate · GitHub

so you can keep track of it :slight_smile:

Thanks - looks like it has been merged and hopefully part of next release for anyone searching for this.

It states you need to use ASYNC_INDEXING=true, out of curiosity where would this flag be put? Is it in the docker container config file?

That’s right!

You must have it set as your environment variable and then restart Weaviate.

More info: Release v1.22.0 - Async Indexing, Nested Object Support, gRPC API Support, Schema Repair, OIDC Group Auth, and many other improvements & fixes · weaviate/weaviate · GitHub

1 Like