Data upload speed is low

Hi, I am trying to upload 600k records to weavite and it is taking 12-13 hrs for it to complete. I tried with latest version 1.22 and used Asychnous mode too but of no use. Here is my schema creation code schema = {
“classes”: [
{
“class”: “xxxxxxxxxxxxxx”,
“description”: “Images of different dogs”,
“vectorIndexType”: “hnsw”,
“vectorIndexConfig”: {
“distance”: “cosine”,
“skip”: False,
“ef”: 2500,
“efConstruction”: 64,
“maxConnections”: 16,
“vectorCacheMaxObjects”:150000000000,
},
“vectorizer”: “text2vec-transformers”,
“properties”: [

Here is my vectorizer:# def _vectorize(self, text: str, config: VectorInputConfig):
#lightEmbeddings = True
with torch.no_grad():
#print(‘text=>’,text)
DEFAULT_VECTORIZER = “LE”
if DEFAULT_VECTORIZER == “LE” :
tokens = self.tokenize_LE(text)
tokens =list(tokens)
batch_sum_vectors=np.array(tokens, dtype=np.float32)
return batch_sum_vectors[0]

Can some one suggest any workaround for data upload ?

Hi!

As you are producing the vectors yourself, I suggest you to generate all of them first on a json format, for example, and then import to weaviate.

Also, are you using batch to import those content? What is the batch size?

yes we are using json and here is my sample code for record upload

df_training = pd.read_json(‘somefile.json’,dtype=object)
with client.batch(
batch_size = 100, num_workers = 4, dynamic=True) as batch:
total_seconds_prev=0
for i, d in df_training.iterrows():
properties = {
“xxx”: d[“xxx”],
“yyy”: d[“yyy”],
“zzz”: d[“zzz”],

}
result=client.batch.add_data_object(properties, “classname”)

Hi!

Were you able to fix this? This can be a lack of resources from the server side.

Do you have some metrics on that?

Thanks!

Hi! for better data upload:

1 - Use deterministic IDs
2 - start with smaller batch size and workers and then increment
3 - Size your deployment accordingly.
4 - Use the latest version of server and client, which now uses GRPC and have features like ASYNC INDEXING.

I will mark this as solved as we have not heard more from your.

Thanks!