Hi, I am trying to upload 600k records to weavite and it is taking 12-13 hrs for it to complete. I tried with latest version 1.22 and used Asychnous mode too but of no use. Here is my schema creation code schema = {
“classes”: [
{
“class”: “xxxxxxxxxxxxxx”,
“description”: “Images of different dogs”,
“vectorIndexType”: “hnsw”,
“vectorIndexConfig”: {
“distance”: “cosine”,
“skip”: False,
“ef”: 2500,
“efConstruction”: 64,
“maxConnections”: 16,
“vectorCacheMaxObjects”:150000000000,
},
“vectorizer”: “text2vec-transformers”,
“properties”: [
Here is my vectorizer:# def _vectorize(self, text: str, config: VectorInputConfig): #lightEmbeddings = True
with torch.no_grad(): #print(‘text=>’,text)
DEFAULT_VECTORIZER = “LE”
if DEFAULT_VECTORIZER == “LE” :
tokens = self.tokenize_LE(text)
tokens =list(tokens)
batch_sum_vectors=np.array(tokens, dtype=np.float32)
return batch_sum_vectors[0]
Can some one suggest any workaround for data upload ?
1 - Use deterministic IDs
2 - start with smaller batch size and workers and then increment
3 - Size your deployment accordingly.
4 - Use the latest version of server and client, which now uses GRPC and have features like ASYNC INDEXING.
I will mark this as solved as we have not heard more from your.