I’m importing some data into a schema, and am getting the following error:
ReadTimeout: The 'objects' creation was cancelled because it took longer than the configured timeout of 60s. Try reducing the batch size (currently 1) to a lower value. Aim to on average complete batch request within less than 10s
I have set my batch size to 200 currently using with client.batch as batch and batch.batch_size= 200
, but am getting this error that says my batch size is 1. I add my objects using: client.batch.add_data_object(properties, class_name, vector)
Not sure what’s going wrong. The first ~1500 datapoints imported successfully, but now after these, I’m always getting this error and am unable to move forward. I would really appreciate some help with this. Thanks.
Hi @Siddhi_Bansal (I’ve moved this to the support category)
Would you be able to reproduce your code here? Maybe then we can take a look and help.
Thanks!
Hi @jphwang , thanks.
I’m calling a function called add_to_weaviate, which has the following code:
def add_to_weaviate(client, url, url_for_bucket_obj, text, max_tokens):
if ("/category/" not in url) and ("first-gen" not in url): # this is just some filtering
with client.batch as batch:
batch.batch_size=200
chunks = split_text_into_chunks(text, max_tokens)
count = 1
for chunk in chunks:
chunk_vector = text_embedding(chunk)
properties = {
"text": chunk,
"url": url
}
client.batch.add_data_object(properties, "Harness_Docs_Data_1500_Tokens", vector=chunk_vector)
@jphwang one thing I noticed: this error has started occurring only after my class has 1200 vectors - everything was working fine before that (maybe some kind of limit? - how can we remove this?). Thought this might help with the issue.
Right, thanks.
So, I think this message:
Try reducing the batch size (currently 1) to a lower value
Refers to the size of the current batch being tried - rather than the maximum size of the batch.
Based on that, it looks like the error happens with one object the batch. Which seems odd. I’ll ask around to see if anyone has an idea.
We’ve noticed a couple of things:
The function passes one text
per call - so each time the function is called, it will re-initialize a new batch process. Could you try refactoring the code like this?
def add_to_weaviate(batch, url, url_for_bucket_obj, text, max_tokens):
if ("/category/" not in url) and ("first-gen" not in url): # this is just some filtering
chunks = split_text_into_chunks(text, max_tokens)
count = 1
for chunk in chunks:
chunk_vector = text_embedding(chunk)
properties = {
"text": chunk,
"url": url
}
batch.add_data_object(properties, "Harness_Docs_Data_1500_Tokens", vector=chunk_vector)
with client.batch as batch:
batch.batch_size=200
...code that creates the texts....
add_to_weaviate(batch, ...)
But still, that shouldn’t necessarily create timeouts.
- Does the error always happen at the exact same point?
- How are you running Weaviate? (What version, where is it running, etc.) and are you able to take a look at the logs?
Note to others for future reference - the user had downgraded from 1.20 to 1.19 in this instance.
Downgrading is not always supported - such as in this case from 1.20
to 1.19
. Please check the documentation, for example our release blogs for such information.