Read timeout because batch size = 1 but it's not

Siddhi_Bansal · July 17, 2023, 12:37am

I’m importing some data into a schema, and am getting the following error:
ReadTimeout: The 'objects' creation was cancelled because it took longer than the configured timeout of 60s. Try reducing the batch size (currently 1) to a lower value. Aim to on average complete batch request within less than 10s

I have set my batch size to 200 currently using with client.batch as batch and batch.batch_size= 200, but am getting this error that says my batch size is 1. I add my objects using: client.batch.add_data_object(properties, class_name, vector)

Not sure what’s going wrong. The first ~1500 datapoints imported successfully, but now after these, I’m always getting this error and am unable to move forward. I would really appreciate some help with this. Thanks.

jphwang · July 17, 2023, 8:02pm

Hi @Siddhi_Bansal (I’ve moved this to the support category)

Would you be able to reproduce your code here? Maybe then we can take a look and help.

Thanks!

Siddhi_Bansal · July 17, 2023, 8:16pm

Hi @jphwang , thanks.
I’m calling a function called add_to_weaviate, which has the following code:

def add_to_weaviate(client, url, url_for_bucket_obj, text, max_tokens):
    if ("/category/" not in url) and ("first-gen" not in url):  # this is just some filtering
        with client.batch as batch:
            batch.batch_size=200
            chunks = split_text_into_chunks(text, max_tokens)
            count = 1
            for chunk in chunks:
                chunk_vector = text_embedding(chunk)
                properties = {
                  "text": chunk,
                  "url": url
                }
                client.batch.add_data_object(properties, "Harness_Docs_Data_1500_Tokens", vector=chunk_vector)

Siddhi_Bansal · July 18, 2023, 4:42am

@jphwang one thing I noticed: this error has started occurring only after my class has 1200 vectors - everything was working fine before that (maybe some kind of limit? - how can we remove this?). Thought this might help with the issue.

jphwang · July 18, 2023, 8:44am

Right, thanks.

So, I think this message:

Try reducing the batch size (currently 1) to a lower value

Refers to the size of the current batch being tried - rather than the maximum size of the batch.

Based on that, it looks like the error happens with one object the batch. Which seems odd. I’ll ask around to see if anyone has an idea.

jphwang · July 18, 2023, 9:08am

We’ve noticed a couple of things:

The function passes one text per call - so each time the function is called, it will re-initialize a new batch process. Could you try refactoring the code like this?

def add_to_weaviate(batch, url, url_for_bucket_obj, text, max_tokens):
    if ("/category/" not in url) and ("first-gen" not in url):  # this is just some filtering
        chunks = split_text_into_chunks(text, max_tokens)
        count = 1
        for chunk in chunks:
            chunk_vector = text_embedding(chunk)
            properties = {
              "text": chunk,
              "url": url
            }
            batch.add_data_object(properties, "Harness_Docs_Data_1500_Tokens", vector=chunk_vector)

with client.batch as batch:
        batch.batch_size=200
        ...code that creates the texts....
        add_to_weaviate(batch, ...)

But still, that shouldn’t necessarily create timeouts.

Does the error always happen at the exact same point?
How are you running Weaviate? (What version, where is it running, etc.) and are you able to take a look at the logs?

jphwang · July 19, 2023, 7:55pm

Note to others for future reference - the user had downgraded from 1.20 to 1.19 in this instance.

Downgrading is not always supported - such as in this case from 1.20 to 1.19. Please check the documentation, for example our release blogs for such information.

Topic		Replies	Views
An error occurred: The 'objects' creation was cancelled because it took longer than the configured timeout of 60s. Try reducing the batch size (currently 1) to a lower value. Aim to on average complete batch request within less than 10s Support bug	1	155	October 15, 2024
Error in Batch Addition Support	14	2963	April 8, 2024
Timing out on a batch of 1 Support	1	729	September 25, 2023
Vectorizer Timeout settings and behavior Support	4	555	July 26, 2024
Weaviate Batch Errors during Batch Insertion with v4 client Support bug , developer-experience , wcs , python , documentation	11	1239	May 15, 2024

Read timeout because batch size = 1 but it's not

Related topics