Batch thread died unexpectedly

Environment:

  • weaviate 1.25.1
  • weaviate client 4.6.4
  • fastapi 0.111.0

when using import batching like so:

async def batch_save(facts: Sequence[CreateWithVector],
weaviate_class_name: str,
weaviate_session: WeaviateClient):
collection = weaviate_session.collections.get(weaviate_class_name)

with collection.batch.dynamic() as batch:
    for fact in facts:
        fact_id = uuid.uuid4()
        batch.add_object(
            properties={
                "fact": fact.content,
            },
            vector=fact.vector,
            uuid=fact_id,
        )
        weaviate_uuid_list.append(str(fact_id))

there is an exception → batch thread died unexpectedly.

Any help is appreciated! thanks!

Hi @aleks,

Welcome to our community! it’s great to have you here :hugs:

Have you considered to capture more information about this error by implementing error handling and logging during the batch process?

In Weaviate Pythin Client, the following exception handling can help raises various error conditions:

  • weaviate.exceptions.WeaviateConnectionError for failed connections.
  • weaviate.exceptions.WeaviateQueryError for failed queries.
  • weaviate.exceptions.WeaviateBatchError for failed batch operations.
  • weaviate.exceptions.WeaviateClosedClientError for operations on a closed client.

As an example, to help catch more of what happens during batch import:

try:
collection = client.collections.get(“NonExistentCollection”)
collection.query.fetch_objects(limit=2)
except weaviate.exceptions.WeaviateBaseError as e:
print(f"Caught a Weaviate error: {e.message}")

You can review this module which defines the exceptions that can be raised by the client library.

Additionally, I would recommend upgrading your Weaviate to the latest v1.25.7 as well. There are improvements since v1.25.1.

Exception handling →

Error Handling →

batching does not support being called within async functions. That is probably why it fails

1 Like

Thanks @Mohamed_Shahin for your help and suggestion, but unluckily the exception thrown is Exception and not the one that you indicated.

As mentioned by @Dirk , batching is not supported in a asynchronous multithreaded environment.

Are there any plans to support it? Is there any workaround?

Any hint would be appreciated.

Thanks so much in advanced.

Batching is using async behind the scenes already. Adding objects is non-blocking and it automatically sends multiple concurrent requests. We do not think it makes sense to call this async

If you want to do async batching yourself you can use data.insert_many(). The latest developement version (4.7.0-rc-2) also contains an async client taht you could try out.

@Dirk thanks for your reply and details.
One aspect that I do not have clear is. You said that batching is not supported being called by async, but in the previous post you commented that batching is using async already behind the scenes. I am confused because I thought batching is not supported in async environments.

Maybe I am missing something :slight_smile:

Thanks for your support and kind regards.

Our implementation of batch is using async-code to send batches.
So basically if you call batch.add_object() you add objects to a queue and then there are background threads that observer that queue and send batches. The internal send_batch function is async to allow for multiple concurrent requests without threading

Ok, thanks for the details @Dirk . But then, why does batching not working using async methods using fastapi as web framework?

Thanks again for your support, patience and time. Really appreciated.

Kind regards

But then, why does batching not working using async methods using fastapi as web framework?

I think (but not 100% sure) because we also create event loops inside of the batching

using fastapi as web framework

I am not an fastapi expert, but our batching algorithm is mainly aimed for long running tasks eg 1000 objects+ and is not threadsafe.

I would recommend to either:

  • use data.insert/insert_many() and build your own async wrapper around them
  • install weaviate-client==4.7.0-rc.2 and test our new async client (please not directly in production :slight_smile: ) you can then do:
async with weaviate.use_async_with_local() as async_client
    collection = async_client.collections.get(name)
    await collection.data.insert()/insert_many(....)
  • give us feedback if anything does not work as expected :slight_smile:

Hi @Dirk ,
to my surprise batching is working as expected using fastapi and async methods. The problem was that I was running pycharm in debug mode and in this mode batching, async and event loops are not good friends :slight_smile:

Thanks for your support.

Keep you posted in anything comes up.

Kind regards,

to my surprise batching is working as expected using fastapi and async methods. The problem was that I was running pycharm in debug mode and in this mode batching, async and event loops are not good friends :slight_smile:

I would be careful here - I think often it comes down to timing and the debug mode might just have the “wrong” timing. Meaning anything that changes the timing (different query, weird user input) could bring the error back.

@tsmith023 - you are more experienced with async, what do you think?

Yes, of course @Dirk , we will be watching if it works as expected. Anyway, I think there is a plan to release a Weaviate async client. Do you know when it would be the release?

And lets see what is the opinion of @tsmith023 :slight_smile:

Thanks a lot for your help?