Parallel Batch Operations and Consistency Level

Hello,

I am currently facing an issue while attempting to execute two batch operations in parallel, each with a batch size of 256. I’ve set the consistency level to “ONE” in the hopes of achieving parallel insertion. However, the response time for these parallel operations is identical to that of executing two consecutive batch operations.

Additionally, I tried explicitly setting the replicas using the following while creating the class:

"replicationConfig": {
  "factor": 2
}

As per the documentation:

If the write is not set to ALL (possible from v1.18), writing data is asynchronous from the user’s perspective.

I’m uncertain if I am overlooking something in my approach. Any guidance or suggestions you can provide would be greatly appreciated.

Thank you.

hi @moaazzaki ! Welcome to our community :hugs:

What version of Weaviate are you running?

Are you running a single batch with 2 workers or running two batches separately?

Increasing the replication factor will not improve import times:

What you can do to improve performance in import is:

  1. Use the new python v4 client, as it leverages a GRPC connection (best used with latest Weavaite Server version)
  2. Implement error handling, so you know what objects were not able to import
  3. Opt for fewer large machines rather than more small ones to minimize network latency.
  4. Experiement with ASYNC INDEXING (experimental)
  5. While importing using batch, go incrementally on the batch size and number of workers. Also monitor the CPU usage of your client.

Let me know if this helps!

1 Like

Hi @DudaNogueira, Thanks for the welcoming!

I tried the above on two versions of weaviate:

More details about your questions:

  1. My implementation is based on async aiohttp calls, as the python client is currently synchronous afaik from the disscussion here.

  2. I send two batches (two API requests) in parallel separately.

  3. I tried going incrementally from batch size 64 to 1024, didn’t notice much gain/loss.

I didn’t try the async indexing option, it looks promising, so I’ll give that a try and hopefully it get the issue solved, thanks for your help!

1 Like