Parallel Batch Operations and Consistency Level

moaazzaki · January 25, 2024, 5:57pm

Hello,

I am currently facing an issue while attempting to execute two batch operations in parallel, each with a batch size of 256. I’ve set the consistency level to “ONE” in the hopes of achieving parallel insertion. However, the response time for these parallel operations is identical to that of executing two consecutive batch operations.

Additionally, I tried explicitly setting the replicas using the following while creating the class:

"replicationConfig": {
  "factor": 2
}

As per the documentation:

If the write is not set to ALL (possible from v1.18), writing data is asynchronous from the user’s perspective.

I’m uncertain if I am overlooking something in my approach. Any guidance or suggestions you can provide would be greatly appreciated.

Thank you.

DudaNogueira · January 30, 2024, 12:47pm

hi @moaazzaki ! Welcome to our community

What version of Weaviate are you running?

Are you running a single batch with 2 workers or running two batches separately?

Increasing the replication factor will not improve import times:

What you can do to improve performance in import is:

Use the new python v4 client, as it leverages a GRPC connection (best used with latest Weavaite Server version)
Implement error handling, so you know what objects were not able to import
Opt for fewer large machines rather than more small ones to minimize network latency.
Experiement with ASYNC INDEXING (experimental)
While importing using batch, go incrementally on the batch size and number of workers. Also monitor the CPU usage of your client.

Let me know if this helps!

moaazzaki · January 30, 2024, 4:30pm

Hi @DudaNogueira, Thanks for the welcoming!

I tried the above on two versions of weaviate:

Docker image: semitechnologies/weaviate:1.23.2
AWS Marketplace Weaviate cluster

Topic		Replies	Views
Low QPS when using gRPC (v4) to batch insert data Support	1	112	January 27, 2025
Weaviate Batch Errors during Batch Insertion with v4 client Support bug , developer-experience , wcs , python , documentation	11	1194	May 15, 2024
Timing out on a batch of 1 Support	1	716	September 25, 2023
Optimizing Imports between number of nodes & pods General	1	489	October 16, 2023
Write timeout in combination with replicas Support wcs , technical	18	484	April 22, 2025

Parallel Batch Operations and Consistency Level

Related topics