Weaviate Spark Connector & Async Indexing

Description

We have been indexing into weaviate using the spark connector 1.3.2 batch writes. Recently, we came across weaviate’s async indexing capability. When we turned it on using the ENV
ASYNC_INDEXING = true & while doing batch writes, we still found the increase in number of vector indexes in the indexed collection to be increasing linearly. We also observed the queue size was increasing as well.

Can someone please explain how weaviate’s async indexing capabilites perform or behave with the spark batch write or is it better to fallback to python client batch write with dynamic server load handling?

Good morning @Arindom_Bora,

Weaviate’s async indexing will for sure put objects in a queue for background indexing however there will be a delay before objects are available for vector search. The queue size reflects objects waiting to be indexed, and it’s normal for this to grow during high imports.

Regarding the Spark connector, I have not come across a proof either bad or good with ASYNC_INDEXING. However, async indexing is a server-side feature and should work regardless of the client, as long as objects are being written to Weaviate.

The main difference with the Python client is that it supports batching features, server-side automatic batching, which dynamically adjusts batch sizes based on server feedback for optimal throughput and stability

If you need more control or feedback-driven batching, consider the Python client.

Best regards,

Mohamed Shahin
Weaviate Support Engineer
(Ireland, UTC±00:00/+01:00)

1 Like

Thank You @Mohamed_Shahin . Let me try the python client as well

1 Like