Spark Connector has errored json/gson after certain batch size

Lakshya_Bakshi · November 14, 2023, 7:08am

Hi all,

I’m using the spark connector to import nearly 200M records. While I’d like to use bigger batches and make use of asynchronous importing from weaviate version 1.22, the spark connector seems to have issues in handling batch sizes beyond 200. Specifically, when going beyond 200, I often see errors like the following:

reason=ExceptionFailure(io.weaviate.spark.WeaviateResultError,error getting result and no more retries left. Error from Weaviate: [WeaviateErrorMessage(message=java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $, throwable=com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $), WeaviateErrorMessage(message=Failed ids: 42946687-9c7b-5a99-b5a5-60f2216e894d,...

Any help would be appreciated!

antas-marcin · November 16, 2023, 11:33am

Hello, are you importing data into WCS sandbox? or it’s your own Weaviate setup?

Your exception message

Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $

suggests that there was a timeout on the LB part. Looks like you’re overwhelming your WCS instance and that’s why you get this error from LB.

Lakshya_Bakshi · November 16, 2023, 6:49pm

Hi Antas,

Thank you for the reply. I’m running self-hosted weaviate, with two transformer-inference containers each running on a GPU and 4 nodes each running one container of weaviate as a shard. My load balancer timeout is set to 10 min, and I’m surprised that I’m overloading my instance with what feels like a marginal change (200 vs 250). I will explore resource planning options and see where the error could be.

antas-marcin · November 23, 2023, 8:32am

Please do! I think that you are overwhelming your current Weaviate infra setup and that’s why you are getting those kind of errors when using spark connector. I think that transformer-inference service might be a bottleneck here.

Topic		Replies	Views
Weaviate batch writes Erroring when i try to write 500000 data from spark connector Support	2	967	April 9, 2024
Weaviate Batch writes throwing errors Support python , technical	2	345	July 4, 2025
Error: 'WeaviateBatchError('Query call with protocol GRPC batch failed with message <>) Support	6	806	October 2, 2025
Payload Too Large Support	1	613	February 23, 2024
Error in Batch Addition Support	14	3660	April 8, 2024

Spark Connector has errored json/gson after certain batch size

Related topics