I’m trying to do a batch import of ~300k objects to my WCS vector DB. The cluster has asynchronous indexing enabled, and I’m using the v4 client.
try:
# with client.batch.fixed_size(batch_size=300, concurrent_requests=10, consistency_level=ConsistencyLevel.QUORUM) as batch:
with client.batch.dynamic(consistency_level=ConsistencyLevel.QUORUM) as batch:
batch_start_time = time.time()
for data in final_users:
if len(str(data)) > 25000:
print(f"Data too long!, userId: {data['userId']}, name: {data['name']}")
continue
batch.add_object(
collection="CollectionName",
properties={
... # properties here as a dict, like:
# "name": data['name']
# ....
}
)
if batch.number_errors > 100:
print(f"Batch failed with {batch.number_errors} errors!")
break
else:
counter += 1
if counter % interval == 0:
batch_end_time = time.time()
print(f"Batch {counter}/{len(final_users)} done in {batch_end_time - batch_start_time:.2f} seconds, with {batch.number_errors} errors!")
if counter % 8000 == 0:
print("Sleeping for 100 seconds...")
time.sleep(100)
batch_start_time = time.time() # Reset the start time for the next batch
finally:
client.close()
I get the following errors always which is really annoying:
ErrorObject(message='update vector: connection to: OpenAI API failed with status: 503 error: Service Unavailable.',...
ErrorObject(message="WeaviateBatchError('Query call with protocol GRPC batch failed with message Received http2 header with status: 502.')...
WeaviateBatchError: Query call with protocol GRPC batch failed with message Deadline Exceeded.
Here is how I am connecting to my cluster:
client = weaviate.connect_to_wcs(
cluster_url="71b8fuq1res4bsprkp4gjq.c0-1.us-east1.gcp.weaviate.cloud",
auth_credentials=weaviate.auth.AuthApiKey(WEAVIATE_AUTH_KEY),
headers={
'X-OpenAI-Api-key': OPENAI_KEY,
'X-Cohere-Api-key': COHERE_KEY
},
additional_config=AdditionalConfig(
connection=ConnectionConfig(
session_pool_connections=30,
session_pool_maxsize=200,
session_pool_max_retries=3,
),
timeout=(60, 180),
)
)
There is very little in terms of error handling and documentation in the Weaviate Docs. Is there a way I could fix these errors in particular?
Weaviate Client Version: 4.5.1
Weaviate Server version: 1.24.12
I’m using OpenAI’s text-embedding-large 3 model for vectorization.