Unable to Run Backup Process w/ Python Client

Description

I’m using the Weaviate Python client to run a backup job to dump files into S3. I am using a replicationConfig with a factor of 2 with 2 nodes and I get the following error message:

weaviate.exceptions.UnexpectedStatusCodeError: Backup creation! Unexpected status code: 422, with response body: {'error': [{'message': 'node {"node1" "XX.XX.XX.XX:XXXX"}: cannot commit : class MyClass doesn\'t exist'}]}.

I can run inserts and query against the data, but I am unable to run a backup because it doesn’t think the data exists in a second node.

Any ideas what could be wrong with this setup? Possible connection issue?

Server Setup Information

  • Weaviate Server Version: 1.26.1
  • Deployment Method: Docker on AWS ECS
  • Multi Node? Number of Running Nodes: 2
  • Client Language and Version: 4.6.5
  • Multitenancy?: Yes

Any additional Information

Only other problem I found was this log:

{"action":"create_backup","level":"error","msg":"coordinator aborted operation","time":"2024-07-23T21:18:29Z"}

Extra logs I found from inserts, but nothing erroring out on it.

{"got":0,"level":"debug","msg":"wait for update version","time":"2024-07-23T21:14:51Z","want":81}

Hi @dhanshew72 !!

That’s strange.

in your http://localhost:8080/v1/nodes?output=verbose can you see your 2 nodes correctly?

Here’s what I got

{"nodes":[{"batchStats":{"queueLength":0,"ratePerSecond":171},"gitHash":"6fd2432","name":"node0","shards":[{"class":"ClassName","compressed":false,"loaded":true,"name":"shard_0","objectCount":1080,"vectorIndexingStatus":"READY","vectorQueueLength":0},{"class":"ClassName","compressed":false,"loaded":true,"name":"shard_1","objectCount":88,"vectorIndexingStatus":"READY","vectorQueueLength":0}],"stats":{"objectCount":1168,"shardCount":2},"status":"HEALTHY","version":"1.26.1"},{"batchStats":{"queueLength":0,"ratePerSecond":0},"gitHash":"6fd2432","name":"node1","shards":null,"stats":{"objectCount":0,"shardCount":0},"status":"HEALTHY","version":"1.26.1"}]}

Seems like it’s just not writing to the other node which is shocking because all other logs show the connection is working.

Hmm, I downgraded to version 1.24 and it showed the correct output curling that endpoint again.