AWS ROLLBACK_FAILED

Hello everyone, I subscribe for weaviate on AWS cloud, I follow the weaviate tutorial for that and I got the following error:
Status
ROLLBACK_FAILED
Status reason
The following resource(s) failed to delete: [weaviatebasamanifestweaviatebasaServiceAccountResource30854DF1].
Please help with that

1 Like

Hi @hussam1030. Would you be able to send us as much detail as you can to support@weaviate.io?

Would you please include details about the full error and on which step it happened within the stack? So we can investigate what exactly times out and increase the timeout / resort things. Any screenshots will be very useful too.

Hi, I have been having the same error too! Using a root account to create the stack following the tutorial.

Here is the error message:Received response status [FAILED] from custom resource. Message returned: TimeoutError: {“state”:“TIMEOUT”,“reason”:“Waiter has timed out”} at checkExceptions (/var/runtime/node_modules/@aws-sdk/node_modules/@smithy/util-waiter/dist-cjs/index.js:59:26) at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/index.js:5933:49) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async defaultInvokeFunction (/var/task/outbound.js:1:875) at async invokeUserFunction (/var/task/framework.js:1:2192) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 9323eefe-9915-49a5-8db3-1f2b82ae9b52)

Thanks @Qiwen_Li - I will pass this on to the team, sorry for inconvenience caused.

Hi @Qiwen_Li - would you be able to help us out by providing some more details?

Cloudformation Stacks can fail for various reasons, such as missing permissions,
timeouts etc. In order to streamline the support process, we need the following information:

Receiving timeouts

In case you receive a timeout upon creation, please give it a second try. From time to time complex
stacks my fail due to processes taking longer than usual upon creation.

AWSSupport-TroubleshootCFNCustomResource

In case your problem persists and you have checked your quotas. The next step would be using AWSSupport-TroubleshootCFNCustomResource.

The AWSSupport-TroubleshootCFNCustomResource runbook helps diagnose why an AWS CloudFormation stack failed in creating, updating, or deleting a custom resource. The runbook checks the service token used for the custom resource and the error message that was returned. After reviewing the details for the custom resource, the runbook output provides an explanation of the stack behavior and troubleshooting steps for the custom resource.

Please follow the Steps listed within Troubleshooting Cloudformation stack failures | AWS re:Post for first insights.

Gather more insights via CloudTrail

Check the AWS CloudTrail logs
If the resource doesn’t show any errors in its corresponding console, then use AWS CloudTrail logs to troubleshoot the issue. For information on viewing CloudTrail logs, see Viewing events with CloudTrail Event history.

  1. Open the CloudFormation console.
  2. In the navigation pane, choose Stacks, and then select the stack that’s in a stuck state.
  3. Choose the Resources tab.
  4. In the Resources section, refer to the Status column. Find any resources that are stuck in the create, update, or delete process. Note: These resources might be in the state CREATE_IN_PROGRESS, UPDATE_IN_PROGRESS, or DELETE_IN_PROGRESS.
  5. Choose the Events tab, and then note the timestamp when CloudFormation initialized the creation of that stuck resource.
  6. Open the CloudTrail console.
  7. In the navigation pane, choose Event history.
  8. For Time range, enter the date and time for the timestamp that you noted in step 5 for the starting time (From). For the ending time (To), enter a date and time that’s five minutes past the starting time. Note: For example, suppose that CloudFormation initialized the creation of your stuck resource at 9:00 AM on 2024-01-01. In this case, enter 09:00 AM on 2024-01-01 as your starting time and 9:05 AM on 2024-01-01 as your ending time.
  9. Choose Apply.
  10. In the returned list of events, find the API calls that are related to the create or update API call of your resource.

Provide Weaviate information for support

In order to streamline the support process we need the following things:

  1. Please provide us the AWS Region you tried to deploy to as well all settings.
  2. Screenshot of the Stack Error / Cloudformation console
  3. Please provide us also the full error message if possible
  4. Provide us the full Cloudformation Deployment logs. How to find logs is explained within the following link: View CloudFormation Logs in the Console

Hi, I can only put one screenshot per post so I am gonna split this up.
This is the resource tab screen shot:

This is the stack and event screenshot:

This is the AWSSupport-TroubleshootCFNCustomResource screenshot:

AWS region : US-east-1
error message: Received response status [FAILED] from custom resource. Message returned: TimeoutError: {“state”:“TIMEOUT”,“reason”:“Waiter has timed out”} at checkExceptions (/var/runtime/node_modules/@aws-sdk/node_modules/@smithy/util-waiter/dist-cjs/index.js:59:26) at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/index.js:5933:49) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async defaultInvokeFunction (/var/task/outbound.js:1:875) at async invokeUserFunction (/var/task/framework.js:1:2192) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 9323eefe-9915-49a5-8db3-1f2b82ae9b52)

Any update? I am getting the same error and it’s quite frustrating. “TimeoutError” and “Rollback failed”. Tried to redeploy many times but didn’t help. Everything works well with Weaviate outside of AWS but this containerized service on AWS doesn’t seem to work…

Going to +1 on this getting the same error trying to run Weaviate on AWS. Here is the screenshot

Hi @Qiwen_Li , I hope you having a good week!

I am still investigating this issue however, are you able to share the .py scripts as well if not confidential.