"attempt to join and failed" when using PERSISTENCE_DATA_PATH env with efs storage

Description

Hi guys,

I am trying to setup weaviate as a single node in AWS ECS Fargate. I want to use an EFS to store Weaviate date if the ECS tasks needs to restart etc.
Anyhow, when using “PERSISTENCE_DATA_PATH” = “/var/lib/weaviate” as an environment parameter in the Task definition, I am repeatedly getting the error “attempted to join and failed” after task startup until the task ultimately fails.

It seems like it works again after I delete all contents from the EFS - until the task gets restartet, then the error comes up again.

Drives me crazy, I would be really happy if you could help me… Below you will find my configurations.

Server Setup Information

  • Weaviate Server Version: 1.29
  • Deployment Method:
  • Multi Node? No
  • Client Language and Version:
  • Multitenancy?: No

Any additional Information

This is my ECS Fargate Task definition for the ECR container (which contains an umodified image of weaviate 1.29):

{
“family”: “weaviate-task”,
“containerDefinitions”: [
{
“name”: “weaviate”,
“image”: “xxx.dkr.ecr.eu-west-1.amazonaws.com/weaviate:latest”,
“cpu”: 0,
“memoryReservation”: 2048,
“portMappings”: [
{
“containerPort”: 8080,
“hostPort”: 8080,
“protocol”: “tcp”
},
{
“containerPort”: 50051,
“hostPort”: 50051,
“protocol”: “tcp”
},
{
“containerPort”: 8300,
“hostPort”: 8300,
“protocol”: “tcp”
}
],
“essential”: true,
“environment”: [
{
“name”: “AZURE_APIKEY”,
“value”: “xxx”
},
{
“name”: “http_proxy”,
“value”: “xxx:8080”
},
{
“name”: “no_proxy”,
“value”: “xxx,localhost,127.0.0.1,xxx”
},
{
“name”: “ENABLE_MODULES”,
“value”: “text2vec-azure-openai”
},
{
“name”: “https_proxy”,
“value”: “xxx”
},
{
“name”: “PERSISTENCE_DATA_PATH”,
“value”: “/var/lib/weaviate”
},
{
“name”: “DEPLOYMENT_ID”,
“value”: “xxx”
},
{
“name”: “RESOURCE_NAME”,
“value”: “xxx”
}
],
“mountPoints”: [
{
“sourceVolume”: “weaviate-efs-volume”,
“containerPath”: “/var/lib/weaviate”,
“readOnly”: false
}
],
“volumesFrom”: ,
“logConfiguration”: {
“logDriver”: “awslogs”,
“options”: {
“awslogs-group”: “/ecs/weaviate-task”,
“mode”: “non-blocking”,
“awslogs-create-group”: “true”,
“max-buffer-size”: “25m”,
“awslogs-region”: “eu-west-1”,
“awslogs-stream-prefix”: “ecs”
}
},
“systemControls”:
}
],
“executionRoleArn”: “arn:aws:iam::xxx:role/ecsTaskExecutionRole”,
“networkMode”: “awsvpc”,
“volumes”: [
{
“name”: “weaviate-efs-volume”,
“efsVolumeConfiguration”: {
“fileSystemId”: “fs-xxx”,
“rootDirectory”: “/”
}
}
],
“placementConstraints”: ,
“requiresCompatibilities”: [
“FARGATE”
],
“cpu”: “1024”,
“memory”: “3072”,
“runtimePlatform”: {
“cpuArchitecture”: “X86_64”,
“operatingSystemFamily”: “LINUX”
},
“enableFaultInjection”: false
}

Here are the errors from CloudWatch that come up repeatedly until the container shuts down:

2025-03-03T21:35:02.599Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“10.22.122.166:8300”],“time”:“2025-03-03T21:35:02Z”}
2025-03-03T21:35:02.600Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“10.22.122.166:8300”,“status”:14,“time”:“2025-03-03T21:35:02Z”}
2025-03-03T21:35:03.600Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“10.22.122.166:8300”],“time”:“2025-03-03T21:35:03Z”}
2025-03-03T21:35:03.601Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“10.22.122.166:8300”,“status”:14,“time”:“2025-03-03T21:35:03Z”}
2025-03-03T21:35:04.601Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“10.22.122.166:8300”],“time”:“2025-03-03T21:35:04Z”}
2025-03-03T21:35:04.602Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“10.22.122.166:8300”,“status”:14,“time”:“2025-03-03T21:35:04Z”}
2025-03-03T21:35:05.603Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“10.22.122.166:8300”],“time”:“2025-03-03T21:35:05Z”}
2025-03-03T21:35:05.603Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“10.22.122.166:8300”,“status”:14,“time”:“2025-03-03T21:35:05Z”}
2025-03-03T21:35:06.603Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempting to join”,“remoteNodes”:[“10.22.122.166:8300”],“time”:“2025-03-03T21:35:06Z”}
2025-03-03T21:35:06.604Z
{“build_git_commit”:“35d800d”,“build_go_version”:“go1.22.12”,“build_image_tag”:“v1.29.0”,“build_wv_version”:“1.29.0”,“level”:“info”,“msg”:“attempted to join and failed”,“remoteNode”:“10.22.122.166:8300”,“status”:14,“time”:“2025-03-03T21:35:06Z”}

Hi!

Are you aware of this ou docs?

Let me know if this helps!

Note to myself and others who are facing that issue:

The error came up in conjunction with my corporate proxy (that is required for internet access out of the VPC): the proxy was used by weaviate to somehow connect with itself (node). Solution was to add the IP range of the load balancer/ECS containers to the no_proxy environment parameter.

1 Like

hi @franz_hals !!

This is the kind of issue that is almost impossible to catch without this context!

Thanks for sharing! We really appreciate it!