Multi node docker-compose deployment on multiple machines

Description

I’m having trouble setting up multi-node deployments using docker-compose on different machines. I have two servers server_1 and server_2 and I’m trying to run 2 instances on server_1 and 1 on server_2. No matter how I set it up, nodes on server_1 and server_2 cannot join a cluster together. All traffic between the servers is allowed.

server_1 docker-compose

services:
  weaviate-1:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '9090'
    - --scheme
    - http
   image: cr.weaviate.io/semitechnologies/weaviate:`1.26.1`
  ports:
    - 9090:9090
    - 7100:7100/tcp
    - 7100:7100/udp
    - 7101:7101/tcp
    - 7101:7101/udp
    - 50051:50051/tcp
    - 50051:50051/udp
    - 8300:8300/tcp
    - 8300:8300/udp
    - 8301:8301/tcp
    - 8301:8301/udp
   environment:
      CLUSTER_HOSTNAME: 'server_1_1'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'
      RAFT_JOIN: 'HOSTNAME_1:8300,HOSTNAME_1:8302,HOSTNAME_2:8300'
      RAFT_BOOTSTRAP_EXPECT: 3
 weaviate-2:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '9090'
    - --scheme
    - http
   image: cr.weaviate.io/semitechnologies/weaviate:`1.26.1`
  ports:
    - 9091:9090
    - 7102:7102/tcp
    - 7102:7102/udp
    - 7103:7103/tcp
    - 7103:7103/udp
    - 50052:50051/tcp
    - 50052:50051/udp
    - 8302:8302/tcp
    - 8302:8302/udp
    - 8303:8303/tcp
    - 8303:8303/udp
   environment:
      CLUSTER_HOSTNAME: 'server_1_2'
      CLUSTER_JOIN: 'HOSTNAME_1:7100'
      CLUSTER_GOSSIP_BIND_PORT: '7102'
      CLUSTER_DATA_BIND_PORT: '7103'
      RAFT_PORT: '8302'
      RAFT_INTERNAL_RPC_PORT: '8303'
      RAFT_JOIN: 'HOSTNAME_1:8300,HOSTNAME_1:8302,HOSTNAME_2:8300'
      RAFT_BOOTSTRAP_EXPECT: 3

and for server_2

services:
  weaviate-1:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '9090'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.26.1
    ports:
    - 9090:9090
    - 7100:7100/tcp
    - 7100:7100/udp
    - 7101:7101/tcp
    - 7101:7101/udp
    - 50051:50051/tcp
    - 50051:50051/udp
    - 8300:8300/tcp
    - 8300:8300/udp
    - 8301:8301/tcp
    - 8301:8301/udp
    restart: on-failure:0
    environment:
      CLUSTER_HOSTNAME: 'server_2'
      CLUSTER_JOIN: 'HOSTNAME_1:7100'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'
      RAFT_JOIN: 'HOSTNAME_1:8300,HOSTNAME_1:8302,HOSTNAME_2:8300'
      RAFT_BOOTSTRAP_EXPECT: 3

I’ve tried numerous RAFT_JOIN configs with the CLUSTER_HOSTNAME’s, or the hostnames of server_1 and server_2 with the ports. In the first case, server_1 services don’t join server_2’s cluster, and server_2 doesn’t join server_1s.
{"level":"info","msg":" memberlist: Suspect server_2 has failed, no acks received","time":"2024-08-13T23:29:46Z"}
In the second case, there’s connection problems between the two services on server_1. Any insight into how to set this up, or examples?

The documentation for this doesn’t include port 8300 / Raft port info at all, and seems to lack a nice example of multi node configurations with docker-compose on different machines.

Server Setup Information

  • Weaviate Server Version: 1.26.1
  • Deployment Method: docker-compose
  • Multi Node? Number of Running Nodes: 3 on 2 servers
  • Client Language and Version: Python
  • Multitenancy?: Not atm

Any additional Information

Hi!

This is an unusual way of deploying. I have seen some users with this similar deployment requirement.

Can you elaborate better on the reasoning for this?

K8s is a better way to manage multi nodes, and is way more battle proofed.

Alternatively, it would be easier to deploy in Docker Swarm.

other than that, I would try leaving all service name, hostname and node name the same, and working with extra hosts to make sure the host/node is resolving into the corresponding IP.

But this is far from the optimal way to deploy a production cluster :thinking:

Hello, I’m currently facing the same issue. Have you figured out how to solve the error?