Failed while setting Multi node cluster using docker swarm on 2 VM

So i’m not able to setup weaviate cluster using docker swarm using 2 VMs.

Here is my docker compose file i’m using

version: '3.8'

networks:
  cluster_network:
    driver: overlay

services:
  weaviate-node-1:
    init: true
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.23.6
    ports:
      - 8080:8080
      - 6060:6060
      - 50051:50051
      - 7100:7100
      - 7101:7101
    restart: on-failure:0
    volumes:
      - ./data-node-1:/var/lib/weaviate
    environment:
      LOG_LEVEL: 'debug'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      DEFAULT_VECTORIZER_MODULE: 'none'
      CLUSTER_HOSTNAME: 'node1'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'
    networks:
      - cluster_network

  weaviate-node-2:    
    init: true
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.23.6
    ports:
      - 8081:8080
      - 6061:6060
      - 50052:50051
      - 7102:7102
      - 7103:7103
    restart: on-failure:0
    volumes:
      - ./data-node-2:/var/lib/weaviate
    environment:
      LOG_LEVEL: 'debug'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      DEFAULT_VECTORIZER_MODULE: 'none'
      CLUSTER_HOSTNAME: 'node2'
      CLUSTER_GOSSIP_BIND_PORT: '7102'
      CLUSTER_DATA_BIND_PORT: '7103'
      CLUSTER_JOIN: '10.2.0.4:7100'
    deploy:      
      placement:
        constraints: 
        - node.labels.node == node2
    networks:
      - cluster_network

service started using

docker stack deploy --compose-file docker-compose.yml stackdemo

service status

 docker stack services stackdemo                                                                                                           master-node: Sat Feb  3 09:17:43 2024

ID             NAME                        MODE         REPLICAS   IMAGE                              PORTS
nmpy53htqsn3   stackdemo_weaviate-node-1   replicated   1/1        semitechnologies/weaviate:1.23.6   *:6060->6060/tcp, *:7100-7101->7100-7101/tcp, *:8080->8080/tcp, *:50051->50051/tcp
061zi7aflznv   stackdemo_weaviate-node-2   replicated   1/1        semitechnologies/weaviate:1.23.6   *:6061->6060/tcp, *:7102-7103->7102-7103/tcp, *:8081->8080/tcp, *:50052->50051/tcp


logs on node 1

{"action":"startup","level":"debug","msg":"finished initializing modules","time":"2024-02-03T09:00:36Z"}
{"action":"graphql_rebuild","level":"debug","msg":"rebuilding the graphql schema","schema":{"Objects":{"classes":[]}},"time":"2024-02-03T09:00:36Z"}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2024-02-03T09:00:36Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2024-02-03T09:00:36Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.0.0.2:39936","time":"2024-02-03T09:00:41Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node2 (timeout reached)","time":"2024-02-03T09:00:43Z"}
{"level":"info","msg":" memberlist: Suspect node2 has failed, no acks received","time":"2024-02-03T09:00:44Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node2 (timeout reached)","time":"2024-02-03T09:00:44Z"}
{"level":"info","msg":" memberlist: Suspect node2 has failed, no acks received","time":"2024-02-03T09:00:46Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node2 (timeout reached)","time":"2024-02-03T09:00:47Z"}
{"level":"info","msg":" memberlist: Marking node2 as failed, suspect timeout reached (0 peer confirmations)","time":"2024-02-03T09:00:48Z"}
{"level":"info","msg":" memberlist: Suspect node2 has failed, no acks received","time":"2024-02-03T09:00:50Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.0.0.2:49374","time":"2024-02-03T09:01:17Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.0.0.2:41916","time":"2024-02-03T09:01:54Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node2 (timeout reached)","time":"2024-02-03T09:01:55Z"}
{"level":"info","msg":" memberlist: Suspect node2 has failed, no acks received","time":"2024-02-03T09:01:59Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node2 (timeout reached)","time":"2024-02-03T09:02:00Z"}
{"level":"info","msg":" memberlist: Marking node2 as failed, suspect timeout reached (0 peer confirmations)","time":"2024-02-03T09:02:03Z"}
{"level":"info","msg":" memberlist: Suspect node2 has failed, no acks received","time":"2024-02-03T09:02:05Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.0.0.2:42368","time":"2024-02-03T09:02:30Z"}
{"level":"debug","msg":" memberlist: Stream connection from=10.0.0.2:34052","time":"2024-02-03T09:03:08Z"}

logs on node 2

{"action":"startup","level":"debug","msg":"created startup context, nothing done so far","startup_time_left":"59m59.998706476s","time":"2024-02-03T09:01:17Z"}
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-02-03T09:01:17Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","msg":"config loaded","startup_time_left":"59m59.99841207s","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","msg":"configured OIDC and anonymous access client","startup_time_left":"59m59.99838947s","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","msg":"initialized schema","startup_time_left":"59m59.998361669s","time":"2024-02-03T09:01:17Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with:  10.2.0.4:7100","time":"2024-02-03T09:01:17Z"}
{"level":"warning","msg":" memberlist: Refuting a suspect message (from: node2)","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","msg":"startup routine complete","time":"2024-02-03T09:01:17Z"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","msg":"start registering modules","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","module":"text2vec-openai","msg":"enabled module","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","module":"text2vec-huggingface","msg":"enabled module","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","module":"text2vec-cohere","msg":"enabled module","time":"2024-02-03T09:01:17Z"}
{"action":"startup","level":"debug","msg":"completed registering modules","time":"2024-02-03T09:01:17Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node1 (timeout reached)","time":"2024-02-03T09:01:19Z"}
{"level":"info","msg":" memberlist: Suspect node1 has failed, no acks received","time":"2024-02-03T09:01:20Z"}
{"level":"debug","msg":" memberlist: Failed UDP ping: node1 (timeout reached)","time":"2024-02-03T09:01:22Z"}
{"level":"info","msg":" memberlist: Suspect node1 has failed, no acks received","time":"2024-02-03T09:01:24Z"}
{"level":"info","msg":" memberlist: Marking node1 as failed, suspect timeout reached (0 peer confirmations)","time":"2024-02-03T09:01:24Z"}
{"action":"startup","error":"could not load or initialize schema: sync schema with other nodes in the cluster: read schema: open transaction: broadcast open transaction: host \"172.25.0.3:7101\": send http request: Post \"http://172.25.0.3:7101/schema/transactions/\": dial tcp 172.25.0.3:7101: i/o timeout","level":"fatal","msg":"could not initialize schema manager","time":"2024-02-03T09:01:47Z"}

Hi @kamal !

Welcome to our community :hugs:

While I have played with some weavaite + grpc + traefik + let’s encrypt

I habe not yet used docker swarm.

on CLUSTER_JOIN, have you tried using something like weaviate-node-1?

I was able to run two nodes using docker compose here:

Let me know if this helps… or will give us a different error :slight_smile:

Thanks!