Multi nodes running unnormally

Dominic_poi · June 19, 2024, 7:45am

Description

Hi,
I’m new to weaviate and I am trying to deploy a multi-node setup using docker-compose for testing :

services:
  weaviate-node-11:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8080:8080
    - 50051:50051
    - 6060:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node1'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'
  weaviate-node-12:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8081:8080
    - 50052:50051
    - 6061:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node2'
      CLUSTER_GOSSIP_BIND_PORT: '7102'
      CLUSTER_DATA_BIND_PORT: '7103'
      CLUSTER_JOIN: 'weaviate-node-11:7100'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'

After starting, I found that the /v1/nodes endpoint seems to return results normally.

{
	"nodes": [{
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node1",
		"shards": null,
		"status": "HEALTHY",
		"version": "1.25.4"
	}, {
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node2",
		"shards": null,
		"status": "HEALTHY",
		"version": "1.25.4"
	}]
}

Next, I used the Python client to create a collection.

import weaviate
import weaviate.classes as wvc
import os


client = weaviate.connect_to_custom(
    http_host="localhost",
    http_port=8080,
    http_secure=False,
    grpc_host="localhost",
    grpc_port=50051,
    grpc_secure=False,
    # headers={
    #     "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    # }
)



try:
    questions = client.collections.create(
        name="Question",
        sharding_config=wvc.config.Configure.sharding(
            desired_count=3
        ),
        replication_config=wvc.config.Configure.replication(
            factor=2
        ),
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),    # Set the vectorizer to "text2vec-openai" to use the OpenAI API for vector-related operations
        generative_config=wvc.config.Configure.Generative.cohere(),             # Set the generative module to "generative-cohere" to use the Cohere API for RAG
        properties=[
            wvc.config.Property(
                name="question",
                data_type=wvc.config.DataType.TEXT,
            ),
            wvc.config.Property(
                name="answer",
                data_type=wvc.config.DataType.TEXT,
            ),
        ],
        # Configure the vector index
        vector_index_config=wvc.config.Configure.VectorIndex.hnsw(  # Or `flat` or `dynamic`
            distance_metric=wvc.config.VectorDistances.COSINE,
            quantizer=wvc.config.Configure.VectorIndex.Quantizer.bq(),
        ),
        # Configure the inverted index
        inverted_index_config=wvc.config.Configure.inverted_index(
            index_null_state=True,
            index_property_length=True,
            index_timestamps=True,
        ),
    )

finally:
    client.close()

After creation, I found that all the shards were concentrated on the node from which I made the call, and the other node did not have the corresponding collection.

http://localhost:8080/v1/schema/Question/shards shows that:

[{
	"name": "pklJTouifT37",
	"status": "READY",
	"vectorQueueSize": 0
}, {
	"name": "1gTDzdO9guT0",
	"status": "READY",
	"vectorQueueSize": 0
}, {
	"name": "IOgYO9o0RmDG",
	"status": "READY",
	"vectorQueueSize": 0
}]

while the other node http://localhost:8081/v1/schema/Question/shards returns:

{
	"error": [{
		"message": "cannot get shards status for a non-existing index for Question"
	}]
}

Then, I tried to import data:

import weaviate
import json


client = weaviate.Client(
    url="http://localhost:8080/",  # Replace with your Weaviate endpoint
    additional_headers={
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Or "X-Cohere-Api-Key" or "X-HuggingFace-Api-Key"
    }
)


# ===== import data =====
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

# Prepare a batch process
client.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:
    # Batch import all Questions
    for i, d in enumerate(data):
        # print(f"importing question: {i+1}")  # To see imports

        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }

        batch.add_data_object(properties, "Question")

And encountered the following error, it seems that weaviate cannot found the class on the other node.

2024-06-19 15:31:46 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:31:46Z","uuid":"7bc9bf37-0d63-40c7-9c63-70d10aea7683"}
2024-06-19 15:31:56 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:31:56Z","uuid":"b5565857-9e61-40a1-88b8-7b0bbb82c139"}
2024-06-19 15:32:06 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:06Z","uuid":"d4f7e703-2e07-47ad-8615-a4cc0da979ea"}
2024-06-19 15:32:16 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:16Z","uuid":"dd5e7784-129f-42dc-ba3a-2e2665812359"}
2024-06-19 15:32:26 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:26Z","uuid":"609a7b08-a4df-4664-ae5e-d5a0952c9e4c"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context canceled","op":"get","replica":"172.23.0.2:7103","shard":"1gTDzdO9guT0","time":"2024-06-19T07:32:35Z","uuid":"6f1b6157-1eb9-4a71-81dc-fe0ac4550910"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/1gTDzdO9guT0/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"1gTDzdO9guT0","time":"2024-06-19T07:32:35Z","uuid":"ba872283-31a9-46fa-bcae-83a24bfd750c"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/IOgYO9o0RmDG/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"IOgYO9o0RmDG","time":"2024-06-19T07:32:35Z","uuid":"0473a789-e36b-4b1e-b97f-d3a7379a54a8"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/pklJTouifT37/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:35Z","uuid":"a94ed56b-625d-4f92-b214-974bee6d4545"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/IOgYO9o0RmDG/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"IOgYO9o0RmDG","time":"2024-06-19T07:32:35Z","uuid":"95e0ce69-c5be-46fa-bdd0-30c2d68916c5"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 172.23.0.3:8080-\u003e172.23.0.1:57940: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"time":"2024-06-19T07:32:35Z"}
2024-06-19 15:32:47 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"exists","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:47Z","uuid":"7bc9bf37-0d63-40c7-9c63-70d10aea7683"}
2024-06-19 15:32:47 weaviate-weaviate-node-11-1  | {"action":"requests_total","api":"rest","class_name":"Question","error":"msg:repo.exists code:500 err:cannot achieve consistency level \"QUORUM\": read error","level":"error","msg":"unexpected error","query_type":"objects","time":"2024-06-19T07:32:47Z"}
2024-06-19 15:32:57 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:57Z","uuid":"7bc9bf37-0d63-40c7-9c63-70d10aea7683"}
2024-06-19 15:32:57 weaviate-weaviate-node-11-1  | {"action":"requests_total","api":"rest","class_name":"","error":"repo: object by id: search index question: cannot achieve consistency level \"QUORUM\": read error","level":"error","msg":"unexpected error","query_type":"objects","time":"2024-06-19T07:32:57Z"}

I noticed that /v1/cluster/statistics shows both nodes as leaders and synchronized is false:

{
	"statistics": [{
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "172.23.0.3:8300",
		"leaderId": "node1",
		"name": "node1",
		"open": true,
		"raft": {
			"appliedIndex": "7",
			"commitIndex": "7",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "7",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "172.23.0.3:8300",
				"id": "node1",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader",
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}, {
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "172.23.0.2:8300",
		"leaderId": "node2",
		"name": "node2",
		"open": true,
		"raft": {
			"appliedIndex": "2",
			"commitIndex": "2",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "2",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "172.23.0.2:8300",
				"id": "node2",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader", <-- both are leaders
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}],
	"synchronized": false <-- Why it cannot be synchronized?
}

What could be the reason that the collection shards are not synchronizing between nodes?

Server Setup Information

Weaviate Server Version: 1.25.4
Deployment Method: docker/binary
Multi Node? Number of Running Nodes: 2
Client Language and Version: python 3.8
Multitenancy?: No

Any additional Information

env : Mac ARM64

DudaNogueira · June 20, 2024, 6:56pm

hi @Dominic_poi !! Welcome to our community

Because you only have 2 nodes in your cluster, you will not be able to reach a consensus on who will be the leader

So I believe that both nodes will declare themselves leader

So first solution, is having least 3+ nodes.

Another solution is explicitly selecting who will be able to vote.

so adding:

RAFT_JOIN: node1

As an environment variable in both your nodes will bring the cluster to behave as expected.

Check here for similar configurations:

Let me know if this helps!

Thanks!

Dominic_poi · June 21, 2024, 2:01am

Thank you for your reply @DudaNogueira ,

In fact I have tried 3 nodes in my cluster, here’s the docker-compose.yml:

services:
  weaviate-node-11:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8080:8080
    - 50051:50051
    - 6060:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node1'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'
  weaviate-node-12:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8081:8080
    - 50052:50051
    - 6061:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node2'
      CLUSTER_GOSSIP_BIND_PORT: '7102'
      CLUSTER_DATA_BIND_PORT: '7103'
      CLUSTER_JOIN: 'weaviate-node-11:7100'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'
  weaviate-node-13:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8082:8080
    - 50053:50051
    - 6062:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node3'
      CLUSTER_GOSSIP_BIND_PORT: '7104'
      CLUSTER_DATA_BIND_PORT: '7105'
      CLUSTER_JOIN: 'weaviate-node-11:7100'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'

And I created a collection with 3 shards and 2 replicas, it’s expected 2 shards on each node:

import weaviate
import weaviate.classes as wvc
import os

weaviate_url="http://10.148.12.177:8080/"

client = weaviate.connect_to_custom(
    http_host="localhost",
    http_port=8080,
    http_secure=False,
    grpc_host="localhost",
    grpc_port=50051,
    grpc_secure=False,
)

client.is_ready()
try:
    questions = client.collections.create(
            "Question",
            vectorizer_config=wvc.config.Configure.Vectorizer.none(),
            sharding_config=wvc.config.Configure.sharding(
                desired_count=2
            ),
            replication_config=wvc.config.Configure.replication(
                factor=3
        ),
    )
finally:
    client.close()

but there’s still only 2 shards on the single node by /nodes endpoint:

{
	"nodes": [{
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node1",
		"shards": [{
			"class": "Question",
			"compressed": false,
			"loaded": true,
			"name": "jOszJZZURggS",
			"objectCount": 0,
			"vectorIndexingStatus": "READY",
			"vectorQueueLength": 0
		}, {
			"class": "Question",
			"compressed": false,
			"loaded": true,
			"name": "NOfUltlxacum",
			"objectCount": 0,
			"vectorIndexingStatus": "READY",
			"vectorQueueLength": 0
		}],
		"stats": {
			"objectCount": 0,
			"shardCount": 2
		},
		"status": "HEALTHY",
		"version": "1.25.4"
	}, {
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node2",
		"shards": null,
		"stats": {
			"objectCount": 0,
			"shardCount": 0
		},
		"status": "HEALTHY",
		"version": "1.25.4"
	}, {
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node3",
		"shards": null,
		"stats": {
			"objectCount": 0,
			"shardCount": 0
		},
		"status": "HEALTHY",
		"version": "1.25.4"
	}]
}

/cluster/statistics (synchronized is still false):

{
	"statistics": [{
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "192.168.32.2:8300",
		"leaderId": "node1",
		"name": "node1",
		"open": true,
		"raft": {
			"appliedIndex": "3",
			"commitIndex": "3",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "3",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "192.168.32.2:8300",
				"id": "node1",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader",
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}, {
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "192.168.32.4:8300",
		"leaderId": "node2",
		"name": "node2",
		"open": true,
		"raft": {
			"appliedIndex": "2",
			"commitIndex": "2",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "2",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "192.168.32.4:8300",
				"id": "node2",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader",
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}, {
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "192.168.32.3:8300",
		"leaderId": "node3",
		"name": "node3",
		"open": true,
		"raft": {
			"appliedIndex": "2",
			"commitIndex": "2",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "2",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "192.168.32.3:8300",
				"id": "node3",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader",
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}],
	"synchronized": false
}

It’s strange…

Dominic_poi · June 21, 2024, 2:07am

Then I add 2 env variable to each node…:

      RAFT_JOIN: node1
      RAFT_BOOTSTRAP_EXPECT: 1

It works!

{
	"statistics": [{
			"bootstrapped": true,
			"candidates": {},
			"dbLoaded": true,
			"isVoter": true,
			"leaderAddress": "192.168.48.4:8300",
			"leaderId": "node1",
			"name": "node1",
			"open": true,
			"raft": {
				"appliedIndex": "4",
				"commitIndex": "4",
				"fsmPending": "0",
				"lastContact": "0",
				"lastLogIndex": "4",
				"lastLogTerm": "2",
				"lastSnapshotIndex": "0",
				"lastSnapshotTerm": "0",
				"latestConfiguration": [{
						"address": "192.168.48.4:8300",
						"id": "node1",
						"suffrage": 0
					},
					{
						"address": "192.168.48.2:8300",
						"id": "node3",
						"suffrage": 1
					},
					{
						"address": "192.168.48.3:8300",
						"id": "node2",
						"suffrage": 1
					}
				],
				"latestConfigurationIndex": "0",
				"numPeers": "0",
				"protocolVersion": "3",
				"protocolVersionMax": "3",
				"protocolVersionMin": "0",
				"snapshotVersionMax": "1",
				"snapshotVersionMin": "0",
				"state": "Leader",
				"term": "2"
			},
			"ready": true,
			"status": "HEALTHY"
		},
		{
			"candidates": {},
			"dbLoaded": true,
			"leaderAddress": "192.168.48.4:8300",
			"leaderId": "node1",
			"name": "node2",
			"open": true,
			"raft": {
				"appliedIndex": "4",
				"commitIndex": "4",
				"fsmPending": "0",
				"lastContact": "62.741042ms",
				"lastLogIndex": "4",
				"lastLogTerm": "2",
				"lastSnapshotIndex": "0",
				"lastSnapshotTerm": "0",
				"latestConfiguration": [{
						"address": "192.168.48.4:8300",
						"id": "node1",
						"suffrage": 0
					},
					{
						"address": "192.168.48.2:8300",
						"id": "node3",
						"suffrage": 1
					},
					{
						"address": "192.168.48.3:8300",
						"id": "node2",
						"suffrage": 1
					}
				],
				"latestConfigurationIndex": "0",
				"numPeers": "0",
				"protocolVersion": "3",
				"protocolVersionMax": "3",
				"protocolVersionMin": "0",
				"snapshotVersionMax": "1",
				"snapshotVersionMin": "0",
				"state": "Follower",
				"term": "2"
			},
			"ready": true,
			"status": "HEALTHY"
		},
		{
			"candidates": {},
			"dbLoaded": true,
			"leaderAddress": "192.168.48.4:8300",
			"leaderId": "node1",
			"name": "node3",
			"open": true,
			"raft": {
				"appliedIndex": "4",
				"commitIndex": "4",
				"fsmPending": "0",
				"lastContact": "44.880084ms",
				"lastLogIndex": "4",
				"lastLogTerm": "2",
				"lastSnapshotIndex": "0",
				"lastSnapshotTerm": "0",
				"latestConfiguration": [{
						"address": "192.168.48.4:8300",
						"id": "node1",
						"suffrage": 0
					},
					{
						"address": "192.168.48.2:8300",
						"id": "node3",
						"suffrage": 1
					},
					{
						"address": "192.168.48.3:8300",
						"id": "node2",
						"suffrage": 1
					}
				],
				"latestConfigurationIndex": "0",
				"numPeers": "0",
				"protocolVersion": "3",
				"protocolVersionMax": "3",
				"protocolVersionMin": "0",
				"snapshotVersionMax": "1",
				"snapshotVersionMin": "0",
				"state": "Follower",
				"term": "2"
			},
			"ready": true,
			"status": "HEALTHY"
		}
	],
	"synchronized": true
}

{
	"nodes": [{
			"batchStats": {
				"queueLength": 0,
				"ratePerSecond": 0
			},
			"gitHash": "a61909a",
			"name": "node1",
			"shards": [{
					"class": "Question",
					"compressed": false,
					"loaded": true,
					"name": "YE9uSk0mbYUT",
					"objectCount": 0,
					"vectorIndexingStatus": "READY",
					"vectorQueueLength": 0
				},
				{
					"class": "Question",
					"compressed": false,
					"loaded": true,
					"name": "ji9QxFNEnzph",
					"objectCount": 0,
					"vectorIndexingStatus": "READY",
					"vectorQueueLength": 0
				}
			],
			"stats": {
				"objectCount": 0,
				"shardCount": 2
			},
			"status": "HEALTHY",
			"version": "1.25.4"
		},
		{
			"batchStats": {
				"queueLength": 0,
				"ratePerSecond": 0
			},
			"gitHash": "a61909a",
			"name": "node2",
			"shards": [{
					"class": "Question",
					"compressed": false,
					"loaded": true,
					"name": "YE9uSk0mbYUT",
					"objectCount": 0,
					"vectorIndexingStatus": "READY",
					"vectorQueueLength": 0
				},
				{
					"class": "Question",
					"compressed": false,
					"loaded": true,
					"name": "ji9QxFNEnzph",
					"objectCount": 0,
					"vectorIndexingStatus": "READY",
					"vectorQueueLength": 0
				}
			],
			"stats": {
				"objectCount": 0,
				"shardCount": 2
			},
			"status": "HEALTHY",
			"version": "1.25.4"
		},
		{
			"batchStats": {
				"queueLength": 0,
				"ratePerSecond": 0
			},
			"gitHash": "a61909a",
			"name": "node3",
			"shards": [{
					"class": "Question",
					"compressed": false,
					"loaded": true,
					"name": "ji9QxFNEnzph",
					"objectCount": 0,
					"vectorIndexingStatus": "READY",
					"vectorQueueLength": 0
				},
				{
					"class": "Question",
					"compressed": false,
					"loaded": true,
					"name": "YE9uSk0mbYUT",
					"objectCount": 0,
					"vectorIndexingStatus": "READY",
					"vectorQueueLength": 0
				}
			],
			"stats": {
				"objectCount": 0,
				"shardCount": 2
			},
			"status": "HEALTHY",
			"version": "1.25.4"
		}
	]
}

But is it necessary? What will happen if the only specific voter - node1 is down? Is there a negative impact on the HA of the cluster?

DudaNogueira · June 21, 2024, 2:32pm

Great!

You can add all nodes to RAFT_JOIN (even with only 2 nodes):

RAFT_JOIN: 'node1,node2,node3'

check here for how this part is configured from our helm:

github.com

weaviate/weaviate-helm/blob/fd59b85ac441e228021e0b7b6342cd94e2a577b7/weaviate/templates/_helpers.tpl#L194


      
              {{- $priorityClassName = .priorityClassName -}}
            {{- end -}}
          
            {{- if (not (empty $priorityClassName)) -}}
              {{- printf "priorityClassName: %s" $priorityClassName -}}
            {{- end -}}
          {{- end -}}
          
          
          {{/*
          Raft cluster configuration settings
          */}}
          {{- define "raft_configuration" -}}
            {{- $replicas := .Values.replicas | int -}}
            {{- $voters := .Values.env.RAFT_BOOTSTRAP_EXPECT | int -}}
            {{- $metada_only_voters := false -}}
            {{- if not (empty .Values.env.RAFT_METADATA_ONLY_VOTERS) -}}
              {{- $metada_only_voters = .Values.env.RAFT_METADATA_ONLY_VOTERS -}}
            {{- end -}} 
            {{- if empty .Values.env.RAFT_BOOTSTRAP_EXPECT -}}
              {{- if ge $replicas 10 -}}

Let me know if this helps!

FalconDinesh · December 3, 2024, 6:55am

Thanks for the detailed explanation. Helped a lot. Nandrigal pala

Topic		Replies	Views
Shard assignment to Nodes not happening Support	4	175	May 26, 2025
Weaviate Cluster Setup with Docker on Different Servers Failing General bug , integration , developer-experience , python	13	1029	July 4, 2025
Weaviate Self host pods are not stable in Production, local index not found and shard errors Support	2	133	May 26, 2025
Weaviate error "transferring leadership" on single node cluster Support integration , technical	12	508	December 21, 2024
Production weaviate 24.6 crashed Support technical	1	490	March 8, 2025

Multi nodes running unnormally

Description

Server Setup Information

Any additional Information

Related topics