Description
Hi,
I’m new to weaviate and I am trying to deploy a multi-node setup using docker-compose for testing :
services:
  weaviate-node-11:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8080:8080
    - 50051:50051
    - 6060:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node1'
      CLUSTER_GOSSIP_BIND_PORT: '7100'
      CLUSTER_DATA_BIND_PORT: '7101'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'
  weaviate-node-12:
    init: true
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8081:8080
    - 50052:50051
    - 6061:6060
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,text2vec-huggingface'
      CLUSTER_HOSTNAME: 'node2'
      CLUSTER_GOSSIP_BIND_PORT: '7102'
      CLUSTER_DATA_BIND_PORT: '7103'
      CLUSTER_JOIN: 'weaviate-node-11:7100'
      HTTP_PROXY: ''
      http_proxy: ''
      LOG_LEVEL: 'debug'
After starting, I found that the /v1/nodes endpoint seems to return results normally.
{
	"nodes": [{
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node1",
		"shards": null,
		"status": "HEALTHY",
		"version": "1.25.4"
	}, {
		"batchStats": {
			"queueLength": 0,
			"ratePerSecond": 0
		},
		"gitHash": "a61909a",
		"name": "node2",
		"shards": null,
		"status": "HEALTHY",
		"version": "1.25.4"
	}]
}
Next, I used the Python client to create a collection.
import weaviate
import weaviate.classes as wvc
import os
client = weaviate.connect_to_custom(
    http_host="localhost",
    http_port=8080,
    http_secure=False,
    grpc_host="localhost",
    grpc_port=50051,
    grpc_secure=False,
    # headers={
    #     "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    # }
)
try:
    questions = client.collections.create(
        name="Question",
        sharding_config=wvc.config.Configure.sharding(
            desired_count=3
        ),
        replication_config=wvc.config.Configure.replication(
            factor=2
        ),
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),    # Set the vectorizer to "text2vec-openai" to use the OpenAI API for vector-related operations
        generative_config=wvc.config.Configure.Generative.cohere(),             # Set the generative module to "generative-cohere" to use the Cohere API for RAG
        properties=[
            wvc.config.Property(
                name="question",
                data_type=wvc.config.DataType.TEXT,
            ),
            wvc.config.Property(
                name="answer",
                data_type=wvc.config.DataType.TEXT,
            ),
        ],
        # Configure the vector index
        vector_index_config=wvc.config.Configure.VectorIndex.hnsw(  # Or `flat` or `dynamic`
            distance_metric=wvc.config.VectorDistances.COSINE,
            quantizer=wvc.config.Configure.VectorIndex.Quantizer.bq(),
        ),
        # Configure the inverted index
        inverted_index_config=wvc.config.Configure.inverted_index(
            index_null_state=True,
            index_property_length=True,
            index_timestamps=True,
        ),
    )
finally:
    client.close()
After creation, I found that all the shards were concentrated on the node from which I made the call, and the other node did not have the corresponding collection.
http://localhost:8080/v1/schema/Question/shards shows that:
[{
	"name": "pklJTouifT37",
	"status": "READY",
	"vectorQueueSize": 0
}, {
	"name": "1gTDzdO9guT0",
	"status": "READY",
	"vectorQueueSize": 0
}, {
	"name": "IOgYO9o0RmDG",
	"status": "READY",
	"vectorQueueSize": 0
}]
while the other node http://localhost:8081/v1/schema/Question/shards returns:
{
	"error": [{
		"message": "cannot get shards status for a non-existing index for Question"
	}]
}
Then, I tried to import data:
import weaviate
import json
client = weaviate.Client(
    url="http://localhost:8080/",  # Replace with your Weaviate endpoint
    additional_headers={
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Or "X-Cohere-Api-Key" or "X-HuggingFace-Api-Key"
    }
)
# ===== import data =====
# Load data
import requests
url = 'https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)
# Prepare a batch process
client.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:
    # Batch import all Questions
    for i, d in enumerate(data):
        # print(f"importing question: {i+1}")  # To see imports
        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }
        batch.add_data_object(properties, "Question")
And encountered the following error, it seems that weaviate cannot found the class on the other node.
2024-06-19 15:31:46 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:31:46Z","uuid":"7bc9bf37-0d63-40c7-9c63-70d10aea7683"}
2024-06-19 15:31:56 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:31:56Z","uuid":"b5565857-9e61-40a1-88b8-7b0bbb82c139"}
2024-06-19 15:32:06 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:06Z","uuid":"d4f7e703-2e07-47ad-8615-a4cc0da979ea"}
2024-06-19 15:32:16 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:16Z","uuid":"dd5e7784-129f-42dc-ba3a-2e2665812359"}
2024-06-19 15:32:26 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:26Z","uuid":"609a7b08-a4df-4664-ae5e-d5a0952c9e4c"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context canceled","op":"get","replica":"172.23.0.2:7103","shard":"1gTDzdO9guT0","time":"2024-06-19T07:32:35Z","uuid":"6f1b6157-1eb9-4a71-81dc-fe0ac4550910"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/1gTDzdO9guT0/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"1gTDzdO9guT0","time":"2024-06-19T07:32:35Z","uuid":"ba872283-31a9-46fa-bcae-83a24bfd750c"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/IOgYO9o0RmDG/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"IOgYO9o0RmDG","time":"2024-06-19T07:32:35Z","uuid":"0473a789-e36b-4b1e-b97f-d3a7379a54a8"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/pklJTouifT37/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:35Z","uuid":"a94ed56b-625d-4f92-b214-974bee6d4545"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"connect: Get \"http://172.23.0.2:7103/replicas/indices/Question/shards/IOgYO9o0RmDG/objects/_digest?schema_version=0\": context canceled","op":"get","replica":"172.23.0.2:7103","shard":"IOgYO9o0RmDG","time":"2024-06-19T07:32:35Z","uuid":"95e0ce69-c5be-46fa-bdd0-30c2d68916c5"}
2024-06-19 15:32:35 weaviate-weaviate-node-11-1  | {"description":"An I/O timeout occurs when the request takes longer than the specified server-side timeout.","error":"write tcp 172.23.0.3:8080-\u003e172.23.0.1:57940: i/o timeout","hint":"Either try increasing the server-side timeout using e.g. '--write-timeout=600s' as a command line flag when starting Weaviate, or try sending a computationally cheaper request, for example by reducing a batch size, reducing a limit, using less complex filters, etc. Note that this error is only thrown if client-side and server-side timeouts are not in sync, more precisely if the client-side timeout is longer than the server side timeout.","level":"error","method":"POST","msg":"i/o timeout","path":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/v1/batch/objects","RawPath":"","OmitHost":false,"ForceQuery":false,"RawQuery":"","Fragment":"","RawFragment":""},"time":"2024-06-19T07:32:35Z"}
2024-06-19 15:32:47 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"exists","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:47Z","uuid":"7bc9bf37-0d63-40c7-9c63-70d10aea7683"}
2024-06-19 15:32:47 weaviate-weaviate-node-11-1  | {"action":"requests_total","api":"rest","class_name":"Question","error":"msg:repo.exists code:500 err:cannot achieve consistency level \"QUORUM\": read error","level":"error","msg":"unexpected error","query_type":"objects","time":"2024-06-19T07:32:47Z"}
2024-06-19 15:32:57 weaviate-weaviate-node-11-1  | {"class":"Question","level":"error","msg":"status code: 500, error: digest objects: local index \"Question\" not found\n: context deadline exceeded","op":"get","replica":"172.23.0.2:7103","shard":"pklJTouifT37","time":"2024-06-19T07:32:57Z","uuid":"7bc9bf37-0d63-40c7-9c63-70d10aea7683"}
2024-06-19 15:32:57 weaviate-weaviate-node-11-1  | {"action":"requests_total","api":"rest","class_name":"","error":"repo: object by id: search index question: cannot achieve consistency level \"QUORUM\": read error","level":"error","msg":"unexpected error","query_type":"objects","time":"2024-06-19T07:32:57Z"}
I noticed that /v1/cluster/statistics shows both nodes as leaders and synchronized is false:
{
	"statistics": [{
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "172.23.0.3:8300",
		"leaderId": "node1",
		"name": "node1",
		"open": true,
		"raft": {
			"appliedIndex": "7",
			"commitIndex": "7",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "7",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "172.23.0.3:8300",
				"id": "node1",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader",
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}, {
		"bootstrapped": true,
		"candidates": {},
		"dbLoaded": true,
		"isVoter": true,
		"leaderAddress": "172.23.0.2:8300",
		"leaderId": "node2",
		"name": "node2",
		"open": true,
		"raft": {
			"appliedIndex": "2",
			"commitIndex": "2",
			"fsmPending": "0",
			"lastContact": "0",
			"lastLogIndex": "2",
			"lastLogTerm": "2",
			"lastSnapshotIndex": "0",
			"lastSnapshotTerm": "0",
			"latestConfiguration": [{
				"address": "172.23.0.2:8300",
				"id": "node2",
				"suffrage": 0
			}],
			"latestConfigurationIndex": "0",
			"numPeers": "0",
			"protocolVersion": "3",
			"protocolVersionMax": "3",
			"protocolVersionMin": "0",
			"snapshotVersionMax": "1",
			"snapshotVersionMin": "0",
			"state": "Leader", <-- both are leaders
			"term": "2"
		},
		"ready": true,
		"status": "HEALTHY"
	}],
	"synchronized": false <-- Why it cannot be synchronized?
}
What could be the reason that the collection shards are not synchronizing between nodes?
Server Setup Information
- Weaviate Server Version: 1.25.4
- Deployment Method: docker/binary
- Multi Node? Number of Running Nodes: 2
- Client Language and Version: python 3.8
- Multitenancy?: No
Any additional Information
env : Mac ARM64
 
    
  
  
        
    
    
  
  
    
    


