Weaviate Self host pods are not stable in Production, local index not found and shard errors

Hello Weaviate,

I have a Weaviate Self host cluster with 7 nodes inside it. Attached the Docker compose file here.

Docker-compose file

version: ‘3.8’

services:
weaviate-node-1:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8080:8080
- 6050:6060
- 50051:50051
restart: on-failure:0
volumes:
- weaviate-node-1-data:/var/lib/weaviate
environment:
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
LOG_LEVEL: ‘info’
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node1’
CLUSTER_GOSSIP_BIND_PORT: ‘7100’
CLUSTER_DATA_BIND_PORT: ‘7101’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

weaviate-node-2:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8081:8080
- 6051:6060
- 50052:50051
restart: on-failure:0
volumes:
- weaviate-node-2-data:/var/lib/weaviate
environment:
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
LOG_LEVEL: ‘info’
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node2’
CLUSTER_GOSSIP_BIND_PORT: ‘7102’
CLUSTER_DATA_BIND_PORT: ‘7103’
CLUSTER_JOIN: ‘weaviate-node-1:7100’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

weaviate-node-3:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8082:8080
- 6052:6060
- 50053:50051
restart: on-failure:0
volumes:
- weaviate-node-3-data:/var/lib/weaviate
environment:
LOG_LEVEL: ‘info’
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node3’
CLUSTER_GOSSIP_BIND_PORT: ‘7104’
CLUSTER_DATA_BIND_PORT: ‘7105’
CLUSTER_JOIN: ‘weaviate-node-1:7100’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

weaviate-node-4:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8083:8080
- 6053:6060
- 50054:50051
restart: on-failure:0
volumes:
- weaviate-node-4-data:/var/lib/weaviate
environment:
LOG_LEVEL: ‘info’
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node4’
CLUSTER_GOSSIP_BIND_PORT: ‘7106’
CLUSTER_DATA_BIND_PORT: ‘7107’
CLUSTER_JOIN: ‘weaviate-node-1:7100’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

weaviate-node-5:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8084:8080
- 6054:6060
- 50055:50051
restart: on-failure:0
volumes:
- weaviate-node-5-data:/var/lib/weaviate
environment:
LOG_LEVEL: ‘info’
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node5’
CLUSTER_GOSSIP_BIND_PORT: ‘7108’
CLUSTER_DATA_BIND_PORT: ‘7109’
CLUSTER_JOIN: ‘weaviate-node-1:7100’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

weaviate-node-6:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8085:8080
- 6055:6060
- 50056:50051
restart: on-failure:0
volumes:
- weaviate-node-6-data:/var/lib/weaviate
environment:
LOG_LEVEL: ‘info’
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node6’
CLUSTER_GOSSIP_BIND_PORT: ‘7110’
CLUSTER_DATA_BIND_PORT: ‘7111’
CLUSTER_JOIN: ‘weaviate-node-1:7100’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

weaviate-node-7:
init: true
command: [–host, 0.0.0.0, --port, ‘8080’, --scheme, http]
image: cr.weaviate.io/semitechnologies/weaviate:1.30.3
ports:
- 8086:8080
- 6056:6060
- 50057:50051
restart: on-failure:0
volumes:
- weaviate-node-7-data:/var/lib/weaviate
environment:
LOG_LEVEL: ‘info’
ENABLE_BATCHING: “true”
BATCH_MAX_SIZE: “100”
BATCH_MAX_DELAY: “50ms”
ASYNC_INDEXING: ‘true’
BATCH_ENABLED: ‘true’
BATCH_QUEUE_TIMEOUT_REPLACE_EXISTING_MS: 2000
BATCH_QUEUE_TIMEOUT_DELETE_MS: 2000
BATCH_QUEUE_TIMEOUT_OBJECT_MS: 2000
BATCH_TOKENS_PER_SECOND: 40000
BATCH_MAX_PARALLEL_BATCHES: 32
ASYNC_INDEXING_QUEUE_SIZE: 100000
QUERY_DEFAULTS_LIMIT: 1000
QUERY_MAXIMUM_RESULTS: 100000
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’
PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’
ENABLE_API_BASED_MODULES: ‘false’
ENABLE_MODULES: ‘’
DISABLE_MODULES: ‘text2vec-contextionary,text2vec-transformers,text2vec-openai,text2vec-huggingface’
CLUSTER_HOSTNAME: ‘node7’
CLUSTER_GOSSIP_BIND_PORT: ‘7112’
CLUSTER_DATA_BIND_PORT: ‘7113’
CLUSTER_JOIN: ‘weaviate-node-1:7100’
RAFT_JOIN: ‘node1,node2,node3,node4,node5,node6,node7’
RAFT_BOOTSTRAP_EXPECT: 7
deploy:
resources:
limits:
cpus: ‘3.0’
memory: ‘6G’
networks:
- weaviate-net

volumes:
weaviate-node-1-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod1
o: bind
weaviate-node-2-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod2
o: bind
weaviate-node-3-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod3
o: bind
weaviate-node-4-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod4
o: bind
weaviate-node-5-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod5
o: bind
weaviate-node-6-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod6
o: bind
weaviate-node-7-data:
driver: local
driver_opts:
type: none
device: /mnt/weaviate/pod7
o: bind

networks:
weaviate-net:
driver: bridge

Errors we face are:-

  1. curl -X POST http://localhost:8080/v1/graphql -H “Content-Type: application/json” -d “{"query":"{ Aggregate { Sitemanager_c_kafka_crm_deal { meta { count } } } }"}”

{“data”:{“Aggregate”:{“Sitemanager_c_kafka_crm_deal”:null}},“errors”:[{“locations”:[{“column”:15,“line”:1}],“message”:“shard j42oexiqtoCg: status code: 422, error: local index "Sitemanager_c_kafka_crm_deal" not found\n”,“path”:[“Aggregate”,“Sitemanager_c_kafka_crm_deal”]}]}

  1. Just after doing a “docker-compose down” and “docker-compose up” , the cluster starts working. These errors are quite frequent and makes us do the docker-compose down too frequently.
  2. Attached a snapshot of another error pattern we frequently

Hi! For multi node deployments, we suggest kubernetes.

I will close this thread in favor of: Shard assignment to Nodes not happening

THanks!