Using helm chart to deploy 3x nodes to self hosted KS
Tried NodeJs weaviate-ts-client 1.4 and 2.0, same behaviour
While ingesting data and running NN queries, node randomly (sometimes 2h, sometimes 10 minutes between occurrences) disappears and starts again.
KS doesn’t restart the node - restart count is 0.
Node is unavailable to client for about 1min at which point (after retrying) it becomes available and ingestion/NN continues as normal
Frequency of restarts increases over time
Weaviate version: 1.22.3
Transformer: None (client supplies vectors)
Shards: 3, each with 8GB of which <3GB used
Replication: None
Number of elements in each shard: <1.5M
Number of clients: 16
Logs below.
Questions:
Is this normal? What’s going on?
Is any data lost during the ‘restart’?
Thanks for your help
Greg
- Logs of weaviate-0 (during weaviate-1 restart)
{"level":"debug","msg":" memberlist: Failed UDP ping: weaviate-1 (timeout reached)","time":"2023-11-28T17:07:37Z"}
{"level":"info","msg":" memberlist: Suspect weaviate-1 has failed, no acks received","time":"2023-11-28T17:07:37Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-0 172.19.246.32:7000","time":"2023-11-28T17:07:39Z"}
...
{"level":"debug","msg":" memberlist: Failed UDP ping: weaviate-1 (timeout reached)","time":"2023-11-28T17:07:40Z"}
{"level":"info","msg":" memberlist: Marking weaviate-1 as failed, suspect timeout reached (0 peer confirmations)","time":"2023-11-28T17:07:40Z"}
{"level":"info","msg":" memberlist: Suspect weaviate-1 has failed, no acks received","time":"2023-11-28T17:07:40Z"}
{"level":"debug","msg":" memberlist: Stream connection from=172.19.246.32:49008","time":"2023-11-28T17:07:50Z"}
{"level":"debug","msg":" memberlist: Stream connection from=172.19.246.239:34340","time":"2023-11-28T17:08:08Z"}
{"level":"error","msg":" memberlist: Conflicting address for weaviate-1. Mine: 172.19.246.140:7000 Theirs: 172.19.246.239:7000 Old state: 2","time":"2023-11-28T17:08:08Z"}
{"level":"error","msg":" memberlist: Conflicting address for weaviate-1. Mine: 172.19.246.140:7000 Theirs: 172.19.246.239:7000 Old state: 2","time":"2023-11-28T17:08:08Z"}
{"level":"error","msg":" memberlist: Conflicting address for weaviate-1. Mine: 172.19.246.140:7000 Theirs: 172.19.246.239:7000 Old state: 2","time":"2023-11-28T17:08:08Z"}
{"level":"error","msg":" memberlist: Conflicting address for weaviate-1. Mine: 172.19.246.140:7000 Theirs: 172.19.246.239:7000 Old state: 2","time":"2023-11-28T17:08:09Z"}
{"level":"error","msg":" memberlist: Conflicting address for weaviate-1. Mine: 172.19.246.140:7000 Theirs: 172.19.246.239:7000 Old state: 2","time":"2023-11-28T17:08:09Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: weaviate-0 172.19.246.32:7000","time":"2023-11-28T17:08:09Z"}
- Logs of weaviate-1 (after restart)
{"action":"startup","level":"debug","msg":"created startup context, nothing done so far","startup_time_left":"59m59.998481609s","time":"2023-11-28T17:08:08Z"}
{"action":"config_load","config_file_path":"/weaviate-config/conf.yaml","level":"info","msg":"Usage of the weaviate.conf.json file is deprecated and will be removed in the future. Please use environment variables.","time":"2023-11-28T17:08:08Z"}
{"deprecation":{"apiType":"Configuration","id":"config-files","locations":["--config-file=\"\""],"mitigation":"Configure Weaviate using environment variables.","msg":"use of deprecated command line argument --config-file","sinceTime":"2020-09-08T09:46:00.000Z","
sinceVersion":"0.22.16","status":"deprecated"},"level":"warning","msg":"use of deprecated command line argument --config-file","time":"2023-11-28T17:08:08Z"}
{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-11-28T17:08:08Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-11-28T17:08:08Z"}
{"action":"startup","level":"debug","msg":"config loaded","startup_time_left":"59m59.997738065s","time":"2023-11-28T17:08:08Z"}
{"action":"startup","level":"debug","msg":"configured OIDC and anonymous access client","startup_time_left":"59m59.997723864s","time":"2023-11-28T17:08:08Z"}
{"action":"startup","level":"debug","msg":"initialized schema","startup_time_left":"59m59.997716464s","time":"2023-11-28T17:08:08Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: 172.19.246.32:7000","time":"2023-11-28T17:08:08Z"}
{"level":"warning","msg":" memberlist: Refuting a suspect message (from: weaviate-1)","time":"2023-11-28T17:08:08Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: 172.19.246.239:7000","time":"2023-11-28T17:08:08Z"}
{"level":"debug","msg":" memberlist: Stream connection from=172.19.246.239:44148","time":"2023-11-28T17:08:08Z"}
{"level":"debug","msg":" memberlist: Initiating push/pull sync with: 172.19.246.108:7000","time":"2023-11-28T17:08:08Z"}
{"action":"startup","level":"debug","msg":"startup routine complete","time":"2023-11-28T17:08:08Z"}
{"action":"startup","level":"debug","msg":"start registering modules","time":"2023-11-28T17:08:08Z"}
{"action":"startup","level":"debug","msg":"completed registering modules","time":"2023-11-28T17:08:08Z"}
{"action":"cluster_api_startup","level":"debug","msg":"serving cluster api on port 7001","port":7001,"time":"2023-11-28T17:08:08Z"}
{"action":"hnsw_grow_index","level":"debug","msg":"index grown from 1000 to 3000, took 48.103µs\n","new_size":3000,"previous_size":1000,"time":"2023-11-28T17:08:08Z","took":48103}
{"action":"hnsw_grow_index","level":"debug","msg":"index grown from 3000 to 5000, took 19.401µs\n","new_size":5000,"previous_size":3000,"time":"2023-11-28T17:08:08Z","took":19401}
...
{"action":"hnsw_grow_index","level":"debug","msg":"index grown from 975616 to 1219520, took 9.448062ms\n","new_size":1219520,"previous_size":975616,"time":"2023-11-28T17:08:11Z","took":9448062}
{"action":"hnsw_grow_index","level":"debug","msg":"index grown from 1219520 to 1524400, took 14.119639ms\n","new_size":1524400,"previous_size":1219520,"time":"2023-11-28T17:08:13Z","took":14119639}
{"action":"lsm_init_disk_segment_build_bloom_filter_secondary","class":"MyClass","index":"myclass","level":"debug","msg":"building bloom filter took 8.865927ms\n","path":"/var/lib/weaviate/myclass_dRVK6rneWdGH_lsm/objects/segment-1701191194984974204.db","secondary_index_position":0,"shard":"dRVK6rneWdGH","time":"2023-11-28T17:08:15Z","took":8865927}
...
{"action":"lsm_init_disk_segment_build_bloom_filter_primary","class":"MyClass","index":"myclass","level":"debug","msg":"building bloom filter took 24.338447ms\n","path":"/var/lib/weaviate/myclass_dRVK6rneWdGH_lsm/property_commonclusterid/segment-1701191195173181719.db","shard":"dRVK6rneWdGH","time":"2023-11-28T17:08:16Z","took":24338447}
{"action":"startup","level":"debug","msg":"shard=dRVK6rneWdGH is ready","time":"2023-11-28T17:08:16Z"}
{"action":"startup","level":"debug","msg":"shard=dRVK6rneWdGH is ready","time":"2023-11-28T17:08:16Z"}