Error resolving node name to host

I have a weaviate server running locally in a docker container. I previously started this server and added around 400,000 elements to it. However, after stopping the docker container and restarting it, I cannot add or query objects within the weaviate database. When I run the container, the weaviate and t2v-transformer services start up fine as shown in this output:

> docker-compose up
[+] Running 5/5
 ✔ weaviate 4 layers [⣿⣿⣿⣿]      0B/0B      Pulled                                                                 1.8s
   ✔ 7264a8db6415 Already exists                                                                                   0.0s
   ✔ f493950e9c5a Pull complete                                                                                    0.8s
   ✔ f01cb8c5845d Pull complete                                                                                    0.9s
   ✔ 551af5a2b9f8 Pull complete                                                                                    0.9s
[+] Running 3/3
 ✔ Network vid_creation_tool_default               Created                                                         0.0s
 ✔ Container vid_creation_tool-t2v-transformers-1  Created                                                         0.1s
 ✔ Container vid_creation_tool-weaviate-1          Created                                                         0.1s
Attaching to vid_creation_tool-t2v-transformers-1, vid_creation_tool-weaviate-1
vid_creation_tool-weaviate-1          | {"action":"startup","default_vectorizer_module":"text2vec-transformers","level":"info","msg":"the default vectorizer modules is set to \"text2vec-transformers\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-08-23T23:26:09Z"}
vid_creation_tool-weaviate-1          | {"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-08-23T23:26:09Z"}
vid_creation_tool-weaviate-1          | {"action":"transformer_remote_wait_for_startup","error":"send check ready request: Get \"http://t2v-transformers:8080/.well-known/ready\": dial tcp 172.23.0.2:8080: connect: connection refused","level":"warning","msg":"transformer remote inference service not ready","time":"2023-08-23T23:26:10Z"}
vid_creation_tool-t2v-transformers-1  | INFO:     Started server process [7]
vid_creation_tool-t2v-transformers-1  | INFO:     Waiting for application startup.
vid_creation_tool-t2v-transformers-1  | INFO:     CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
vid_creation_tool-t2v-transformers-1  | INFO:     CUDA_CORE set to cuda:0
vid_creation_tool-weaviate-1          | {"action":"transformer_remote_wait_for_startup","error":"send check ready request: Get \"http://t2v-transformers:8080/.well-known/ready\": dial tcp 172.23.0.2:8080: connect: connection refused","level":"warning","msg":"transformer remote inference service not ready","time":"2023-08-23T23:26:11Z"}
vid_creation_tool-t2v-transformers-1  | INFO:     Application startup complete.
vid_creation_tool-t2v-transformers-1  | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
vid_creation_tool-t2v-transformers-1  | INFO:     172.23.0.3:40492 - "GET /.well-known/ready HTTP/1.1" 204 No Content
vid_creation_tool-weaviate-1          | {"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50051","time":"2023-08-23T23:26:12Z"}
vid_creation_tool-weaviate-1          | {"action":"restapi_management","level":"info","msg":"Serving weaviate at http://[::]:8080","time":"2023-08-23T23:26:12Z"}
vid_creation_tool-weaviate-1          | {"action":"requests_total","api":"rest","class_name":"","error":"list objects: search index word: remote shard object search HRafiqzDyTSd: resolve node name \"391d06c00e39\" to host","level":"error","msg":"unexpected error","query_type":"objects","time":"2023-08-23T23:26:17Z"}

I can connect to the weaviate instance using the Python weaviate client and client.is_ready() returns True. But when I try to access all the words in the database with a query such as client.query.aggregate("Word").with_meta_count().do(), the last error shown above comes up where it fails in resolving a node name to host:

vid_creation_tool-weaviate-1          | {"action":"requests_total","api":"rest","class_name":"","error":"list objects: search index word: remote shard object search HRafiqzDyTSd: resolve node name \"391d06c00e39\" to host","level":"error","msg":"unexpected error","query_type":"objects","time":"2023-08-23T23:26:17Z"}

This also happens when I try to add any object or query any object with a uuid. I am running the docker container locally using docker-compose up with the following docker-compose.yaml file:

---
version: '3.4'
services:
  weaviate:
    volumes:
      - ./data:/var/lib/weaviate
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.21.1
    ports:
    - 8080:8080
    restart: on-failure:0
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers'
      CLUSTER_HOSTNAME: 'node1'
  t2v-transformers:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: 'all'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'
...

I tried restarting the container, deleting the container and recreating it, and also using a new docker-compose.yaml file. I found that the issue only occurs if I mount my ./data directory as a volume at /var/lib/weaviate. This was not an issue before. I also tried searching for the node name “391d06c00e39” but could not find any references to it in my container or logs. I was unable to find any similar issues elsewhere as well. Has anyone encountered a similar kind of issue? Is it possible my persistent data has been corrupted somehow?

Hi, @Shreyas_Agarwal !

Welcome to our community :hugs:

I assume you did some migration, right?

Check out this guide, it looks to have the fix for this error message:

Edit TL;DR

If you see the error message "shard Knuw6a360eCY: resolve node name \"5b6030dbf9ea\" to host" , you can make Weaviate usable again, by setting 5b6030dbf9ea as the host name: CLUSTER_HOSTNAME=5b6030dbf9ea .

Let me know if that helps!

Thanks!

Thanks @DudaNogueira, that resolved my issue. I never did any migration. I was always using a Weaviate version after v1.8.0. It’s a bit strange since I had the CLUSTER_HOSTNAME set to node1 before the first time I ran the container. Perhaps something changed while I was testing the settings. It’s working now that I’ve set the CLUSTER_HOSTNAME to 391d06c00e39.

1 Like