Unable to connect to weaviate running behind reverse proxy

Description

I have weaviate installed on my local docker and proxied 8080 to an url and 50051 to another one. using the following code, I’m connecting to them

client = weaviate.connect_to_custom(
            http_host='host-a',
            grpc_host='host-b',
            http_port=443,
            http_secure=True,
            grpc_port=443,
            grpc_secure=True,
            skip_init_checks=True,
            additional_config=AdditionalConfig(
                connection=ConnectionConfig(
                                session_pool_connections=30,
                                session_pool_maxsize=200,
                                session_pool_max_retries=3,
                            ),
                    # timeout=(60, 180),
                timeout=Timeout(init=60, query=60, insert=180)  # Values in seconds
            )
        )

getting the following errors when i do self.vectorstore.add_documents(
documents=document,
)

2024-Aug-01 11:19 AM - langchain_weaviate.vectorstores - ERROR - Failed to add object: None
Reason: WeaviateBatchError(‘Query call with protocol GRPC batch failed with message <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = “Received http2 header with status: 403”\n\tdebug_error_string = “UNKNOWN:Error received from peer {created_time:“2024-08-01T11:19:42.004222-05:00”, grpc_status:7, grpc_message:“Received http2 header with status: 403”}”\n>.’)

hi @gdrajhasekarun !!

Welcome to our community :hugs:

This indicates an error on exposing your Weaviate.

Can you share more info on how you have exposed?

Also, check this forum thread on this topic:

This is my docker compose file.

version: '3.4'
networks:
  frontend:
    external: true
  backend:
    external: true
    
services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.9
    # ports:
    # - 8080:8080
    # - 50051:50051
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:

      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-cohere,text2vec-huggingface,text2vec-palm,text2vec-openai,generative-openai,generative-cohere,generative-palm,ref2vec-centroid,reranker-cohere,qna-openai'
      CLUSTER_HOSTNAME: 'node1'
    networks:
      - frontend
      - backend
    labels:
      - 'traefik.enable=true'
      - 'traefik.docker.network=frontend'
      - 'traefik.http.routers.weavite.entrypoints=websecure'
      - 'traefik.http.routers.weavite.tls=true'
      - 'traefik.http.routers.weavite.tls.certresolver=production'
      - 'traefik.http.routers.weavite.service=weavite_1'
      - 'traefik.http.services.weavite_1.loadbalancer.server.port=8080'
      - 'traefik.http.routers.weavite.rule=Host(`weavite.<domain>`)'

      - 'traefik.http.routers.grc_weavite.entrypoints=websecure'
      - 'traefik.http.routers.grc_weavite.tls=true'
      - 'traefik.http.routers.grc_weavite.tls.certresolver=production'
      - 'traefik.http.routers.grc_weavite.service=grc_weavite_1'
      - 'traefik.http.services.grc_weavite_1.loadbalancer.server.port=50051'
      - 'traefik.http.routers.grc_weavite.rule=Host(`grc-weavite.<domain>`)'
volumes:
  weaviate_data:

getting the following error
2024-Aug-01 11:19 AM - langchain_weaviate.vectorstores - ERROR - Failed to add object: None
Reason: WeaviateBatchError(‘Query call with protocol GRPC batch failed with message <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = “Received http2 header with status: 403”\n\tdebug_error_string = “UNKNOWN:Error received from peer {created_time:“2024-08-01T11:19:42.004222-05:00”, grpc_status:7, grpc_message:“Received http2 header with status: 403”}”\n>.’)

Check this gist:

Those are the traefik labels I was able to make correctly exposing:

    labels:
      - "traefik.enable=true"
      # http
      - "traefik.http.services.weaviate_http_service.loadbalancer.server.port=8080"
      - "traefik.http.routers.weaviate_http_router.rule=Host(`weaviate.yourdomain.com`)"
      - "traefik.http.routers.weaviate_http_router.entrypoints=websecure"
      - "traefik.http.routers.weaviate_http_router.service=weaviate_http_service"
      - "traefik.http.routers.weaviate_http_router.tls.certresolver=myresolver"
      # # grpc
      - "traefik.http.services.weaviate_grpc_service.loadbalancer.server.scheme=h2c"
      - "traefik.http.services.weaviate_grpc_service.loadbalancer.server.port=50051"
      - "traefik.http.routers.weaviate_grpc_router.rule=Host(`grpc.weaviate.yourdomain.com`)"
      - "traefik.http.routers.weaviate_grpc_router.entrypoints=grpc"
      - "traefik.http.routers.weaviate_grpc_router.service=weaviate_grpc_service"
      - "traefik.http.routers.weaviate_grpc_router.tls.certresolver=myresolver"

Thank you.

Still my issues is not resolved. Is it possible to get me the Traefik config for grpc.

Sure. It is on that thread I pasted above.

here the gist:

on that same thread there are some ways to test the grpc endpoint.

Note that this should work with any grpc expose method in a reverse proxy.

Thanks!

I followed the the steps mentioned in the thread.

  1. Opened GRPC in Cloudflare (No gateway config)
  2. Followed the steps for Traefik and Weaviate.

I was not able to connect to Grpc using the command

grpcurl -d ‘{“service”: “Weaviate”}’ -proto health.proto grpc.weaviate.mydomain.com:50051 grpc.health.v1.Health/Check

Oh, that indicates your GRPC service may not be properly exposed.

What is the error message?

Failed to dial target host “:50051”: context deadline exceeded

Even I did the following. Exposed weaviate with 8080 and 50051 as part of docker instances. now localhost:8080 is responding me with the following response
Command: curl http://localhost:8080/v1/nodes
Response:
{“nodes”:[{“batchStats”:{“queueLength”:0,“ratePerSecond”:0},“gitHash”:“6aeae65”,“name”:“node1”,“shards”:null,“stats”:{“objectCount”:0,“shardCount”:1},“status”:“HEALTHY”,“version”:“1.23.5”}]}

With GRPC, I did the following.
Command: grpcurl -d ‘{“service”: “Weaviate”}’ -proto health.proto localhost:50051 grpc.health.v1.Health/Check
Response Failed to dial target host “localhost:50051”: tls: first record does not look like a TLS handshake

Oh, wait. If you don’t have SSL, you need to pass the -plaintext, like so

grpcurl --plaintext -d '{"service": "Weaviate"}' -proto health.proto localhost:50051 grpc.health.v1.Health/Check

Thanks!

I was able to make it working in local host using the command you gave me. Let me try with the traefik and cloud flare level.

Cloudflare - Opened GRPC at domain level. I was using Zero trust tunnel.

Oh! Is it working now?

Yes, at localhost level. But not at traefik and Cloudflare level. I’m researching on that.

Cloudflare needs all GRPC services to expose as http2. So I’m routing the hostname “weaviate.grpc.” in Traefik with port as 443. Traefik connect using the url to the docker url “http://172.19.0.14:50051”. Do you see any issues here.

I’m mostly a developer with little knowledge on networking.