Hybrid query with rerank, get context deadline exceeded ERROr

Description

When I use hybrid query with rerank, and will get a error of context deadline exceeded. But if I use near_text or bm25 query with rerank, it work fine . My code is as follow

collection = weaviate_client.collections.get(index_name)
response = collection.query.hybird(
                query=query,
                limit=10,
                filters=Filter.by_property("key").not_equal(to_int(key)),
                rerank=Rerank(prop="text", query=query),
                return_metadata=MetadataQuery(score=True, explain_score=True),
                alpha=0.75,
                fusion_type=HybridFusion.RELATIVE_SCORE,
)

And the exception info is:

Query call with protocol GRPC search failed with message <AioRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "explorer: get class: extend: extend rerank: error ranking with cohere: send POST request: Post "http://reranker-transformers:8080/rerank": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-08-07T23:37:00.889730113+08:00", grpc_status:2, grpc_message:"explorer: get class: extend: extend rerank: error ranking with cohere: send POST request: Post \"http://reranker-transformers:8080/rerank\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}"
>.

If I set rerank as None, it will work fine. And use rerank with near_text or bm25, will also work file.

The reranker is build with reranker-transformers, which version is v1.1.1, and the model is BAAI/bge-reranker-large.

Server Setup Information

  • Weaviate Server Version:
version: "3.4"
services:
  weaviate:
    image: semitechnologies/weaviate:1.25.10
    ports:
      - "8088:8080"
      - "50051:50051"
    volumes:
      - ./data:/var/lib/weaviate
    restart: on-failure:0
    networks:
      - weaviate_default
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers:8080'
      RERANKER_INFERENCE_API: 'http://reranker-transformers:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers,reranker-transformers'
      CLUSTER_HOSTNAME: 'node1'
  t2v-transformers:
    image: semitechnologies/transformers-inference:baai-bge-m3-onnx
    networks:
      - weaviate_default
    environment:
      ENABLE_CUDA: 0 # set to 1 to enable
  reranker-transformers:
    build:
      context: reranker-transformers-1.1.1
      dockerfile: Dockerfile
      args:
        HF_ENDPOINT: "https://hf-mirror.com"
        MODEL_NAME: "BAAI/bge-reranker-large"
    image: weaviate-reranker-transformers:latest
    networks:
      - weaviate_default
    environment:
      ENABLE_CUDA: '0'
networks:
  weaviate_default:
    driver: bridge
  • Deployment Method: docker
  • Client Language and Version: Python, with weaviate-client==4.7.1

hi @vk_Cheung !!

Can you inspect the object that you get from both queries?

You are probably getting bigger objects, that is now taking its toll on the reranker service, leading to a probable timeout.

Is this query taking a long time? If this is a timeout thing, this is the env. var that will control the module client timeout

MODULES_CLIENT_TIMEOUT it’s default is 50s

You can try increasing the timeout?

However… whenever you see yourself tweaking Weaviate timeout default values, you should first look at the resources you have available at your Weaviate cluster.

Let me know if this helps.

Thanks!