Using a local reranker-transformers reduces performance by 100x

Description

Using Weaviate running from docker-compose
Image: cr.weaviate.io/semitechnologies/weaviate:1.25.1
Reranker: cr.weaviate.io/semitechnologies/reranker-transformers:cross-encoder-ms-marco-MiniLM-L-6-v2

Server Setup Information

  • Weaviate Server Version: 1.25.1
  • Deployment Method docker-compose
  • Multi Node? No
  • Client Language and Version: Python 3.12.3

Any additional Information

When running

collection = clientv4.collections.get(collection_name)
hybrid_documentsv4 = collection.query.hybrid(
    query=user_input,
    limit=4,
    query_properties=["text", "key"],
    rerank=Rerank(prop="text", query=user_input),
    return_metadata=MetadataQuery(score=True)
)

responses take 50000+ ms

When I disable reranking:

collection = clientv4.collections.get(collection_name)
hybrid_documentsv4 = collection.query.hybrid(
    query=user_input,
    limit=4,
    query_properties=["text", "key"],
    return_metadata=MetadataQuery(score=True)
)

responses take 500 ms (100x faster)

My reranker-transformers docker container uses a max of about 109% of 1 of 10 CPU core and 1.6GB RAM when executing for the 50 seconds. Even if I run parallel python threads there is no improvement in speed of reranking. And I cannot get the CPU and RAM usage to take more from my host.

I have confirmed my Docker machine can access up to 10 cores and 14 GB of RAM and there is no resoruce contention while doing the hybrid-search. This is purely and issue with the container or how weaviate client does the reranking it seems. I am unsure of how to improve my performance. 50 seconds is dreadfully slow!

Hi!

I believe that for improving performance you will need a CUDA enabled hardware and make sure it will run with it.

Let me know if that helps.

Thanks!