Using a local reranker-transformers reduces performance by 100x

Benjamin_Lush · May 27, 2024, 6:15pm

Description

Using Weaviate running from docker-compose
Image: cr.weaviate.io/semitechnologies/weaviate:1.25.1
Reranker: cr.weaviate.io/semitechnologies/reranker-transformers:cross-encoder-ms-marco-MiniLM-L-6-v2

Server Setup Information

Weaviate Server Version: 1.25.1
Deployment Method docker-compose
Multi Node? No
Client Language and Version: Python 3.12.3

Any additional Information

When running

collection = clientv4.collections.get(collection_name)
hybrid_documentsv4 = collection.query.hybrid(
    query=user_input,
    limit=4,
    query_properties=["text", "key"],
    rerank=Rerank(prop="text", query=user_input),
    return_metadata=MetadataQuery(score=True)
)

responses take 50000+ ms

When I disable reranking:

collection = clientv4.collections.get(collection_name)
hybrid_documentsv4 = collection.query.hybrid(
    query=user_input,
    limit=4,
    query_properties=["text", "key"],
    return_metadata=MetadataQuery(score=True)
)

responses take 500 ms (100x faster)

My reranker-transformers docker container uses a max of about 109% of 1 of 10 CPU core and 1.6GB RAM when executing for the 50 seconds. Even if I run parallel python threads there is no improvement in speed of reranking. And I cannot get the CPU and RAM usage to take more from my host.

I have confirmed my Docker machine can access up to 10 cores and 14 GB of RAM and there is no resoruce contention while doing the hybrid-search. This is purely and issue with the container or how weaviate client does the reranking it seems. I am unsure of how to improve my performance. 50 seconds is dreadfully slow!

DudaNogueira · May 31, 2024, 6:14pm

Hi!

I believe that for improving performance you will need a CUDA enabled hardware and make sure it will run with it.

Let me know if that helps.

Thanks!

Nicholas_Miller · July 11, 2024, 10:31pm

I had a similar issue. Same docker container. In my case, the memory usage of the container climbed a bit with each rerank call. Eventually the host machine needed to use swap, which reduced performance. Then the host machine crashed. Took some time to diagnose as I couldn’t even ssh to the machine when out of memory!
So there must be a memory leak in the reranker container somewhere. I note from the source on Github that the CrossEncoder class uses the threadpool. There are some bugs filed with sentence-transformers to do with memory leaks and the threadpool.

DudaNogueira · July 15, 2024, 4:45pm

hi @Nicholas_Miller !!

Welcome to our community!!

Do you mean this code, right?

So in your findings, this could be something from upstream?

Could you open a issue so we can keep track of that? Also, mentioning this thread.

Thanks for helping us on that!

Topic		Replies	Views
HIgh Cpu Usage Support technical	3	36	May 28, 2025
Rerank with HybridFusion.RELATIVE_SCORE - How many are ranked? Support	3	248	May 8, 2024
How to handle concurrent requests in re-rerank feature in Weaviate? Support developer-experience	4	300	May 29, 2024
Short Chunks Being Penalized in Reranking Pipeline — How to Fix? Support feedback	1	51	April 10, 2025
Reranker- change batch size? Support	2	274	January 8, 2024

Using a local reranker-transformers reduces performance by 100x

Description

Server Setup Information

Any additional Information

Related topics