How to handle concurrent requests in re-rerank feature in Weaviate?

Currently, re-rank does not support concurrent requests. Will we develop concurrent requests of rerank in the future? Or how to reduce the time consumption of re-rank for multiple collections?Thanks

Hi @ChengCheng !! Welcome to our community :slight_smile:

Not sure I follow. AFAIK, the rerank API will require all documents to be sent at the same time, for example, cohere api:

Can you elaborate on that?

Thanks!

Hi , @DudaNogueira Thanks for your reply. We are using the reranker-transformers. There are multiple classes (collections) in Weaviate. ‘rerank-transformer’ cannot be processed in parallel.
e.g. query 9 collections in parallel, weaviate returns chunks from 9 collections in parallel (you get chunks from 9 collection at the same time), but the weaviate re-ranker re-rank the collections one by one (sequentially)
How can we speed up retrieval when we want to call multiple classes with rerank feature in parallel?

Just a shot in the dark here… perhaps you can run multiple instances of Weaviate and each can have it’s own reranker-transformers instance? I’m having the same problem and am interested in the solution. I’ll let you know how my experiment turns out.

Hi!

Welcome to our community @Benjamin_Lush ! :hugs:

I believe you will need to run multiple instances of the inference/reranker of the same models , and run that behind a load balancer. :thinking:

With that you could spread the load?