Currently, re-rank does not support concurrent requests. Will we develop concurrent requests of rerank in the future? Or how to reduce the time consumption of re-rank for multiple collections?Thanks
Hi @ChengCheng !! Welcome to our community
Not sure I follow. AFAIK, the rerank API will require all documents to be sent at the same time, for example, cohere api:
Can you elaborate on that?
Thanks!
Hi , @DudaNogueira Thanks for your reply. We are using the reranker-transformers. There are multiple classes (collections) in Weaviate. ‘rerank-transformer’ cannot be processed in parallel.
e.g. query 9 collections in parallel, weaviate returns chunks from 9 collections in parallel (you get chunks from 9 collection at the same time), but the weaviate re-ranker re-rank the collections one by one (sequentially)
How can we speed up retrieval when we want to call multiple classes with rerank feature in parallel?
Just a shot in the dark here… perhaps you can run multiple instances of Weaviate and each can have it’s own reranker-transformers instance? I’m having the same problem and am interested in the solution. I’ll let you know how my experiment turns out.
Hi!
Welcome to our community @Benjamin_Lush !
I believe you will need to run multiple instances of the inference/reranker of the same models , and run that behind a load balancer.
With that you could spread the load?