Description
Using Weaviate running from docker-compose
Image: cr.weaviate.io/semitechnologies/weaviate:1.25.1
Reranker: cr.weaviate.io/semitechnologies/reranker-transformers:cross-encoder-ms-marco-MiniLM-L-6-v2
Server Setup Information
- Weaviate Server Version: 1.25.1
- Deployment Method docker-compose
- Multi Node? No
- Client Language and Version: Python 3.12.3
Any additional Information
When running
collection = clientv4.collections.get(collection_name)
hybrid_documentsv4 = collection.query.hybrid(
query=user_input,
limit=4,
query_properties=["text", "key"],
rerank=Rerank(prop="text", query=user_input),
return_metadata=MetadataQuery(score=True)
)
responses take 50000+ ms
When I disable reranking:
collection = clientv4.collections.get(collection_name)
hybrid_documentsv4 = collection.query.hybrid(
query=user_input,
limit=4,
query_properties=["text", "key"],
return_metadata=MetadataQuery(score=True)
)
responses take 500 ms (100x faster)
My reranker-transformers docker container uses a max of about 109% of 1 of 10 CPU core and 1.6GB RAM when executing for the 50 seconds. Even if I run parallel python threads there is no improvement in speed of reranking. And I cannot get the CPU and RAM usage to take more from my host.
I have confirmed my Docker machine can access up to 10 cores and 14 GB of RAM and there is no resoruce contention while doing the hybrid-search. This is purely and issue with the container or how weaviate client does the reranking it seems. I am unsure of how to improve my performance. 50 seconds is dreadfully slow!