Description
Problem with Docker Configuration and CUDA for Reranker-Transformers
I followed the tutorial and used the following Docker configuration:
reranker-transformers:
# Set the name of the inference container
image: semitechnologies/reranker-transformers:baai-bge-reranker-v2-m3
container_name: reranker-transformers-container
network_mode: "bridge"
volumes:
- .:/usr/src/app
ports:
- 50051:8080
restart: always
environment:
ENABLE_CUDA: 0 # set to 1 to enable
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
After successfully starting the service, when I set ENABLE_CUDA to 1, calling the service results in the following error:
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Batches: 0%| | 0/1 [00:00<?, ?it/s]
ERROR: Something went wrong while scoring the input.
Traceback (most recent call last):
File "/app/app.py", line 55, in read_item
return await cross_encoder.do(item)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/crossencoder.py", line 56, in do
return await asyncio.wrap_future(self.executor.submit(self._rerank, item))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/crossencoder.py", line 53, in _rerank
return self._perform_rerank(item)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/crossencoder.py", line 45, in _perform_rerank
return self._batch_rerank(item)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/crossencoder.py", line 37, in _batch_rerank
scores = self.model.predict(sentences)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 375, in predict
self.model.to(self._target_device)
File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2883, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 805, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1160, in convert
return t.to(
^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 305, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
However, if I set ENABLE_CUDA
to 0
, there is no issue.
Why does this issue only happen with the reranker? I am asking because I used the transformers-inference
image with ENABLE_CUDA: 1
without any problems:
t2v-transformers:
image: semitechnologies/transformers-inference:baai-bge-m3-onnx
container_name: ccpg-t2v-transformers-container # Separate from JiaoDa naming
network_mode: "bridge"
volumes:
- .:/usr/src/app
ports:
- 50049:8080
restart: always
environment:
ENABLE_CUDA: 1 # set to 1 to enable
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
Additionally, I would like to know how to configure these services to load pre-downloaded models from the local machine.
Server Setup Information
- GPU hardware: GH200 (aarch64)