How to use 'trust_remote_code=True' in case of locally downloaded gated model

Description

How to use trust_remote_code=True in case of locally downloaded gated model from huggingface to be used with Weaviate.

Server Setup Information

  • Weaviate Server Version: 1.25.4
  • Deployment Method: docker
  • Multi Node? Number of Running Nodes: Single Node
  • Client Language and Version: Python

I’m using Dockerfile as below to create the container of locally downloaded huggingface model:

$ cat Nvidia-NV-Embed.Dockerfile 
FROM semitechnologies/transformers-inference:custom

# Copy the locally downloaded model to the Docker image
COPY local_NV-Embed-v1 /app/models/model

# Set the environment variable to trust remote code
ENV TRUST_REMOTE_CODE=True

and using docker-compose file as:

$ cat docker-compose.yml 
---
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.25.4
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - /path/to/weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers:8080'
      QNA_INFERENCE_API: 'http://qna-transformers:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers,qna-transformers'
      CLUSTER_HOSTNAME: 'node'
  t2v-transformers:
    image: nvidia-nv-embed-inference
    environment:
      ENABLE_CUDA: '0'
      TRUST_REMOTE_CODE: 'true' 
  qna-transformers:
    image: roberta_large_squad2_inference
    environment:
      ENABLE_CUDA: '0'
...

I’m using TRUST_REMOTE_CODE at both places, still I’m getting this error:

$ sudo docker compose logs -f t2v-transformers
weaviate_docker_2-t2v-transformers-1  | INFO:     Started server process [7]
weaviate_docker_2-t2v-transformers-1  | INFO:     Waiting for application startup.
weaviate_docker_2-t2v-transformers-1  | INFO:     CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
weaviate_docker_2-t2v-transformers-1  | INFO:     Running on CPU
weaviate_docker_2-t2v-transformers-1  | The repository for ./models/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/./models/model.
weaviate_docker_2-t2v-transformers-1  | You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
weaviate_docker_2-t2v-transformers-1  | 
weaviate_docker_2-t2v-transformers-1  | ERROR:    Traceback (most recent call last):
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 599, in resolve_trust_remote_code
weaviate_docker_2-t2v-transformers-1  |     answer = input(
weaviate_docker_2-t2v-transformers-1  |              ^^^^^^
weaviate_docker_2-t2v-transformers-1  | EOFError: EOF when reading a line
weaviate_docker_2-t2v-transformers-1  | 
weaviate_docker_2-t2v-transformers-1  | During handling of the above exception, another exception occurred:
weaviate_docker_2-t2v-transformers-1  | 
weaviate_docker_2-t2v-transformers-1  | Traceback (most recent call last):
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 734, in lifespan
weaviate_docker_2-t2v-transformers-1  |     async with self.lifespan_context(app) as maybe_state:
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 610, in __aenter__
weaviate_docker_2-t2v-transformers-1  |     await self._router.startup()
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 713, in startup
weaviate_docker_2-t2v-transformers-1  |     handler()
weaviate_docker_2-t2v-transformers-1  |   File "/app/app.py", line 74, in startup_event
weaviate_docker_2-t2v-transformers-1  |     meta_config = Meta(model_dir, model_name, use_sentence_transformer_vectorizer)
weaviate_docker_2-t2v-transformers-1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
weaviate_docker_2-t2v-transformers-1  |   File "/app/meta.py", line 11, in __init__
weaviate_docker_2-t2v-transformers-1  |     self.config = AutoConfig.from_pretrained(model_path).to_dict()
weaviate_docker_2-t2v-transformers-1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1141, in from_pretrained
weaviate_docker_2-t2v-transformers-1  |     trust_remote_code = resolve_trust_remote_code(
weaviate_docker_2-t2v-transformers-1  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 612, in resolve_trust_remote_code
weaviate_docker_2-t2v-transformers-1  |     raise ValueError(
weaviate_docker_2-t2v-transformers-1  | ValueError: The repository for ./models/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/./models/model.
weaviate_docker_2-t2v-transformers-1  | Please pass the argument `trust_remote_code=True` to allow custom code to be run.
weaviate_docker_2-t2v-transformers-1  | 
weaviate_docker_2-t2v-transformers-1  | ERROR:    Application startup failed. Exiting.
weaviate_docker_2-t2v-transformers-1 exited with code 3

Pls help me how and where should I pass this argument trust_remote_code=True in case of locally downloaded gated model from huggingface to be used with Weaviate ?

I am not able to use the locally downloaded nvidia/NV-Embed-v1 model with Weaviate because of TRUST_REMOTE_CODE error, as stated above.

Any help in this regard to resolve this TRUST_REMOTE_CODE error, highly appreciated.

Thanks.

hi @curious !!

I have to ask internally as I don’t have too much of expertise on this subject :frowning:

Thanks!

Hi @DudaNogueira,

Thank you for looking into this. I appreciate your efforts and help to seek internal expertise. I’ll wait for further updates.

Meanwhile, for the convenience, here are the steps to reproduce this issue:

Step 1: Access to model
As nvidia/NV-Embed-v1 is a gated model, we have to accept the conditions to access its files and content in Hugging Face - https://huggingface.co/nvidia/NV-Embed-v1

Step 2: Download the model locally
mkdir weaviate_docker && cd weaviate_docker
git lfs install

#Note: When prompted for a password, use an access token with write permissions.
#Generate one from your settings: https://huggingface.co/settings/tokens
git clone https://huggingface.co/nvidia/NV-Embed-v1

Step 3: Create docker container
Create a file Nvidia-NV-Embed.Dockerfile with contents:

FROM semitechnologies/transformers-inference:custom  
COPY NV-Embed-v1 /app/models/model

Build the container
sudo docker build -f Nvidia-NV-Embed.Dockerfile -t nvidia-nv-embed-inference .

Step 4: Create docker-compose.yml
Content of docker-compose.yml as mentioned in the original thread

Step 5: Start the containers
sudo docker compose up

Step 6:
Expected Outcome:
All containers, including t2v-transformers, should be up as usual.

What actually happened:
As can be seen in logs:
sudo docker compose logs -f weaviate
sudo docker compose logs -f t2v-transformers

The text2vec-transformers container failed to start because of TRUST_REMOTE_CODE issue.



If I’m using the NV-Embed-v1 model directly using below code, it is working perfectly. But I’m unable to set the trust_remote_code=True settings in case of weaviate docker deployment.

from sentence_transformers import SentenceTransformer

# Load the NV-Embed-v1 model using sentence-transformers with trust_remote_code
model_path = "/path/to/model/weaviate_docker/NV-Embed-v1"
model = SentenceTransformer(model_path, device='cpu', trust_remote_code=True)

# Generate embeddings for stored data and queries
stored_embeddings = model.encode(stored_data, batch_size=2, normalize_embeddings=True)
query_embeddings = model.encode(queries, batch_size=2, normalize_embeddings=True)

hi @curious !!

Good news: Add`TRUST_REMOTE_CODE` env var by cdpierse · Pull Request #85 · weaviate/t2v-transformers-models · GitHub

:slight_smile:

It is worth mentioning to be cautious with this flag :grimacing:

Let me know if this helps!

Wow!! Thanks a lot @DudaNogueira to you and your team for updating the code and committing the changes to the main.

I downloaded the latest repo, update dockerfile, re-built the container image, up the docker… and voila… there is no issue of trust_remote_code this time.

However, for some strange reason, the code is trying to access the model at huggingface instead of using local model and thus throwing this error:

$ sudo docker compose logs -f t2v-transformers 
weaviate_docker_2-t2v-transformers-1  | INFO:     Started server process [7]
weaviate_docker_2-t2v-transformers-1  | INFO:     Waiting for application startup.
weaviate_docker_2-t2v-transformers-1  | INFO:     CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
weaviate_docker_2-t2v-transformers-1  | INFO:     Running on CPU
weaviate_docker_2-t2v-transformers-1  | ERROR:    Traceback (most recent call last):
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
weaviate_docker_2-t2v-transformers-1  |     response.raise_for_status()
weaviate_docker_2-t2v-transformers-1  |   File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
weaviate_docker_2-t2v-transformers-1  |     raise HTTPError(http_error_msg, response=self)
weaviate_docker_2-t2v-transformers-1  | requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/nvidia/NV-Embed-v1/resolve/main/config.json
weaviate_docker_2-t2v-transformers-1  | 
.
.
.
weaviate_docker_2-t2v-transformers-1  | Cannot access gated repo for url https://huggingface.co/nvidia/NV-Embed-v1/resolve/main/config.json.
weaviate_docker_2-t2v-transformers-1  | Access to model nvidia/NV-Embed-v1 is restricted. You must be authenticated to access it.
weaviate_docker_2-t2v-transformers-1  | 
weaviate_docker_2-t2v-transformers-1  | ERROR:    Application startup failed. Exiting.
weaviate_docker_2-t2v-transformers-1 exited with code 3

I tried adding logger info to the app.py to see why and from where it is calling the huggingface repo, but unfortunately could not find any useful info.

I am also using a locally downloaded huggingface model roberta_large_squad2_inference for qna module (although not a gated model), and that is getting up and running perfectly fine without any issues.

What could be going wrong in case of t2v-transformer module? Any clue/help pls.

[PS: Sure, I’ll be cautious about the flag, thanks for sharing the link. :slight_smile: ]

This message is strange. Shouldn’t it be using GPU?

Maybe it wasn’t able to map the GPU into docker? I saw something with update nvidia drivers, and downgrading the driver helped.

@DudaNogueira Yes, the transformer is Running on CPU as we have set the same in docker-compose file:

Reason for the same is we are in process of getting a GPU based server, until then want to try weaviate on CPU machine (it is still 64 core Xeon Gold :slight_smile: ). Yes, we are aware that inference speed will be slow on CPU, but we are managing the same for the time being as given vectorizer results are of better accuracy in our case.

Do you feel that the error we are facing where the code is trying to access the model on huggingface instead of using local folder has something to do with the GPU ?

weaviate_docker_2-t2v-transformers-1 | requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/nvidia/NV-Embed-v1/resolve/main/config.json

Also, as metioned earlier, we were able to run the given model (locally downloaded) directly using separate python code. We are only getting error in case of weaviate docker setup (t2v-transformer module).

I’m still getting errors for this as well trying to use the built-in weaviate methods of manipulating models. After peeking at the files, I don’t see how any of this supposed to work in the first place.

t2v-transformers-models/download.py at main · weaviate/t2v-transformers-models (github.com)

Take this file for example. It tries to unpack via

trust_remote_code = os.getenv(“TRUST_REMOTE_CODE”, False)

but outside of the python script, I will always be setting a shell variable which I believe will just always be evaluated as a string by getenv. Then when that gets passed to the hf function, it just ignores it if its a string and the only time it is a real, python boolean, it is always False. This is a pretty common problem of working with shells and python scripts. see:

We’ve all been there: on boolean environment variables. - Ru (rusingh.com)