Description
The custom vectorizer container gets killed, probably by Linux OOM daemon. How do I manage resources of weaviate+custom model on a development machine?
This is the way I define and start my two containers:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.24.6
command:
- "--host=0.0.0.0"
- "--port=8080"
- "--scheme=http"
ports:
- "8080:8080"
- "50051:50051"
volumes:
- weaviate_data:/var/lib/weaviate
restart: unless-stopped
environment:
LOG_LEVEL: debug
ENABLE_CUDA: 0
LIMIT_RESOURCES: true
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: true
PERSISTENCE_DATA_PATH: /var/lib/weaviate
CLUSTER_HOSTNAME: finland
ENABLE_MODULES: text2vec-transformers
DEFAULT_VECTORIZER_MODULE: text2vec-transformers
TRANSFORMERS_INFERENCE_API: http://t2v-e5-mistral:8080
depends_on:
t2v-e5-mistral:
condition: service_healthy
t2v-e5-mistral:
build:
context: /home/mema/llms/e5-mistral-7b-instruct
dockerfile: Dockerfile
image: e5-mistral-7b-instruct
environment:
ENABLE_CUDA: '0'
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:8080/docs"]
interval: 30s
timeout: 10s
retries: 2
start_period: 10s
The custom t2v-e5-container is built with the following Dockerfile:
FROM semitechnologies/transformers-inference:custom
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
RUN MODEL_NAME=intfloat/e5-mistral-7b-instruct ./download.py
I then spin up the two containers with a “docker compose up -d weaviate”. As weaviate’s service definition has a depends on t2v-e5-mistral this will start and when ready also weaviate starts without any problems in the logs. but after some variable time the custom model (t2v-e5-mistral) gets killed. Here follows a “docker compose logs -f t2v-e5-mistral” log of three starts and subsequent kills. As you can see the last two get killed immediately.
t2v-e5-mistral-1 | INFO: Started server process [7]
t2v-e5-mistral-1 | INFO: Waiting for application startup.
t2v-e5-mistral-1 | INFO: CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
t2v-e5-mistral-1 | INFO: Running on CPU
Loading checkpoint shards: 100%|██████████| 6/6 [00:06<00:00, 1.13s/it]
t2v-e5-mistral-1 | INFO: Application startup complete.
t2v-e5-mistral-1 | INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
t2v-e5-mistral-1 | INFO: 127.0.0.1:59906 - "GET /docs HTTP/1.1" 200 OK
t2v-e5-mistral-1 | INFO: 192.168.160.4:52866 - "GET /meta HTTP/1.1" 200 OK
t2v-e5-mistral-1 | INFO: 127.0.0.1:47926 - "GET /docs HTTP/1.1" 200 OK
t2v-e5-mistral-1 | INFO: 127.0.0.1:56692 - "GET /docs HTTP/1.1" 200 OK
... skipped a dozen identical messages...
t2v-e5-mistral-1 | INFO: 127.0.0.1:54584 - "GET /docs HTTP/1.1" 200 OK
t2v-e5-mistral-1 | INFO: 127.0.0.1:44640 - "GET /docs HTTP/1.1" 200 OK
t2v-e5-mistral-1 | Killed
t2v-e5-mistral-1 | INFO: Started server process [7]
t2v-e5-mistral-1 | INFO: Waiting for application startup.
t2v-e5-mistral-1 | INFO: CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
t2v-e5-mistral-1 | INFO: Running on CPU
Loading checkpoint shards: 83%|████████▎ | 5/6 [00:05<00:01, 1.22s/it]Killed
t2v-e5-mistral-1 | INFO: Started server process [7]
t2v-e5-mistral-1 | INFO: Waiting for application startup.
t2v-e5-mistral-1 | INFO: CUDA_PER_PROCESS_MEMORY_FRACTION set to 1.0
t2v-e5-mistral-1 | INFO: Running on CPU
Loading checkpoint shards: 83%|████████▎ | 5/6 [00:05<00:01, 1.15s/it]Killed
The model I am using for this container is on Huggingface at this URL: intfloat/e5-mistral-7b-instruct · Hugging Face
The development machine is a 64GB/16cores Linux server whose normal usage is depicted in this nmon snapshot:
and as soon as I launch the t2v-e5-mistral service container I see the free memory rapidly going to zero and then going up again after the container is killed.
In this other snapshot you can see two CPU spikes as I launched the container twice trying to make the picture when the memory was very low:
and can see the memory is almost depleted.
Would be very grateful if you could suggest some environment variables or other means to keep weaviate/custom text2vector containers within limits of my dev machine. Thanks in advance