Locally hosted transformers inference container

curiousmind · September 30, 2025, 5:25pm

Description

I have a local Docker image of a specific transformers embedding model, and I am spinning up a Docker container from it. I specify the necessary configurations in docker-compose.yml file. I am following the official tutorial:

link1https://docs.weaviate.io/weaviate/model-providers/transformers/embeddings

link2https://docs.weaviate.io/weaviate/model-providers/transformers/embeddings-custom-image#build-a-custom-transformers-model-image

Here is my docker-compose.yml file:

-–

services:

weaviate:

command:

- --host

- 0.0.0.0

- --port

- '8080'

- --scheme

- http

image: cr.weaviate.io/semitechnologies/weaviate:1.32.9

ports:

- 8080:8080

- 50051:50051

volumes:

- weaviate_data:/var/lib/weaviate

restart: on-failure:0

environment:

QUERY_DEFAULTS_LIMIT: 25

AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: ‘true’

PERSISTENCE_DATA_PATH: ‘/var/lib/weaviate’

DEFAULT_VECTORIZER_MODULE: text2vec-transformers

ENABLE_MODULES: text2vec-transformers

TRANSFORMERS_INFERENCE_API: http://text2vec-transformers:8080

CLUSTER_HOSTNAME: ‘node1’

text2vec-transformers:

image: medembed-inference

ports:

- 8000:8080

deploy:

resources:

reservations:

devices:

        - driver: nvidia

capabilities: [gpu]

environment:

ENABLE_CUDA: 1

NVIDIA_VISIBLE_DEVICES: all

NVIDIA_DRIVER_CAPABILITIES: all

volumes:

weaviate_data:

…

I get two running Docker containers after I run docker-compose up -d command on the above docker-compose.yml file, namely, one that runs Weaviate server, and the other that runs my local embedding model.

I can test weaviate server by running this command:

curl localhost:8080

and the embedding model by running this command:

curl localhost:8000/vectors -H ‘Content-Type: application/json’ -d ‘{“text”: “foo bar”}’

I do get a vector representation of the text through this command.

However, my question is:

When I create a collection and insert objects into it, the vectors I’m getting are different than the ones I get by using the curl command on localhost:8000, which is where my embedder is running. So, it makes me wonder if I’m using my local embedder. That is, am I using the correct value for the environment variable TRANSFORMERS_INFERENCE_API? Since I am locally hosting my embedder, should I use localhost:8000 or, host.docker.internal:8000? I’ve used both of them and I get the error Connection reset by peer and the weaviate server does not start. I also tried to override TRANSFORMERS_INFERENCE_API by setting InferenceUrl parameter in vector_config to localhost:8000 and host.docker.internal:8000, but I get the error “connection refused”.

Any help would be greatly appreciated. Thanks.

Server Setup Information

Weaviate Server Version: 1.32.9
Deployment Method: Docker
Multi Node? Number of Running Nodes: 1
Client Language and Version: Python and weaviate client version is 4.16.10
Multitenancy?: No

Any additional Information

DudaNogueira · September 30, 2025, 9:23pm

hi @curiousmind !!

Welcome to our community

You are probably not considering that Weaviate will, by default, also vectorize the Collection name.

Check this code, for example:

client.collections.delete("Test")
simple = client.collections.create(
    "Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_transformers(
        inference_url="https://webhook.site/CREATE-YOUR-ENDPOINT-ON-WEBHOOK",
    )
)
simple.data.insert({"text": "foo bar"})
simple.query.fetch_objects(include_vector=True).objects[0].vector

this will be the payload sent to http://webhook.site:

{
  "text": "Test foo bar",
  "dims": 0,
  "vector": null,
  "error": "",
  "config": {
    "pooling_strategy": "masked_mean",
    "task_type": "passage"
  }
}

This means you should compare the vectors of your object with this one:

curl localhost:8000/vectors -H 'Content-Type: application/json' -d '{"text": "Test foo bar"}'

Regarding whether to use localhost:8000 or host.docker.internal:8000

If you are running both Weaviate and the model in the same docker network you can use the service name, for instance, text2vec-transformers . In case you are running both service at the same docker compose, they are on the same network.

If you set http://localhost:8000 as TRANSFORMERS_INFERENCE_API Weaviate will try to connect to itself. It will not work!

Now, if you set TRANSFORMERS_INFERENCE_API as http://host.docker.internal:8000, Weaviate, that is running as a docker image, will try to connect to docker host, aka your computer or server running docker.

This will usually work. If not, it will depend on the host, windows, mac, etc.

Let me know if this helps

Happy coding!

curiousmind · September 30, 2025, 11:42pm

Thanks, DudaNogueira, for the prompt response and the great explanation. Yes, I am running both services from the same docker-compose.yml file. I am now using service name, that is, http://text2vec-transformers:8080 in the TRANSFORMERS_INFERENCE_API.

However, I have also tried explicitly setting vectorize_collection_name = False in vector_config and the vectors are still different. Upon more investigation, I noticed a bigger issue that no matter what text I feed as data_object into the collection, the returned vectors are identical for some reason. This problem persisted even when I changed the vectorizer module from my local one to text-embedding-ada version 002. Here’s the section of vector_config I have:

vector_config = [

    wc.Configure.Vectors.text2vec_azure_openai(

name = “note_vector”,

resource_name = ‘********’,

deployment_id = ‘text-embedding-ada-002’,

source_properties = [‘FULL_TEXT’],

vectorize_collection_name = False,

    )

\]

Here’s how I’m retrieving the vectors back:

vectorized_collection = client.collections.use(collection_name)

response = vectorized_collection.query.fetch_objects(

include_vector=True,

limit=10

)

for item in response.objects:

print(item.vector)

All the returned vectors look identical (at least as far as I can tell). What kind of glitch could there be?

Also, even when I try to explicitly set skip_vectorization = True and vectorize_property_name = True, these two properties always are set to False for all the properties. I really appreciate any help you could provide. Thanks.

curiousmind · October 1, 2025, 12:26am