Normalizing a vector when using weaviate with a text2vec-transformers

rjalex · June 26, 2025, 1:37pm

I am using weaviate 1.3.1 local via docker with a text2vec-transformers (multilingual-e5-large-inference) image as the backend embedder (see below for details)

From this embedder I read:

* The E5 models (including `e5-multilingual-large`) produce embeddings that are **meant to be L2-normalized** (i.e., turned into unit vectors).
* The **contrastive loss** used during training assumes cosine similarity — which mathematically requires **unit-length vectors** for correct behavior.

and also asking ChatGPT about what happens if you skip it (which I have done until now) it says:

### Consequences of Skipping Normalization:

* Storing **raw, unnormalized vectors** in Weaviate (or any vector DB) means you're not really using **cosine similarity**, even if the DB claims to.
* This can lead to **suboptimal retrieval results**, particularly poor semantic matching and lower recall/precision.

The embedder /vectors endpoint does not allow me to do anything other that sending it a string:

so the question is “does this container wrongly return the raw E5 vector or does it normalize it under the covers?”

Thank you very much.

Docker compose declaration follows.

I am using weaviate 1.3.1 local via docker as per the following config:

services:
  weaviate131:
    networks:
      - mema_docker_compose_weaviate_net 
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.31.0
    ports:
      - "${WEAVIATE_HTTP_PORT:-8099}:8080"
      - "${WEAVIATE_GRPC_PORT:-50099}:50051"
    volumes:
      - weaviate_131_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers'
      TRANSFORMERS_INFERENCE_API: 'http://multilingual-e5-transformers:8080'
      CLUSTER_HOSTNAME: 'node1'
    depends_on:
      - multilingual-e5-transformers

  multilingual-e5-transformers:
    build:
      context: .
      dockerfile: multilingual-e5-large.Dockerfile
    image: multilingual-e5-large-inference
    networks:
      - mema_docker_compose_weaviate_net
    ports:
      - "${TRANSFORMER_PORT:-8092}:8080"
    environment:
      ENABLE_CUDA: '0'
    restart: on-failure:0

volumes:
  weaviate_131_data:

networks:
  mema_docker_compose_weaviate_net:
    external: true

DudaNogueira · June 27, 2025, 1:12pm

Ciao amico @rjalex !!! Long time no see!

This is a hard one that I do not have the answer

I will raise this to our team.

Just to confirm, you referring to this endpoint here, right?

github.com/weaviate/t2v-transformers-models

app.py

929a804e0


      
              auth: Optional[HTTPAuthorizationCredentials] = Depends(get_bearer_token),
          ):
              if is_authorized(auth):
                  return meta_config.get()
              else:
                  response.status_code = status.HTTP_401_UNAUTHORIZED
                  return {"error": "Unauthorized"}
          
          
          @app.post("/vectors")
          @app.post("/vectors/")
          async def vectorize(
              item: VectorInput,
              response: Response,
              auth: Optional[HTTPAuthorizationCredentials] = Depends(get_bearer_token),
          ):
              if is_authorized(auth):
                  try:
                      vector = await vec.vectorize(item.text, item.config, get_worker())
                      return {"text": item.text, "vector": vector.tolist(), "dim": len(vector)}
                  except Exception as e:

Thanks!

rjalex · June 27, 2025, 2:05pm

Hi Duda,
I was almost going to call you Saint Duda since you always help me a lot but then remembered that to be a Saint the prerequisite is that you should be dead

Yes I am using the /vectors endpoint.

What I really hope is that the container dedicated to e5-multilingual does the required normalisation.

Will keep tuned to have a confirmation.

Muito obrigado
Bob

Topic		Replies	Views
Local Embed vs Weaviate Module Support	6	1180	October 19, 2023
Does Weaviate has its own embedding model which can be used for text and image embedding Support technical	1	139	January 2, 2025
Multilingual embedder for Weaviate Support	7	473	July 6, 2025
Weaviate Text Embedding Variations Support	1	550	February 19, 2024
WCS DEPLOYMENT of text2vec-transformer Support	5	448	February 8, 2024

Normalizing a vector when using weaviate with a text2vec-transformers

Related topics