Vectorization failed 404 http://host.docker.internal:11434/api/embed

Description

Running windows subsystem for linux (WSL2) with docker desktop running the containerization show from windows. I have ollama started with a model, works just fine when testing it with ollama run llama3.1.

I cook it up with:

docker run -d --gpus=all --name ollama --restart always -v ollama:/root/.ollama --add-host=host.docker.internal:host-gateway -p 11434:11434 ollama/ollama:0.3.10

my docker compose has env vars set to look at my .env file:

OLLAMA_URL=http://host.docker.internal:11434/
OLLAMA_MODEL=llama3.1:latest
OLLAMA_EMBED_MODEL=llama3.1

This works as expected, as I can start up Verba on port 8000 and select docker deployment in the UI. The ā€œChatā€ tab page has ā€œ0 documents embedded by llama3.1:latestā€ so itā€™s definitely connecting and reading the right model, else this would show a connection error.

But going onto the ā€œImport Dataā€ tab and trying to add and import a simple txt file containing ā€œWhy is the sky blueā€, throws up:

āœ˜ No documents imported 0 of 1 succesful tasks
ā„¹ FileStatus.ERROR | why_oh_why.txt | Import for why_oh_why.txt failed:
Import for why_oh_why.txt failed: Batch vectorization failed: Vectorization
failed for some batches: 404, message='Not Found',
url=URL('http://host.docker.internal:11434/api/embed') | 0

I even tried adding ollama to the same network as docker compose (docker network connect verba_default ollama) and got to the same point, but with ā€œhttp://ollama:11434/api/embedā€ failing in the same way.

I jumped into the code to start debugging the OllamaEmbedder:

    async def vectorize(self, config: dict, content: list[str]) -> list[float]:

        model = config.get("Model").value

        data = {"model": model, "input": content}
        
        async def on_request_end(session, trace_config_ctx, params):
            print(f"Ending request:\n   method: {params.method}\n   url: {params.url}\n   headers: {params.headers}")

        trace_config = aiohttp.TraceConfig()
        trace_config.on_request_end.append(on_request_end)

        async with aiohttp.ClientSession(trace_configs=[trace_config]) as session:
            async with session.post(self.url + "/api/embed", json=data) as response:
                response.raise_for_status()
                data = await response.json()
                embeddings = data.get("embeddings", [])
                return embeddings

And I was confused as can be on the printout changing the Method to GET:

Ending request:
   method: GET
   url: http://host.docker.internal:11434/api/embed
   headers: <CIMultiDict()>

But maybe thatā€™s down to my poor understanding of Python and these async libraries / middleware changing things as it goes through?

Either way when I use curl from the verba-verba-1 container Iā€™m able to get the embeddings just fine:

curl http://host.docker.internal:11434/api/embed -d '{"model": "llama3.1","input": "Why is the sky blue?"}'

So, now Iā€™m at a loss on what else to try. Any ideas?

Server Setup Information

  • Vera commit: 59a46d06e382dc88cc90d9d217e7c5a2a8f950dc
  • Deployment Method: local docker compose
  • OS: Windows + WSL2

hi @Kieran_Sears !!

Welcome to our community :hugs:

I was just playing around with Verba + Ollama all in docker :slight_smile:

I am not sure exactly how WSL2 plays with windows + docker, but can you try running everything in docker?

One thing to note: Whenever you start Verba, your ollama must have the models available, otherwise they will not be listed in Verba. Verba will connect to Ollama at startup and read all available models.

Here is how I am doing:

first, create a docker-compose.yaml file like this:

---

services:
  verba:
    image: semitechnologies/verba
    ports:
      - 8000:8000
    environment:
      - WEAVIATE_URL_VERBA=http://weaviate:8080
      - OLLAMA_URL=http://ollama:11434
      - OLLAMA_MODEL=llama3.2
      - OLLAMA_EMBED_MODEL=llama3.2

    volumes:
      - ./data:/data/
    depends_on:
      weaviate:
        condition: service_healthy
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8000 || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s

  weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.25.10
    ports:
      - 8080:8080
      - 3000:8080
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8080/v1/.well-known/ready || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s
    environment:
      OPENAI_APIKEY: $OPENAI_API_KEY
      COHERE_APIKEY: $COHERE_API_KEY
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'e'
      CLUSTER_HOSTNAME: 'node1'

  ollama:
    image: ollama/ollama:0.3.14
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - 11434:11434
      
volumes:
  weaviate_data: {}
  ollama_data: {}
...

Now, letā€™s make sure we have the model we selected (in this case, llama3.2) available:

docker compose exec -ti ollama ollama pull llama3.2

You can check if the model is listed here:
http://localhost:11434/api/tags

ok, now we can start everything up:

docker compose up -d

Now proceed to import a document at verba, that should be running at:
http://localhost:8000/

A little after the import start, you should see ollama eating up resources:

Obs: llama was quite slow to vectorize :thinking: and while doing with large documents, it was crashing :grimacing:

Let me know if this helps!

THanks!

1 Like

So now, for example, if you want to add a new model, you should:

For example, adding nomic-embed-text:

docker compose exec -ti ollama ollama pull nomic-embed-text
docker compose restart verba

You should now see both models listed in Verba

Ps: While vectorizing large documents, I have faced this error:

Which I believe may be some docker env variable that need to be set.

Hey @DudaNogueira , thanks for the warm welcome!

I found my issue, and itā€™s painfully straight forward.

I was running it all within docker, I just had verba and weaviate on my docker compose, but the ollama image running independently in itā€™s own container rather than in the same docker compose file. As I say, this can be done by ensuring the ollama container is put onto the same network as verba. My ollama does have a model in it (Llama3.1), which Iā€™ve yet to benchmark but seems to run slick considering I set it up with GPU acceleration (see docker image for details on how), Iā€™ll test it with larger files now Iā€™ve gotten it working. Lord knows if the embedding works as expected, but considering itā€™s producing content I canā€™t see why it wouldnā€™t.

The solution

But the issue was having a trailing forward slash at the end of my OLLAMA_URL environment variable. I did think it was strange my curl command could hit my endpoint but the service couldnā€™t? So, after removing it:

- OLLAMA_URL=http://host.docker.internal:11434/
+ OLLAMA_URL=http://host.docker.internal:11434

I had my issue disappear, with even the logs giving the expected POST method:

Ending request:
   method: POST
   url: http://host.docker.internal:11434/api/embed
   headers: <CIMultiDict()>

Which to me is still quite infuriating that it changed the method just because the resource wasnā€™t found? If someone can point to the part where itā€™s written in the standard that this should happen, then please add it as a reply on here!

Thank you kindly for your prompt reply by the way! If you wanted to have more of a play around you could find out about how to set up the ollama image in docker compose with the GPU acceleration, that would absolutely speed things up as itā€™s practically instantaneous on my machine that runs a NVIDIA GFORCE RTX 3070.

Iā€™m quite new to contributing to open source, but Iā€™d like to prevent anyone else from making such a rookie error that was difficult to debug. Should I open a PR that:

  1. updates the readme with a note to say avoid tailing slashes.
  2. update the config to validate env var URLs and improve error handling so logs show when something is amiss.
  3. update the config to just strip any tailing slashes automagically.

Would like to know your thoughts?

Kind regards!

Ohhhhh. :grimacing:

I believe we can improve this in Verba, properly joining the url paths here:

and here

Please, feel free to open an issue so we can tackle this. We are always open to contributions, specially those that improve DX.

This could prevent this issue:

from urllib.parse import urljoin
base_url = "https://example.com/"
relative_path = "/api/v1/users"
joined_url = urljoin(base_url, relative_path)

There are other models that can also suffer from this issue.

As I use a mac, I only run models on CPU. :frowning:
I will eventually get my hands on a proper GPU to play around with with some load :slight_smile:

If you are opening this issue, make sure to link to this thread for context!

Thanks!