Locally running RAG pipeline with Verba and Llama3 with Ollama

Description

I tried following the blog post, Locally running RAG pipeline with Verba and Llama3 with Ollama https://weaviate.io/blog/local-llm-with-verba-for-rag, to build locally and it won’t import the pdf. The document is less than 300 kb.
Error message:

✘ No documents imported 0 of 1 succesful tasks
ℹ FileStatus.ERROR | the-heros-journey-joseph-campbell.pdf | Import for
the-heros-journey-joseph-campbell.pdf failed: Import for
the-heros-journey-joseph-campbell.pdf failed: Batch vectorization failed:
Vectorization failed for some batches: 500, message='Internal Server Error',
url=URL('http://localhost:11434/api/embed') | 0

Server Setup Information

I followed the embed path of the blog post on a macbook pro. Locally running RAG pipeline with Verba and Llama3 with Ollama. I can get Ollama to work locally.

  • Weaviate Server Version:
  • Deployment Method: embed
  • Multi Node? Number of Running Nodes:
  • Client Language and Version: python
  • Multitenancy?:

Any additional Information

Hi @bam !!

Welcome to our community :slight_smile:

I have just built and run Verba using Ollama to test the newer version.

I noticed that while we have recently implemented a public docker image, it doesn’t have the arm platform images yet.

So first I cloned the repo, built the image, and used this docker compose to spin it that same image up.

Considering the error message you pasted, it looks like the OLLAMA_URL is pointing to localhost. I tried to reproduce this error, but got a different one:

verba-1     | ✘ No documents imported 0 of 1 succesful tasks
verba-1     | ℹ FileStatus.ERROR | netherlands-wikipedia-article-text.pdf | Import
verba-1     | for netherlands-wikipedia-article-text.pdf failed: Import for
verba-1     | netherlands-wikipedia-article-text.pdf failed: Batch vectorization failed:
verba-1     | Vectorization failed for some batches: Cannot connect to host localhost:11434
verba-1     | ssl:default [Connection refused] | 0

Are you running ollama locally? If that’s the case, you should set the OLLAMA_URL accordingly:

---

services:
  verba:
    image: verba-verba
    ports:
      - 8000:8000
    environment:
      - WEAVIATE_URL_VERBA=http://weaviate:8080
      - OPENAI_API_KEY=$OPENAI_API_KEY
      - COHERE_API_KEY=$COHERE_API_KEY
      - OLLAMA_URL=http://host.docker.internal:11434
      - OLLAMA_MODEL=llama3.1
      - OLLAMA_EMBED_MODEL=mxbai-embed-large
      - UNSTRUCTURED_API_KEY=$UNSTRUCTURED_API_KEY
      - UNSTRUCTURED_API_URL=https://api.unstructured.io/general/v0/general
      #- GITHUB_TOKEN=$GITHUB_TOKEN

    volumes:
      - ./data:/data/
    depends_on:
      weaviate:
        condition: service_healthy
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8000 || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s

  weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.25.10
    ports:
      - 8080:8080
      - 3000:8080
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8080/v1/.well-known/ready || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s
    environment:
      OPENAI_APIKEY: $OPENAI_API_KEY
      COHERE_APIKEY: $COHERE_API_KEY
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'e'
      CLUSTER_HOSTNAME: 'node1'

volumes:
  weaviate_data: {}
...

Let me know if that helps!

Thanks!

Thank you for the quick response.


What does your file structure look like?
I didn’t use Docker to launch Weaviate. I used the embedded path because the blog post said it would be the easiest. That’s my directory structure and the only file I added was the .env copied from the blogpost.
I ran docker pull semitechnologies/verba in my terminal.
Then tried running docker compose up using the file/commands you shared and got a different error.

✘ weaviate Error context canceled                                                                 1.3s
 ✘ verba Error    pull access denied for verba-verba, repository does not exist or...              1.3s
Error response from daemon: pull access denied for verba-verba, repository does not exist or may require 'docker login'

I confirmed that I’m logged in. I check stackoverflow for his error and it sights mispellings are a cause. Tho’ I didn’t change anything in the commands.
Thank you. I’ll keep tinkering with it tonight.

I think defining the schema will solve my problem.
error message:
localhost:8080/v1/schema

I believe docker is the easiest way :grimacing:

Once you know how to play around with, it gets really easy to run apps. Also, you get a more production ready deployment, considering that embedded is still marked as experimental.

the verba-verba image I have used is the one I have built myself, in my mac. You can build it by cloning Verba’s repo and running

docker compose build

I have sent a PR to enable Verba to have linux/arm64 support, so it will work on Mac.

For now, you can try using the image I have built, so your docker-compose can be:

---

services:
  verba:
    image: dudanogueira/verba
    ports:
      - 8000:8000
    environment:
      - WEAVIATE_URL_VERBA=http://weaviate:8080
      - OPENAI_API_KEY=$OPENAI_API_KEY
      - COHERE_API_KEY=$COHERE_API_KEY
      - OLLAMA_URL=http://host.docker.internal:11434
      - OLLAMA_MODEL=llama3.1
      - OLLAMA_EMBED_MODEL=mxbai-embed-large
      - UNSTRUCTURED_API_KEY=$UNSTRUCTURED_API_KEY
      - UNSTRUCTURED_API_URL=https://api.unstructured.io/general/v0/general
      #- GITHUB_TOKEN=$GITHUB_TOKEN

    volumes:
      - ./data:/data/
    depends_on:
      weaviate:
        condition: service_healthy
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8000 || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s

  weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.25.10
    ports:
      - 8080:8080
      - 3000:8080
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8080/v1/.well-known/ready || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s
    environment:
      OPENAI_APIKEY: $OPENAI_API_KEY
      COHERE_APIKEY: $COHERE_API_KEY
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'e'
      CLUSTER_HOSTNAME: 'node1'

volumes:
  weaviate_data: {}
...

Let me know if you are okey running it with docker (don’t worry, I can help you on this) or if you want to keep running with weaviate embedded .

How would I set this up without Cohere? I read their TOC and it requires giving them permission to use your data. My goal is to use this locally/offline and not expose my data to 3rd parties.
I need help creating a schema and importing my data. I got the docker container running locally.


I have Verba, Weaviate, and Ollama running in the same image. Ollama keeps restarting. Verba says it’s connected to Ollama on port 11434 only when Ollama is started outside of the docker image.
I’m getting this error in docker using a refactored docker compose file now:

INFO:     Started reloader process [1] using WatchFiles
2024-11-03 16:31:31 verba-1     | ℹ Couldn't connect to Ollama http://host.docker.internal:11434
2024-11-03 16:31:31 verba-1     | ℹ Couldn't connect to Ollama http://host.docker.internal:11434
2024-11-03 16:31:31 verba-1     | ℹ Couldn't connect to Groq (https://api.groq.com/openai/v1/)

I made a few changes to the docker file:

services:
  ollama:
    image: ollama/ollama
    command: ollama run llama3.2
    ports:
      - 11434:11434
    restart: on-failure

  verba:
    image: dudanogueira/verba
    ports:
      - 8000:8000
    environment:
      - WEAVIATE_URL_VERBA=http://weaviate:8080
      - OLLAMA_URL=http://host.docker.internal:11434
      - OLLAMA_MODEL=llama3.2
      - OLLAMA_EMBED_MODEL=mxbai-embed-large
      #- GITHUB_TOKEN=$GITHUB_TOKEN

    volumes:
      - ./data:/data/
    depends_on:
      weaviate:
        condition: service_healthy
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8000 || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s

  weaviate:
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    image: semitechnologies/weaviate:1.25.10
    ports:
      - 8080:8080
      - 3000:8080
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    healthcheck:
      test: wget --no-verbose --tries=3 --spider http://localhost:8080/v1/.well-known/ready || exit 1
      interval: 5s
      timeout: 10s
      retries: 5
      start_period: 10s
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'e'
      CLUSTER_HOSTNAME: 'node1'

volumes:
  weaviate_data: {}

what are the logs for ollama? docker logs ollama to find out why it’s restarting?

you’ll probably want to link a volume to ollama as well otherwise you go through the hassle of needing to manually pull a model each time you start it up. I’ve got an example here which also has a few other bells and whistles that work for my machine (see ollama dockerhub for GPU details):

volumes:
  ollama_data:
    driver: local

services:
  ollama:
    container_name: ollama
    hostname: ollama
    image: ollama/ollama:0.3.9
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: ["gpu"]
            count: all
    volumes:
      - ollama_data:/root/.ollama
    restart: always
    ports:
      - 11434:11434
    healthcheck:
      test: ollama list || exit 1
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 10s

Have you tried OLLAMA_URL=http://ollama:11434 for your verba services environment variables? I’ve found that I don’t need host.docker.internal if I’m running ollama as part of docker compose, only if it’s externally being fired up with something like docker run ollama

1 Like

Thank you. I tried to use your docker compose file and got the following

error:
Gracefully stopping... (press Ctrl+C again to force)
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]


I think the issue is loading the schema. I don’t have a schema yet.

Just checking, did you follow the instructions in the link posted “ollama dockerhub for GPU details”? Maybe you don’t have an Nvidia card or didn’t install the Nvidia Container Toolkit?

I was able to get it to work, not the way I wanted but it works.
I open weaviate db in a docker image, run ollama locally, and verba locally using the pip install goldenverba then verba start.
Here’s my docker-compose file"


networks:
  local-net:
    external: true
    name: local-net  # This is the Docker network that allows access to your local machine

services:
  weaviate:
    image: semitechnologies/weaviate:latest
    environment:
      - QUERY_DEFAULTS_LIMIT=20
      - ENABLE_MODULES=text2vec-verba
      - VERBA_API_URL=http://host.docker.internal:8000  # Access Verba on local port 8000
    ports:
      - "8080:8080"  # Expose Weaviate on port 8080
    networks:
      - local-net
1 Like

Oh, nice!

Thanks for sharing!

I have also noticed ollama performing better when running directly on host instead of docker.

For one dataset it was importing in host, but not on docker.

I run mac, without GPU, so this may also affect it somehow.

1 Like