Vectorization failed for some batches: 500, message='Internal Server Error'

Description

11.37 MB pdf file 647 chunks
UI Error Response:
Import for NASM's essentials of corrective exercise training ( PDFDrive ).pdf failed: Import for NASM's essentials of corrective exercise training ( PDFDrive ).pdf failed: Batch vectorization failed: Vectorization failed for some batches: 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed')

Server Setup Information

Any additional Information

Debugger Error Response:

Debugging File Configuration
{
  "fileID": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
  "filename": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
  "extension": "pdf",
  "status_report": {
    "STARTING": {
      "fileID": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
      "status": "STARTING",
      "message": "Starting Import",
      "took": 0
    },
    "LOADING": {
      "fileID": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
      "status": "LOADING",
      "message": "Loaded NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
      "took": 14.4
    },
    "CHUNKING": {
      "fileID": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
      "status": "CHUNKING",
      "message": "Split NASM's essentials of corrective exercise training ( PDFDrive ).pdf into 647 chunks",
      "took": 0.31
    },
    "EMBEDDING": {
      "fileID": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
      "status": "EMBEDDING",
      "message": "",
      "took": 0
    },
    "ERROR": {
      "fileID": "NASM's essentials of corrective exercise training ( PDFDrive ).pdf",
      "status": "ERROR",
      "message": "Import for NASM's essentials of corrective exercise training ( PDFDrive ).pdf failed: Import for NASM's essentials of corrective exercise training ( PDFDrive ).pdf failed: Batch vectorization failed: Vectorization failed for some batches: 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed'), 500, message='Internal Server Error', url=URL('http://localhost:11434/api/embed')",
      "took": 0
    }
  },
  "source": "",
  "isURL": false,
  "metadata": "",
  "overwrite": false,
  "content": "File Content",
  "labels": [
    "Document"
  ],
  "rag_config": {
    "Reader": {
      "selected": "Default",
      "components": {
        "Default": {
          "name": "Default",
          "variables": [],
          "library": [
            "pypdf",
            "docx",
            "spacy"
          ],
          "description": "Ingests text, code, PDF, and DOCX files",
          "config": {},
          "type": "FILE",
          "available": true
        },
        "HTML": {
          "name": "HTML",
          "variables": [],
          "library": [
            "markdownify",
            "beautifulsoup4"
          ],
          "description": "Downloads and ingests HTML from a URL, with optional recursive fetching.",
          "config": {
            "URLs": {
              "type": "multi",
              "value": "",
              "description": "Add URLs to retrieve data from",
              "values": []
            },
            "Convert To Markdown": {
              "type": "bool",
              "value": 0,
              "description": "Should the HTML be converted into markdown?",
              "values": []
            },
            "Recursive": {
              "type": "bool",
              "value": 0,
              "description": "Fetch linked pages recursively",
              "values": []
            },
            "Max Depth": {
              "type": "number",
              "value": 3,
              "description": "Maximum depth for recursive fetching",
              "values": []
            }
          },
          "type": "URL",
          "available": false
        },
        "Git": {
          "name": "Git",
          "variables": [],
          "library": [],
          "description": "Downloads and ingests all files from a GitHub or GitLab Repo.",
          "config": {
            "Platform": {
              "type": "dropdown",
              "value": "GitHub",
              "description": "Select the Git platform",
              "values": [
                "GitHub",
                "GitLab"
              ]
            },
            "Owner": {
              "type": "text",
              "value": "",
              "description": "Enter the repo owner (GitHub) or group/user (GitLab)",
              "values": []
            },
            "Name": {
              "type": "text",
              "value": "",
              "description": "Enter the repo name",
              "values": []
            },
            "Branch": {
              "type": "text",
              "value": "main",
              "description": "Enter the branch name",
              "values": []
            },
            "Path": {
              "type": "text",
              "value": "",
              "description": "Enter the path or leave it empty to import all",
              "values": []
            },
            "Git Token": {
              "type": "password",
              "value": "",
              "description": "You can set your GitHub/GitLab Token here if you haven't set it up as environment variable `GITHUB_TOKEN` or `GITLAB_TOKEN`",
              "values": []
            }
          },
          "type": "URL",
          "available": true
        },
        "Unstructured IO": {
          "name": "Unstructured IO",
          "variables": [
            "UNSTRUCTURED_API_KEY"
          ],
          "library": [],
          "description": "Uses the Unstructured API to import multiple file types such as plain text and documents",
          "config": {
            "Strategy": {
              "type": "dropdown",
              "value": "auto",
              "description": "Set the extraction strategy",
              "values": [
                "auto",
                "hi_res",
                "ocr_only",
                "fast"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "Set your Unstructured API Key here or set it as an environment variable `UNSTRUCTURED_API_KEY`",
              "values": []
            },
            "API URL": {
              "type": "text",
              "value": "https://api.unstructured.io/general/v0/general",
              "description": "Set the base URL to the Unstructured API or set it as an environment variable `UNSTRUCTURED_API_URL`",
              "values": []
            }
          },
          "type": "FILE",
          "available": false
        },
        "Firecrawl": {
          "name": "Firecrawl",
          "variables": [],
          "library": [],
          "description": "Use Firecrawl to scrape websites and ingest them into Verba",
          "config": {
            "Mode": {
              "type": "dropdown",
              "value": "Scrape",
              "description": "Switch between scraping and crawling. Note that crawling can take some time.",
              "values": [
                "Crawl",
                "Scrape"
              ]
            },
            "URLs": {
              "type": "multi",
              "value": "",
              "description": "Add URLs to retrieve data from",
              "values": []
            },
            "Firecrawl API Key": {
              "type": "password",
              "value": "",
              "description": "You can set your Firecrawl API Key or set it as environment variable `FIRECRAWL_API_KEY`",
              "values": []
            }
          },
          "type": "URL",
          "available": true
        }
      }
    },
    "Chunker": {
      "selected": "Token",
      "components": {
        "Token": {
          "name": "Token",
          "variables": [],
          "library": [],
          "description": "Splits documents based on word tokens",
          "config": {
            "Tokens": {
              "type": "number",
              "value": 250,
              "description": "Choose how many Token per chunks",
              "values": []
            },
            "Overlap": {
              "type": "number",
              "value": 50,
              "description": "Choose how many Tokens should overlap between chunks",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "Sentence": {
          "name": "Sentence",
          "variables": [],
          "library": [],
          "description": "Splits documents based on word tokens",
          "config": {
            "Sentences": {
              "type": "number",
              "value": 5,
              "description": "Choose how many Sentences per chunks",
              "values": []
            },
            "Overlap": {
              "type": "number",
              "value": 1,
              "description": "Choose how many Sentences should overlap between chunks",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "Recursive": {
          "name": "Recursive",
          "variables": [],
          "library": [
            "langchain_text_splitters "
          ],
          "description": "Recursively split documents based on predefined characters using LangChain",
          "config": {
            "Chunk Size": {
              "type": "number",
              "value": 500,
              "description": "Choose how many characters per chunks",
              "values": []
            },
            "Overlap": {
              "type": "number",
              "value": 100,
              "description": "Choose how many characters per chunks",
              "values": []
            },
            "Seperators": {
              "type": "multi",
              "value": "",
              "description": "Select seperators to split the text",
              "values": [
                "\n\n",
                "\n",
                " ",
                ".",
                ",",
                "​",
                ",",
                "、",
                ".",
                "。",
                ""
              ]
            }
          },
          "type": "",
          "available": false
        },
        "Semantic": {
          "name": "Semantic",
          "variables": [],
          "library": [
            "sklearn"
          ],
          "description": "Split documents based on semantic similarity or max sentences",
          "config": {
            "Breakpoint Percentile Threshold": {
              "type": "number",
              "value": 80,
              "description": "Percentile Threshold to split and create a chunk, the lower the more chunks you get",
              "values": []
            },
            "Max Sentences Per Chunk": {
              "type": "number",
              "value": 20,
              "description": "Maximum number of sentences per chunk",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "HTML": {
          "name": "HTML",
          "variables": [],
          "library": [
            "langchain_text_splitters "
          ],
          "description": "Split documents based on HTML tags using LangChain",
          "config": {},
          "type": "",
          "available": false
        },
        "Markdown": {
          "name": "Markdown",
          "variables": [],
          "library": [
            "langchain_text_splitters"
          ],
          "description": "Split documents based on markdown formatting using LangChain",
          "config": {},
          "type": "",
          "available": true
        },
        "Code": {
          "name": "Code",
          "variables": [],
          "library": [
            "langchain_text_splitters "
          ],
          "description": "Split code based on programming language using LangChain",
          "config": {
            "Language": {
              "type": "dropdown",
              "value": "python",
              "description": "Select programming language",
              "values": [
                "cpp",
                "go",
                "java",
                "kotlin",
                "js",
                "ts",
                "php",
                "proto",
                "python",
                "rst",
                "ruby",
                "rust",
                "scala",
                "swift",
                "markdown",
                "latex",
                "html",
                "sol",
                "csharp",
                "cobol",
                "c",
                "lua",
                "perl",
                "haskell",
                "elixir"
              ]
            },
            "Chunk Size": {
              "type": "number",
              "value": 500,
              "description": "Choose how many characters per chunk",
              "values": []
            },
            "Chunk Overlap": {
              "type": "number",
              "value": 50,
              "description": "Choose how many characters overlap between chunks",
              "values": []
            }
          },
          "type": "",
          "available": false
        },
        "JSON": {
          "name": "JSON",
          "variables": [],
          "library": [
            "langchain_text_splitters "
          ],
          "description": "Split json files using LangChain",
          "config": {
            "Chunk Size": {
              "type": "number",
              "value": 500,
              "description": "Choose how many characters per chunks",
              "values": []
            }
          },
          "type": "",
          "available": false
        }
      }
    },
    "Embedder": {
      "selected": "Ollama",
      "components": {
        "Ollama": {
          "name": "Ollama",
          "variables": [],
          "library": [],
          "description": "Vectorizes documents and queries using Ollama. If your Ollama instance is not running on http://localhost:11434, you can change the URL by setting the OLLAMA_URL environment variable.",
          "config": {
            "Model": {
              "type": "dropdown",
              "value": "snowflake-arctic-embed:latest",
              "description": "Select a installed Ollama model from http://localhost:11434. You can change the URL by setting the OLLAMA_URL environment variable. ",
              "values": [
                "snowflake-arctic-embed:latest",
                "llama3:latest",
                "codegemma:latest",
                "llama3.2:latest"
              ]
            }
          },
          "type": "",
          "available": true
        },
        "SentenceTransformers": {
          "name": "SentenceTransformers",
          "variables": [],
          "library": [
            "sentence_transformers"
          ],
          "description": "Embeds and retrieves objects using SentenceTransformer",
          "config": {
            "Model": {
              "type": "dropdown",
              "value": "all-MiniLM-L6-v2",
              "description": "Select an HuggingFace Embedding Model",
              "values": [
                "all-MiniLM-L6-v2",
                "mixedbread-ai/mxbai-embed-large-v1",
                "all-mpnet-base-v2",
                "BAAI/bge-m3",
                "all-MiniLM-L12-v2",
                "paraphrase-MiniLM-L6-v2"
              ]
            }
          },
          "type": "",
          "available": false
        },
        "Weaviate": {
          "name": "Weaviate",
          "variables": [],
          "library": [],
          "description": "Vectorizes documents and queries using Weaviate's In-House Embedding Service.",
          "config": {
            "Model": {
              "type": "dropdown",
              "value": "Embedding Service",
              "description": "Select a Weaviate Embedding Service Model",
              "values": [
                "Embedding Service"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "Weaviate Embedding Service Key (or set EMBEDDING_SERVICE_KEY env var)",
              "values": []
            },
            "URL": {
              "type": "text",
              "value": "",
              "description": "Weaviate Embedding Service URL (if different from default)",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "VoyageAI": {
          "name": "VoyageAI",
          "variables": [],
          "library": [],
          "description": "Vectorizes documents and queries using VoyageAI",
          "config": {
            "Model": {
              "type": "dropdown",
              "value": "voyage-2",
              "description": "Select a VoyageAI Embedding Model",
              "values": [
                "voyage-2",
                "voyage-large-2",
                "voyage-finance-2",
                "voyage-multilingual-2",
                "voyage-law-2",
                "voyage-code-2"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "OpenAI API Key (or set OPENAI_API_KEY env var)",
              "values": []
            },
            "URL": {
              "type": "text",
              "value": "https://api.voyageai.com/v1",
              "description": "OpenAI API Base URL (if different from default)",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "Cohere": {
          "name": "Cohere",
          "variables": [],
          "library": [],
          "description": "Vectorizes documents and queries using Cohere",
          "config": {
            "Model": {
              "type": "dropdown",
              "value": "embed-english-v3.0",
              "description": "Select a Cohere Embedding Model",
              "values": [
                "embed-english-v3.0",
                "embed-multilingual-v3.0",
                "embed-english-light-v3.0",
                "embed-multilingual-light-v3.0"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "You can set your Cohere API Key here or set it as environment variable `COHERE_API_KEY`",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "OpenAI": {
          "name": "OpenAI",
          "variables": [],
          "library": [],
          "description": "Vectorizes documents and queries using OpenAI",
          "config": {
            "Model": {
              "type": "dropdown",
              "value": "text-embedding-3-small",
              "description": "Select an OpenAI Embedding Model",
              "values": [
                "text-embedding-ada-002",
                "text-embedding-3-small",
                "text-embedding-3-large"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "OpenAI API Key (or set OPENAI_API_KEY env var)",
              "values": []
            },
            "URL": {
              "type": "text",
              "value": "https://api.openai.com/v1",
              "description": "OpenAI API Base URL (if different from default)",
              "values": []
            }
          },
          "type": "",
          "available": true
        }
      }
    },
    "Retriever": {
      "selected": "Advanced",
      "components": {
        "Advanced": {
          "name": "Advanced",
          "variables": [],
          "library": [],
          "description": "Retrieve relevant chunks from Weaviate",
          "config": {
            "Suggestion": {
              "type": "bool",
              "value": 1,
              "description": "Enable Autocomplete Suggestions",
              "values": []
            },
            "Search Mode": {
              "type": "dropdown",
              "value": "Hybrid Search",
              "description": "Switch between search types.",
              "values": [
                "Hybrid Search"
              ]
            },
            "Limit Mode": {
              "type": "dropdown",
              "value": "Autocut",
              "description": "Method for limiting the results. Autocut decides automatically how many chunks to retrieve, while fixed sets a fixed limit.",
              "values": [
                "Autocut",
                "Fixed"
              ]
            },
            "Limit/Sensitivity": {
              "type": "number",
              "value": 1,
              "description": "Value for limiting the results. Value controls Autocut sensitivity and Fixed Size",
              "values": []
            },
            "Chunk Window": {
              "type": "number",
              "value": 1,
              "description": "Number of surrounding chunks of retrieved chunks to add to context",
              "values": []
            },
            "Threshold": {
              "type": "number",
              "value": 80,
              "description": "Threshold of chunk score to apply window technique (1-100)",
              "values": []
            }
          },
          "type": "",
          "available": true
        }
      }
    },
    "Generator": {
      "selected": "Ollama",
      "components": {
        "Ollama": {
          "name": "Ollama",
          "variables": [],
          "library": [],
          "description": "Generate answers using Ollama. If your Ollama instance is not running on http://localhost:11434, you can change the URL by setting the OLLAMA_URL environment variable.",
          "config": {
            "System Message": {
              "type": "text",
              "value": "You are Fit T Centenarian, a chatbot for Retrieval Augmented Generation (RAG). You will receive a user query and context pieces that have a semantic similarity to that query. Please answer these user queries only with the provided context. Mention documents you used from the context if you use them to reduce hallucination. If the provided documentation does not provide enough information, say so. If the user asks questions about you as a chatbot specifially, answer them naturally. If the answer requires code examples encapsulate them with ```programming-language-name ```. Don't do pseudo-code.",
              "description": "System Message",
              "values": []
            },
            "Model": {
              "type": "dropdown",
              "value": "llama3.2:latest",
              "description": "Select an installed Ollama model from http://localhost:11434.",
              "values": [
                "snowflake-arctic-embed:latest",
                "llama3:latest",
                "codegemma:latest",
                "llama3.2:latest"
              ]
            }
          },
          "type": "",
          "available": true
        },
        "OpenAI": {
          "name": "OpenAI",
          "variables": [],
          "library": [],
          "description": "Using OpenAI LLM models to generate answers to queries",
          "config": {
            "System Message": {
              "type": "text",
              "value": "You are Verba, a chatbot for Retrieval Augmented Generation (RAG). You will receive a user query and context pieces that have a semantic similarity to that query. Please answer these user queries only with the provided context. Mention documents you used from the context if you use them to reduce hallucination. If the provided documentation does not provide enough information, say so. If the user asks questions about you as a chatbot specifially, answer them naturally. If the answer requires code examples encapsulate them with ```programming-language-name ```. Don't do pseudo-code.",
              "description": "System Message",
              "values": []
            },
            "Model": {
              "type": "dropdown",
              "value": "gpt-4o",
              "description": "Select an OpenAI Embedding Model",
              "values": [
                "gpt-4o",
                "gpt-3.5-turbo"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "You can set your OpenAI API Key here or set it as environment variable `OPENAI_API_KEY`",
              "values": []
            },
            "URL": {
              "type": "text",
              "value": "https://api.openai.com/v1",
              "description": "You can change the Base URL here if needed",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "Anthropic": {
          "name": "Anthropic",
          "variables": [],
          "library": [],
          "description": "Using Anthropic LLM models to generate answers to queries",
          "config": {
            "System Message": {
              "type": "text",
              "value": "You are Verba, a chatbot for Retrieval Augmented Generation (RAG). You will receive a user query and context pieces that have a semantic similarity to that query. Please answer these user queries only with the provided context. Mention documents you used from the context if you use them to reduce hallucination. If the provided documentation does not provide enough information, say so. If the user asks questions about you as a chatbot specifially, answer them naturally. If the answer requires code examples encapsulate them with ```programming-language-name ```. Don't do pseudo-code.",
              "description": "System Message",
              "values": []
            },
            "Model": {
              "type": "dropdown",
              "value": "claude-3-5-sonnet-20240620",
              "description": "Select an Anthropic Model",
              "values": [
                "claude-3-5-sonnet-20240620"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "You can set your Anthropic API Key here or set it as environment variable `ANTHROPIC_API_KEY`",
              "values": []
            }
          },
          "type": "",
          "available": true
        },
        "Cohere": {
          "name": "Cohere",
          "variables": [],
          "library": [],
          "description": "Generator using Cohere's command-r-plus model",
          "config": {
            "System Message": {
              "type": "text",
              "value": "You are Verba, a chatbot for Retrieval Augmented Generation (RAG). You will receive a user query and context pieces that have a semantic similarity to that query. Please answer these user queries only with the provided context. Mention documents you used from the context if you use them to reduce hallucination. If the provided documentation does not provide enough information, say so. If the user asks questions about you as a chatbot specifially, answer them naturally. If the answer requires code examples encapsulate them with ```programming-language-name ```. Don't do pseudo-code.",
              "description": "System Message",
              "values": []
            },
            "Model": {
              "type": "dropdown",
              "value": "embed-english-v3.0",
              "description": "Select a Cohere Embedding Model",
              "values": [
                "embed-english-v3.0",
                "embed-multilingual-v3.0",
                "embed-english-light-v3.0",
                "embed-multilingual-light-v3.0"
              ]
            },
            "API Key": {
              "type": "password",
              "value": "",
              "description": "You can set your Cohere API Key here or set it as environment variable `COHERE_API_KEY`",
              "values": []
            }
          },
          "type": "",
          "available": true
        }
      }
    }
  },
  "file_size": 11918470,
  "status": "ERROR"
}

hi @bam !

This error seems to be coming from ollama.

Can you get something from ollama logs?

Idk where to see the logs. I googled it and it says look in the tray. The only option I have is to quit ollama. I’m still searching for the logs.

In Mac (and probably linux too) it is here:

tail -n 100  ~/.ollama/logs/server.log

I increased the docker memory allocation to 10 GB and trying again.

Thank you.
Here are the logs from that file:

time=2024-11-14T15:39:42.076-05:00 level=INFO source=server.go:601 msg="llama runner started in 8.17 seconds"
llama_model_loader: loaded meta data with 30 key-value pairs and 255 tensors from /Users/bam/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 3B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 3B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 28
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 24
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  18:                          general.file_type u32              = 15
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   58 tensors
llama_model_loader: - type q4_K:  168 tensors
llama_model_loader: - type q6_K:   29 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 1
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = all F32
llm_load_print_meta: model params     = 3.21 B
llm_load_print_meta: model size       = 1.87 GiB (5.01 BPW) 
llm_load_print_meta: general.name     = Llama 3.2 3B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2024-11-14T15:39:43.778-05:00 level=WARN source=runner.go:126 msg="truncating input prompt" limit=2048 prompt=6327 numKeep=5
[GIN] 2024/11/14 - 15:43:05 | 200 |         3m31s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/11/14 - 15:44:14 | 200 | 22.751794079s |       127.0.0.1 | POST     "/api/embed"
ggml.c:13343: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
ggml.c:13343: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
time=2024-11-14T15:46:43.252-05:00 level=WARN source=server.go:482 msg="llama runner process no longer running" sys=6 string="signal: abort trap"
er running" sys=6 string="signal: abort trap"
time=2024-11-14T15:46:43.252-05:00 level=WARN source=server.go:482 msg="llama runner process no longer running" sys=6 string="signal: abort trap"
time=2024-11-14T15:46:43.252-05:00 level=WARN source=server.go:482 msg="llama runner process no longer running" sys=6 string="signal: abort trap"
time=2024-11-14T15:46:43.252-05:00 level=ERROR source=routes.go:453 msg="embedding generation failed" error="do embedding request: Post \"http://127.0.0.1:62404/embedding\": EOF"
[GIN] 2024/11/14 - 15:46:43 | 500 |         2m51s |       127.0.0.1 | POST     "/api/embed"
time=2024-11-14T15:46:43.257-05:00 level=ERROR source=routes.go:453 msg="embedding generation failed" error="llama runner process no longer running: -1 GGML_ASSERT(i01 >= 0 && i01 < ne01) failed\nggml.c:13343: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed"
time=2024-11-14T15:46:43.258-05:00 level=WARN source=server.go:482 msg="llama runner process no longer running" sys=6 string="signal: abort trap"
[GIN] 2024/11/14 - 15:46:43 | 500 |         2m52s |       127.0.0.1 | POST     "/api/embed"
time=2024-11-14T15:46:43.289-05:00 level=ERROR source=routes.go:453 msg="embedding generation failed" error="llama runner process no longer running: -1 GGML_ASSERT(i01 >= 0 && i01 < ne01) failed\nggml.c:13343: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed"
time=2024-11-14T15:46:43.289-05:00 level=WARN source=server.go:482 msg="llama runner process no longer running" sys=6 string="signal: abort trap"
time=2024-11-14T15:46:43.289-05:00 level=ERROR source=routes.go:453 msg="embedding generation failed" error="llama runner process no longer running: -1 GGML_ASSERT(i01 >= 0 && i01 < ne01) failed\nggml.c:13343: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed"
[GIN] 2024/11/14 - 15:46:43 | 500 |         2m52s |       127.0.0.1 | POST     "/api/embed"
[GIN] 2024/11/14 - 15:46:43 | 500 |         2m52s |       127.0.0.1 | POST     "/api/embed"
time=2024-11-14T15:46:43.295-05:00 level=ERROR source=routes.go:453 msg="embedding generation failed" error="llama runner process no longer running: -1 GGML_ASSERT(i01 >= 0 && i01 < ne01) failed\nggml.c:13343: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed"
[GIN] 2024/11/14 - 15:46:43 | 500 |         2m52s |       127.0.0.1 | POST     "/api/embed"

I increased the memory allocation to 12 GB and swap to 2GB and the issue persists. It happens on large (>80 mb) and small (< 1mb) pdf files.

Interesting.

It seems ollama is failing to vectorize those files.

If you can share those files with me, I can try running that in my machine on a mac m2 pro.