Keep getting this error "'vectorizer: no module with name "text2vec-openai" present'" when trying to use weaviate deployed in Kubernetes

I have weaviate installed and running in a local kubernetes cluster:

>kubectl port-forward svc/weaviate -n weaviate 80
Forwarding from 127.0.0.1:80 -> 8080
Forwarding from [::1]:80 -> 8080

I have the following script called backend.py where I want to read some PDF documents, create a vector store index and store it in my local kubernetes cluster

###########
backend.py 
###########

from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.weaviate import Weaviate
import weaviate, os
from langchain.embeddings import OpenAIEmbeddings
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

client = weaviate.Client('http://localhost:80')

# Load the documents
doc_loader = DirectoryLoader(
    r'C:\Users\username\Documents\Docs',
    glob='**/*.pdf',
    show_progress=True
)
docs = doc_loader.load()

# Split the docs into chunks
splitter = CharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=300
)
splitted_docs_list = splitter.split_documents(docs)

# Create schema
if client.schema.exists('classname'):
    client.schema.delete_class('classname')

class_obj = {
    "class": "classname",
    "vectorizer": "text2vec-openai",
    "moduleConfig": {
        "text2vec-openai": {
            "vectorizeClassName": True
        }
    }
}

client.schema.create_class(class_obj)

embeddings = OpenAIEmbeddings()

vectorstore = Weaviate(client, "classname", "text", embedding=embeddings)

# add text chunks' embeddings to the Weaviate vector database
texts = [d.page_content for d in splitted_docs_list]
metadatas = [d.metadata for d in splitted_docs_list]
vectorstore.add_texts(texts, metadatas=metadatas, embedding=embeddings)

When running python backend.py, I get the following error:

>python backend.py
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 15/15 [00:45<00:00,  3.03s/it]
Traceback (most recent call last):
  File "C:\Users\username\Documents\App\backend.py", line 75, in <module>
    client.schema.create_class(class_obj)
  File "C:\Users\username\AppData\Local\Programs\Python\Python311\Lib\site-packages\weaviate\schema\crud_schema.py", line 250, in create_class
    self._create_class_with_primitives(loaded_schema_class)
  File "C:\Users\username\AppData\Local\Programs\Python\Python311\Lib\site-packages\weaviate\schema\crud_schema.py", line 814, in _create_class_with_primitives
    raise UnexpectedStatusCodeException("Create class", response)
weaviate.exceptions.UnexpectedStatusCodeException: Create class! Unexpected status code: 422, with response body: {'error': [{'message': 'vectorizer: no module with name "text2vec-openai" present'}]}.

Why is it failing to create the schema? Can someone help me understand it and fix it?


Interestingly, in case its relevant, when I was running the weaviate client in WCS, I was not getting any error and I had a working app, and in fact the schema structure I had defined was pretty simple too:

class_obj = {
    "class": "classname",
    "vectorizer": "text2vec-openai",
}
client.schema.create_class(class_obj)

But since it was not working after I changed the weaviate client to the kubernetes cluster, and I was getting the same error above, I decided to try other schemas where I explicitly mention the vectorizer. But its not working either.

Hey @Kristada673 ,

It looks like your deployment is not configured to include text2vec-openai.
Can you share your compose file?

Here is an example docker-compose file, I use with Docker deployments.
I guess these tend to be similar to Kubernetes configs.

Note: the key is in the ENABLE_MODULES config, where I have text2vec-openai module listed (amongst many other modules I use).

---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:1.21.2
    ports:
    - 8080:8080
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'none'
      ENABLE_MODULES: 'text2vec-cohere,text2vec-huggingface,text2vec-palm,text2vec-openai,generative-openai,generative-cohere,generative-palm,ref2vec-centroid,reranker-cohere,qna-openai'
      CLUSTER_HOSTNAME: 'node1'
volumes:
  weaviate_data:
...
1 Like

My guess for Kubernetes, you need to add a configuration like the one here.

I think it should be enough to add/update:

  text2vec-openai:
    # enable if you want to use OpenAI module
    enabled: true

Yeah, I figured that out eventually myself. The deployment was actually done by our devops guy, so I asked him to check if there’s anything in the settings which said vectorizers can’t be specified, or only the default vectorizer can be used (which was null)? He said yeah, and then reconfigured it to use text2vec-openai by default.

But this kind of stuff is not super clear from the documentation, and can be only found out after a lot of digging online. Specially because if I’m using WCS deployment, the text2vec-openai is the default vectorizer, so one wouldn’t know to expect that it wouldn’t be the case in kubernetes deployment where null is the default.

Yes, I agree that the journey around the docker configuration is not perfect (…yet :wink: ), although the default config we share in our docs includes text2vec-openai by default :wink:

Thank you for your feedback

We are constantly working to improve our docs, so thank you for your feedback, which helps us pick our battles :pray:

Planned improvements

FYI. we are working on a UX improvement around models hosted by integrated providers (like AWS, Cohere, HuggingFace, Google, OctoAI, OpenAI, VoyageAI, etc) – where we will include them all out of the box. This way you won’t need to include them in the docker/kuberenetes configuration, as they will already be there :wink: