I have weaviate installed and running in a local kubernetes cluster:
>kubectl port-forward svc/weaviate -n weaviate 80
Forwarding from 127.0.0.1:80 -> 8080
Forwarding from [::1]:80 -> 8080
I have the following script called backend.py
where I want to read some PDF documents, create a vector store index and store it in my local kubernetes cluster
###########
backend.py
###########
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.weaviate import Weaviate
import weaviate, os
from langchain.embeddings import OpenAIEmbeddings
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = weaviate.Client('http://localhost:80')
# Load the documents
doc_loader = DirectoryLoader(
r'C:\Users\username\Documents\Docs',
glob='**/*.pdf',
show_progress=True
)
docs = doc_loader.load()
# Split the docs into chunks
splitter = CharacterTextSplitter(
chunk_size=1000,
chunk_overlap=300
)
splitted_docs_list = splitter.split_documents(docs)
# Create schema
if client.schema.exists('classname'):
client.schema.delete_class('classname')
class_obj = {
"class": "classname",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"vectorizeClassName": True
}
}
}
client.schema.create_class(class_obj)
embeddings = OpenAIEmbeddings()
vectorstore = Weaviate(client, "classname", "text", embedding=embeddings)
# add text chunks' embeddings to the Weaviate vector database
texts = [d.page_content for d in splitted_docs_list]
metadatas = [d.metadata for d in splitted_docs_list]
vectorstore.add_texts(texts, metadatas=metadatas, embedding=embeddings)
When running python backend.py
, I get the following error:
>python backend.py
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 15/15 [00:45<00:00, 3.03s/it]
Traceback (most recent call last):
File "C:\Users\username\Documents\App\backend.py", line 75, in <module>
client.schema.create_class(class_obj)
File "C:\Users\username\AppData\Local\Programs\Python\Python311\Lib\site-packages\weaviate\schema\crud_schema.py", line 250, in create_class
self._create_class_with_primitives(loaded_schema_class)
File "C:\Users\username\AppData\Local\Programs\Python\Python311\Lib\site-packages\weaviate\schema\crud_schema.py", line 814, in _create_class_with_primitives
raise UnexpectedStatusCodeException("Create class", response)
weaviate.exceptions.UnexpectedStatusCodeException: Create class! Unexpected status code: 422, with response body: {'error': [{'message': 'vectorizer: no module with name "text2vec-openai" present'}]}.
Why is it failing to create the schema? Can someone help me understand it and fix it?
Interestingly, in case its relevant, when I was running the weaviate client in WCS, I was not getting any error and I had a working app, and in fact the schema structure I had defined was pretty simple too:
class_obj = {
"class": "classname",
"vectorizer": "text2vec-openai",
}
client.schema.create_class(class_obj)
But since it was not working after I changed the weaviate client to the kubernetes cluster, and I was getting the same error above, I decided to try other schemas where I explicitly mention the vectorizer. But its not working either.