weaviate-client==4.7.1
langchain-weaviate==0.0.2
langchain==0.2.11
I am able to create a simple example to create a ‘db’ and use that db to do inference in one flow:
from bge import bge_m3_embedding
print(f'Read in text ...')
loader = TextLoader('state_of_the_union.txt')
documents = loader.load()
print('Split text ...')
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
print('Load embedding model ...')
embedding_model = bge_m3_embedding
print('Embed docs ...')
weaviate_client = weaviate.connect_to_local()
db = WeaviateVectorStore.from_documents(docs, embedding_model, client=weaviate_client, index_name='test')
#db = WeaviateVectorStore.from_documents([], embedding_model, client=weaviate_client, index_name='test')
# print('Perform search ...')
query = 'What did the president say about Ketanji Brown Jackson'
results = db.similarity_search_with_score(query, alpha=1)
for i, doc in enumerate(results):
print(f'{i}--->{doc[1]:.3f}')
print(results[0])
#
weaviate_client.close()
This works all fine. The db is created and similar docs are retrieved. However, now if I wan to use this ‘db’ to run the same query, I got an outofindex error:
print('Load embedding model ...')
embedding_model = bge_m3_embedding
print('Load embedded docs ...')
weaviate_client = weaviate.connect_to_local()
db = WeaviateVectorStore.from_documents([], embedding_model, client=weaviate_client, index_name='test')
# print('Perform search ...')
query = 'What did the president say about Ketanji Brown Jackson'
results = db.similarity_search_with_score(query, alpha=1)
for i, doc in enumerate(results):
print(f'{i}--->{doc[1]:.3f}')
print(results[0])
And the error message is below:
Traceback (most recent call last):
File "/Users/I747411/ai/lc_weaviate.py", line 22, in <module>
db = WeaviateVectorStore.from_documents([], embedding_model, client=weaviate_client, index_name='test')
File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_core/vectorstores/base.py", line 1058, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_weaviate/vectorstores.py", line 487, in from_texts
weaviate_vector_store.add_texts(texts, metadatas, tenant=tenant, **kwargs)
File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_weaviate/vectorstores.py", line 165, in add_texts
embeddings = self._embedding.embed_documents(list(texts))
File "/Users/I747411/ai/venv/lib/python3.10/site-packages/langchain_community/embeddings/huggingface.py", line 331, in embed_documents
embeddings = self.client.encode(
File "/Users/I747411/ai/venv/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 565, in encode
if all_embeddings[0].dtype == torch.bfloat16:
IndexError: list index out of range
/Users/I747411/ai/venv/lib/python3.10/site-packages/weaviate/warnings.py:303: ResourceWarning: Con004: The connection to Weaviate was not closed properly. This can lead to memory leaks.
Please make sure to close the connection using `client.close()`.
Please see the error message: “IndexError: list index out of range”.
What’s the proper way to use existing vector db to do inference? Please help!