Hello Weaviate noob here, I created a vectorizer using LangChain that works fine. And it does persist to cloud after creation. But suppose at some later date I want to get the vectorizer without rebuilding it? I believe the line would be:
I am far from being a LangChain expert. On top of that, I am also a Weaviate noob myself, as I recently joined Weaviate, hehehe.
What I could find is that, while you need to specify text_key for the main Class instantiation, it will have no effect while passing it to from_documents(). It is hardcoded to text here:
text_key will be the property where text will be stored:
I am not sure if hardcoding text at from_texts() is a good thing, because you tie all from_documents() and from_texts()name import to that property, leaving no other option while importing content.
So now, it will depend on how you imported your data (if it was using from_documents, your text_key will be text)
So here is something that have worked for me:
from langchain.vectorstores import Weaviate
import weaviate
# considering you have docs, embeddings, dependencies, etc
WEAVIATE_URL = "http://localhost:8080"
db = Weaviate.from_documents(docs, embeddings, weaviate_url=WEAVIATE_URL, by_text=False, index_name="MyIndex")
# now, you can:
client = weaviate.Client(WEAVIATE_URL)
db = Weaviate(client=client, index_name="MyIndex", text_key="text")
db.similarity_search_by_text(query="health")
Thanks it works, even with a BGE Huggingaface embedding, but I see that Langchain creates 2 classes i.e. “MyIndex” and “LangChain”. But it does NOT work for the “WeaviateHybridSearchRetriever” if u define:
retriever = WeaviateHybridSearchRetriever(
client=client,
index_name=“MyIndex”,
text_key=“text”,
attributes=,
embedding=BGEembedding,
create_schema_if_missing=True
)
and response = retriever.get_relevant_documents(query=“some question?”) gives error.
ValueError: Error during query: [{‘locations’: [{‘column’: 6, ‘line’: 1}], ‘message’: ‘get vector input from modules provider: VectorFromInput was called without vectorizer’, ‘path’: [‘Get’, ‘MyIndex’]}]
Default the “LangChain” class use OpenAI ada embedding.
Hello.If I have two parameters that need to be passed to text_key, how should I handle it? The actual problem I am facing is that I want to return content and source.Thanks.