I am currently building a Q&A interface with Streamlit and Langchain. Our initial vector database was in Pinecone. We have documents about the same topic, but different industries. Pure embedding search is not optimal, as it will match the same concepts across industries. So, we build a simple selector option where users pick their industry, and then ask the question. In pinecone each industry had their own namespace, we then simply filter on this:
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings, namespace=namespace)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
Hybrid search with pinecone is not as convenient as with Weaviate, and since we noticed beter performance with hybrid search we are switching to Weaviate. The downside is that filters are not so clear for the Weaviate retriever.
retriever = WeaviateHybridSearchRetriever(
client=client,
index_name=WEAVIATE_INDEX_NAME,
text_key="page_content",
k=5,
alpha=0.75,
attributes=["file_name", "industry],
create_schema_if_missing=False,
)
Our Langchain Chain looks similar to this ( langchain/templates/hybrid-search-weaviate/hybrid_search_weaviate/chain.py at master · langchain-ai/langchain · GitHub ):
# RAG prompt
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
# RAG
model = ChatOpenAI()
chain = (
RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
| prompt
| model
| StrOutputParser()
)
The docs do show this:
retriever.invoke(
"AI integration in society",
where_filter={
"path": ["author"],
"operator": "Equal",
"valueString": "Prof. Jonathan K. Sterling",
},
)
Does anyone know how/where to add the where_filter parameter for Weaviate hybrid search in the Chain?