Langchain WeaviateHybridSearchRetriever with filters?

Just_Guide7361 · August 7, 2024, 7:54pm

I am currently building a Q&A interface with Streamlit and Langchain. Our initial vector database was in Pinecone. We have documents about the same topic, but different industries. Pure embedding search is not optimal, as it will match the same concepts across industries. So, we build a simple selector option where users pick their industry, and then ask the question. In pinecone each industry had their own namespace, we then simply filter on this:

vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings, namespace=namespace)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

Hybrid search with pinecone is not as convenient as with Weaviate, and since we noticed beter performance with hybrid search we are switching to Weaviate. The downside is that filters are not so clear for the Weaviate retriever.

retriever = WeaviateHybridSearchRetriever(
        client=client,
        index_name=WEAVIATE_INDEX_NAME,
        text_key="page_content",
        k=5,
        alpha=0.75,
        attributes=["file_name", "industry],
        create_schema_if_missing=False,
    )

Our Langchain Chain looks similar to this ( langchain/templates/hybrid-search-weaviate/hybrid_search_weaviate/chain.py at master · langchain-ai/langchain · GitHub ):

# RAG prompt
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# RAG
model = ChatOpenAI()
chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | model
    | StrOutputParser()
)

The docs do show this:

retriever.invoke(
    "AI integration in society",
    where_filter={
        "path": ["author"],
        "operator": "Equal",
        "valueString": "Prof. Jonathan K. Sterling",
    },
)

Does anyone know how/where to add the where_filter parameter for Weaviate hybrid search in the Chain?

DudaNogueira · August 7, 2024, 7:57pm

hi @Just_Guide7361 !!

Welcome to our community

Sorry, your topic was stuck on some anti spam check

We have a recipe that you will probably benefit here:

For instance, this is how you can use langchain and filters:

from weaviate import classes as wvc
# change bellow to get chunks per different files / countries
source_file = "brazil-wikipedia-article-text.pdf"
#source_file = "netherlands-wikipedia-article-text.pdf"
where_filter = wvc.query.Filter.by_property("source").equal(source_file)
docs = db.similarity_search("traditional food", filters=where_filter)
print(docs)

Let me know if this helps!

Thanks!

DudaNogueira · August 7, 2024, 9:12pm

Hi!

I have just updated that langchain recipe as it had some deprecations.

here is the part you are interested:

from langchain_openai import OpenAI
from langchain.chains import RetrievalQA

# Let's answer some question
#source_file = "brazil-wikipedia-article-text.pdf"
source_file = "netherlands-wikipedia-article-text.pdf"
where_filter = wvc.query.Filter.by_property("source").equal(source_file)

# we want our retriever to filter the results
retriever = db.as_retriever(search_kwargs={"filters": where_filter})

qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=os.environ.get("OPENAI_API_KEY")),
                                 chain_type="stuff", 
                                 retriever=retriever, 
                                 chain_type_kwargs=chain_type_kwargs, 
                                 return_source_documents=True)
                                 
answer = qa({"query": "What is the traditional food of this country?"})
print(answer)

While this example only uses one operand filter, you can easily add more logic.

For example multiple operands:

And nested filters:

Hope this helps!

Thanks!

Just_Guide7361 · August 9, 2024, 10:41am

@DudaNogueira thank you for the quick reply. However, using the RetrievalQA is not ideal. As this one is deprecated in newer versions (langchain.chains.retrieval_qa.base.RetrievalQA — 🦜🔗 LangChain 0.2.12).

They recommend using create_retrieval_chain (langchain.chains.retrieval.create_retrieval_chain — 🦜🔗 LangChain 0.2.12). Which is using the LCEL principles.

Are there any plans to update the recipe/examples with this?

DudaNogueira · August 9, 2024, 1:03pm

Hi @Just_Guide7361 !!

Thanks for pointing it out!!

I will take the opportunity and also write a recipe using the multi tenancy feature with langchain.

here is a working code using create_retrieval_chain (I will update the recipe later today):

# ...
from weaviate.classes.query import Filter

# client = weaviate.connect_to_weaviate_cloud(...)

embeddings = OpenAIEmbeddings()
db = WeaviateVectorStore.from_documents([], embeddings, client=client, index_name="WikipediaLangChain")

source_file = "brazil-wikipedia-article-text.pdf"
#source_file = "netherlands-wikipedia-article-text.pdf"
where_filter = Filter.by_property("source").equal(source_file)

# we want our retriever to filter the results
retriever = db.as_retriever(search_kwargs={"filters": where_filter})

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

llm = ChatOpenAI(model="gpt-4o-mini")
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "What is he traditional food of this country?"})
print(response["answer"])

By the way, we host a lot of online and in presence webinars and workshops. Check it out: Online Workshops & Events | Weaviate - Vector Database

Thanks and hope you are enjoying your “Weaviate journey”!!

Just_Guide7361 · August 11, 2024, 12:52pm

Nevermind fixed it.

Hey, thank you for the quick replies. I have tried your example but sadly it does not work. When initialising the db, I get an “list index out of range error”.

Here is my code:

from langchain_cohere import CohereEmbeddings
import weaviate
from weaviate.classes.init import Auth
from weaviate.classes.query import Filter

embeddings = CohereEmbeddings(model=EMBEDDINGS_MODEL, cohere_api_key=COHERE_API_KEY)

headers = {
    "X-Cohere-Api-Key": COHERE_API_KEY,
}

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=WEAVIATE_URL,  
    auth_credentials=Auth.api_key(WEAVIATE_API_KEY), 
    headers=headers,
)

db = WeaviateVectorStore.from_documents([], embeddings, client=client, index_name=index_name)

should become: 

db = WeaviateVectorStore(embeddings= embeddings, client=client, index_name=index_name)

where_filter = Filter.by_property(property_to_filter).equal(selected_property_by_user)
retriever = db.as_retriever(search_kwargs={"filters": where_filter, "alpha": 0.8})
retrieved_files = retriever.invoke(user_query)

I’ve inserted my documents as follows:

embeddings = CohereEmbeddings(
    model=EMBEDDINGS_MODEL,
    cohere_api_key=COHERE_API_KEY,
)

db = WeaviateVectorStore.from_documents(langchain_document, embeddings, client=client, index_name=index_name)

Using the weaviate client I am able to retrieve documents, when I initialise the db with the langchain_document I am also able to retrieve, but when I initialise it with an empty array it does not work.

Ideally ofcourse I do not have to pass the langchain_document to the db each time I want to use the weaviate db.

Can you point out where I am going wrong?

Thanks!

DudaNogueira · August 13, 2024, 12:30pm

Hi @Just_Guide7361 !

I understand you were able to make it work, right?

Let me know if there is any other blocker we can help you with.

We are here to help you on this journey

Thanks!

DudaNogueira · August 13, 2024, 12:43pm

Hi again @Just_Guide7361 !!

I believe this thread is related to the issue you had:

Thanks for sharing the solution!

Topic		Replies	Views
Langchain integration with Weaviate Client v4 - Hybrid search not working Support	8	480	May 27, 2025
Need help combining weaviate with langchain Support	8	3051	April 5, 2024
Integration of weaviate and langchain, how to use hybrid in v4 like as_retriever in v3 Support python	1	372	March 19, 2025
WeaviateHybridSearchRetriever with WeaviateAsyncClient Support	2	307	September 8, 2024
Unable to get expected results using BM25 or any search functions Support	8	854	July 3, 2024

Langchain WeaviateHybridSearchRetriever with filters?

Related topics