Weaviate custon Retriever

yuri_Golfeto · June 3, 2024, 2:41pm

Description

Hello the simple description about what i want to do.
at my program i was creating a Collection for any data or archives, with this properties
“file_name”: wc.DataType.TEXT,
“file_type”: wc.DataType.TEXT,
“file_version”: wc.DataType.TEXT,
“splitter_method”: wc.DataType.TEXT,
“splitter_args”: wc.DataType.TEXT,
“type”: wc.DataType.TEXT,
“url”: wc.DataType.TEXT,
“uuid”: wc.DataType.UUID,
“version”: wc.DataType.TEXT,
“page_content”: wc.DataType.TEXT_ARR
“metadata”: wc.DataType.TEXT_ARRAY,

also use the t2v_transformers from cr.weaviate.io
i want to create a retriever from the “page_content”(vectorized) and “metadata”(non vectorized) without use another embedding sistem and also without create a new collection from weaviate Abstractions,
can i do it ?

Server Setup Information

Weaviate Server Version: 1.25.1
Deployment Method: docker - embedded
Client Language and Version: python
Multitenancy?: idk

Any additional Information

i already can insert the data into a weaviate DB, i just want make a retriever from this DB TEXT_ARRAY in a object with especifica UUID5
i also want to ask if have any tutorial from weaviate about how to do a VectorStore and retriever without langchain abstraction.
any other questions quem ask me

DudaNogueira · June 3, 2024, 10:50pm

Hi @yuri_Golfeto !! Welcome to our community

Not sure I understood your first issue

For the second, using Weaviate directly AND using langchain, I have written a nice recipe here:

github.com

weaviate/recipes/blob/cdead57111cfd384ed8218f58fb7f6fd72e05afd/integrations/llm-frameworks/langchain/loading-data/langchain-simple-pdf.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multilanguage RAG filtering by multiple PDFs with Langchain and OpenAi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# lets install our super tools\n",
    "%pip install -Uqq langchain-weaviate\n",
    "%pip install openai tiktoken langchain"
   ]
  },

This file has been truncated. show original

If you use this approach, you can probably retrieve your data the way you want (maybe solving the first issue?)

If you could, please, elaborate on your first issue?

Let me know if this helps

Thanks!

yuri_Golfeto · June 4, 2024, 5:23pm

isen’t this what i want, but this one work well, i just changed the class
let me explain in a better way,
at my idea i want have the “master” Collection called Collection_ingestor.
why?
reason: i want create more them one Vector Store with many types of documents and different types of text_Split.
at the properties i sendo before i just insert all of documentos Splits inside the Page_Content type of Text_ARRAY.
this didn’t work because at vetorizarion they vectorize the Object, didin’t Vectorize each part for each part.
now can you send me the recipe of Weaviate HybridSearch, this recipe its 100% better to understand.

yuri_Golfeto · June 4, 2024, 7:48pm

more one question, why i cannot create 2 WeaviateVectorStore.from_documents
in a same weaviate connection ?? or how i do it ?

DudaNogueira · June 10, 2024, 6:58pm

Hi!

There are some recipes on hybrid search here directly in Weavaite here:

As you are using langchain, you can use it thru that integration.

You can create as many vectorstore you want.

Each time you pass the index_name, for Weaviate, it’s about a collection.

Sor for example, creating 1 vectorstore:

db_collection_a = WeaviateVectorStore.from_documents(docs, embeddings, client=client, index_name="CollectionA")

db_collection_b = WeaviateVectorStore.from_documents(docs, embeddings, client=client, index_name="CollectionB")

you can pass the same client, just change the index_name.

Also, if you don’t want to ingest data, simply pass an empty docs, like so:

db_collection_a = WeaviateVectorStore.from_documents([], embeddings, client=client, index_name="CollectionA")

let me know if this helps

yuri_Golfeto · June 12, 2024, 2:31pm

Hello Duda, that helped me a lot,
let me show what a want to do

was any way to use 2 vector store as a same retriever, like MultiRetrievalQAChain but from Weaviate side ?
reason why i ask from a Weaviate side: Weaviate repository from langchain its easier to understand, more efficient and make the things better and trustworthy

DudaNogueira · June 12, 2024, 5:56pm

You can only query one collection in Weaviate.

If you want to query over multiple contents, you will need to add them to the same collection, and filter out the contents you want using filters.

Not sure if that would be possible with Langchain (meaning langchain would perform query in different collections then merge the results).

Let me know if this helps!

THanks!

yuri_Golfeto · June 21, 2024, 7:57am

for now i thing its enough help.

Topic		Replies	Views
Weaviate Hybrid Retriever issue in Langchain for custom vectors Support	3	1293	December 11, 2023
Integration of weaviate and langchain, how to use hybrid in v4 like as_retriever in v3 Support python	1	275	March 19, 2025
WeaviateHybridSearchRetriever with WeaviateAsyncClient Support	2	181	September 8, 2024
How to get the Vector Store from Document Splitted and Embedding Support python	3	728	June 10, 2024
How to access/search data ingested through Weaviate client in langchain / langchain-weaviate? Support wcs , python	7	759	July 15, 2024

Weaviate custon Retriever

Description

Server Setup Information

Any additional Information

Related topics