How to hide the client v3 deprecation warning?

How to hide this warning

DeprecationWarning: Dep016: You are using the Weaviate v3 client, which is deprecated.
            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python

  warnings.warn(

I can’t use the client v4 because langchain WeaviateHybridSearchRetriever only works with v3 at the moment, II already opened a ticket about it

I just added these lines on top of my code

import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    import weaviate

Hi! We have updated Langchain integration to use the python v4 client:

Do you think that. can help you migrating to v4 client?

If not, let me know what else you need on that integration.

Thanks!

1 Like

@DudaNogueira Hello, I’m interested in doing hybrid search like this

I don’t want to split the text and generate embedings. How to use WeaviateHybridSearchRetriever with v4 client?

The example you shared doesn’t use WeaviateHybridSearchRetriever

hi @elie !

With the new langchain integration, the similarity_search will perform, under the hood, a hybrid search:

So with that, you can simply call the similarity_search method

docs = db.similarity_search(query, alpha=0)

you can instantiate your db, like so:

db = WeaviateVectorStore.from_documents([], embeddings, client=weaviate_client)

If you want an end to end example, I have recently updated our langchain integration recipe here:

Let me know if this helps.

Thanks!

Fine but I don’t want to do the chunking and embedding myself, I want to avoid this code

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

because this way you’d have to waste so much time finding the correct chunk size and so on, and I can’t guarantee that I’d get the correct answers I want, I want an answer with a reference (source), If I do the chunking, some chunk would be messed up, I might get the same bad quality like chroma or faiss

the old way of loading data is much better Weaviate Hybrid Search | 🦜️🔗 LangChain

docs = [
    Document(
        metadata={
            "title": "Embracing The Future: AI Unveiled",
            "author": "Dr. Rebecca Simmons",
        },
        page_content="A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.",
    ),
    Document(
        metadata={
            "title": "Symbiosis: Harmonizing Humans and AI",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.",
    ),
    Document(
        metadata={"title": "AI: The Ethical Quandary", "author": "Dr. Rebecca Simmons"},
        page_content="In her second book, Dr. Simmons delves deeper into the ethical considerations surrounding AI development and deployment. It is an eye-opening examination of the dilemmas faced by developers, policymakers, and society at large.",
    ),
    Document(
        metadata={
            "title": "Conscious Constructs: The Search for AI Sentience",
            "author": "Dr. Samuel Cortez",
        },
        page_content="Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.",
    ),
    Document(
        metadata={
            "title": "Invisible Routines: Hidden AI in Everyday Life",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="In his follow-up to 'Symbiosis', Prof. Sterling takes a look at the subtle, unnoticed presence and influence of AI in our everyday lives. It reveals how AI has become woven into our routines, often without our explicit realization.",
    ),
]
retriever.add_documents(docs)

I get to enforce the structure I want to use. Is there anyway of doing this in the new client? I just don’t want to waste my time testing different chunk sizes.

@elie if you already have the texts in the form you want, then you can use the from_texts method to ingest your data into weaviate. Here is an end-to-end code snippet to show how to use it:

@elie You can disable that specific warning with this:

import warnings

warnings.filterwarnings(
    "ignore",
    """Dep016: You are using the Weaviate v3 client, which is deprecated.
            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python
            """,
    category=DeprecationWarning,
)

My data is json array not text, from_texts does not work in this case, and if I were to stringify the json, weaviate would be stuck, I waited 30min to get an answer already, but I have nothing no output, nothing

Hi @elie !

Considering the dataset you provided, this is how you can accomplish it:

import weaviate
from langchain.docstore.document import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain_weaviate.vectorstores import WeaviateVectorStore

embeddings = OpenAIEmbeddings()
client = weaviate.connect_to_local()
docs = [
    Document(
        metadata={
            "title": "Embracing The Future: AI Unveiled",
            "author": "Dr. Rebecca Simmons",
        },
        page_content="A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.",
    ),
    Document(
        metadata={
            "title": "Symbiosis: Harmonizing Humans and AI",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.",
    ),
    Document(
        metadata={"title": "AI: The Ethical Quandary", "author": "Dr. Rebecca Simmons"},
        page_content="In her second book, Dr. Simmons delves deeper into the ethical considerations surrounding AI development and deployment. It is an eye-opening examination of the dilemmas faced by developers, policymakers, and society at large.",
    ),
    Document(
        metadata={
            "title": "Conscious Constructs: The Search for AI Sentience",
            "author": "Dr. Samuel Cortez",
        },
        page_content="Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.",
    ),
    Document(
        metadata={
            "title": "Invisible Routines: Hidden AI in Everyday Life",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="In his follow-up to 'Symbiosis', Prof. Sterling takes a look at the subtle, unnoticed presence and influence of AI in our everyday lives. It reveals how AI has become woven into our routines, often without our explicit realization.",
    ),
]
db = WeaviateVectorStore.from_documents(docs, embeddings, client=client, index_name="EliePoc")
query = db.similarity_search("polemic topic")

this is what I got inside query:

[Document(page_content='Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.', metadata={'title': 'Conscious Constructs: The Search for AI Sentience', 'author': 'Dr. Samuel Cortez'}),
 Document(page_content='In her second book, Dr. Simmons delves deeper into the ethical considerations surrounding AI development and deployment. It is an eye-opening examination of the dilemmas faced by developers, policymakers, and society at large.', metadata={'title': 'AI: The Ethical Quandary', 'author': 'Dr. Rebecca Simmons'}),
 Document(page_content='A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.', metadata={'title': 'Embracing The Future: AI Unveiled', 'author': 'Dr. Rebecca Simmons'}),
 Document(page_content='Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.', metadata={'title': 'Symbiosis: Harmonizing Humans and AI', 'author': 'Prof. Jonathan K. Sterling'})]

You can also filter by your metadata, like so:

from weaviate.classes.query import Filter

filter = Filter.by_property("author").equal("Dr. Samuel Cortez")
query = db.similarity_search("polemic topic", filters=filter)
print(query)

this will yield, as expected, only the one object from your dataset:

[Document(page_content='Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.', metadata={'title': 'Conscious Constructs: The Search for AI Sentience', 'author': 'Dr. Samuel Cortez'})]

Let me know if this helps.

Thanks!

Ok that works thanks, but now i have 3 issues with the v4 client, exclusively, v3 client was fine

  1. is there a way to get db without having to add documents to the database? I’m searching for something like db = client.get_db() or something similar, because you add data once in the database and you need the db all the time, it doesn’t make sense to have to add data everytime you need to use the db, I need some getter

  2. how to delete data and see the schema? those no longer work

schema = client.schema.get()
client.schema.delete_all()

client doesn’t have schema anymore

  1. if I’m doing multiple queries, say I have a loop and then I do multiple db.similarity_search inside that loop I get this error
sys:1: ResourceWarning: unclosed <socket.socket fd=716, 
family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=0, 
laddr=('::1', 59001, 0, 0), raddr=('::1', 8080, 0, 0)>

Nice!

  1. This is how you can instantiate the db without passing documents:
import weaviate
from langchain.embeddings import OpenAIEmbeddings
from langchain_weaviate.vectorstores import WeaviateVectorStore

embeddings = OpenAIEmbeddings()
client = weaviate.connect_to_local()

db = WeaviateVectorStore.from_documents([], embeddings, client=client, index_name="EliePoc")

  1. This is described in our docs here:
    Manage collections | Weaviate - Vector Database

for instance:

client.collections.delete("EliePoc")
  1. This seems a connectivity issue between client and server.

How big is this for loop? How does your deployment looks like?

I have ran a simple test (client running locally, and Weaviate running also locally with docker ):

results = []
for i in range(100):
    query = db.similarity_search("polemic topic")
    results.append(query)

and got the expected results, a list with 100 query results, and no error message.