How to hide the client v3 deprecation warning?

elie · May 2, 2024, 10:41am

How to hide this warning

DeprecationWarning: Dep016: You are using the Weaviate v3 client, which is deprecated.
            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python

  warnings.warn(

I can’t use the client v4 because langchain WeaviateHybridSearchRetriever only works with v3 at the moment, II already opened a ticket about it

elie · May 2, 2024, 1:10pm

I just added these lines on top of my code

import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    import weaviate

DudaNogueira · May 2, 2024, 3:13pm

Hi! We have updated Langchain integration to use the python v4 client:

Do you think that. can help you migrating to v4 client?

If not, let me know what else you need on that integration.

Thanks!

elie · May 2, 2024, 3:31pm

@DudaNogueira Hello, I’m interested in doing hybrid search like this

I don’t want to split the text and generate embedings. How to use WeaviateHybridSearchRetriever with v4 client?

The example you shared doesn’t use WeaviateHybridSearchRetriever

DudaNogueira · May 6, 2024, 8:13pm

hi @elie !

With the new langchain integration, the similarity_search will perform, under the hood, a hybrid search:

github.com

langchain-ai/langchain-weaviate/blob/0585715d029b06a107f97adcff1cac1dd1a674ca/libs/weaviate/langchain_weaviate/vectorstores.py#L279


      
                  # raise an error because weaviate will do a fetch object query
                  # if both query and vector are None
                  raise ValueError("Either query or vector must be provided.")
              else:
                  vector = self._embedding.embed_query(query)
          
          return_uuids = kwargs.pop("return_uuids", False)
          
          with self._tenant_context(tenant) as collection:
              try:
                  result = collection.query.hybrid(
                      query=query, vector=vector, limit=k, **kwargs
                  )
              except weaviate.exceptions.WeaviateQueryException as e:
                  raise ValueError(f"Error during query: {e}")
          
          docs_and_scores: List[Tuple[Document, float]] = []
          for obj in result.objects:
              text = obj.properties.pop(self._text_key)
              filtered_metadata = {
                  k: v

So with that, you can simply call the similarity_search method

docs = db.similarity_search(query, alpha=0)

you can instantiate your db, like so:

db = WeaviateVectorStore.from_documents([], embeddings, client=weaviate_client)

If you want an end to end example, I have recently updated our langchain integration recipe here:

Let me know if this helps.

Thanks!

elie · May 6, 2024, 8:36pm

Fine but I don’t want to do the chunking and embedding myself, I want to avoid this code

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

because this way you’d have to waste so much time finding the correct chunk size and so on, and I can’t guarantee that I’d get the correct answers I want, I want an answer with a reference (source), If I do the chunking, some chunk would be messed up, I might get the same bad quality like chroma or faiss

the old way of loading data is much better Weaviate Hybrid Search | 🦜️🔗 LangChain

docs = [
    Document(
        metadata={
            "title": "Embracing The Future: AI Unveiled",
            "author": "Dr. Rebecca Simmons",
        },
        page_content="A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.",
    ),
    Document(
        metadata={
            "title": "Symbiosis: Harmonizing Humans and AI",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.",
    ),
    Document(
        metadata={"title": "AI: The Ethical Quandary", "author": "Dr. Rebecca Simmons"},
        page_content="In her second book, Dr. Simmons delves deeper into the ethical considerations surrounding AI development and deployment. It is an eye-opening examination of the dilemmas faced by developers, policymakers, and society at large.",
    ),
    Document(
        metadata={
            "title": "Conscious Constructs: The Search for AI Sentience",
            "author": "Dr. Samuel Cortez",
        },
        page_content="Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.",
    ),
    Document(
        metadata={
            "title": "Invisible Routines: Hidden AI in Everyday Life",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="In his follow-up to 'Symbiosis', Prof. Sterling takes a look at the subtle, unnoticed presence and influence of AI in our everyday lives. It reveals how AI has become woven into our routines, often without our explicit realization.",
    ),
]
retriever.add_documents(docs)

I get to enforce the structure I want to use. Is there anyway of doing this in the new client? I just don’t want to waste my time testing different chunk sizes.

hsm207 · May 7, 2024, 2:08am

@elie if you already have the texts in the form you want, then you can use the from_texts method to ingest your data into weaviate. Here is an end-to-end code snippet to show how to use it:

github.com

langchain-ai/langchain-weaviate/blob/0585715d029b06a107f97adcff1cac1dd1a674ca/libs/weaviate/tests/unit_tests/test_vectorstores_integration.py#L92-L103


      
          def test_similarity_search_with_metadata(
              weaviate_client: weaviate.WeaviateClient,
              consistent_embedding: ConsistentFakeEmbeddings,
          ) -> None:
              """Test end to end construction and search with metadata."""
              texts = ["foo", "bar", "baz"]
              metadatas = [{"page": i} for i in range(len(texts))]
              docsearch = WeaviateVectorStore.from_texts(
                  texts, consistent_embedding, metadatas=metadatas, client=weaviate_client
              )
              output = docsearch.similarity_search("foo", k=1)
              assert output == [Document(page_content="foo", metadata={"page": 0})]

hsm207 · May 7, 2024, 6:52am

@elie You can disable that specific warning with this:

import warnings

warnings.filterwarnings(
    "ignore",
    """Dep016: You are using the Weaviate v3 client, which is deprecated.
            Consider upgrading to the new and improved v4 client instead!
            See here for usage: https://weaviate.io/developers/weaviate/client-libraries/python
            """,
    category=DeprecationWarning,
)

elie · May 7, 2024, 11:58am

My data is json array not text, from_texts does not work in this case, and if I were to stringify the json, weaviate would be stuck, I waited 30min to get an answer already, but I have nothing no output, nothing

DudaNogueira · May 7, 2024, 9:15pm

Hi @elie !

Considering the dataset you provided, this is how you can accomplish it:

import weaviate
from langchain.docstore.document import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain_weaviate.vectorstores import WeaviateVectorStore

embeddings = OpenAIEmbeddings()
client = weaviate.connect_to_local()
docs = [
    Document(
        metadata={
            "title": "Embracing The Future: AI Unveiled",
            "author": "Dr. Rebecca Simmons",
        },
        page_content="A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.",
    ),
    Document(
        metadata={
            "title": "Symbiosis: Harmonizing Humans and AI",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.",
    ),
    Document(
        metadata={"title": "AI: The Ethical Quandary", "author": "Dr. Rebecca Simmons"},
        page_content="In her second book, Dr. Simmons delves deeper into the ethical considerations surrounding AI development and deployment. It is an eye-opening examination of the dilemmas faced by developers, policymakers, and society at large.",
    ),
    Document(
        metadata={
            "title": "Conscious Constructs: The Search for AI Sentience",
            "author": "Dr. Samuel Cortez",
        },
        page_content="Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.",
    ),
    Document(
        metadata={
            "title": "Invisible Routines: Hidden AI in Everyday Life",
            "author": "Prof. Jonathan K. Sterling",
        },
        page_content="In his follow-up to 'Symbiosis', Prof. Sterling takes a look at the subtle, unnoticed presence and influence of AI in our everyday lives. It reveals how AI has become woven into our routines, often without our explicit realization.",
    ),
]
db = WeaviateVectorStore.from_documents(docs, embeddings, client=client, index_name="EliePoc")
query = db.similarity_search("polemic topic")

this is what I got inside query:

[Document(page_content='Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.', metadata={'title': 'Conscious Constructs: The Search for AI Sentience', 'author': 'Dr. Samuel Cortez'}),
 Document(page_content='In her second book, Dr. Simmons delves deeper into the ethical considerations surrounding AI development and deployment. It is an eye-opening examination of the dilemmas faced by developers, policymakers, and society at large.', metadata={'title': 'AI: The Ethical Quandary', 'author': 'Dr. Rebecca Simmons'}),
 Document(page_content='A comprehensive analysis of the evolution of artificial intelligence, from its inception to its future prospects. Dr. Simmons covers ethical considerations, potentials, and threats posed by AI.', metadata={'title': 'Embracing The Future: AI Unveiled', 'author': 'Dr. Rebecca Simmons'}),
 Document(page_content='Prof. Sterling explores the potential for harmonious coexistence between humans and artificial intelligence. The book discusses how AI can be integrated into society in a beneficial and non-disruptive manner.', metadata={'title': 'Symbiosis: Harmonizing Humans and AI', 'author': 'Prof. Jonathan K. Sterling'})]

You can also filter by your metadata, like so:

from weaviate.classes.query import Filter

filter = Filter.by_property("author").equal("Dr. Samuel Cortez")
query = db.similarity_search("polemic topic", filters=filter)
print(query)

this will yield, as expected, only the one object from your dataset:

[Document(page_content='Dr. Cortez takes readers on a journey exploring the controversial topic of AI consciousness. The book provides compelling arguments for and against the possibility of true AI sentience.', metadata={'title': 'Conscious Constructs: The Search for AI Sentience', 'author': 'Dr. Samuel Cortez'})]

Let me know if this helps.

Thanks!

elie · May 8, 2024, 4:00pm

Ok that works thanks, but now i have 3 issues with the v4 client, exclusively, v3 client was fine

is there a way to get db without having to add documents to the database? I’m searching for something like db = client.get_db() or something similar, because you add data once in the database and you need the db all the time, it doesn’t make sense to have to add data everytime you need to use the db, I need some getter
how to delete data and see the schema? those no longer work

schema = client.schema.get()
client.schema.delete_all()

client doesn’t have schema anymore

if I’m doing multiple queries, say I have a loop and then I do multiple db.similarity_search inside that loop I get this error

sys:1: ResourceWarning: unclosed <socket.socket fd=716, 
family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=0, 
laddr=('::1', 59001, 0, 0), raddr=('::1', 8080, 0, 0)>

DudaNogueira · May 8, 2024, 7:55pm

Nice!

This is how you can instantiate the db without passing documents:

import weaviate
from langchain.embeddings import OpenAIEmbeddings
from langchain_weaviate.vectorstores import WeaviateVectorStore

embeddings = OpenAIEmbeddings()
client = weaviate.connect_to_local()

db = WeaviateVectorStore.from_documents([], embeddings, client=client, index_name="EliePoc")

This is described in our docs here:
Manage collections | Weaviate - Vector Database

for instance:

client.collections.delete("EliePoc")

This seems a connectivity issue between client and server.

How big is this for loop? How does your deployment looks like?

I have ran a simple test (client running locally, and Weaviate running also locally with docker ):

results = []
for i in range(100):
    query = db.similarity_search("polemic topic")
    results.append(query)

and got the expected results, a list with 100 query results, and no error message.

Topic		Replies	Views
Deprication Warning Support	5	1176	March 14, 2024
WeaviateHybridSearchRetriever Support bug	6	539	July 15, 2024
AttributeError: 'WeaviateClient' object has no attribute 'query' Support python	3	368	December 9, 2024
Weaviate client throwing "ResourceWarning: unclosed transport" Support	4	137	February 17, 2025
V3 Python client with server version 1.30.x Support python	1	64	May 2, 2025

How to hide the client v3 deprecation warning?

Related topics