[Question] How to retrieve all documents in weaviate?

Description

I want to get all documents in weaviate, but it always return just part of my query to me.
For example,

client=weaviate.Client(weaviate_url) 
query = (
    client.query.get(class_name, ["content", "source", "idx"])
    .with_where({
        "path": ["source"],  
        "operator": "Equal",  
        "valueString": doc_name
    })
)
len(response['data']['Get'][class_name])  # 100

But it’s actually 450 documents. If I add with_limit(2000), I can get right response.

client=weaviate.Client(weaviate_url) 
query = (
    client.query.get(class_name, ["content", "source", "idx"])
    .with_where({
        "path": ["source"],  
        "operator": "Equal",  
        "valueString": doc_name
    })
)
response = query.with_limit(2000).do()
len(response['data']['Get'][class_name])  # 450

But If real number of documents are larger than 2000, it will return wrong result for me.
Would somebody can tell me how to deal with it?

Thanks!

Server Setup Information

  • Weaviate Server Version: 3.24.1

Any additional Information

hi @cyc00518 !! Welcome to our community :hugs: !!

You can use our cursor api, as stated here:

Also, I noticed you are using our python client version 3.24.1.

We strongly suggest using the new python v4 client, as it will have significant performance improvements for read and write operations, as it will leverage GRPC instead REST/HTTP

Let me know if this helps or if there is any other help I can provide!

Thanks!

Hi, @DudaNogueira!
Thanks for your quick reply!

Sorry, I think I might lose that document on your guide page.

Honestly, I do really find that you release v4 version. But I build a knowledge platform in my company based on v3 version early last year. So it might took quite much time to transfer to new v4 version… :melting_face:

In fact, I still have one question, hope you don’t mind I ask in this page.

About my knowledge platform mentioned above, I should build a delete function for user to delete what they uploaded before. I use with_where to filter those data in weaviate , but I find
Equal operator doesn’t return precise result for me.

For example, I want to filter source == '請假.txt', but it will return source == 請假.txt and source == 請假_53.txt. So I need to add filter rule by myself.

import weaviate
class_name= 'TEST'
client=weaviate.Client(weaviate_url)
doc_name='請假.txt'     
response = (
    client.query.get(class_name, properties=["_additional {id}", "source"])
    .with_where({
        "path": ["source"],
        "operator": "Equal",
        "valueString": doc_name
    })  
).with_limit(20000).do()

source_set = set([item['source'] for item in response['data']['Get'][class_name]])
print(source_set)
# {'請假.txt', '請假_53.txt'} ->  I hope it should only be '請假.txt'

Is it a right behavior for Equal operator ? Or it is some bug for Chinese file_name?

Thanks!!