Performance Issue when Extracting Documents with Field Filter in Weaviate

Hello Weaviate Forum Team,

I hope this message finds you well. I am currently facing a performance issue while attempting to extract documents from a “Class” in my Weaviate local instance. Specifically, I would like to retrieve all documents where the field “X” equals a specific value.

Following the official documentation, I have implemented a “with_where” filter in my code as shown below:

where_filter = {
    "path": ["X"],
    "operator": "Equal",
    "valueText": "value",
}

result = (
    client.query
    .get("Class", ["content"])
    .with_where(where_filter)
    .do()
)

When there are no results found, the response time is quite negligible, with a duration of only a few nanoseconds. However, when there are 70 matches out of a dataset consisting of approximately 6.6 billion documents, the response time increases significantly to around 14 seconds.

Additionally, I have observed that when the number of matches reaches approximately 100,000, the response time further extends to approximately an hour. These extended response times are causing concerns for my application.

I am currently using Weaviate version 1.17.5 and the Weaviate Python client library version 3.15.4.

I would appreciate your assistance in understanding whether this performance behavior is expected or if there might be any improvements or optimizations that I can implement to enhance the query execution time.

Thank you for your attention and support. I look forward to your guidance.

Best regards,
Harshit

Hi @Iammsd07

I suspect you’ll benefit greatly from upgrading the Weaviate version to the latest (and Python too).

The reason is that filtering performance for very strict filters was greatly improved with 1.18 by introduction of Roaring bitmaps.

Please try that and see if that helps, especially for the strict filters

For the filters with high number of matches - do you need to fetch all 100k documents with a Get request at once?

If not, you could for instance use Aggregate to see how many objects match, or alternative you could use pagination to show subsets of results.

Cheers
JP

Thanks @jphwang

Upgrading the weaviate version worked, response time for 70 matches was reduced from 14 seconds to 1.3 seconds.

Cool! Glad to hear it :slight_smile: