Dynamic any filter with Python v4 client

Struggling to dynamically build a variable filter. WARNING long post.

I have a FastAPI python application with a route receiving a request body that can be:

{
    "query_text": "Orban Meloni elezioni avversari",
    "lang_model_name": "intfloat/multilingual-e5-large",
    "result_limit": 30,
    "alpha": 0.2
}

and in this simple case I will be able to simply perform a weaviate hybrid query using query_text (which I’ll vectorize with model “intfloat/multilingual-e5-large”) and the alpha value.

But the user could elect to also fill another 4 fields in the frontend and in this case I wish to add filtering to the hybrid query. So for example a full request body could be:

{
    "query_text": "Cina",
    "lang_model_name": "intfloat/multilingual-e5-large",
    "result_limit": 10,
    "alpha": 0,
    "author": "Gariazzo",
    "fromIsoDate": "2024-05-11",
    "toIsoDate": "2024-05-29",
    "category": "Alias"
}

and in this case I can build a corresponding composite filter object as follows:

the_filter = (
            Filter.by_property("isoEditionDate").greater_or_equal(
                request.fromIsoDate.isoformat()
            )
            & Filter.by_property("isoEditionDate").less_or_equal(
                request.toIsoDate.isoformat()
            )
            & Filter.by_property("author").like(request.author)
            & Filter.by_property("category").like(request.category)

and the hybrid search would be formulated as follows:

response = wv_artcoll.query.hybrid(
                query=query_string,
                query_properties=[f"{WV_ENTITIES_PROPERTY}^2", WV_VECTOR_PROPERTY],
                vector=query_vector,
                target_vector=graphql_model_name,
                limit=request.result_limit,
                alpha=request.alpha,
                return_metadata=MetadataQuery(score=True, explain_score=True),
                filters=the_filter,
            )

so far so good but now here’s the catch. Any of the 4 filtering values can be missing, so a valid request could be like:

{
    "query_text": "Cina",
    "lang_model_name": "intfloat/multilingual-e5-large",
    "result_limit": 10,
    "alpha": 0,
    "fromIsoDate": "2024-05-11",
    "toIsoDate": "2024-05-29",
    "category": "Alias"
}

very similar to the previous one but as you can see the “author” is missing and therefore request.author has a None value and therefore the_filter would not be valid and understandably you would get an error such as the following:
ERROR - Failed to perform hybrid query: Query call with protocol GRPC search failed with message unknown value type <nil>.
So I tried to dynamically build the filter query as follows:

 my_filter = None
        if request.fromIsoDate is not None:
            from_date_filter = Filter.by_property("isoEditionDate").greater_or_equal(
                request.fromIsoDate.isoformat()
            )
            my_filter = (
                from_date_filter if my_filter is None else my_filter & from_date_filter
            )
        if request.toIsoDate is not None:
            to_date_filter = Filter.by_property("isoEditionDate").less_or_equal(
                request.toIsoDate
            )
            my_filter = (
                to_date_filter if my_filter is None else my_filter & to_date_filter
            )
        if request.author is not None:
            author_filter = Filter.by_property("author").like(request.author)
            my_filter = (
                author_filter if my_filter is None else my_filter & author_filter
            )
        if request.category is not None:
            category_filter = Filter.by_property("category").equal(request.category)
            my_filter = (
                category_filter if my_filter is None else my_filter & category_filter
            )

where as you can see for every request property that is present (not None) I am building a corresponding filter object and adding it to the filter.

The hybrid filtered request would be identical to the previous one but using ‘my_filter’ instead of ‘the_filter’ with the same values in the request object.

Problem is that this search never returns anything so I’m probably not dynamically building the filter correctly.

Even though exploring a filter object is a bit cumbersome and probably the “equality” method has not been implemented, the my_filter and the_filter do not appear to be the same.

Any ideas on how to solve this use case which looks pretty common (filtering on a variable number of properties/operators/values) with the python v4 library?

Is there maybe a problem in the implementation of the operator overloading?

Python client is 4.5.6

Thanks in advance.

This is a good point

Can you create an issue in the python client repo? I think we can improve this usecase

Hi @rjalex,

I’ve moved your question to a new post, as this is a new question :wink:

For your usecase, you can use Filter.all_of filter, which takes an array of filters and applies AND to each condition. (FYI, there is also Filter.any_of for OR operations)

Here is how you should be able to use it:

from weaviate.classes.query import Filter

fromIsoDate = None

dynamic_filter = []


if request.toIsoDate is not None:
    dynamic_filter.append(
        Filter.by_property("isoEditionDate").less_or_equal(request.toIsoDate)
    )

if request.author is not None:
    dynamic_filter.append(
        Filter.by_property("author").like(request.author)
    )

if request.category is not None:
    dynamic_filter.append(
        Filter.by_property("category").equal(request.category)
    )

my_collection.query.fetch_objects(
    # ... the query stuff
    filters=Filter.all_of(dynamic_filter)
)
1 Like

Thank you so much @sebawita (and @Dirk ) :slight_smile:

Just a couple of clarifications please:

a) what would be the behaviour of this if all optional values are None hence the dynamic_filter is an empty list? If I add a if len(dynamic_filter) > 0 else None ?

b) Dirk while helping immensely has confused me a little bit on one issue. If isoEditionDate is a TEXT (str in python) and being in the ISO8601 format is alphabetically sortable, would the less_or_equal and greater_or_equal work or not? In other words do the >,<,>=,<= work as expected comparing strings?

Take care and have a great day.

PS In your example does it matter if the request.[from|to]IsoDate are TEXT or DATE ?

I am not 100% sure what you mean.

This works fine – all objects get returned.

res = my_collection.query.fetch_objects(
    filters=None
)

This doesn’t work - any_of needs at least one value.

res = col.query.fetch_objects(
    filters=Filter.any_of([])
)

So, yes, you should be able to do:

the_final_filter=None
if(len(dynamic_filter)>0):
  the_final_filter=dynamic_filter

res = col.query.fetch_objects(
    filters=the_final_filter
)

In theory, it should work, but when I run some tests, I get some weird behaviour with less_than and greater_than filters. Even when I set the Tokenisation to FIELD (which should use the whole value for comparison)

^^^ actually, this statement is incorrect. I’ve updated my answer in the next post.

Recommendation

My recommendation is to use a Date datatype in Weaviate.

Property(name="airDate", data_type=DataType.DATE),

Then you need to insert the date as either
A) Date object

from datetime import datetime, timezone
date_to_insert = datetime(2000, 1, 1).replace(tzinfo=timezone.utc)

B) ISO compatible string

date_to_insert = "2020-01-21T00:00:00+00:00"

Then query like this:

from datetime import datetime, timezone
from weaviate.classes.query import Filter

questions = client.collections.get("Questions")

response = questions.query.fetch_objects(
    limit=5,
    filters=Filter.by_property("airDate").greater_or_equal(datetime(2000, 1, 1).replace(tzinfo=timezone.utc))
    # filters=Filter.by_property("airDate").greater_or_equal("2000-01-01T00:00:00+00:00")
)
1 Like

Actually, I was wrong. You can use greater/less/equal on TEXT properties.
You just need to set the Tokenization on the date property to FIELD.

Here is a full example:

from weaviate.classes.config import Property, DataType, Tokenization

client.collections.delete("FilterData")
client.collections.create(
    "FilterData",
    properties=[
        Property(name="text_data", data_type=DataType.TEXT),
        Property(name="isoEditionDate", data_type=DataType.TEXT, tokenization=Tokenization.FIELD)
    ]
)

col = client.collections.get("FilterData")
                             
col.data.insert_many([
    { "text_data": "First Object", "isoEditionDate": "2023-01-11" },
    { "text_data": "Second Object", "isoEditionDate": "2024-02-11" },
    { "text_data": "Third Object", "isoEditionDate": "2024-02-13" },
    { "text_data": "Fourth Object", "isoEditionDate": "2025-05-25" },
])

res = col.query.fetch_objects(
    filters=Filter.by_property("isoEditionDate").greater_than("2023-05-21")
)

for item in res.objects:
    print(item.properties)
1 Like

Yay !!! You guys rock! :slight_smile:

I went down the DATE road as explained in this other thread Valid ISO8601 and RFC3339 date string discarded as invalid - #9 by rjalex

but it is great to know about the possibility of having used the TEXT/FIELD combo which would have been probably simpler in my specific case.

Thank you very much.