How to filter by date field

Hi, how do I filter searches by date using the python client?

I can see in the docs how to filter by internal datetime objects: link

Which mentions you must set indexTimestamps to true to track these meta timestamps: Link

But I don’t see anything on how to filter by a user defined timestamp.

I have a field published_at defined as such:

...
wc.Property(
        name="published_at",
        data_type=wc.DataType.DATE,
        index_searchable=False,
        skip_vectorization=True,
    ),
...

Which is returned in the graphql client as a datetime: "published_at": "2024-04-22T08:16:47Z",

In my index I may have some published_at times that are in the future that I don’t want to retrieve. I would like to filter less than or equal to today’s date (or less than tomorrow’s date).

How is this done using the python client?

Thanks

Solved it using regular property filter and a datetime. For anyone that may find it useful:

def get_tomorrow_midnight() -> datetime:
    now = datetime.now()
    tomorrow = now + timedelta(days=1)
    midnight = datetime(tomorrow.year, tomorrow.month, tomorrow.day)
    return midnight
response = collection.query.hybrid(
            ...,
            filters=wq.Filter.by_property("published_at").less_than(
                get_tomorrow_midnight()
            ),
        )

May be worth adding something to the docs, as currently the only datetime filtering in the docs is displaying how to filter on weaviate’s meta timestamps, not user-defined ones (unless I missed it).

Cheers

1 Like

hi @justin.godden !

Thanks for sharing and pointing it out.

At the end of the day, the indexTimestamps fields are just a date field as published_at are :slight_smile:

So all date filters will also apply to both meta properties and the ones you define.

Thanks!

Hi Duda,

The only point I was making was the only example of datetime filtering in the docs is using a built-in method specifically for the meta properties - by_creation_time - from the first link: filters=wvc.query.Filter.by_creation_time().greater_or_equal(year2k)

Unless I missed it, I didn’t see anywhere explaining filtering user defined datetimes. Perhaps it’s implied since you can do value comparison with datetimes already in python (just like using less than for strings works also).

Just my experience that I didn’t find it clear. Up to you if you want to act on that.

Cheers

1 Like

we will add some examples :slight_smile:

Oh! Got it.

It indeed makes sense to have an explicit example there as well as the reference for the meta properties.

Thanks!!

I am having issues trying to filter results by date here.
I want to filter out documents that are earlier than 2024-07-11, but somehow documents are not filtered. When I printed the response, I can see dates earlier than 2024-07-11 are still appearing.

Expected retrieved documents should be only be 2024-07-11 or later.

Please let me know if there is a straightforward way to do this:

import weaviate
from datetime import datetime, timezone, timedelta
from weaviate.classes.query import MetadataQuery, Filter

client = weaviate.connect_to_local()

def format_to_RFC3339_date(date_str):
  date_obj = datetime.striptime(date_str, "%Y-%m-%d")
  offset = timezone(timedelta(hours=8))
  rfc3339_date_with_offset = date_obj.replace(tzinfo=offset).isoformat()
  return rfc3339_date_with_offset

try:
  date_str = "2024-07-11"
  formatted_date = format_to_RFC3339_date(date_str)

  collection = client.collections.get("LlamaIndex")
  response = collection.query.hybrid(
    ...,
    filters=Filter.by_property('creation_date').greater_or_equal(formatted_date),
    return_metadata=MetadataQuery(
      distance=True,
      certainty=True,
      score=True,
      explain_score=True,
    )

  for obj in response.objects:
    print(obj.properties['creation_date']

Hi! You should use timestamps.

check here a simple reproducible example:

import weaviate
import os
from weaviate import classes as wvc

client = weaviate.connect_to_local()

client.collections.delete("Collection")
collection = client.collections.create(
    "Collection",
    properties=[
        wvc.config.Property(name="som_date", data_type=wvc.config.DataType.DATE),
        wvc.config.Property(name="some_text", data_type=wvc.config.DataType.TEXT)
    ]
)
collection.data.insert(
    {
        "some_text": "2024-05-05", 
        "some_date": "2024-05-05T23:20:50.52Z"
    }
)
collection.data.insert({
        "some_text": "2024-06-06", 
        "some_date": "2024-06-06T23:20:50.52Z"
    }
)

from weaviate.classes.query import Filter
query = collection.query.fetch_objects(
    filters=Filter.by_property("some_date").greater_or_equal("2024-05-10T23:20:50.52Z")
)
for object in query.objects:
    print(object)

Let me know if this helps.
Thanks!

Hi @DudaNogueira,

I’m currently inserting data using LlamaIndex and I assumed I could use node.metadata to filter by dates.

for obj in response.objects:
     print(obj.properties.keys()) 
# dict_keys([_node_type', 'content', 'page_label', 'last_modified_date', ..., 'creation_date', ... ])

Is there really no way to use this metadata for filtering? My users have inserted a significant number of documents, and I’d prefer not to delete the collection and ingest the data again.

Thank you!

Edit:
I have found a solution that actually have to modify how LlamaIndex queries with Weaviate.

Although this have resolved my issue with integration, but still does not give me an idea how to query direct with Weaviate using the node.metadata['last_modified_date] | obj.properties['last_modified_date'].

Hi! In order to filter for create or modified date, you need to explicitly set the collection to do so:

Unfortunatelly, when using llamaindex or langchain, it will create the collection for your.

What you can do is to create the collection beforehand, setting all the options you want, and then using llamaindex/langchain.

Also, on that same issue, when creating the collection “outside” of llamaindex/langchain, it is a good idea to set the vectorizer too.

That way you can both use those llm frameworks or query Weaviate directly.

I have crafted a nice recipe here that ilustrates that (using langchain)

https://github.com/weaviate/recipes/tree/main/integrations/langchain/loading-data

Let me know if that helps.

Thanks!

Let me know if this helps.

1 Like