Filter near_text search based on empty/non-empty text array

Hi! I just started using the Weaviate Python client (version 4.10.4). I have a collection movies that includes the following properties: movie_description (text, used for vector based search) and movie_tags (text array). movie_tags can have no or one or more tags, such as ‘blockbuster’, ‘high IMDb rating’, ‘crowd favorite’, etc.

I would like to define a query Filter to pass to near_text during a semantic search, with weaviate.classes.query.Filter to do the following:

  1. filter to search from only objects with EMPTY movie_tags array
  2. filter to search from only objects with NON-EMPTY movie_tags arrray

Could you provide instructions on correctly building the filter? I’m hoping there’s a better way than using Filter.by_property("movie_tags").contains_any(all_unique_movie_tags)

Pointers to the python client documentation on filtering text arrays based on array size will also be greatly appreciated. Thank you very much!

hi @violin1443 !!

Welcome to our community :hugs:

If you want to filter by property length or null state, you need to first create a collection and specify that at inverted_index_config, like so:

import weaviate
from weaviate import classes as wvc

client.collections.delete("Test")
collection = client.collections.create(
    name="Test",
    vectorizer_config=[
        wvc.config.Configure.NamedVectors.text2vec_openai(name="default"),
    ],
    inverted_index_config=wvc.config.Configure.inverted_index(
        index_null_state=True,
        index_property_length=True
    ),
    properties=[
        wvc.config.Property(name="movie_description", data_type=wvc.config.DataType.TEXT),
        wvc.config.Property(name="movie_tags", data_type=wvc.config.DataType.TEXT_ARRAY),
    ]
)
collection = client.collections.get("Test")
collection.data.insert_many([
    { "movie_description": "Move desc 1. No tag"},
    { "movie_description": "Move desc 2. One Tag", "movie_tags": ["tag1"]},
    { "movie_description": "Move desc 3. Two Tags", "movie_tags": ["tag1", "tag2"]},
    { "movie_description": "Move desc 4. OverLap tags", "movie_tags": ["tag2", "tag3"]},
])

Now you can perform different searches and filters:

# movies with no tags
filters = wvc.query.Filter.by_property("movie_tags").is_none(True) # change to False if you want movies with tags
# movies with any of the given tags
filters = wvc.query.Filter.by_property("movie_tags").contains_any(["tag3", "tag2"])
# movies with tags all of the given tags
filters = wvc.query.Filter.by_property("movie_tags").contains_all(["tag3", "tag2"])
# movies with tags count > 2
# https://weaviate.io/developers/weaviate/search/filters#by-object-property-length
filters=wvc.query.Filter.by_property("movie_tags", length=True).greater_or_equal(2)

query = collection.query.near_text(
    query="some movie",
    filters= filters
)
for o in query.objects:
    print(o.properties)

Let me know if this helps!

Thanks!

1 Like

This is very helpful! Thank you so much :slight_smile:

The current collection I am working on doesn’t have inverted_index_config specified–do I have to delete and rebuild the index? Or is there any way to add these index config to the existing collection, just to avoid having to vectorize all the texts again?

Thanks a lot!

1 Like

Hi!

Yes, you will need to create a new collection and copy over the data.

Not all collection configuration is mutable. Here is a list of the ones you can change:

And here a fairly simple guide on how to migrate your data over:

Thanks!

1 Like