How does the filter ContainsAny and ContainsAll work?

A_S · October 1, 2024, 12:52pm

If you have a list that contains sentences [“Dorthraki language”, “Ktulhu Monster lives in”] and you are searching for text chunk with the operator ContainsAny, is it supposed to match the text chunks only if the full phrase is in the text or also if the text contains a partial match for the string in the list? (like the text chunk doesn’t contain “Dorthraki language” but contains the word “Dorthraki”, is it supposed to match?)

Thanks!

DudaNogueira · October 1, 2024, 4:00pm

hi @A_S !! Welcome back

The behavior will depend on what is the tokenization you have for that specific property.

by default, the tokenization is word. This means that for the query you are running, it should match to all word tokens dorthraki language ktulhu``monster (in is a stop word)

With that said, consider this code:

client.collections.delete("Test")
collection = client.collections.create(
    "Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
    properties=[
        wvc.config.Property(
            name="text_word", data_type=wvc.config.DataType.TEXT, tokenization=wvc.config.Tokenization.WORD,
        ),
        wvc.config.Property(
            name="text_field", data_type=wvc.config.DataType.TEXT, tokenization=wvc.config.Tokenization.FIELD
        )
    ]
)
collection.data.insert({"text_word": "Dorthraki language here", "text_field": "Dorthraki language"})
collection.data.insert({"text_word": "Ktulhu language Dorthraki", "text_field": "Ktulhu language Dorthraki"})

now, when I do a contains any on the text_field that has the field tokenization property, I will find one result, like this:

collection.aggregate.over_all(
    filters=wvc.query.Filter.by_property("text_field").contains_any(["Dorthraki language"])
)

AggregateReturn(properties={}, total_count=1)

while if I do the same query, but on the text_word property, you will find both objects:

collection.aggregate.over_all(
    filters=wvc.query.Filter.by_property("text_word").contains_any(["Dorthraki language"])
)

AggregateReturn(properties={}, total_count=2)

We have some extensive material on tokenization. Check this out:

and here:

Let me know if this helps

A_S · October 2, 2024, 9:10am

Thank you! That explained a lot!

Topic		Replies	Views
Is the equal filter implicitely matching substrings? Support	3	411	February 16, 2024
Plain GQL query with "containsAny" operator not working Support bug , technical	4	251	March 19, 2025
Text Array Filtering (Array intersection) - Exact element match issue Support	3	267	June 10, 2024
Filters do not seem to be working as expected Support developer-experience , feedback	12	11377	February 14, 2025
Not Equal Filter with Word Tokenization with non-alphanumeric characters Support	2	193	October 16, 2024

How does the filter ContainsAny and ContainsAll work?

Related topics