hi @A_S !! Welcome back
The behavior will depend on what is the tokenization you have for that specific property.
by default, the tokenization is word. This means that for the query you are running, it should match to all word tokens dorthraki
language
ktulhu``monster
(in
is a stop word)
With that said, consider this code:
client.collections.delete("Test")
collection = client.collections.create(
"Test",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
properties=[
wvc.config.Property(
name="text_word", data_type=wvc.config.DataType.TEXT, tokenization=wvc.config.Tokenization.WORD,
),
wvc.config.Property(
name="text_field", data_type=wvc.config.DataType.TEXT, tokenization=wvc.config.Tokenization.FIELD
)
]
)
collection.data.insert({"text_word": "Dorthraki language here", "text_field": "Dorthraki language"})
collection.data.insert({"text_word": "Ktulhu language Dorthraki", "text_field": "Ktulhu language Dorthraki"})
now, when I do a contains any on the text_field
that has the field tokenization property, I will find one result, like this:
collection.aggregate.over_all(
filters=wvc.query.Filter.by_property("text_field").contains_any(["Dorthraki language"])
)
AggregateReturn(properties={}, total_count=1)
while if I do the same query, but on the text_word
property, you will find both objects:
collection.aggregate.over_all(
filters=wvc.query.Filter.by_property("text_word").contains_any(["Dorthraki language"])
)
AggregateReturn(properties={}, total_count=2)
We have some extensive material on tokenization. Check this out:
and here:
Let me know if this helps