Not Equal Filter with Word Tokenization with non-alphanumeric characters

dhanshew72 · October 14, 2024, 10:42pm

Description

Is it possible to use the not equal filter on properties with word tokenization even if they don’t have alphanumeric characters? Unfortunately I have data like test.com/2 with word tokenization when it wasn’t intended.

For example, I’d want to query test.com/2 using a not equal filter like below:

Filter.by_property("my_property").not_equal("test.com/2")

Does any workaround exist like query against “test com 2” or update tokenization or search with a different tokenization? My main issue is adding a new field or replacing with the proper tokenization is a lengthy process in a production system.

Server Setup Information

Weaviate Server Version: 1.24.0
Deployment Method: Docker
Multi Node? Number of Running Nodes: No.
Client Language and Version: 4.6.5
Multitenancy?: Yes

DudaNogueira · October 15, 2024, 9:30pm

hi @dhanshew72 !

because you had tokenization set to word, the property value test.com/2 will be tokenized as test com 2

This will proves our point:

client.collections.delete("Test")
collection = client.collections.create(
    name="Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)

collection.data.insert_many([
    {"text": "test.com/2"},
    {"text": "test.com/3"},
    {"text": "test.com/4"},

])

now we query:

results = collection.query.fetch_objects(
    filters=(
        wvc.query.Filter.by_property("text").equal("test") & 
        wvc.query.Filter.by_property("text").equal("com") & 
        wvc.query.Filter.by_property("text").equal("2")
    )
)
for i in results.objects:
    print("###")
    print(i.properties)

results:

{‘text’: ‘test.com/2’}

As you want to exclude that filtered objects, not equal on a word tokenization will not help you.

So you can try adding a new property, with the field tokenization, and then filling in the content of that property so you can filter it out.

Let me know if this helps

dhanshew72 · October 16, 2024, 4:09pm

Interesting, I’ll make note of that. Thank you.

Topic		Replies	Views
Not_equal filter seems not work Support	2	766	January 23, 2024
Filter by property equal not working as expected on string Support	2	70	September 19, 2025
[Question] Bug in not_equal filter Support technical	4	603	September 16, 2024
Filters do not seem to be working as expected Support developer-experience , feedback	12	11834	February 14, 2025
Filtering equals does not perform an equality comparison Support bug , python	1	499	June 11, 2024

Not Equal Filter with Word Tokenization with non-alphanumeric characters

Description

Server Setup Information

Related topics