Description
Is it possible to use the not equal filter on properties with word tokenization even if they don’t have alphanumeric characters? Unfortunately I have data like test.com/2
with word tokenization when it wasn’t intended.
For example, I’d want to query test.com/2
using a not equal filter like below:
Filter.by_property("my_property").not_equal("test.com/2")
Does any workaround exist like query against “test com 2” or update tokenization or search with a different tokenization? My main issue is adding a new field or replacing with the proper tokenization is a lengthy process in a production system.
Server Setup Information
- Weaviate Server Version: 1.24.0
- Deployment Method: Docker
- Multi Node? Number of Running Nodes: No.
- Client Language and Version: 4.6.5
- Multitenancy?: Yes
hi @dhanshew72 !
because you had tokenization set to word, the property value test.com/2
will be tokenized as test
com
2
This will proves our point:
client.collections.delete("Test")
collection = client.collections.create(
name="Test",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)
collection.data.insert_many([
{"text": "test.com/2"},
{"text": "test.com/3"},
{"text": "test.com/4"},
])
now we query:
results = collection.query.fetch_objects(
filters=(
wvc.query.Filter.by_property("text").equal("test") &
wvc.query.Filter.by_property("text").equal("com") &
wvc.query.Filter.by_property("text").equal("2")
)
)
for i in results.objects:
print("###")
print(i.properties)
results:
{‘text’: ‘test.com/2’}
As you want to exclude that filtered objects, not equal on a word tokenization will not help you.
So you can try adding a new property, with the field
tokenization, and then filling in the content of that property so you can filter it out.
Let me know if this helps 
1 Like
Interesting, I’ll make note of that. Thank you.
1 Like