Description
Is it possible to use the not equal filter on properties with word tokenization even if they don’t have alphanumeric characters? Unfortunately I have data like test.com/2 with word tokenization when it wasn’t intended.
For example, I’d want to query test.com/2 using a not equal filter like below:
Filter.by_property("my_property").not_equal("test.com/2")
Does any workaround exist like query against “test com 2” or update tokenization or search with a different tokenization? My main issue is adding a new field or replacing with the proper tokenization is a lengthy process in a production system.
Server Setup Information
- Weaviate Server Version: 1.24.0
- Deployment Method: Docker
- Multi Node? Number of Running Nodes: No.
- Client Language and Version: 4.6.5
- Multitenancy?: Yes
hi @dhanshew72 !
because you had tokenization set to word, the property value test.com/2 will be tokenized as test com 2
This will proves our point:
client.collections.delete("Test")
collection = client.collections.create(
name="Test",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
)
collection.data.insert_many([
{"text": "test.com/2"},
{"text": "test.com/3"},
{"text": "test.com/4"},
])
now we query:
results = collection.query.fetch_objects(
filters=(
wvc.query.Filter.by_property("text").equal("test") &
wvc.query.Filter.by_property("text").equal("com") &
wvc.query.Filter.by_property("text").equal("2")
)
)
for i in results.objects:
print("###")
print(i.properties)
results:
{‘text’: ‘test.com/2’}
As you want to exclude that filtered objects, not equal on a word tokenization will not help you.
So you can try adding a new property, with the field tokenization, and then filling in the content of that property so you can filter it out.
Let me know if this helps 
Interesting, I’ll make note of that. Thank you.