[Question] Bug in not_equal filter

When using not_equal, it does not seem to work in retrieving objects.

Please see my minimal reproducible code below that creates an object with a lastUpdateDeviceId and then uses lastUpdateDeviceId as something else and runs not_equal.

However, no objects are returned. But if I remove the not_equal filter (the 2nd filter), it will return the last inserted objects.

import uuid
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Filter, MetadataQuery
from weaviate.classes.config import Configure, VectorDistances, Property, DataType
from weaviate.classes.tenants import Tenant
from datetime import datetime, timedelta
import pytz

try:
	# TODO: use different credentials for production
	# Best practice: store your credentials in environment variables
	wcd_url = "" #TODO: insert url
	wcd_api_key = "" #TODO: insert key

	client = weaviate.connect_to_weaviate_cloud(
		cluster_url=wcd_url,                                    # Replace with your Weaviate Cloud URL
		auth_credentials=wvc.init.Auth.api_key(wcd_api_key),    # Replace with your Weaviate Cloud key
	)
except:
	pass

def parse_weaviate_results(results):
	"""
	Converts results into a list[JSON] for easy
	parsing for the frontends
	"""
	parsed_objs = []

	for result in results:
		parsed_objs.append(parse_weaviate_result(result))	

	return parsed_objs

def parse_weaviate_result(result):


	pst = pytz.timezone('US/Pacific')
	last_update_time_utc = result.metadata.last_update_time
	last_update_time_pst = last_update_time_utc.astimezone(pst)
	print(f"Last update time in PST: {last_update_time_pst} for {result.properties['title']} with device id: {result.properties['lastUpdateDeviceId']}")
	return {
		**result.properties,
		# adding this after unpack to prevent override
		"uniqueid": str(result.uuid),
		"score": result.metadata.distance if hasattr(result.metadata, 'distance') else -1,
		"vector": result.vector,
		"last_updated_utc": result.metadata.last_update_time
	}

nodes_collection = None

# TODO: after production launch, delete this as it's dangerous
# client.collections.delete('Nodes')

try:
	# For all objects
	nodes_collection = client.collections.create(
		name="Nodes",
		vectorizer_config=wvc.config.Configure.Vectorizer.none(),
		vector_index_config=Configure.VectorIndex.hnsw(
			distance_metric=VectorDistances.COSINE
		),
		# Multi tenancy to separate each user's data
		multi_tenancy_config=Configure.multi_tenancy(enabled=True, auto_tenant_creation=True, auto_tenant_activation=True),
		inverted_index_config=Configure.inverted_index( 
			index_null_state=True,
        	index_property_length=True,
			index_timestamps=True
		)
		# Specify some properties beforehand to set right data type (i.e. obj[] instead of string[])
		# properties=[
		# 	Property(name="tags", data_type=DataType.OBJECT_ARRAY),
		# ]
	)
except:
	nodes_collection = client.collections.get("Nodes")

try:
	# Create tenant on JeopardyQuestion
	nodes_collection.tenants.create(
		tenants=[
			Tenant(name="tenantA"),
			Tenant(name="tenantB"),
		]
	)
except:
	pass

nodes_collection.with_tenant('tenantA').data.insert(
	vector=[0.0] * 384,
	properties={'lastUpdateDeviceId': 'device-78C24351-F40A-4E37-8953-F003FA474877'},
	uuid=str(uuid.uuid4())
)


# get the last inserted object in last 10 mins
last_sync_datetime = datetime.now() - timedelta(minutes=10)

query_result = nodes_collection.with_tenant('tenantA').query.fetch_objects(
	filters=Filter.by_update_time().greater_than(last_sync_datetime) &
		Filter.by_property("lastUpdateDeviceId").not_equal('device-d06e69fb200a1b8fdb8a96d8aff91e9e7839f35d9ac0ad69780067174e26fda1'), 
# if I comment out the above, it will return objects
	include_vector=True, # TOOD: include_vector=True
	return_metadata=MetadataQuery(last_update_time=True),
)

print('Objects returned: ')
print(parse_weaviate_results(query_result.objects))

# TODO: expected the inserted object to be returned
# actual: []


client.close()


I am not sure if specifying tokenization will fix it as I found in Not_equal filter seems not work

But the main problem is then when specifying properties along with tenancy, I run into this bug here:

(which I just updated with a minimal code example too, sorry for the delay on that one)

Hi!

I have created a code with tokenization set to field, and now it works as expected

from weaviate import classes as wvc
client.collections.delete("Test")
collection = client.collections.create(
    name="Test",
    vectorizer_config=wvc.config.Configure.Vectorizer.none(),
    vector_index_config=wvc.config.Configure.VectorIndex.hnsw(
        distance_metric=wvc.config.VectorDistances.COSINE
    ),
    # Multi tenancy to separate each user's data
    multi_tenancy_config=wvc.config.Configure.multi_tenancy(
        enabled=True, auto_tenant_creation=True, auto_tenant_activation=True
    ),
    properties=[
        wvc.config.Property(
            name="lastUpdateDeviceId",
            data_type=wvc.config.DataType.TEXT,
            tokenization=wvc.config.Tokenization.FIELD
        )
    ],
    inverted_index_config=wvc.config.Configure.inverted_index(
        index_null_state=True,
        index_property_length=True,
        index_timestamps=True
    )
    # Specify some properties beforehand to set right data type (i.e. obj[] instead of string[])
    # properties=[
    # 	Property(name="tags", data_type=DataType.OBJECT_ARRAY),
    # ]
)

collection.tenants.create(
    tenants=[
        wvc.tenants.Tenant(name="tenantA"),
        wvc.tenants.Tenant(name="tenantB"),
    ]
)

from weaviate.util import generate_uuid5
collection.with_tenant('tenantA').data.insert(
	vector=[0.0] * 384,
	properties={'lastUpdateDeviceId': 'device-78C24351-F40A-4E37-8953-F003FA474877'},
	uuid=generate_uuid5("object1")
)

from datetime import datetime, timedelta, timezone
last_sync_datetime = datetime.now(timezone.utc).astimezone() - timedelta(minutes=10)

collection.with_tenant('tenantA').query.fetch_objects(
	filters=wvc.query.Filter.by_update_time().greater_than(last_sync_datetime) &
		wvc.query.Filter.by_property("lastUpdateDeviceId").not_equal('device-d06e69fb200a1b8fdb8a96d8aff91e9e7839f35d9ac0ad69780067174e26fda1'), 
	include_vector=True,
	return_metadata=wvc.query.MetadataQuery(last_update_time=True),
)

Note that, if I set the search query to d06e69fb200a1b8fdb8a96d8aff91e9e7839f35d9ac0ad69780067174e26fda1, instead of device-d0.... it will also work.

This is because device-d06.... will become two key workds: device and d06...

Let me know if this helps!

1 Like

Thanks Duda!

Our nodes collection is already in production though so the alternate solution you mentioned was to just remove ‘device-’ from the front right?

I guess to update the property, I can do

articles.config.add_property(
    Property(
        name="onHomepage",
        data_type=DataType.BOOL
    )
)

Hi!

You will need to set the tokenization of that property to FIELD.

Check this part, where we define the lastUpdateDeviceId property.

Weaviate will tokenize you content, according to the tokenization.

if you have property with a value device-123456, and the tokenization is word (the default), you will endup with two keywords: device and 123456.

Now, when the tokenization is set to field, you endup with only one keyword: device-123456.

So the best solution is defining this new property.

Let me know if this helps!

1 Like