Description
I’m attempting to remove records using a delete_many call like below:
...
db_filter = Filter.by_property("file_id").equal("the_file_id")
collection.data.delete_many(where=db_filter)
...
The delete is successful and if I run a subsequent fetch_object with appropriate filter I won’t see the record anymore. However, occasionally the record will pop back up in hybrid searches and re-enter the database partially. I’ve been able to recreate it a few times and haven’t found a solution for this issue. I’m only deleting at most 1000 objects per call.
What I’ve tried
- Remove by UUID after lookup of all records
- Upgraded weaviate version to latest (1.33), I did see this mentioned for 1.24, but my version based on release dates seem fine: Can not delete all objects
- Verify delete by calling
fetch_object after a sleep call, still came up in subsequent queries
- Modified
consistency_level change to ALL on every query
Any suggestions/ideas would be appreciated.
Server Setup Information
- Weaviate Server Version: 1.33.0 (happened on 1.25.0 & 1.24.26 as well)
- Deployment Method: ECS
- Multi Node? Number of Running Nodes: 5
- Client Language and Version: Python with 4.10.4
- Multitenancy?: Yes
Any additional Information
I attempted to upgrade from 1.24 to 1.25, but still saw the issue. Afterwards, I jumped to 1.33 (not on Kubernetes), and the upgrade worked after some smoke tests but I still see this issue on deletes.
Hi @dhanshew72 ,
Good Day!
Welcome to Weaviate Community!
It seems that your experience matches known issues with deletion consistency in Weaviate, especially in replicated or high-concurrency environments. Setting the deletion resolution strategy to either DeleteOnConflict or TimeBasedResolution should help resolve your case.
This is how you can update your deletionStrategy:
from weaviate.classes.config import Reconfigure, ReplicationDeletionStrategy #Use the collection you want to update articles =
client.collections.use("Article") # Update the deletion strategy
articles.config.update(replication_config=Reconfigure.replication( deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION # or DELETE_ON_CONFLICT
), )
Hope this helps.
I’m curious on how this could resolve the issue. It seems to be on lookup that this data comes back after a deletion which I can verify does work. I only have one replica, this seems to be the cache and file system being off base. Is uploading identical data a problem in earlier versions of weaviate?
Updating the deletionStrategy setting should help, as the default value is set to NoAutomatedResolution. With this setting, deletion conflicts are not treated as a special case — meaning if an object is deleted on one replica but still exists on another, it could potentially be restored. You can find more details on this behavior here.
In addition to the deletionStrategy, it’s considered best practice in a multi-node setup to ensure that replication settings, such as asynchronous replication, are properly configured. More on that can be found here.
Lastly, the current version of the Weaviate client in use is v4.17.0. It’s also recommended to keep both the client and server versions up to date to benefit from the latest fixes and improvements. Release notes here.
Let me know if you’d like help updating the configuration or client version.
To clarify, I have a replication factor of 1, does that impact it?
I see. Thanks for clarifying. Yes, if replication factor is 1 for your collections, then there is no replication happening. Could you verify how long was the sleep value configured? You may try increasing the value.
If you are targeting a multi-node setup(5 nodes) it would be best to check the replication settings as recommended for data consistency. Also please note that, replication factor cannot be updated since v1.32 as replica movement has been implemented. You can read about replication factor here: Cluster Architecture | Weaviate Documentation
Sleep value is default at 90 seconds at the moment. It doesn’t seem to be related to that, the deletes are working, but the data just comes up again in subsequent searches after the delete has happened.
I’m thinking it’s a caching/file system issue where data isn’t fully deleted or marked as deleted properly.
Tested this out. Unfortunately didn’t solve my issue, I think I found an issue, if the file is a duplicate IE you delete, then re-upload, it’ll pop back up. Not sure if that’s the exact issue. The only workaround I have is updating the record to empty out all fields which had some success.
Oh my, I figured it out. I had a weird condition in my code that would re-upload objects that could’ve been deleted. That’s on me, thanks for the help.
1 Like