Upload to and Deletion from Custom Weviate Instance Always Times out

Description

Hey, I have weviate version 1.25.2 running in a docker instance deployed on AWS EC2. I’m connecting to it via the weviate python client v4 (grpc connection).
Data object upload and data deletion seems to time out with no response from the server every day or two. And restarting the docker container fixes it (sometimes) for a while, but it goes back to a state where any data upload or deletion times out. How do I fix this?

Server Setup Information

  • Weaviate Server Version: 1.25.2
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: 1
  • Client Language and Version: Python3.10 v4
  • Multitenancy: Using collections only

Any additional Information

These are some of the logs form the docker container:
{“action”:“lsm_recover_from_active_wal”,“class”:“66879dbd3c40e9cdf”,“index”:“_66879dbd3c40e9cdf”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/66879dbd3c40e9cd1/3rPXMbHcJoMl/lsm/property__id/segment-1720443038073474385”,“shard”:“3rPXMbHcJoMl”,“time”:“2024-07-08T16:11:49Z”}

{“level”:“info”,“msg”:“Completed loading shard 66879dbd3c40e9cdf_3rPXMbHcJoMl in 7.394234ms”,“time”:“2024-07-08T16:11:49Z”}

{“index”:“6672b5b6cd7159bdb”,“level”:“info”,“msg”:“restore local index”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“hnsw_vector_cache_prefill”,“count”:5000,“index_id”:“main”,“level”:“info”,“limit”:1000000000000,“msg”:“prefilled vector cache”,“time”:“2024-07-08T16:11:49Z”,“took”:26883467}

{“action”:“telemetry_push”,“level”:“info”,“msg”:“telemetry started”,“payload”:“\u0026{MachineID:f6835ccc-31c8-4fe8-8442-90f075d8d2b3 Type:INIT Version:1.25.2 NumObjects:0 OS:linux Arch:arm64 UsedModules:}”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“6672b5b6cd7159”,“index”:“6672b5b6cd7159bdb”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/6672b5b6cd7159bd5/7EtZRTDHZyWY/lsm/objects/segment-1720454667233976221”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“6672b5b6cd7159bd”,“index”:“6672b5b6cd7159b”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_dataType/segment-1720454667243683422”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_dataType_searchable/segment-1720454667250774444”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_docName/segment-1720454300546997303”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_docName_searchable/segment-1720454300550333412”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_key/segment-1720454667258914344”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_text/segment-1720454667265382584”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property_text_searchable/segment-1720454667272903507”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_6672b5b6cd7159bdb2f6b575”,“index”:“_6672b5b6cd7159bdb2f6b575”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/lsm/property__id/segment-1720454667280362040”,“shard”:“7EtZRTDHZyWY”,“time”:“2024-07-08T16:11:49Z”}

{“level”:“info”,“msg”:“Completed loading shard _6672b5b6cd7159bdb2f6b575_7EtZRTDHZyWY in 3.9361ms”,“time”:“2024-07-08T16:11:49Z”}

{“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“info”,“msg”:“restore local index”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“hnsw_vector_cache_prefill”,“count”:3000,“index_id”:“main”,“level”:“info”,“limit”:1000000000000,“msg”:“prefilled vector cache”,“time”:“2024-07-08T16:11:49Z”,“took”:91757}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/objects/segment-1720443037283495175”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_docName/segment-1720443037284478051”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_docName_searchable/segment-1720443037284928456”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_key/segment-1720443037285297770”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_text/segment-1720443037285733078”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_text_searchable/segment-1720443037286143541”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_dataType/segment-1720443037286500975”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property_dataType_searchable/segment-1720443037286748076”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“lsm_recover_from_active_wal”,“class”:“_66879e0b3c40e9cdf374f8ea”,“index”:“_66879e0b3c40e9cdf374f8ea”,“level”:“warning”,“msg”:“active write-ahead-log found. Did weaviate crash prior to this? Trying to recover…”,“path”:“/var/lib/weaviate/_66879e0b3c40e9cdf374f8ea/jSUejNL3pdFG/lsm/property__id/segment-1720443037287251378”,“shard”:“jSUejNL3pdFG”,“time”:“2024-07-08T16:11:49Z”}

{“level”:“info”,“msg”:“Completed loading shard _66879e0b3c40e9cdf374f8ea_jSUejNL3pdFG in 5.402865ms”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“hnsw_vector_cache_prefill”,“count”:3000,“index_id”:“main”,“level”:“info”,“limit”:1000000000000,“msg”:“prefilled vector cache”,“time”:“2024-07-08T16:11:49Z”,“took”:4717552}

{“action”:“bootstrap”,“leader”:“172.25.0.2:8300”,“level”:“info”,“msg”:“successfully joined cluster”,“time”:“2024-07-08T16:11:49Z”}

{“action”:“attach_tombstone_to_deleted_node”,“level”:“info”,“msg”:“found a deleted node (21) without a tombstone, tombstone was added”,“node_id”:21,“time”:“2024-07-08T16:15:06Z”}

{“action”:“requests_total”,“api”:“rest”,“class_name”:“_6672b5b6cd7159bdb2f6b575”,“error”:“put object: import into index _6672b5b6cd7159bdb2f6b575: put local object: shard="7EtZRTDHZyWY": update vector index: insert doc id 23 to vector index: find and connect neighbors: at level 0: pick entrypoint at level beginning: context deadline exceeded”,“level”:“error”,“msg”:"unexpected

{“action”:“hybrid”,“error”:“explorer: get class: vector search: object vector search at index _6672b5b6cd7159bdb2f6b575: shard _6672b5b6cd7159bdb2f6b575_7EtZRTDHZyWY: vector search: entrypoint was deleted in the object store, it has been flagged for cleanup and should be fixed in the next cleanup cycle”,“level”:“error”,“msg”:“denseSearch failed”,“time”:“2024-07-08T16:17:39Z”}

{“action”:“hybrid”,“error”:“explorer: get class: vector search: object vector search at index _6672b5b6cd7159bdb2f6b575: shard _6672b5b6cd7159bdb2f6b575_7EtZRTDHZyWY: vector search: entrypoint was deleted in the object store, it has been flagged for cleanup and should be fixed in the next cleanup cycle”,“level”:“error”,“msg”:“denseSearch failed”,“time”:“2024-07-08T16:22:40Z”}

time=“2024-07-08T16:23:53Z” level=error msg=“unregistering callback ‘shard/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/vector/tombstone_cleanup’ of ‘index/_6672b5b6cd7159bdb2f6b575/vector/tombstone_cleanup’ failed: context deadline exceeded, unregistering callback ‘shard/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/geoProps/tombstone_cleanup’ of ‘index/_6672b5b6cd7159bdb2f6b575/geo_props/tombstone_cleanup’ failed: context deadline exceeded, unregistering callback ‘shard/_6672b5b6cd7159bdb2f6b575/7EtZRTDHZyWY/geo_props/commit_logger’ of ‘index/_6672b5b6cd7159bdb2f6b575/geo_props/commit_logger’ failed: context deadline exceeded” action=drop_shard class=_6672b5b6cd7159bdb2f6b575 id=_6672b5b6cd7159bdb2f6b575_7EtZRTDHZyWY

{“action”:“delete_index”,“class”:“_6672b5b6cd7159bdb2f6b575”,“level”:“error”,“msg”:“drop: stop vector tombstone cleanup cycle: context deadline exceeded”,“time”:“2024-07-08T16:24:53Z”}

{“level”:“warning”,“msg”:“prop len tracker file /var/lib/weaviate/_6672b5b6cd7159bdb2f6b575/ZW2YQ2tGWrQ1/proplengths does not exist, creating new tracker”,“time”:“2024-07-08T16:32:48Z”}

{“level”:“info”,“msg”:“Created shard _6672b5b6cd7159bdb2f6b575_ZW2YQ2tGWrQ1 in 2.132659ms”,“time”:“2024-07-08T16:32:48Z”}

{“action”:“hnsw_vector_cache_prefill”,“count”:1000,“index_id”:“main”,“level”:“info”,“limit”:1000000000000,“msg”:“prefilled vector cache”,“time”:“2024-07-08T16:32:48Z”,“took”:76692}

{“level”:“warning”,“msg”:“prop len tracker file /var/lib/weaviate/_6676cbe571e518ab887dd0ae/hGkeYTzf7eMF/proplengths does not exist, creating new tracker”,“time”:“2024-07-08T16:34:26Z”}

{“level”:“info”,“msg”:“Created shard _6676cbe571e518ab887dd0ae_hGkeYTzf7eMF in 761.399µs”,“time”:“2024-07-08T16:34:26Z”}

{“action”:“hnsw_vector_cache_prefill”,“count”:1000,“index_id”:“main”,“level”:“info”,“limit”:1000000000000,“msg”:“prefilled vector cache”,“time”:“2024-07-08T16:34:26Z”,“took”:79843}

{“action”:“tombstone_cleanup_begin”,“class”:“_6672b5b6cd7159bdb2f6b575”,“level”:“info”,“msg”:“class _6672b5b6cd7159bdb2f6b575: shard ZW2YQ2tGWrQ1: starting tombstone cleanup”,“shard”:“ZW2YQ2tGWrQ1”,“time”:“2024-07-08T16:37:48Z”,“tombstones_in_cycle”:1,“tombstones_total”:1}

{“action”:“tombstone_cleanup_begin”,“class”:“_6676cbe571e518ab887dd0ae”,“level”:“info”,“msg”:“class _6676cbe571e518ab887dd0ae: shard hGkeYTzf7eMF: starting tombstone cleanup”,“shard”:“hGkeYTzf7eMF”,“time”:“2024-07-08T17:14:26Z”,“tombstones_in_cycle”:1,“tombstones_total”:1}

1 Like

Hey, could someone please help answer this.
@DudaNogueira

Please let me know if any additional information is required.

hi @AbhiP !!

Do you have any readings from resource usage?

Also, have you tried the latest version? I believe this issue was recently fixed.

Hey @DudaNogueira , thanks for responding to my query!

I’m running 2 docker containers within an EC2 instance (one of which is the Weaviate server). The EC2 instance has 1vCPU and 8GB of RAM.

With Weaviate python v3 client - if I use a batch upload operation, the CPU usage rises to 100%, a batch upload error is thrown, but on automatic retry, the operation completed.
I moved to weviate v3 client from v4 python client due to the issues mentioned when I started this thread.

Now on Weaviate v3 client, when attempting retrieval, I get this error:
“explorer: get class: vector search: object vector search at index superdash_6672b5b6cd7159bdb2f6b575: shard superdash_6672b5b6cd7159bdb2f6b575_ZW2YQ2tGWrQ1: vector search: entrypoint was deleted in the object store, it has been flagged for cleanup and should be fixed in the next cleanup cycle”

On regular usage (only retrieval operations with Weaviate v4 python client), the server CPU usage doesn’t rise above 5%.

In response to your other question, no, I haven’t tried the latest version.

Questions:

  1. Are the server specs mentioned insufficient for me to run Weaviate in a docker container?
  2. Is it possible to upgrade the version of Weaviate server instance running in the docker container without necessitating a complete data migration?

Hey @DudaNogueira, I can confirm that on the Weaviate Python v4 client as well, the CPU usage rises to 100% on object upload.
Object deletion works well sometimes, and times out other times. It is not consistent.

Will increasing the CPU cores on the EC2 instance help to mitigate this problem?

@DudaNogueira Facing this issue. What is one way to resolve this issue?

Hey @DudaNogueira, any support here?
Would be happy even if you can point us to some documentation that we may have missed reading; that could help us solve the issue.

Hi @ThomDB !! Welcome to our community. What is the server version you are using?

@AbhiP Have you tried this on latest version?

And what are the number of objects in place?

Thanks!

Hey @DudaNogueira , As I have already mentioned, I have not yet tried this on the latest version, because this is a production DB. I have asked you certain specific questions already related to this. Let me repeat them:

  1. Are the server specs mentioned insufficient for me to run Weaviate in a docker container?
  2. Is it possible to upgrade the version of Weaviate server instance running in the docker container without necessitating a complete data migration?

The number of individual objects are in the order of about ~10,000 or so

hi!

You can take a backup and change the image tag at your docker compose.

Then, if something goes wrong. just revert back the original version docker image, and restore the data.

You can also do the backup, spin a new Weaviate server, restore, and do some tests.

Let me know if this helps!

Thankjs!