Increasing Backup Size and Duration for Weaviate Index

Hello Weaviate Community,

I’m seeking help regarding an issue we’re encountering with our Weaviate index backups. In our project, we perform daily backups of the Weaviate index after updating some / all documents. Despite the total number of documents remaining the same overall, we’ve observed a significant increase in the size of the backup files over time. Initially, our backups were around 400 MB, but they’ve now grown to 1 GB. Along with the increase in size, the time required to complete the backups has also increased.

Details:
Daily Backup Routine: We update some documents or all documents daily.
Total Document Count: The overall number of documents has not changed significantly.
Initial Backup Size: Approximately 400 MB.
Current Backup Size: Approximately 1 GB.

Issues:

  1. Increasing Backup Size: Despite the document count remaining constant, backup size has increased.
  2. Increased Backup Time: The duration to complete the backup process has also increased.

Questions:

  1. What could be causing the backup size to increase even though the document count remains the same?
  2. Are there any best practices or configurations we should consider to prevent this from happening?
  3. How can we optimize the backup process to reduce the time it takes?

Server Setup Information

  • Weaviate Server Version: 1.22.0
  • Deployment Method: Docker
  • Multi Node? Number of Running Nodes: single node
  • Client Language and Version: Python and v3.25.3

Thanks

hi @Iammsd07 !!

1.22.0 is a fairly old version and a lot has improved since then. We strongly suggest to upgrade to at least latest 1.24.X

I recall an issue with tombstones not being properly deleted at some point. So maybe this could be the reason that.

Even with the same object count, do you update or delete/replace those documents?

Let me know if this helps.

Thanks!