Weaviate Cloud Hosting Backups

Description

Let’s say something bad happens and that I need to do a backup of my cluster hosted on the Weaviate cloud.

What are the capabilities here? I saw on another post that Weaviate has covered, but it’s free stressful to not know exactly how we are covered.

Couldn’t find any documentation on this.

Particularly, the questions were:

  • In MongoDB, I can backup to a specific time period as regular backups at intervals are done. Is this possible?
  • Is there a self-serve way to do backup?
  • Can I export my Weaviate instance and store it myself as well?

Server Setup Information

  • Weaviate Server Version: Weaviate Cloud Hosted
  • Deployment Method:
  • Multi Node? Number of Running Nodes:
  • Client Language and Version:
  • Multitenancy?: Yes

Any additional Information

hi @Tejas_Sharma !!

All hosted clusters in our cloud have a backup bucket already configured.

This means that, at any time, you can create or restore from your own backups, on top of the automated backups we do from our side.

Those “ad-hoc” backups can be store up to 1 month.

Here is an example code in python v4 client for that, using a real serverless cluster in our cloud:

client.collections.delete("Collection1")
collection = client.collections.create("Collection1")
collection.data.insert({"name": "John"})
collection.data.insert({"name": "Mary"})
print("Collection1 exists?", client.collections.exists("Collection1"))
print("Collection1 total count", collection.aggregate.over_all())
backup_task = client.backup.create(backup_id="super-cool-backup-id", backend="gcs", include_collections=["Collection1"], wait_for_completion=True)
print("Backup task", backup_task)
client.collections.delete("Collection1")
print("Collection1 deleted!")
print("Collection1 exists?", client.collections.exists("Collection1"))
restore_task = client.backup.restore(backup_id="super-cool-backup-id", backend="gcs", include_collections="Collection1", wait_for_completion=True)
print("Collection1 restore task", restore_task)
print("Collection1 exists?", client.collections.exists("Collection1"))
print("Collection1 total count", collection.aggregate.over_all())

I got this as the output:

Collection1 exists? True
Collection1 total count AggregateReturn(properties={}, total_count=2)
Backup task error=None status=<BackupStatus.SUCCESS: 'SUCCESS'> path='gs://weaviate-wcs-prod-cust-us-west3-workloads-backups/69e14018-8f2c-4361-bf79-0953902372b3/super-cool-backup-id' backup_id='super-cool-backup-id' collections=['Collection1']
Collection1 deleted!
Collection1 exists? False
Collection1 restore task error=None status=<BackupStatus.SUCCESS: 'SUCCESS'> path='gs://weaviate-wcs-prod-cust-us-west3-workloads-backups/69e14018-8f2c-4361-bf79-0953902372b3/super-cool-backup-id' backup_id='super-cool-backup-id' collections=['Collection1']
Collection1 exists? True
Collection1 total count AggregateReturn(properties={}, total_count=2)

We do not provide a feature in our console to export your backup. :frowning:

However, If you ever need a copy of your backups for testing or in case you want take your vectors elsewhere - hope that’s not the case!! :slight_smile: - you can always reach out to our super friendly support line at:

Let me know if this answer your questions!

Thanks!

Thanks Duda, so from Python code I can do a backup at any time. Is there any docs link?

Oh and no migration at all, super happy with Weaviate but this is just something on the back of our minds at the team in case an emergency happens — planning before hand!

1 Like

Glad to hear that!

Here are the docs about backup:

If you need direct access to the backup bucket, then the Bring Your Own Cloud offering is a way to go

Let me know if that helps!

Thanks!

1 Like

Hi @DudaNogueira

I’m experiencing significant challenges creating backups on my paid Weaviate Cloud serverless cluster using Azure storage. Despite my best efforts following the available documentation, I haven’t been able to create a working implementation.

Specifically, I’m unable to determine how to properly pass the Azure Storage Account connection string and container name parameters to the Weaviate Cloud backup API. The documentation doesn’t seem to provide clear examples for this use case.

Here’s the code I’ve attempted to use:

import weaviate
import os
import datetime
import argparse
from weaviate.backup import BackupStorage


wcd_url = "https://123abc.c1.europe-west3.gcp.weaviate.cloud"
wcd_api_key = "key..123"
azure_connection_string="DefaultEndpointsProtocol=https;AccountName=...",
azure_container="weaviate-backups"


# Generate backup ID with timestamp
timestamp = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
backup_id = f"backup-{timestamp}"

# Connect to Weaviate
print(f"Connecting to Weaviate at {wcd_url}")
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,
    auth_credentials=weaviate.auth.Auth.api_key(wcd_api_key)
)

try:
    print(f"Starting backup with ID: {backup_id}")
    
    # The most basic create call with just the required parameters
    backup_config = {
        "backend": BackupStorage.AZURE,
        "azure": {
            "container": azure_container,
            "connection_string": azure_connection_string
        }
    }
    
    # Start backup process
    print(f"Starting backup with ID: {backup_id}")
    client.backup.create(
        backup_id=backup_id,
        **backup_config
    )
    
    print(f"Backup initiated: {backup}")
    
    # Get initial status
    status = client.backup.get_status(backup_id)
    print(f"Initial status: {status}")
    
except Exception as e:
    print(f"Error: {str(e)}")
finally:
    client.close()
    print("Connection closed")

hi @filip.s !!

All clusters hosted in our cloud already have the backup buckets configured, and the backups are triggered daily by our platform.

You can check the module user for your cluster with:

client.get_meta()

On top of the automatic backups we provide, you can also trigger a new backup at anytime passing the backend used in your cluster.

Note: We do not expose those backups in our console.

For any issues with a hosted cluster, the best place to ask for help is Weaviate Cloud

Thanks!

Thanks for your answer.

It is great you’re doing backups, but I feel I am missing enough control over it in order to feel confident that I am covered when something happens.

I would like to use example of a SQL database in Azure. In Azure portal, I can see list of my regular backups and I can trigger restore by few clicks. I feel that I am in control.
Correct my if I’m wrong, but I don’t see such option at Weaviate.

What is the process in situation when there is some kind of data loss (caused by whatever - hacking incident, our client app error, etc.), I want to see what backups are there in Weaviate, and trigger a restore?

Thanks for your help

hi @filip.s !

You are right. We do not expose the backups you have available in your cluster hosted in our cloud.

However, you can always reach out to our support team, and we’ll send you the ids of the backup we currently have, either created by us or by you.

With those ids, you can restore your data at anytime using client.backup.restore.

Let me know if that helps!

Thanks!

Thanks @DudaNogueira , it is clearer for me now.

In case of incidents, reaching out to Weaviate support may end up taking too long (I assume you guys don’t work 24/7).
Is there any reason why you decided not to expose cloud backups with your customers?

I can also do backups manually, using the functionality in your python library that saves backups to Azure Storage Account. However, I just want able to make it work based on your documentation nor I found any workable examples online. Would you mind sharing a simple script that works?

Thanks for your help.

hi @filip.s !

Our support team covers all timezones.

Here you can find more information on response time for our support levels according to the severity level:

I don’t believe there is any specific reason for the backups not being exposed other than it was not yet implemented.

I have asked internally about this feature.

I can also do backups manually, using the functionality in your python library that saves backups to Azure Storage Account. However, I just want able to make it work based on your documentation nor I found any workable examples online. Would you mind sharing a simple script that works?

You can’t do that, actually. The backup module is configured at the cluster level. So you can only perform backups to where it is already configured in your cluster.

You can migrate your data over to a new cluster, or store it locally, using this migration guide:

Note that we also have BYOC (Bring your Own Cloud) offering where you can customize some of the configurations while still counting with our services.

Let me know if that helps!

Thanks!

1 Like