Cluster instance stopped and stuck at "Provisioning certificate"


We have one managed Weaviate Cluster instance in GCP Tokyo region that has been stuck at “Provisioning certificate…” after running for a few days. It is now stuck for about 12 hours.

Right before the issue, the instance started returning 502s to queries.

Is there anything we could do to remove or restore this instance?

Hi @naclbit -

We can look into it. Could you DM me or @Glockenbeat your WCS username email please? That will help us to to identify the cluster.

Hi jphwang,

Thanks, but it looks like my account in this Discourse forum has no permission granted to send DMs (if anything, we’ve registered to Weaviate using the same e-mail as this account).

Oh I didn’t know that. I found your email address - we’ll be in touch soon :slight_smile:

Hi, the instance in question is still stuck in the same state.

Another instance we created, in the mean time, had similar issue an hour ago, but it backed up itself. However, while search is working on that instance, batch post requests are now 100% timing out (most of these reqs are small - 768dim * 32bit float * avg 3-4 batches, and we are vectorizing externally).

Is there anything we could do to fix the issue? The load is not high (a lot lower than the peak) right now. There is no way for us to confirm managed cluster’s inner workings, but it feels like the instance failed to recreate/scale itself after a brief outage.

Hi - any help on this issue?

Basically, we currently have one stuck instance (unchanged) and having to remove and recreate the working cluster every day once it gets into a 502/timeout crash loop.


Actually you witnessed a very uncommon bug that was present in Weaviate and which has been fixed by 1.19.8. We upgraded your cluster to this version yesterday and were monitoring, unfortunately it got deleted in the meantime.

However with the newest version this issue should not occur anymore. If they do, please let us know and we’ll prioritise looking into it.

Hi Glockenbeat-

We had a different issue with other clusters, and deleted the cluster for the time being. Thanks for the heads up.