Cluster instance stopped and stuck at "Provisioning certificate"

naclbit · June 12, 2023, 7:06am

Hi,

We have one managed Weaviate Cluster instance in GCP Tokyo region that has been stuck at “Provisioning certificate…” after running for a few days. It is now stuck for about 12 hours.

Right before the issue, the instance started returning 502s to queries.

Is there anything we could do to remove or restore this instance?

jphwang · June 12, 2023, 9:19am

Hi @naclbit -

We can look into it. Could you DM me or @Glockenbeat your WCS username email please? That will help us to to identify the cluster.

naclbit · June 12, 2023, 9:41am

Hi jphwang,

Thanks, but it looks like my account in this Discourse forum has no permission granted to send DMs (if anything, we’ve registered to Weaviate using the same e-mail as this account).

jphwang · June 12, 2023, 9:54am

Oh I didn’t know that. I found your email address - we’ll be in touch soon

naclbit · June 12, 2023, 8:34pm

Hi, the instance in question is still stuck in the same state.

Another instance we created, in the mean time, had similar issue an hour ago, but it backed up itself. However, while search is working on that instance, batch post requests are now 100% timing out (most of these reqs are small - 768dim * 32bit float * avg 3-4 batches, and we are vectorizing externally).

Is there anything we could do to fix the issue? The load is not high (a lot lower than the peak) right now. There is no way for us to confirm managed cluster’s inner workings, but it feels like the instance failed to recreate/scale itself after a brief outage.

naclbit · June 15, 2023, 10:25am

Hi - any help on this issue?

Basically, we currently have one stuck instance (unchanged) and having to remove and recreate the working cluster every day once it gets into a 502/timeout crash loop.

Clipboard01

Glockenbeat · June 22, 2023, 11:01am

Actually you witnessed a very uncommon bug that was present in Weaviate and which has been fixed by 1.19.8. We upgraded your cluster to this version yesterday and were monitoring, unfortunately it got deleted in the meantime.

However with the newest version this issue should not occur anymore. If they do, please let us know and we’ll prioritise looking into it.

naclbit · June 22, 2023, 11:13am

Hi Glockenbeat-

We had a different issue with other clusters, and deleted the cluster for the time being. Thanks for the heads up.

Topic		Replies	Views
New instance provisioning stuck on "Creating k8s cluster" Support	1	373	July 8, 2023
Weaviate cluster is very unstable (1.29.2) Support	8	341	April 9, 2025
Write timeout in combination with replicas Support wcs , technical	18	484	April 22, 2025
Weaviate didn't start up Support	6	938	July 30, 2024
[QUESTION] Async replication hashbeat fails with context deadline timeout Support bug , python , technical	6	274	February 25, 2025

Cluster instance stopped and stuck at "Provisioning certificate"

Related topics