Downtime
Incident Report for Calidity
Postmortem

At the start of the incident, Janar discovered that the Kubernetes cluster has no nodes servicing it. This means all workloads on the Kubernetes cluster were stopped.

Google Cloud Console showed all the nodes that were supposed to service the cluster as online, leading us to believe that they had somehow disconnected / deauthorized from the cluster.

We spun down all the disconnected nodes and spun up new servers to start servicing the workloads.

The cluster immediately started self-healing to restore all workloads to operational condition.

Posted Nov 19, 2021 - 09:09 UTC

Resolved
Cluster has self-recovered, all operations working.
Posted Nov 19, 2021 - 09:07 UTC
Monitoring
The cluster has self-recovered.
Posted Nov 19, 2021 - 09:01 UTC
Update
We are continuing to work on a fix for this issue.
Posted Nov 19, 2021 - 09:00 UTC
Update
We are continuing to work on a fix for this issue.
Posted Nov 19, 2021 - 08:59 UTC
Identified
The Google Cloud Kubernetes cluster has unexpectedly disconnected all servers from the cluster.
Posted Nov 19, 2021 - 08:52 UTC
Investigating
Cluster is offline, backend is unavailable
Posted Nov 19, 2021 - 08:47 UTC
This incident affected: Cloud services (API, Calidity Web App) and Data collectors (LoRa data collector, Cloud Gateway data collector, Sigfox data collector, Estfeed data collector).