Elevated connections and requests failure

Incident Report for CVaaS

Postmortem

Incident Report: CVaaS Service Disruption - September 8, 2025
On September 8, 2025, CloudVision as-a-Service experienced a service disruption, lasting from 8 minutes in our UK region (cv-prod-uk-1) to 25 minutes in a US region (us-central1-a). During this time, users were unable to establish new connections, and transient connection errors occurred for those with existing connections.

We do not expect any data loss, as device state streaming was coalesced and resumed after the disruption. This incident was not caused by a security breach, intrusion, or other malicious activity.

Root Cause

An internal certificate expired within our trust chain, causing a disruption. The certificate was not properly monitored in our system, which notifies and automatically renews certificates, leading to its expiration.

Our Response

Our team was alerted to similar service disruptions across various global regions, and a status page incident was posted because the issue was not transient and was unrelated to our service provider. The disruption was triaged, the root cause identified, and mitigations implemented within 20 minutes. Corrective actions were taken simultaneously over the next approximately 35 minutes. The incident was closed after an assessment of all regions was completed, and services were restored.

We understand that an outage of this nature is unacceptable and apologize to all affected CloudVision as-a-Service users impacted by this incident. We are committed to implementing changes identified during our internal investigation to prevent this type of failure from happening again.

Thank you for using CloudVision as-a-Service.

Posted Sep 16, 2025 - 17:55 UTC

Resolved

This incident is now fully resolved.

Postmortem for this incident is still pending. Our team is currently actively working on reviewing this incident, and postmortem update.
Posted Sep 08, 2025 - 15:35 UTC

Update

us-central1-c region is fully operational.
Posted Sep 08, 2025 - 15:31 UTC

Update

us-central1-a region is fully operational.
Posted Sep 08, 2025 - 15:07 UTC

Update

us-central1-b region is fully operational.
Posted Sep 08, 2025 - 14:56 UTC

Update

ausoutheast-1 and apnortheast-1 regions are fully operational.
Posted Sep 08, 2025 - 14:42 UTC

Update

euwest-2 region is fully operational.
Posted Sep 08, 2025 - 14:26 UTC

Update

uk-1, india-1 and na-northeast1-b regions are fully operational.
Posted Sep 08, 2025 - 14:22 UTC

Monitoring

Services is coming back up, but with degraded performance. We are monitoring the situation.
Posted Sep 08, 2025 - 14:15 UTC

Update

A fix is rolling out on all service regions.
Posted Sep 08, 2025 - 14:01 UTC

Identified

We have identified the root cause of the issue. The team is working on a fix.
Posted Sep 08, 2025 - 13:18 UTC

Investigating

We are currently investing alerts about connection failure affecting all service regions.
Posted Sep 08, 2025 - 12:56 UTC
This incident affected: euwest-2 (Core Platform), apnortheast-1 (Core Platform), us-central1-a (Core Platform), us-central1-c (Core Platform), ausoutheast-1 (Core Platform), na-northeast1-b (Core Platform), uk-1 (Core Platform), us-central1-b (Core Platform), and india-1 (Core Platform).