Flow data processing outage

Incident Report for CVaaS

Postmortem

Incident Report: CVaaS Service Disruption - September 1, 2025

On September 1, 2025, CloudVision as-a-Service region cv-prod-euwest-2 experienced a disruption that lasted 11 hours and 48 minutes. During this time, the service was unable to ingest flow data successfully. As a result, a gap in data might be seen when viewing traffic flow dashboards, flows in Topology views, event-specific details related to traffic flows, or UNO-related product reporting during this period. This incident was not due to a security breach, intrusion, or otherwise malicious activity.

Root cause

This outage was caused by a subset of our data becoming unreadable. The particular corner case was a result of a race condition found in a specific part of the system that controls data lifecycle and enforces access control through checking permissions. Successful authz is required for any CRUD operations on any part of our data to be successful. When we reach an inconsistent state as such, our system’s default behavior is to fail close. This design, while protective, led to this incident where we could not successfully ingest and write flow data, and the system remained secure by design.

Our Response

Our team worked non-stop to resolve the issue. The extended duration of the outage was due to an initial misdiagnosis of the root cause. Once the correct cause was identified, our team was able to resolve the issue and restore full functionality within approximately 30 minutes.

We understand this broad outage and the impact on your experience is unacceptable. We sincerely apologize to all affected CloudVision as-a-Service users in the affected region. We are committed to implementing changes identified during our internal investigation to prevent this type of failure from happening again.

Thank you for using CloudVision as-a-Service.

Posted Sep 08, 2025 - 18:10 UTC

Resolved

Ingestion of flow data into cv-prod-euwest-2 was unavailable for approximately 12 hours. Consequently, no flow data was ingested or is currently visible for your devices during this period.

Postmortem for this incident is still pending. Our team is currently actively working on reviewing this incident, and postmortem update.

Posted Sep 02, 2025 - 05:27 UTC

Monitoring

We have identified, recovered from the main issue and now monitoring. All flow ingest should resume processing again.

Posted Sep 02, 2025 - 05:06 UTC