High streaming latency and device inactive
Incident Report for CVaaS
Postmortem

A network upgrade of the cluster caused some backend to be unreachable. The network degradation caused processing delay of the streamed data, resulting in customer’s devices to be shown as inactive and the various API to not be responding in time. Once the misbehaving server were removed from the cluster, the system was able to recover.

This incident is not related to yesterday performance degradation, but the visible effect for customers is the same.

Posted Jul 31, 2024 - 15:43 UTC

Resolved
This incident has been resolved.
Posted Jul 31, 2024 - 15:33 UTC
Update
The streaming latency is back to normal. Keeping the incident open for a few more minutes, to validate that we have fully recovered.
Posted Jul 31, 2024 - 15:20 UTC
Update
We are still monitoring the situation. The cluster has mostly recovered, but higher latency is still observed for a small fraction of the ingested data.
Posted Jul 31, 2024 - 15:03 UTC
Update
We are continuing to monitor the situation. The backlog of streamed data has been processed. Higher streaming latency are still being observed at the moment.
Posted Jul 31, 2024 - 14:36 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 31, 2024 - 14:30 UTC
Identified
The issue has been identified, we are applying mitigation. At the moment, data is being queued but not processed.
Posted Jul 31, 2024 - 14:18 UTC
Investigating
We are currently investigating this issue.
Posted Jul 31, 2024 - 14:07 UTC
This incident affected: us-central1-a (Core Platform).