Event Summary:
On September 28th between approximately 16:41 UTC and 21:08 UTC, we experienced widespread service interruptions in the Americas (AMER) region of the WatchGuard Cloud (WGC) as well as partial service interruptions in the European (EMEA) and Asia-Pacific (APAC) regions of the WatchGuard Cloud (WGC). The event is now resolved, and all WatchGuard Cloud services are operating normally for all users in all regions.
For customers registered to the AMER WGC region, this resulted in an inability to log in, administer accounts or operators, configure products, view dashboards, and generate reports for products through the WGC platform. It also impacted the operation of many cloud-based services such as AuthPoint Authentications and cloud management for Fireboxes or Access Points. Device log ingestion and WatchGuard endpoint security clients were not impacted.
For customers registered to the EMEA or APAC WGC regions, this resulted in an inability to log in to the WGC Web UI until 18:07 UTC. However, customers already logged in were not impacted and all services continued to operate normally.
Event Findings:
At approximately 16:20 UTC on September 28th, 2022, multiple components in the Americas (AMER) region of the WatchGuard Cloud (WGC) had slowly increasing error rates that by 16:41 UTC climbed high enough to cause potential service interruptions, trigger our alarms, and ultimately resulted in widespread issues throughout the WGC Platform along with other cloud-based products. Due to an initial redirection from the AMER WGC region prior to authenticating to the European (EMEA) or Asia-Pacific (APAC) WGC regions, customer logins in those regions were also impacted. Our on-call engineers were immediately alerted to this issue and at 16:52 UTC we found the high error rates originated within our 3rd-party infrastructure provider. We began working to mitigate the impacts of this large-scale infrastructure failure and by 18:07 UTC we were able to deploy changes restoring the WGC WebUI login in our EMEA and APAC regions. Both our infrastructure provider and our teams continued to work on mitigating the impact in our AMER region and by 20:37 UTC, error rates began decreasing on our components allowing partial recovery of the impacted products. At 21:08 UTC, all products returned to normal operation for all WGC regions.
We sincerely apologize for the impact on our affected customers, and we know the stability of the WatchGuard Cloud is important to you and your business.