2022-09-28 - Widespread WGC Issues
Incident Report for WatchGuard Technologies
Postmortem


Event Summary:
On September 28th between approximately 16:41 UTC and 21:08 UTC, we experienced widespread service interruptions in the Americas (AMER) region of the WatchGuard Cloud (WGC) as well as partial service interruptions in the European (EMEA) and Asia-Pacific (APAC) regions of the WatchGuard Cloud (WGC). The event is now resolved, and all WatchGuard Cloud services are operating normally for all users in all regions.

For customers registered to the AMER WGC region, this resulted in an inability to log in, administer accounts or operators, configure products, view dashboards, and generate reports for products through the WGC platform. It also impacted the operation of many cloud-based services such as AuthPoint Authentications and cloud management for Fireboxes or Access Points. Device log ingestion and WatchGuard endpoint security clients were not impacted.

For customers registered to the EMEA or APAC WGC regions, this resulted in an inability to log in to the WGC Web UI until 18:07 UTC. However, customers already logged in were not impacted and all services continued to operate normally.

Event Findings:
At approximately 16:20  UTC on September 28th, 2022, multiple components in the Americas (AMER) region of the WatchGuard Cloud (WGC) had slowly increasing error rates that by 16:41 UTC climbed high enough to cause potential service interruptions, trigger our alarms, and ultimately resulted in widespread issues throughout the WGC Platform along with other cloud-based products. Due to an initial redirection from the AMER WGC region prior to authenticating to the European (EMEA) or Asia-Pacific (APAC) WGC regions, customer logins in those regions were also impacted. Our on-call engineers were immediately alerted to this issue and at 16:52 UTC we found the high error rates originated within our 3rd-party infrastructure provider. We began working to mitigate the impacts of this large-scale infrastructure failure and by 18:07 UTC we were able to deploy changes restoring the WGC WebUI login in our EMEA and APAC regions. Both our infrastructure provider and our teams continued to work on mitigating the impact in our AMER region and by 20:37 UTC, error rates began decreasing on our components allowing partial recovery of the impacted products. At 21:08 UTC, all products returned to normal operation for all WGC regions.

We sincerely apologize for the impact on our affected customers, and we know the stability of the WatchGuard Cloud is important to you and your business.

Posted Sep 29, 2022 - 21:38 UTC

Resolved
We are no longer experiencing issues in the AMER Region of WatchGuard Cloud. This event is now resolved, and we'll continue to monitor for any potential recurrence. All services are operating normally at this time. We'll post a summary with the details of this event within 24hrs. We apologize for any impact this may have had on you or your customers.
Posted Sep 28, 2022 - 21:43 UTC
Monitoring
Our systems show the AMER region of WatchGuard Cloud is returning to normal. Issues within our 3rd party infrastructure provider have been resolved. We're continuing to monitor to ensure system stability. We'll post our next update in 30min, if not sooner.
Posted Sep 28, 2022 - 21:16 UTC
Update
We're seeing recovery across the AMER region of WatchGuard Cloud but some services and customers remain impacted by this issue. We'll post our next update in 30 minutes, if not sooner. Thank you for your continued patience.
Posted Sep 28, 2022 - 20:45 UTC
Update
We're still seeing partial recovery in some services such as AuthPoint Authentications but many customers remain impacted by this issue. We continue to work on finding potential mitigations in our AMER region and remain in close communication with our 3rd party infrastructure provider. We'll post our next update in 30 minutes, if not sooner. Thank you for your continued patience.
Posted Sep 28, 2022 - 20:15 UTC
Update
We're still seeing partial recovery in some services such as AuthPoint Authentications but many customers remain impacted by this issue. We continue to work on potential mitigations in our AMER region and remain in close communication with our 3rd party infrastructure provider who is working to resolve their issues as well. We'll post our next update in 30 minutes, if not sooner. Thank you for your patience.
Posted Sep 28, 2022 - 19:44 UTC
Update
We're starting to see partial recovery in some services such as AuthPoint Authentications but many customers remain impacted by this issue. We are still working hard to mitigate impacts in our AMER region and remain in close communication with our 3rd party infrastructure provider who is working to resolve their issues as well. We'll post our next update in 30 minutes, if not sooner. Thank you for your patience.
Posted Sep 28, 2022 - 19:11 UTC
Update
We are still working hard to mitigate impacts in our AMER region. We remain in close communication with our 3rd party infrastructure provider and they are actively working to resolve the issue. We'll post our next update in 30 minutes, if not sooner. Thank you for your patience.
Posted Sep 28, 2022 - 18:39 UTC
Identified
A mitigation is in place to restore Web UI Login to WatchGuard Cloud in EMEA and APAC regions. We are still working hard to mitigate impacts in our AMER region. We'll post our next update in 30 minutes, if not sooner. Thank you for your patience.
Posted Sep 28, 2022 - 18:07 UTC
Update
We've identified issues with our 3rd party infrastructure provider. We're working to mitigate the impact to the WatchGuard Cloud. We'll post our next update in 30min, if not sooner.
Posted Sep 28, 2022 - 17:39 UTC
Update
We are continuing to investigate this issue.
Posted Sep 28, 2022 - 17:14 UTC
Investigating
We're investigating reports of issues in the WatchGuard Cloud. Once we determine the impact and scale of these potential issues we'll post our next update in 30min, if not sooner.
Posted Sep 28, 2022 - 17:04 UTC
This incident affected: AuthPoint Authentication:::AMER (RADIUS:::AMER, SAML:::AMER, Logon Agents:::AMER, ADFS:::AMER, RDWeb:::AMER, Firebox:::AMER, API:::AMER), AuthPoint Administration:::AMER (Account Administration:::AMER, User and Token Administration:::AMER, Reporting:::AMER), WatchGuard Cloud Platform:::AMER (Web UI Login:::AMER, Account Administration:::AMER, Operator Administration:::AMER, Inventory Administration:::AMER), WatchGuard Cloud Visibility:::AMER (Log Search and Log Manager:::AMER, Dashboards and Reports:::AMER), WatchGuard Cloud Platform:::EMEA (Web UI Login:::EMEA), WatchGuard Cloud Platform:::APAC (Web UI Login:::APAC), Firebox Management in WatchGuard Cloud:::AMER (Firebox Configuration:::AMER), and Access Point Management in WatchGuard Cloud:::AMER (Access Point Configuration:::AMER).