Across the US this past Friday, Amazon Alexa’s usual blue ring turned white, indicating that the service was not available. Stemming from a large storm that rattled the Northeast, Amazon’s latest outage affected Alexa and other businesses including Atlassian, MongoDB, and Zillow.
In Friday’s case, AWS experienced a power outage that occurred within their “Direct Connect” and Network connectivity service offerings. Direct Connect allows a company’s data center networks to directly connect to AWS infrastructure, providing faster and often more cost effective methods to deploy a hybrid cloud scenario.
This outage isn’t new or unique. All public clouds have experienced some levels of outages across different offerings. So with 93% of organizations using cloud services with one form or another, how do organizations minimize the impact of network outages?
1. Do not treat the cloud like a data center
This outage provides a moment to reflect - the cloud is not just another data center. It is a large toolbox full that can be immensely useful when utilized appropriately. This toolbox is not a full solution on its own, you must build your solution with the provided tools. In order for the cloud to provide continuity for your business during disaster scenarios, a strategy must be carefully thought out and implemented by experienced teams.
2. Plan for redundancy
Outages to one CSP typically minimally affect another. Dev9 often recommend the use of a second cloud service provider (CSP) on critical systems. In this outage, Microsoft Azure saw only a very minor service degradation in a logging service while Google Cloud did not experience any outage.
Best practice necessitates the identification of critical systems and components within a solution, and to provide a mitigation plan. These critical systems can be deployed as redundant systems within multiple regions of a CSP or on separate cloud provider.
Redundancy and high-availability is one of the primary drivers for adopting a public cloud. Utilize these benefits. In some cases, leveraging experience cloud specialists to follow the CSP’s best practices will be needed.
3. Make and practice disaster recovery plans
Multi-regional infrastructure solutions are becoming more common to businesses, but the plan typically only covers data and compute. Mitigating wide-scale outages involves evaluating all aspects of your cloud infrastructure looking for any single points of failure, such as Direct Connect, VPNs and stateful applications.
Remember to make time to practice disaster recovery plans to ensure business continuity.
4. Assess your cloud maturity
Take some time to conduct a Cloud Readiness assessment. This is important to determine weakness areas within your infrastructure and to formulate an effective plan to keep your business operational when outages occur.