After Sydney suffered from extreme weather conditions on June 5, Amazon Web Services partners worked through the night with clients to resolve issues that arose after the Cloud provider endured a power-related outage.
At 4pm Sydney time, the company reported the power issue at its Sydney region datacentres delivering its EC2 and S3 services.
The company reported that connectivity issues for EC2, Elastic Load Balancing, ElastiCache, Redshift, RDS and Elastic Beanstalk still remained at 9pm.
According to Comunet chief executive, Mark Ogden, 100 of the company’s customers were affected and all clients, except for one, were resolved in three hours.
“Our engineers identified the issues soon after the outage happened and that allowed them to quickly investigate and advise their clients. We had one client that took longer than some of the others, but we were in constant interaction with them and were able to resolve business-critical challenges they experienced,” Ogden said.
Ogden said AWS worked collaboratively with Comunet to mitigate the issues.
“We are seeing this as a good learning opportunity with AWS and will make the necessary adjustments where required to avoid similar events in the future,” he said.
According to RXP Services technology strategist Cloud, identity and security, Alessandro Cardoso, multiple clients suffered as a result of not having a ‘backup plan’.
“Some of our customers have more than one Cloud provider and some have hybrid so this helps alot. Many customers when they think about Cloud, they think of cost-savings and not enough about proper planning and design and that it was happened yesterday. Those customers didn’t have a second plan,” Cardoso said.
In recognising that this circumstance could happen to any major public Cloud provider, Cardoso said he advised his customers to understand that a Cloud provider does not have high availability by default.
Strut Digital chief executive, Zack Levy, told ARN that his engineers were restoring services until 3:30am.
However, he said he still trusts his own and his customers solutions to be hosted on AWS.
“Purely because of the sheer volume of customers the Cloud provider has, you can imagine if they have an interruption, the pressure is so great to get that result, so the smallest customer becomes just as important as their biggest customer," Levy said.
"Could me and my team have slept some more hours last night? Absolutely. But everything was online so we got the updates regularly as they progressed. At least they’re transparent enough for their customers and partners to go onto the website and see that they experienced problems in real time.”
In line with Cardoso, Levy added that the best piece of advice that Strut Digital gives its customers - especially from now on, is to always build solutions assuming something will fail, whether its a single server or a complete Amazon availability zone.
“If you make that investment,in incidents like yesterday, you are way less likely to be impacted,” he said.
Bulletproof director of sales and marketing, Mark Randall, said the outage highlighted the importance of a highly available solution.
“Our technical support team were immediately alerted to the issue that occurred overnight and we were able to mitigate the impact to our customers, minimising disruption and returning them to full operation as quickly as possible. Customers appreciated the transparency and communication provided throughout," he added.