Software error complicates Amazon's data center recovery
- 09 August, 2011 19:07
Amazon Web Services' efforts to restore service following a power outage at its Dublin data center were complicated further on Monday by an error in the EBS (Elastic Block Storage) software, the company said.
As a result of the software error, the EBS snapshot management system incorrectly thought some of the blocks were no longer being used and deleted them, Amazon wrote on its Service Health Dashboard at 3:11 PM PDT.
The company has addressed the error to prevent it from recurring and also disabled all of the snapshots that contain these missing blocks. Amazon will send e-mails to affected customers as soon as it has a new copy of their snapshots available, which can then be used to recover the data, it said.
At 10:01 PM PDT, Amazon said that it had recovered all of the EBS volumes and EC2 (Elastic Compute Cloud) instances that it was able to verify were fully consistent at the time of the power outage.
However, the company was unable to verify whether or not there were any in-flight writes that did not get consistently saved to some EBS volumes. To remedy that, it has started creating recovery snapshots for the affected volumes. As they become available, the snapshots will be added to users' accounts, according to Amazon. The process will be time consuming and may take up to 24 hours to fully complete, it said.
The availability problems started after lightening struck a transformer, sparking an explosion and fire which caused the power outage at 10:41 AM PDT on Sunday.
The last few days have not been very merry for Amazon Web Services. Besides the issues in Dublin, Amazon also suffered a brief outage in the U.S. as a result of network connectivity issues.
Send news tips and comments to firstname.lastname@example.org