The ANAO also recommended that the ATO determine the level of availability of services associated with IT systems to include in service standards and subsequently report performance against those standards.
For its part, the ATO has agreed to all three recommendations, with the agency’s CIO, Ramez Katf saying that a dedicated program of work to enhance the ATO’s IT systems’ resilience, performance and stability was already underway.
“We will focus on improving our IT design and governance, further strengthening our cyber security posture and improving the technology used by ATO staff to ensure they have the right tools to do their job,” he said.
SAN migration project
The ongoing program comes after the ATO approved a new storage strategy in February 2017 that was proposed by DXC Technology.
According to the ANAO report, the strategy proposed by DXC involved the migration of all data off the failed storage array, and the replacement of the storage network devices with new XP7 storage components at both the agency’s Sydney Data Centre and the Western Sydney Data Centre.
The work to replace the previous 3PAR SAN was carried out in the ATO’s Rebuild Program under the so-called SAN migration project, and included replacing the damaged disk drives, replacing all optical cables, updating the firmware and independent testing.
The ANAO said that DXC Technology decommissioned the failed 3PAR SAN supporting the production environment by July 2017. The SAN migration was completed as a phased approach once the new XP7 SAN was installed. A final report and certification was issued in June 2017.
As previously reported, the defective 3PAR SAN was then sent to HPE laboratories in the United States for forensic analysis into the root cause of the failed storage drives.
A report from DXC Technology is expected in early 2018.
“The SAN Migration project was to ‘stand up’ dual XP7 SANs in the Sydney and the Western Sydney Data Centres,” the report stated.
“The storage environment includes replicated storage arrays across data centres, a feature absent from the ‘original’ 3PAR SAN-supported environment. The updated storage configuration will be monitored for capacity and performance,” it said.
The ANAO also said that, according to DXC Technology, the dual XP7 storage configuration should provide better performance than the prior infrastructure.
Specifically, in the event a storage drive may be failing, it is designed to not lock the troubled drive, but instead use one of the spare storage drives to rebuild itself.
“The storage array will continue to write to the disk until it is completely rebuilt on the spare drive and then lock out the failing drive,” the ANAO said.
"This storage configuration provides the ATO with the capability to run its [IT] systems even if two storage disks fail simultaneously."
This feature is understandably seen as an important factor given that the system recovery tools used to restore IT services – data management, system monitoring and backup/restore – were in the same data centre on the affected SAN that went down in December 2016.
The system failure meant that those tools were unavailable, and that there were no backup or redundant system recovery tools available on other IT systems to detect and analyse the incident, or to support efforts to recover and restore services.
The ATO said it has since been working to ensure its IT service continuity management has focused on IT technical architecture and design and operational risk management to strengthen the identification and treatment of risks to critical IT infrastructure that may lead to system failures.