How Vodafone standardised its monitoring around Splunk
- 02 July, 2018 13:00
Vodafone is rapidly centralising its IT monitoring and event management globally around Splunk tools, enabling IT operations teams to get better uptime from mission critical applications and start to leverage more machine learning capabilities to avoid incidents.
Speaking at Splunk Live in London, Luke Bradley, senior manager of engineering and operations for Technology Shared Services at Vodafone, said: "IT is increasingly becoming more and more the mechanism by which we differentiate ourselves within the market.
"Being a telco is being a telco, the services you can offer over the top of that is what makes the difference.
"We are very much on our digital transformation journey, this introduces an additional set of requirements to really put analytics and data at the centre of what we do."
The technology shared services group is an internal division of Vodafone group which provides IT services globally, such as service desk, infrastructure management and application operations, across 26 geographies with 8,500 employees alone.
Over the past four years Vodafone has been standardising its infrastructure and application performance management (APM) monitoring around Splunk tools, including all event management onto a single IT service management (ITSM) platform.
"So we have been trying to create a single operational analytics platform that sits on top of all of this stuff," Bradley said, regardless of user group, geography or use case. "We are really trying to get to the point where we have a single store of operational data."
That re-architected monitoring programme now covers 40,000 servers and 3,500 applications, with 430TB of data capacity, but also saw Vodafone flip the way it approaches monitoring.
"We have traditionally taken a bottom up approach to monitoring, as most organisations have - the operations team has traditionally struggled to map issues to something of real significance - so as part of that monitoring transformation we have turned things on their head," Bradley added.
"We are taking a top-down view of business services for all markets, we are standardising the concept of a business service across those countries, so the act of visiting a mobile phone shop in Dusseldorf is logically the same as in Dublin."
ITSI and machine learning
That single ITSM platform is now monitored using Splunk’s IT Service Intelligence – ITSI – so that the operation team has full visibility over the project.
Talking in more detail about the ITSM platform, Stefan Ciobanu, product owner for analytics and big data solutions at Vodafone, said: "We are now running one of the largest IT service management platforms."
Vodafone's ITSM platform has around 13,000 daily users generating 2,000 tickets per day.
"So this platform can’t go down," Ciobanu said. "Having this level of tickets on a global scale you have to have a solution to monitor it perfectly and ensure uptime is 99.99 per cent minimum.
"This empowered operations to predict downtime and make sure the platform is running at higher capacity. By doing so we gained our objective of a higher availability platform, so giving operations teams the correct vision inside all of our services.
"By using the predictive solutions in ITSI we managed to do preventative maintenance and helped avoid incidents."
Vodafone is now looking to roll ITSI out onto other mission-critical services.
Now that Vodafone has this single monitoring capability, it is looking to start leveraging more machine learning for smarter and more predictive alerting and issue resolution.
"With the machine learning toolkit on Splunk we are building a community of data analysts to deploy more and more predictive maintenance on our tools and applications," Ciobanu said.
(By Scott Carey, Computerworld UK)