Customers now believe in the new enterprise - that is, the enterprise that depends on LANs, WANs, intranets, large-scale systems, and even desktops. But believing in the new enterprise means that organisations trust their priceless corporate data to internal IT departments, which in turn trust their integrator partners. As a result, IT managers want service guarantees from integrators. Increasingly, these guarantees are delivered in the form of service-level agreements (SLAs), in which both customers and integrators can find refuge. Sergio Perez gives you the lowdown on what an SLA can do for you and your customersSLAs were initially introduced by telecommunications carriers as a means of establishing a contractual agreement between themselves and their customers. These service providers wanted to establish guidelines for delivering specified levels of availability and/or performance with measurable goals.
Lately, SLAs have had a wide variety of uses, in both internal and external IT functions. Internal SLAs are developed between MIS departments and business units within the same organisation. For example, the finance department can set up an SLA with an MIS department to ensure a specified application response time and system availability.
External SLAs are set up on behalf of an entire organisation. MIS departments will negotiate and establish the SLA with an outside service provider for the greater good of the entire organisation. A good example of an external SLA is an agreement between a telecommunications data-circuit provider and a customer for specified levels of data-circuit availability. The agreement is for the benefit of the organisation as a whole and should be developed between the MIS leaders, the organisation's legal staff, and the appropriate counterparts at the service provider. Conversely, an internal SLA should be created with team members representing the business unit, MIS departments, and the legal department.
Today it's the solutions integrator's turn. SLAs are being implemented to guarantee availability and performance for workstations, network servers (hardware), network operating systems, business applications, network communication devices (hubs, switches, and routers), and data and voice circuits. Think about it - you can add a variable of measurement to just about any operational function in the IT world. Help desk calls, onsite service calls, packet-delivery percentage, and even equipment procurement can all be measured by response time or time-to-resolution.
The idea is to use easily established performance metrics, a thorough scope of services, goals that are achievable, cost structures that are fair, and monitoring software to keep the provider in check.
The SLA is a fairly new animal for many solutions integrators. So how do you go about establishing one? First let's deal with the politics of SLAs.
Successful change managers realise that the most important ingredient for a successful project is executive management buy-in. Sell the top brass on why, how, and where the tools of an SLA can benefit the company. For example, correlate greater uptime with the effect it can have on increased revenues, which in turn should increase profitability.
Learn to quantify and qualify user expectations. Reasonable and obtainable user expectations are probably the most difficult ingredients to quantify. Don't just guarantee 99 per cent uptime or sub-second response time to user enquiries. Instead, be specific. For example, state that the effective throughput for your frame-relay network should be 95 per cent of your CIR (committed information rate). Remember that goals that are set too high can cost service providers their customers and might end up costing someone their job.
While you shouldn't make unreasonable guarantees, don't set expectations too low, either. Learn a customer's business and set service-level goals with a financial correlation. For example, specifically state that you will provide a call back within one hour and that, if needed, an engineer will be on site within four hours of the initial problem call.
Goals that are set too low are essentially meaningless. It is critical to set goals that give users satisfactory or, at least, improved performance. For instance, express that you will escalate a trouble ticket within a certain time period. Ideally, in such a case, you should establish codes to identify certain levels of concerns. If a concern arises at a certain level, escalate it immediately. Establishing a well-defined scope reduces the overall complexity of the SLA.
If you are dealing with carriers as part of the package, make sure that you take care of your customer's operational business hours differently to the way you would handle off-peak hours. State that the peak hours throughput must be at a certain level, and ensure that if your customer exceeds that level, the carrier will not simply discard your customer's packets. Some service providers require that anytime a defined environment variable changes, the customer should contact the provider within a certain time frame. This requirement leads us to the ever-pressing and desirable task of environment or system documentation and procedures. SLAs should include the points of contact and their responsibilities, plus the minimum and maximum time allowed for repairs. If you think that service providers won't hold you accountable for following specified procedures, think again.
The key from the end-user standpoint is to hold service providers liable for easily measurable variables such as uptime, efficiency, performance, and response times. Additionally, end-user responsibilities must be focused on and made explicitly available. Make customers tell you what your end of the bargain is. Of course we all know that they're interested in the dollars of the deal. However, in addition to defining monetary responsibilities, make them explicitly state what your responsibilities are and what is expected from your organisation.
How do you determine a fair cost for the SLA? How can you establish the cost of response time when this measurement currently does not exist? Begin by taking a look at how much business can be lost if response times are inadequate. For instance, what if a reservation agent doesn't receive adequate response time from an online reservation system? What if the agency's call capacity is diminished? Each reservation agent must be able to handle a certain number of calls per specified time period. The cost for maintaining this service will then correlate to the average number of calls handled per reservation agent. Your customer's business unit managers should be able to calculate the cost of downtime rather quickly. You should correlate service cost to response time, performance, or availability - or all three.
How does the customer measure your performance or the performance of a circuit provider? Today you and your customer have many tools available, ranging from circuit-management software to application-monitoring tools. While the market for SLA-type measuring tools is immature, you can expect that to change quickly.
Several software tools that measure network availability are available from companies such as Hewlett-Packard, Network Associates, Concord Communications, MicroMuse, and GenevaSoft. Network Associates' SLA Service Manager allows a company to measure the availability of any selected device. Concord Communications' Network Health Monitor allows a user to measure uptime of routers and carrier links. MicroMuse's Netcool products measure SLA compliance in a number of ways. Tools from Empirical Software and Nextpoint Networks can measure database or application performance.
SLA software is expanding to the point where even desktop performance and reliability can be easily measured. Fairly soon we'll be able to monitor and measure all environment activities from a centralised, Web-based platform.
Once the components are defined, it's fairly simple to establish the SLA document. The SLA should be viewed as a living document - it should be periodically reviewed and, if applicable, changed. Remember that this is a "reasonable" document, and it is reasonable to expect change.
Having an SLA is not a panacea. After signing the document, you might be tempted to forget about the rest of your customer's IT needs because the SLA seems to be all-inclusive. SLAs do not replace, for example, fault-tolerance planning or disaster recovery. Fortunately, technology failures rarely occur these days. Many problems can be traced to a lack of procedure or not following procedure. (Ask the large frame-relay carrier with a three-letter name that recently had a severe network outage due to not properly testing the effects of a software upgrade.)Solutions integrators must be the conduit through which business needs meet technology solutions. Your customers need you to help them innovate business techniques through the use of computers and networks, as well as the capability to manage both. By setting goals and agreements based on monetary risk and reward, you can focus on how technology can increase a company's earnings per share.
Do not allow an SLA to stifle your ingenuity or overcome your innovative thoughts with financial, quantitative, or qualitative measurements. All businesses are in flux in today's world. SLAs should be used as tools to enable and measure continuous performance and planning.
What to include in an SLA
Determine systems or applications priorities (what gets priority when multiple incidents occur).
Define mandatory, core, desirable, and support functions (what will occur when).
Establish user responsibilities, including: what software and hardware will and won't be supported; whom to contact when an incident occurs; and the establishment of an SLA point of contact.
Determine the roles of each party in providing applications development; support; database, network, and operations management; and vendor services.
Establish certification procedures for the implementation of new devices and systems.
Define standards and guidelines for all components.
Determine escalation procedures.
Determine re-evaluation or reassessment procedures.
Establish time frames for response, as well as the minimum and maximum time in which to repair problems.
Determine expected levels of both planned and unplanned outages.
Define expected performance and other characteristics during normal operations and during failure conditions.
Establish standards and guidelines for all components.
Determine resolutions for inadequate performance, specifying various alternatives.
Specify time frames in which to produce expected reports.