can service level management reduce the staggering cost of it failure?

Computer Audit Update July 1994

CAN SERVICE LEVEL MANAGEMENT REDUCE THE STAGGERING COST OF IT FAILURE?

Ken Thompson

A major study, sponsored by the DTI a few years back, estimated that organizations in the UK were wasting around 20% of their annual revenues in 'IT failure costs'. Thus the typical Times Top 1000 blue-chip organization in the UK could be burning the staggering amount of over £200 million per annum due to inadequacies in its IT/IS provision.

IT failure costs can be split into two types - - those which result in additional costs to the internal IS/IT function, such as a higher level of maintenance spend than planned, and those which directly cost the business money. The latter are, of course, much harder to quantify than the internal IS/IT losses, but also dwarf the former in their sheer scale. For example, if a new application, cost-justified on the basis of saving the business £1 million per annum, has its implementation delayed six months due to failures in the software development service, then the business has lost half a million pounds. If poor system response t imes on the same organization's Order Processing system, caused by inadequate capacity planning, delay by a year

the introduction of a new telesales service aimed at generating a margin of £2 million per annum, then the business has lost £2 million. It is not difficult to see how these consequentialbusiness losses can soon approach 20% of an organization's revenue.

Statistics like this have pushed many organizations towards high-profile, high-capital, high-cost disciplines such as CASE. A number of other organizations have turned to a discipline with a lower profile and cost base - - Service Level Management.

What is Service Level Management?

Service Level Management is the discipline of the service provider managing the critical elements of the service it provides to its customer. This involves four main aspects.

• Defining the nature of the service and its users.

Setting targets for various indicators covering the most critical aspects of the service.

Monitoring the service against these targets and taking agreed actions when these targets are not met.

• Planning for how the level of service may be maintained and improved in the face of

Scenario 1 Scenario 2

Scenario 3

Figure 1: Various SLA scenarios.

8 ©1994 Elsevier Science Ltd

July 1994 Computer Audit Update

growth in its usage and any other factors (e.g. other services) which would impact its performance.

These aspects are normally formalized in a document known as a Service LevelAgreement or SLA. Figure I shows three typical situations where SLAs might be employed.

Scenario 1 shows a simple situation where a business user has a single SLA with its internal IT department covering all aspects of the development, maintenance and operation of its IS serv ice - - the SLA in th is case is non-contractual. Scenario 2 is where the IS/IT provision has been contracted out to an external provider-- in this case the SLA should be closely tied in with the legal contract and should include penalties on the provider if the service provision falls below a certain defined level.

Scenario 3 is a more complex, but entirely reasonable, situation which could happen when the development and operations functions within an organization are structurally independent. In this case the business user has agreed one SLA with its operations department for systems operations (availability, reliability..) and a smaller one with its development department for development and maintenance (delivery dates, fault turnaround..). This situation clearly needs to be carefully planned and will inevitably require some form of 'SLA' between the development department and the operations department. For example, if the development department has agreed to a one week response on urgent bug fixes to its users it would need to have an agreement with the operations help desk on the response times (say one day) for such faults to be reported to it. This third scenario could also be further complicated by the fact that the operations department might, in fact, be an external Facilities Management organization from whom the business user is buying 'operations time' and the internal development department is buying 'development time'.

The agreement aspect of the SLA is very important - - it is not only the service provider which makes promises! The service provider promises to provide the service to the agreed levels only if the user agrees to restrict their

usage of the service to the agreed limits of demand. At a very simple level the service provider might only commit to a one second maximum response time on a given application transaction for a maximum of three concurrent users.

So what are the benefits of Service Level Management?

CCTA, in their publication on Service Level Management (part of the CCTA IT Infrastructure Library published by HMSO), gives some guidance on the types and the extent of the benef i ts achievable using Service Level Management . CCTA pred ic ts that most organizations will be able to calculate a net financial advantage after the benefits are estimated and any extra costs, essentially staff and tool related, are deducted.

Some of the main types of benefits identified revolve around stabil izing the relationship between the service providing organization and the user and include:

Making a senior manager in the service organization accountable for provision of a well defined level of service attainment which can be easi ly measured enabl ing the customer (user) to balance the level of service with the cost of service thus focusing on their real needs and not just their 'wants' , facilitating sustainable service improvement programmes. This leads to enhanced user productivity allowing a better basis for provider-user dispute resolution due to objective service measures and well-defined escalation procedure. This provides the service organization with a clearer, more predictable profile of future user demand levels to facilitate a consistent service provision through better planning.

A further significant benefit, identified by the CCTA, is that SLAs provide an 'arms length' relationship between the users and the IS/IT service providers. Such a relationship will obviously ease the path to future Facilities Management arrangements should this be required.

@1994 Elsevier Science Ltd 9

Computer Audit Update July 1994

1- Service Definition

2 - Service Levels 2,1 Service Level Targets 2.2 Constraints & Exceptions 2.3 User Limits & Charging 2.4 Growth Assumptions

3 - Supporting Procedures

4 - SLA Review Arrangements

Figure 2: Main parts of a Service Level Agreement (SLA).

What does an SLA typically contain?

Figure 2 is a simplified diagram of the four main parts of an SLA (the CCTA publication includes a much more detailed skeleton which can be used as a generic start-point for SLA definition).

Part 1 of the SLA should define the 'Service' or Services offered by the Service Provider to the Customer or Customers. This could be a specific such as a particular business system used by only one set of customers or a general service, such as E-mail, offered to all customers or some combination of the two.

Part 2 of the SLA is mostly concerned with the critical aspects of the service(s) defined in part 1 and the targets which need to be achieved for these. However, as well as defining the usage and growth limits that have been assumed in estimating these targets, it is also necessary to specify any constraints and exceptions to these targets due to things like regular housekeeping and known, planned service changes.

Part 3 of the SLA defines the supporting procedures that will need to be put in place to monitor, report and address service failures. For example, there needs to be an agreed procedure for classifying the most important service failures and ensuring they are quickly addressed and if necessary escalated to higher levels of management if not resolved within a certain time-frame.

Finally, part 4 of the SLA should state the

period which the SLA covers and should maintain a log of any changes to the SLA during this period. It should also define the procedure and timescales for periodic (and exceptional) reviews of the entire SLA. (An exceptional review of the whole SLA might be necessary if an unprecedented level of service failures occurred within a specified period indicating fundamental problems in sustaining the agreed service levels.)

So how do you introduce Service Level Management?

The key message would seem to be... gradually!

Organizations seeking to implement service level management hit the same problem that any organization faces when they try to implement a programme based on quantitative measures. In a nutshell, "You can't specify targets until you know how you are doing and you can't know how you are doing until you have had a measurement programme running for a number of months".

One of the first things an organization must do after it defines its first-cut SLA is to put procedures in place to measure current performance against its chosen key service indicators. This is the place where the whole programme is at its most vulnerable as to measure these indicators it will probably be necessary to make some form of investment in new tools, procedures and staff time for monitoring. The CCTA publication indicates that although some monitoring tool support does exist it is not always at the right level to link specific service indicator performance with specific services and not just the overall service across the user base as a whole.

As a result of this three to six month monitoring exercise, the organization usually finds that its initial targets were too high and the SLA needs to be revised accordingly. One reason for this is that the SLA may not have taken into account the constraints imposed by the 'under-pinning services'. This is similar to the example given earlier when 'fault f ixing' performance by the development department depended on the 'fault reporting' procedures under the control of the operations department.

10 ©1994 Elsevier Science Ltd

July 1994 Computer Audit Update

Thus, it is critical that for each service ind ica to r any under -p inn ing serv ice dependencies (e.g. communications network bandwidth) are identified and if found to be constraining appropriate action taken. This could involve reducing the service level targets or investing in the underpinning service to support the required service level. In the latter case, costs will be incurred and decisions need to be made on (a) are they justified (i.e. does the user need this service level) and (b) how should the costs be distributed if the under-pinning service is used by a number of different groups of users? Charging (actual or nominal) is a very important issue in service level management and a properly constructed SLA will use charging to assist in managing user demand by, for example, charging less for off-peak service usage.

For Service Level Management to work a number of organizational procedures and roles are required. A number of these have already been identified in Figure 2 under part 3 of the SLA. Obviously, the head of a computer services function has overall responsibil i ty for the management of all service levels in his organization. In some organizations it is usual for a full-time or part-time 'Service Level Manager' to be appointed reporting directly to the head of the IT Service function. Large computer service organizations may also require dedicated Availability Managers, Capacity Managers and Change Managers.

Users experience of Service Level Management

I asked a member of the CCTA IT Infrastructure Library development team how extensive Service Level Management is. Their view is that there has been considerable interest in Service Level Management in the public sector for at least the last five years. The pressures of market testing and compulsory competitive tendering have added more recent impetus. Interestingly, the CCTA IT Infrastructure Library sells equally in both the public and private sectors showing that considerable interest exists

commercially as well. CCTA stress, however, that Service Level Management is no 'quick fix' and needs strong management support to enable the necessary investment in the supporting infrastructure and processes. "It is important for organizations to spend time determining the correct s t ruc tu re of IT serv ices and corresponding SLAs. The level of service must also be realistic. It is very easy for the Service Management function to lose credibi l i ty, especially if the levels are pitched too low. Once lost this credibility is very difficult to regain."

The Problem Manager from a major UK public sector user of SLAs (I can't name the organization or the member of staff!) working in an 1SO9001 environment echoes the CCTA warning. "Often senior management do not understand the level of supporting processes, such as formal change control needed underneath SLAs to make them work. If we did it again we would put the supporting processes in first and the formal SLAs second - - we did it the other way round!" However, on the benefits side, our public sector was very positive. "We have now covered the bulk of our operational services with SLAs. This definitely gives the services a much higher profile and our users have the confidence that somebody is continual ly monitoring their service levels and will follow formalized procedures when things fall short".

Conclusions

It seems clear then that Service Level Management can bring substantial tangible and intangible benefits to a customer-service provider relationship provided it is planned and introduced thoroughly. A major Critical Success Factor for making it work will be the extent to which early and active user and management participation in the process is achieved. The users need to understand their responsibilities in terms of usage/demand, they need to be able to differentiate between what they need as opposed to what they want, and finally they must be rea l is t ic and pat ient in their in i t ia l performance/improvement expectations.

@1994 Elsevier Science Ltd 11

can service level management reduce the staggering cost of it failure?

Documents