Wednesday, November 9, 2011

Problem Management

Effective Problem management plays an important role in reducing the quantity and duration of service disruption in an organization. However IT Organization (ITO) leaders often confuse between Incident and Problem Management leading to more service disruptions for longer durations.

Incident management aims to restore service quickly by reducing the duration of disruptions. Problem management on the other hand aims to prevent service disruptions by discovering the causes (or potential causes) of disruptions and creating workarounds and permanent resolutions to them. If you do not have both, then you most likely have more user downtime, lower customer satisfaction, and higher costs than you should, all of which reduce ITO Return on Investment (ROI).
Problem management starts with incident management and while they are not the same thing, the two are quite interdependent.
The Dependency:

Incident Management is often like a “one shot” process that starts in response to a report of service disruption, and ends with the eventual restoration of the service. Its goal is restoration of service, and capturing information which can then be used by Problem Management. Problem Management on the other hand is an “always on” process, where in you continuously examine information from any source that has or could initiate an Incident Management cycle. Its goal is prevention.
Without Problem Management, ITO staff maybe faced with “fixing” the same issues repeatedly. Without Incident Management, ITO staff has limited data for analysis and cannot focus on Root Cause Analysis (RCA) and other prevention activities. One without the other usually results in more user downtime and steals valuable ITO resources away from efforts to add business value.
The improved efficiency achieved from Problem management can be used to free valuable resources that can be utilized for other business-aligned projects (innovation) thus making tangible contributions to the success of your firm.
Following discrete Incident and Problem Management processes decreases call volume and reduces outage duration. These improvements can shift the balance between innovation and “Keeping the Lights On” (KTLO) enough to produce visible improvements in ITO business contribution, while freeing resources for focus on adding value beyond basic operations.
Properly focusing on the objective of each (restoration vs. prevention) and making their goals a priority will ensure that this does not fall off in response to the day-to-day pressures of supporting users.

The following are some of the steps can be taken in an ITO to improve problem Management:
1.) Reassess your understanding of Incident and Problem Management.
Consider how you currently operate with respect to restoring and preventing. Understand the objectives of Incident and Problem Management and what it means to your firm and ITO.
2.) Assume that currently you combine restoration and prevention activities with little formal management over either.
In most ITOs, prevention efforts receive much lower priority than restoration efforts – even though prevention can reduce restoration efforts significantly. It is also common to fail to collect accurate information from each Incident Management cycle.
3.) Investigate current activities to validate your assumption findings.
Attempt to document the average duration and number of outages, user downtime if possible. Do you have management objectives of each? What percentage of effort do you expend on each? Is your team capturing the information required to reduce their workload? Are they actively working to reduce outage duration and number of outages?
4.) Assess the capabilities of your staff
Determine if there are 1) methods in place to record all Incident Management details, 2) any formal method for developing workarounds to speed resolutions, and 3) preventative activities such as trend analysis using Pareto Analysis (a.k.a. the “80/20” rule.) More mature ITOs include formal RCA techniques such as CFIA, Ishikawa, Kepner-Tregoe and others.
5.) Start a service improvement program that formalizes Incident and Problem Management as distinct processes.
Ensure Incident Management tracks and gathers information Problem Management will use for trend analysis – software support tools are critical for this. Ensure that you allocate enough time for trend analysis, and that there is a process for gathering industry information to identify potential disruptions

Organization for Problem Management:
Organizational structure plays a significant role in the success or failure of Problem Management. Correct organizational structure is critical to success, however firms probably should not reorganize to achieve it.
ITOs often confuse process and performance. Traditional ITO structures make little distinction between task descriptions (process) and task execution (performance). Research shows that often firms attempt to form two groups: Incident Management (triage) and Problem Management (root-cause analysis). The flaw in this approach can be increased outage duration, loss of organizational knowledge, damage to camaraderie, loss of management visibility, human resource issues, higher costs, and reduced communications.
A federated approach is often better than dedicated departments. One solution is a federated approach that leverages existing staff and capabilities as shared resources to form virtual organizations for resolving problems. This approach however requires defined roles, responsibilities, and a common oversight.

In summary, Problem Management is much more than simply resolving an outage. Its reactive and proactive tasks span multiple departments and even incorporate suppliers and vendors. Focusing first on Incident Management, and then Problem Management can lead to higher customer satisfaction through improved service quality, as it improves working conditions within your ITO.

Reference: Mastering Problem Management by Hank Marquis, Global Knowledge