Use Incidents and Incident Management
Learn about what "incidents" are and how incident management can help you. An incident is a single event, disturbance or query that affects the quality of a service to a customer. When a customer is affected by an incident which is then resolved, the service for that customer is restored to normal levels. Incidents can be logged by technicians, and administrators can allow customers to submit incidents as well using the Customer Portal. Because incident management focuses on getting the customer back on track as quickly as possible, fixes for incidents are often "band-aid" fixes and do not always allow the underlying root cause to be further explored and resolved.
Topics in this article:
Objective: To track and manage the development of bug fixes created to resolve specific customer issues
ITIL formally defines the primary goal of the incident management process to be restoring normal service operation as quickly as possible after a disruption, and minimizing the impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. In the simplest terms, incident management is all about dealing with things that went wrong and getting your customers up and running (and keeping them there) by whatever means necessary. Naturally, incident management is where most organizations tend to start out with their service management. Managing incidents means assigning the right priority and deadline to all incidents, enabling open communication with the right people, and understanding each incident's context and what and who it affects, as well as accurately measure the service-level agreement compliance and understand incident hotspots (i.e., problems).
Learn how to create incidents (either one at a time or by importing multiple simultaneously), as well as resolve and close incidents. Administrators can configure incident settings to suit the organization's needs. The settings for incidents can be managed on a per-service basis, as detailed in the following related articles:
- Set Priorities and SLAs (Incidents)
- Set Types
- Set Categories
- Set Statuses
- Set Default Field Selections for New Incidents
- Add Custom Fields
- Set Up Triggers
- Solicit Feedback (Incidents)
Managing incidents entails logging issues and tracking them through initial submission through final resolution. By tracking incidents, Service Desk allows you to accurately measure your Service Level Agreement (SLA) compliancy and understand your incident hotspots and their wider affects. Using incident management includes ensuring the following things for each incident:
- The right priority level
- An appropriate deadline for resolving incidents
- Open communication with the right people
- Understanding each incident's context, as well as what and who it affects
When first setting up Service Desk, it's best to set up a request process that all members can follow. Try managing requests by implementing the following phases 1 at a time for each and every request: record, respond and report.
1. Record – Get staff to consistently record each and every request.
First, provide a single point of contact (either an individual or a group of individuals who act as a service desk) that has multiple regulated channels of access (i.e., email, web). Discourage users from contacting specific people when they encounter a problem, and instead steer them toward using the channels of access for the service desk so that each and every request is properly recorded. In addition, try only recognizing support work that is recorded through Service Desk so that support individuals also encourage users to go through the standardized channels of access.
When done correctly, this ensures that records of all requests are logged as incidents in Service Desk. Record all interactions with your users – whether by phone, email, Twitter or accosted in the hallway – as comments against the related incident record in Service Desk. All responses and work should be logged and tracked, and incidents should be properly closed (or documented if abandoned). The more consistent all agents are with this process, the better; it is up to management to ensure they stay on track.
2. Respond – Get organized at managing the response to the requests.
Once you have good records of the requests you are dealing with, start getting smart about how you handle them. Try employing the following best practices:
- Make sure each request has an assignee who owns it, ensuring that no requests get lost in the backlog.
- Match requests to others to see what works and to recognize patterns in issues (i.e., problems).
- Build up recorded information on the services and systems. Give responding staff access to information and training (i.e., knowledge articles).
- Use external information: use search engines, get training, get involved in communities and consult agents.
- Develop and provide models or scripts for how to deal with commonly submitted requests.
- Use Service Desk to pass incident ownership to others rather than email, which is a good way for issues to be lost in crowded inboxes. If you need to pass it to someone outside your organization, try including them in your Service Desk account to ensure you can continue to manage and record what they are doing for you so everybody can see it in one place.
- Regularly monitor how long requests are taking and follow up on the slow ones.
3. Report – Use the data to improve.
All those incident records create a system of safeguards that prevent you and your organization from dropping the ball on any request. While keeping your organized and prioritized is the main goal, another big payoff is the ability to analyze the data to look for trends. What areas generate the most questions (training may be required)? Which users complain the most? Which services have been flakiest? Which staff members are extra-efficient (learn from them) or not so efficient (help them improve)? Using reports to analyze trends allows you to answer those questions and more, which provides you with the ability to be proactive about issues and improve your service.
While all 3 of the steps above are crucial to maintaining a well-managed service, the second step – respond – is typically the most time-consuming and complex part. Below, see some tips on breaking down the system of response even further.
It always helps to categorize all incidents, whether they are general requests or issues to be fixed. But when an incident requires something to be fixed, categorization is particularly important in getting a general idea of what type of incident it is so agents can determine how serious it is and how wide and severe the impact is. This helps ensure that incidents are assigned to the right person the first time and prevents time from being wasted while issues float around from agent to agent before being addressed.
This is where keeping records of past fixes and building up the knowledge base can really pay off. Searching incident, problem and knowledge article records for fixes to similar incidents in the past can cut down on valuable support time. If the issue has been fixed before, it can either be fixed in the same way or passed to the agent who did so. This is called level 1 support.
If there isn't a match in past records or the issue can't be fixed, it can then be passed to level 2 support: those who have the technical skills to do specialist diagnosis and resolution. If they can’t fix things, they refer the incident to level 3 support: the individuals who built or supplied the features that are not working, which might be a third-party supplier.
All this passing around amongst support groups is referred to as Functional Escalation. However, most people think of Hierarchical Escalation (i.e. telling somebody more senior) when discussing escalation. Hierarchical Escalation can be necessary in the following situations:
- The incident impact is serious enough that higher-ups should know about it.
- A fix can’t be found for the incident.
- Someone is not responding fast or well enough in relation to the severity of the incident.
That senior individual might make the call that this is a major incident. In that case, the normal process described here is often dropped and a crisis-response process might be adopted instead.
When somebody is not getting the service they expect, the incident process must focus on restoring that service. Sometimes that is not the same thing as fixing the underlying problem. If the underlying problem needs to be fixed in order to return the service to normal then that is what must be done; however, often there is a workaround that can be employed that will return normal behavior for users without fixing anything. For example, with some software simply logging off and on again or rebooting a server may get a user around an issue and working again.
Eventually a problem may cause so many incidents that the user must wait while an issue is properly diagnosed and solved once and for all. It is management's call whether the inconvenience is outweighed by the ongoing cost of recurring incidents. But in general, the incident process takes whatever workarounds or temporary fixes it can to get service restored to the user as quickly as possible.
This applies to all incidents. Before a ticket is closed, ensure that the following is done:
- Inform the reporter that the fix has been completed.
- Ensure that the reporter agrees that is it fixed and are satisfied with the outcome.
- Properly categorize the incident so that reporting data is useful.
The incident has a record of everything that happened and what workaround or fix you used. In the future, you or one of your colleagues may be grateful you wrote it down.