Search Pass4Sure

ITIL Problem vs Incident Management Exam

Master the ITIL 4 incident vs problem management distinction for your exam: definitions, lifecycle, known errors, workarounds, and how both practices interact.

ITIL Problem vs Incident Management Exam

What is the difference between incident and problem management in ITIL 4?

In ITIL 4, Incident Management focuses on restoring normal service operation as quickly as possible following an unplanned interruption, minimizing the adverse impact on the business. Problem Management focuses on identifying the root causes of incidents and eliminating them to prevent future occurrences or reduce their impact. A workaround resolves an incident without fixing the underlying problem; a known error is a problem with a documented workaround. The ITIL 4 Foundation exam tests this distinction in approximately 10-15% of questions, and both practices have dedicated Practitioner-level exams with 30 questions, 60-minute format, and 70% pass mark.


No distinction in ITIL 4 generates more exam failures than the boundary between incident management and problem management. The two practices are closely related, interact constantly in real IT environments, and use terminology that candidates frequently confuse. Yet the examiners know this and deliberately construct distractors that exploit the confusion.

Mastering this distinction means more than memorizing definitions. It means understanding why each practice exists, what activities belong exclusively to each, where they interact, and how to categorize real-world IT scenarios correctly under exam conditions. This guide covers every testable aspect of both practices across Foundation, CDS, and their respective practice module exams.


The Core Distinction: Speed vs. Understanding

The most powerful conceptual anchor for this distinction is the difference in primary objective:

  • Incident Management is optimized for speed. Its success metric is how quickly normal service is restored. Using a workaround to restore service faster -- even if it does not fix the underlying issue -- is entirely correct incident management behavior.
  • Problem Management is optimized for understanding. Its success metric is whether root causes are identified and eliminated (or their impact reduced). It operates on a slower timeline because investigation requires thoroughness.

This speed vs. understanding distinction explains why the same event can trigger activities in both practices simultaneously. An incident occurs; incident management responds to restore service. While incident management works on restoration, problem management begins (or continues) investigating why the incident occurred.

"The service desk's job when an incident arrives is not to understand it -- it is to resolve it. Understanding is problem management's job. Conflating the two is how organizations end up with analysts who spend an hour investigating a root cause when the correct action is to apply the known workaround and restore service in two minutes." -- Stuart Rance, ITIL 4 author and consultant


ITIL 4 Definitions: Precision Matters

Incident

An incident is an unplanned interruption to a service or reduction in the quality of a service.

Key elements of this definition:

  • Unplanned -- a scheduled maintenance window that takes a service offline is not an incident (it is a planned outage)
  • Interruption -- the service is unavailable or degraded
  • OR reduction in quality -- the service is available but performing below agreed levels (for example, slower than normal)

Problem

A problem is a cause, or potential cause, of one or more incidents.

Key elements:

  • Cause -- problems are root causes, not symptoms
  • Potential cause -- problems can be identified proactively, before incidents occur (proactive problem management)
  • One or more incidents -- a single problem can generate multiple incidents; tracking this linkage is part of Problem Management's value

Known Error

A known error is a problem that has been analyzed and has not been resolved.

Key elements:

  • A known error is a type of problem record -- not a separate object
  • It has been analyzed -- the root cause is understood (or partially understood)
  • It has not been resolved -- the underlying cause remains present
  • Known errors typically have documented workarounds that Incident Management can use

Workaround

A workaround is a solution that reduces or eliminates the impact of an incident or problem for which a full resolution is not yet available.

Key exam application: A workaround can close an incident (the user's service is restored) without closing the underlying problem. A workaround does not resolve a problem -- it manages its impact. This is the most tested concept in the problem vs. incident distinction.


Practice Comparison Table

Dimension Incident Management Problem Management
Purpose Restore service quickly Eliminate root causes
Primary metric Mean Time to Restore (MTTR) Problem resolution rate; reduction in incident recurrence
Time horizon Immediate / short-term Medium to long-term
Success criteria Service restored to agreed levels Root cause identified and eliminated or workaround documented
Workaround usage Uses workarounds to restore service Creates and validates workarounds
Relationship to incidents Responds to incidents Analyzes patterns across incidents
Who triggers it Service desk, users, monitoring Incident patterns, trend analysis, proactive review
Output artifacts Incident records, resolution notes Problem records, known error records, workarounds

The Incident Management Practice in Detail

The Incident Lifecycle

Every incident follows a consistent lifecycle that Incident Management governs:

  1. Detection -- incident identified via user report, monitoring alert, or service desk contact
  2. Logging -- incident recorded with full details: time, user, service affected, symptoms
  3. Classification -- category assigned (what type of incident?) and priority set (how urgent/impactful?)
  4. Initial diagnosis -- attempt to identify a quick resolution or known workaround
  5. Escalation (if needed) -- functional escalation to specialist team; hierarchical escalation to management if impact warrants
  6. Investigation and diagnosis -- deeper analysis if initial diagnosis fails
  7. Resolution and recovery -- apply fix or workaround; verify service restored
  8. Closure -- confirm with user; record lessons learned; close record

Incident Priority

Priority determines the order in which incidents are handled. ITIL 4 sets priority based on two factors:

Urgency -- how quickly the incident needs to be resolved based on business impact over time. A broken payroll system on payday has extremely high urgency. The same system broken on a non-payday has lower urgency.

Impact -- the extent to which the incident affects the business. An outage affecting 1000 users has high impact. An outage affecting 1 user in a non-critical role has low impact.

Priority = f(urgency, impact)

Most organizations implement a priority matrix that assigns priority levels (P1-P4 or Critical/High/Medium/Low) based on combinations of urgency and impact scores.

Major Incidents

A major incident is a high-impact incident that requires a coordinated response. ITIL 4 recommends that organizations have a separate major incident management procedure that:

  • Assigns a dedicated major incident manager (often separate from normal incident managers)
  • Convenes a war room or bridge call with relevant technical and business stakeholders
  • Provides regular status communications to affected business users
  • Maintains an incident timeline for post-incident review

Post-incident reviews (sometimes called post-mortems in DevOps contexts) generate inputs to Problem Management. The major incident review identifies what happened, why it happened, how it was resolved, and what should change to prevent recurrence.


The Problem Management Practice in Detail

Reactive vs. Proactive Problem Management

Problem Management operates in two modes that the exam tests separately:

Reactive problem management responds to incidents that have already occurred. It analyzes incident records to identify root causes, documents known errors, and develops permanent solutions.

Proactive problem management identifies potential problems before incidents occur. It analyzes infrastructure, services, and processes to find weaknesses and address them before they generate incidents.

"Proactive problem management is where the real value lies, but it is also where most organizations under-invest. The reactive work is urgent and visible; the proactive work is important but invisible until something goes wrong. That tension is exactly what the DPI and HVIT exams explore at the strategic level." -- Kaimar Karu, former Head of ITIL at Axelos

Problem Records and Investigation

The Problem Management practice creates and manages problem records. A problem record documents:

  • The symptoms that triggered problem identification
  • The incidents linked to this problem
  • The current investigation status
  • Root cause analysis findings (as investigation progresses)
  • Workarounds available for affected incidents
  • Known error status (once the problem is analyzed)
  • Proposed permanent solution (once identified)
  • Resolution status

The transition from "problem record" to "known error record" occurs when the root cause has been identified sufficiently to document a workaround, even if the permanent solution is not yet available. This is a testable transition point.

Root Cause Analysis Techniques

ITIL 4 does not prescribe specific root cause analysis (RCA) techniques but references several that are widely used:

  • 5 Whys -- repeatedly asking "why?" to drill down from symptom to root cause
  • Fishbone / Ishikawa diagram -- visual mapping of potential causes across categories (people, process, tools, environment)
  • Fault tree analysis -- logical diagramming of failure paths from outcome back to causes
  • Timeline analysis -- chronological review of events leading to an incident

The exam does not test the mechanics of these techniques but does test whether candidates know that Problem Management is responsible for RCA and that RCA findings belong in the problem record.


How the Practices Interact

The relationship between Incident Management and Problem Management is tested through multi-practice scenarios. Key interaction patterns:

Pattern 1: Recurring Incidents Trigger Problem Management

A service desk receives five incidents in two weeks reporting that a specific application crashes for a specific type of user action. Incident Management resolves each individually using a workaround (restarting the application). The pattern of recurrence triggers problem management investigation.

Exam signal: When a scenario describes multiple similar incidents with the same workaround being applied repeatedly, the correct next action involves Problem Management -- either raising a problem record or escalating to the Problem Management team.

Pattern 2: Known Errors Support Incident Resolution

Problem Management documents a known error with a workaround. Incident Management uses this workaround to resolve future incidents faster. The first contact resolution rate for incidents related to this problem improves.

Exam signal: When a scenario asks what information from Problem Management most benefits Incident Management, the answer is known error records and documented workarounds.

Pattern 3: Major Incident Review Inputs Problem Management

A major incident is resolved. The post-incident review identifies that the root cause has not been found. The review creates a problem record for ongoing investigation by Problem Management.

Exam signal: Post-incident reviews are an Incident Management activity; the creation of the problem record for subsequent investigation is where Incident Management hands off to Problem Management.


Exam Tips: Avoiding the Most Common Mistakes

Mistake 1: Having Incident Management Investigate Root Cause

If an exam scenario shows a service desk analyst investigating the root cause of an incident, this is a Problem Management activity being performed incorrectly by Incident Management. The correct answer will involve raising a problem record or engaging the Problem Management team.

Mistake 2: Having Problem Management Restore Service

Problem Management investigates; it does not restore service. If a problem investigation team is deployed to fix a live outage, the outage resolution is Incident Management's responsibility. Problem Management works on preventing recurrence.

Mistake 3: Treating a Workaround as a Resolution

Applying a workaround to an incident resolves the incident. It does not resolve the underlying problem. The problem record remains open until either a permanent solution is implemented or the problem is accepted as a known error with a permanent workaround.

Mistake 4: Confusing Functional and Hierarchical Escalation

Functional escalation transfers an incident to a team with more technical expertise (second-line, third-line support). Hierarchical escalation informs management when impact or urgency warrants executive awareness. These are different mechanisms triggered by different conditions -- exam questions test the distinction.


Frequently Asked Questions

Can the same person manage both incidents and problems?

In small organizations, the same person may perform both Incident Management and Problem Management activities, but ITIL 4 recognizes that they have different objectives and time horizons, so performing them simultaneously on the same issue creates conflict. A person cannot optimally restore service as fast as possible (incident) while simultaneously conducting a thorough root cause investigation (problem). Best practice is to separate these activities either by person, team, or time. Many organizations have a combined "Incident and Problem Management" team but assign specific roles to specific activities.

What happens to incident records when a problem is identified?

When a problem record is created, related incident records are linked to it. This linkage provides the problem investigation team with the full dataset of affected incidents, helping identify patterns, scope of impact, and chronology. Linking incidents to problems is a key activity for both practices -- it benefits Problem Management (richer data) and Incident Management (when the problem is resolved, all linked incidents can be updated).

What is the difference between a problem record and a known error record?

Both are records within Problem Management, but they represent different stages of investigation. A problem record is created when a problem is identified but before root cause analysis is complete. A known error record represents a problem that has been analyzed -- the root cause is understood (at least partially) and a workaround has been documented. The transition from problem to known error occurs when sufficient analysis has been completed to document a reliable workaround, even if the permanent solution has not been implemented.


References

  1. Axelos. (2021). ITIL 4 Practice Guide: Incident Management. TSO.
  2. Axelos. (2021). ITIL 4 Practice Guide: Problem Management. TSO.
  3. Axelos. (2019). ITIL 4 Foundation: IT Service Management. TSO.
  4. Rance, S. (2020). Incident vs. Problem Management in ITIL 4. Stuart Rance Consulting Blog.
  5. Karu, K. (2020). Proactive Problem Management in ITIL 4. Axelos Publications.
  6. Kim, G., Humble, J., Debois, P., & Willis, J. (2016). The DevOps Handbook. IT Revolution Press.
  7. Agutter, C. (2023). ITIL 4 Practice Module Study Guides: Incident and Problem. ITSM Zone Ltd.
  8. PeopleCert. (2024). ITIL 4 Practitioner: Incident Management Sample Papers. PeopleCert Ltd.
  9. PeopleCert. (2024). ITIL 4 Practitioner: Problem Management Sample Papers. PeopleCert Ltd.
  10. HDI. (2022). The ROI of Problem Management. UBM Technology Group Research Report.