What is the difference between MTTR, MTTD, and MTTA?

MTTD (Mean Time to Detect) measures how long it takes monitoring to identify an issue after it occurs. MTTA (Mean Time to Acknowledge) measures how long until someone confirms they're responding to the alert. MTTR (Mean Time to Resolution) measures the full time from detection to resolution. Together, they reveal different bottlenecks in incident response.

Which incident metric is most important?

It depends on your bottleneck. If incidents go undetected for too long, focus on MTTD. If alerts go unacknowledged, improve MTTA. If detection and response are fast but resolution is slow, work on MTTR. Track all three to identify where your process breaks down.

How is MTTR calculated?

MTTR is the sum of all resolution times divided by the number of incidents. Resolution time starts when monitoring detects the issue and ends when the service is restored. For example, if you had 10 incidents with a total resolution time of 300 minutes, your MTTR is 30 minutes.

What's a good MTTR target?

Good MTTR depends on your service criticality, SLAs, and incident complexity. Consumer-facing services might target under 15 minutes for SEV-1 incidents. Internal tools might accept 1-2 hours. Focus on improvement trends rather than arbitrary targets—reducing MTTR by 20% matters more than hitting a specific number.

MTTR vs MTTD vs MTTA: Incident Response Metrics Explained

Teams track MTTR religiously. Dashboards display it prominently. Leadership asks about it in every review. But many teams misunderstand what MTTR actually measures and miss the metrics that reveal where their incident response breaks down.

MTTR, MTTD, and MTTA capture different phases of incident response. Knowing which metric to focus on depends on where your bottlenecks exist.

The Incident Response Timeline

Every incident follows a predictable sequence of events. Understanding this timeline clarifies what each metric captures:

Issue occurs - Something breaks in your system
Detection - Monitoring identifies the problem (MTTD measures this)
Alert fires - Notification sent to on-call engineers
Acknowledgment - Someone confirms they are responding (MTTA measures this)
Investigation - Team diagnoses root cause
Resolution - Service restored (MTTR measures detection to resolution)

Each phase presents different challenges. Fast detection means nothing if alerts go unacknowledged. Quick acknowledgment does not help if investigation takes hours.

Mean Time to Detect (MTTD)

What it measures: The delay between when an issue begins and when your monitoring detects it.

MTTD reveals how quickly your monitoring identifies problems. Low MTTD means comprehensive monitoring catches issues before users notice. High MTTD means users report problems that your systems miss.

How to calculate:

MTTD = (Sum of detection times) / (Number of incidents)

Detection time starts when the issue actually occurs and ends when monitoring triggers an alert. If your API starts failing at 2:00 PM but your first alert fires at 2:12 PM, MTTD for that incident is 12 minutes.

Real-world example:

Your database becomes overloaded at 9:00 AM. Query performance degrades immediately, but your monitoring checks database connections every 5 minutes. The check at 9:04 AM succeeds because connections are still being accepted. The check at 9:09 AM detects slow queries and fires an alert. MTTD: 9 minutes.

What good looks like:

Critical services: Under 5 minutes
Important services: Under 15 minutes
Non-critical services: Under 30 minutes

Common detection delays:

Monitoring checks run too infrequently. A 5-minute check interval means MTTD cannot go below 5 minutes, and average detection takes 2.5 minutes just from timing.

Thresholds set too high. Alerting only when error rates exceed 50 percent means significant degradation happens before detection. Lower thresholds catch problems earlier.

Missing monitoring coverage. If you monitor server CPU but not database query performance, database issues go undetected until servers become overloaded.

Mean Time to Acknowledge (MTTA)

What it measures: The gap between alert notification and human acknowledgment.

MTTA tracks how quickly someone responds after monitoring detects an issue. Low MTTA indicates engineers receive and act on alerts promptly. High MTTA suggests alert fatigue, notification problems, or on-call coverage gaps.

How to calculate:

MTTA = (Sum of acknowledgment times) / (Number of incidents)

Acknowledgment time starts when the alert fires and ends when a responder acknowledges they are investigating. If an alert triggers at 3:00 AM but the on-call engineer acknowledges at 3:07 AM, MTTA is 7 minutes.

Real-world example:

Your monitoring detects elevated error rates at 11:30 PM on a Friday. The alert pages the on-call engineer. They are asleep, and their phone is on silent. The escalation policy triggers after 10 minutes, paging the secondary on-call. They acknowledge at 11:43 PM. MTTA: 13 minutes.

What good looks like:

Critical incidents: Under 5 minutes
High-priority incidents: Under 10 minutes
Medium-priority incidents: Under 20 minutes

Common acknowledgment delays:

Alert fatigue from false positives. When most alerts are noise, engineers ignore notifications. They check alerts on their own schedule rather than responding immediately.

Notification routing failures. Alerts sent to distribution lists where nobody feels ownership. Alerts sent to Slack channels during off-hours when engineers are offline. Paging systems misconfigured with incorrect contact information.

Unclear on-call schedules. Engineers unsure if they are on-call. Handoff times misaligned across teams. No escalation when primary responders are unavailable.

Mean Time to Resolve (MTTR)

What it measures: The complete duration from issue detection to full restoration.

MTTR captures your total incident response effectiveness. It includes detection, acknowledgment, investigation, implementation, and verification. Low MTTR means quick end-to-end response. High MTTR indicates bottlenecks anywhere in the process.

How to calculate:

MTTR = (Sum of resolution times) / (Number of incidents)

Resolution time starts when monitoring detects the issue and ends when service is fully restored and verified. If detection happens at 10:00 AM and service restoration completes at 10:52 AM, MTTR is 52 minutes.

Note that MTTR typically starts from detection, not when the issue actually began. Some teams calculate from issue occurrence, making MTTR equal to MTTD plus resolution time.

Real-world example:

Your payment processing service starts failing at 2:15 PM. Monitoring detects the issue at 2:18 PM. Engineer acknowledges at 2:23 PM. Investigation reveals a database connection pool exhaustion. The team increases pool size and restarts affected services. Service fully restored at 3:05 PM. MTTR: 47 minutes (from detection to resolution).

What good looks like:

MTTR varies dramatically by incident type, team size, and system complexity. Rather than comparing to external benchmarks, track your own baseline and measure improvement:

Month 1: Establish baseline (example: 85 minutes average)
Month 3: Target 20 percent improvement (68 minutes)
Month 6: Target 40 percent improvement (51 minutes)

Breaking down MTTR by severity:

Different severity levels have different resolution expectations. Critical incidents affecting all users demand faster response than minor issues impacting limited functionality:

Severity 1 (Critical outage): Under 30 minutes
Severity 2 (Major degradation): Under 60 minutes
Severity 3 (Moderate impact): Under 120 minutes
Severity 4 (Minor issues): Under 240 minutes

How These Metrics Connect

MTTD, MTTA, and MTTR are not independent metrics. They reveal different bottlenecks in your response process.

The complete timeline:

Issue Start → [MTTD] → Detection → [MTTA] → Acknowledgment → [Investigation + Fix] → Resolution
                                                                                    ← MTTR →

MTTR includes both MTTD and MTTA plus investigation and resolution time. If MTTR is high but MTTD and MTTA are low, your bottleneck is investigation or implementation. If MTTD is high but MTTA and resolution are fast, improve monitoring.

Example breakdown:

Consider an incident with 45-minute MTTR:

MTTD: 8 minutes (issue starts to alert fires)
MTTA: 5 minutes (alert fires to engineer acknowledges)
Investigation: 20 minutes (diagnosis and solution identification)
Implementation: 12 minutes (applying fix and verifying restoration)

This breakdown shows most time spent on investigation. Runbooks addressing common failure modes could reduce MTTR significantly.

Which Metric Should You Focus On?

The answer depends on where your response process struggles.

Focus on MTTD when:

Users report issues before monitoring alerts
Significant degradation occurs before detection
Check intervals are infrequent
Monitoring covers infrastructure but not user experience
Alert thresholds allow problems to grow before triggering

Focus on MTTA when:

Alerts fire but sit unacknowledged for extended periods
False positive rates are high
On-call schedules have gaps or unclear ownership
Notification routing sends alerts to wrong people
Engineers miss alerts during off-hours

Focus on MTTR when:

Detection and acknowledgment are fast but total resolution is slow
Investigation takes longer than fixing
Responders lack troubleshooting guidance
Similar incidents recur without systematic prevention
No clear runbooks exist for common failures

Most teams should track all three metrics. The breakdowns reveal where to invest improvement effort.

Tracking These Metrics in Practice

Manual metric tracking fails during 3 AM incidents. Engineers resolving production outages will not remember to log detection timestamps, acknowledgment times, and resolution durations.

Modern incident management platforms automate metric collection. When monitoring detects an issue and creates an incident, detection time is recorded. When engineers acknowledge, acknowledgment timestamp is captured. When status changes to resolved, resolution time is logged.

Platforms like Upstat track these metrics automatically throughout incident lifecycles. The system records when incidents are created (detection), when responders acknowledge (acknowledgment), and when incidents close (resolution). This provides accurate duration tracking with breakdowns by severity level, time period, and incident type without manual data entry.

Built-in analytics show MTTR trends over time, acknowledgment patterns by team, and detection effectiveness by service. Teams identify improvement opportunities through actual data rather than assumptions.

Moving From Numbers to Improvement

Tracking metrics accomplishes nothing without action. Use these measurements to drive specific improvements:

When MTTD is high:

Add monitoring for user-facing functionality, not just infrastructure
Reduce check intervals from 5 minutes to 30 seconds for critical services
Lower alert thresholds to catch degradation before complete failure
Implement synthetic monitoring that simulates user actions
Add real user monitoring to detect actual customer impact

When MTTA is high:

Review false positive rate and improve alert accuracy
Verify notification routing sends alerts to on-call engineers
Test paging systems to confirm delivery
Clarify on-call schedules with automatic handoffs
Implement escalation policies for unacknowledged alerts
Use incident simulation to validate notification channels

When MTTR is high despite good MTTD and MTTA:

Create runbooks for common incident types
Document troubleshooting procedures with decision trees
Build automation for repetitive resolution tasks
Conduct post-incident reviews to identify preventable recurrences
Track which incidents consume the most resolution time
Train additional team members on incident response

The Metrics That Actually Matter

Stop tracking vanity metrics. Stop creating dashboards nobody uses. Start measuring the three numbers that reveal incident response effectiveness: MTTD shows monitoring quality, MTTA reveals alerting health, and MTTR indicates overall response capability.

Track them consistently. Break them down by severity and incident type. Review trends monthly. Use them to identify specific bottlenecks.

Understanding what these metrics measure and how they connect transforms incident response from reactive firefighting into systematic improvement. You cannot improve what you do not measure, but measuring the wrong things leads nowhere.

Measure detection, acknowledgment, and resolution. Then fix the phase that creates the biggest bottleneck.

Explore In Upstat

Track incident metrics automatically with built-in duration tracking, acknowledgment timestamps, and analytics showing MTTR trends by severity and time period.

See How Analytics Works

MTTR vs MTTD vs MTTA Explained

MTTR, MTTD, and MTTA measure different phases of incident response. Understanding what each metric captures and how they relate helps teams identify bottlenecks, prioritize improvements, and track response effectiveness over time.

The Incident Response Timeline

Mean Time to Detect (MTTD)

Mean Time to Acknowledge (MTTA)

Mean Time to Resolve (MTTR)

How These Metrics Connect

Which Metric Should You Focus On?

Tracking These Metrics in Practice

Moving From Numbers to Improvement

The Metrics That Actually Matter

Explore In Upstat