Alert Performance Report

The Alert Performance report helps you optimize your monitoring by analyzing alert effectiveness. Identify noisy alerts, improve detection times, and ensure critical issues are caught.

Report Sections

Overall Metrics

Key alert statistics for the period:

  • Total Alerts - All alerts generated
  • Average Detection Time - Time from issue start to alert
  • Alert Rate - Alerts per day average

A line chart showing daily alert counts over the last 30 days. Use this to:

  • Identify alert storms
  • Track improvements from tuning
  • Spot patterns in alert frequency

Monitor Performance

Table showing metrics for each monitor:

  • Monitor Name - The monitor generating alerts
  • Alert Count - Total alerts from this monitor
  • Failure Rate - Percentage of checks that failed
  • Avg Detection Time - How quickly issues are detected

Sort by any column to find:

  • Most noisy monitors
  • Monitors with high failure rates
  • Slow detection times

Alert Distribution

Visual breakdown showing:

  • Alerts by monitor type
  • Time of day distribution
  • Severity levels

Using the Report

Identifying Noisy Alerts

Look for monitors with:

  • High alert counts but low incident correlation
  • Frequent flapping (up/down cycles)
  • Alerts during known maintenance

These are candidates for tuning or removal.

Improving Detection Time

Fast detection is critical for minimizing impact:

  • Review monitors with slow detection times
  • Consider more frequent check intervals
  • Adjust thresholds for earlier warning

Exporting Data

Click Export to download a CSV with:

  • Per-monitor statistics
  • Daily alert volumes
  • Detection time details
  • Complete metrics for analysis

Alert Optimization

Reducing Alert Fatigue

Common causes and solutions:

  • Flapping services - Add retry logic or increase thresholds
  • Known issues - Use maintenance windows
  • Low-priority alerts - Adjust severity or disable
  • Duplicate alerts - Consolidate similar monitors

Threshold Tuning

Use the data to:

  • Set appropriate failure thresholds
  • Adjust check frequencies
  • Configure proper retry counts
  • Balance sensitivity vs noise

Best Practices

Regular Review

  • Weekly review of top alerting monitors
  • Monthly trend analysis
  • Quarterly threshold adjustments
  • Document changes and impacts

Team Collaboration

  • Share findings with service owners
  • Get feedback on alert usefulness
  • Coordinate tuning efforts
  • Track improvement metrics

Interpreting Results

Good Performance Indicators

  • Stable or decreasing alert volume
  • Quick detection times (under 5 minutes)
  • Low false positive rate
  • Alerts correlate with real incidents

Areas for Improvement

  • Increasing alert trends without incidents
  • Slow detection times for critical services
  • High volume from specific monitors
  • Alerts ignored by team

Taking Action

Immediate Steps

  1. Disable or tune the noisiest monitors
  2. Adjust thresholds on flapping services
  3. Add maintenance windows for known issues
  4. Review and update alert routing

Long-term Improvements

  • Implement better monitoring strategies
  • Use composite alerts for complex scenarios
  • Add business hours logic where appropriate
  • Regular alert effectiveness reviews