Alert Performance Report
The Alert Performance report helps you optimize your monitoring by analyzing alert effectiveness. Identify noisy alerts, improve detection times, and ensure critical issues are caught.
Report Sections
Overall Metrics
Key alert statistics for the period:
- Total Alerts - All alerts generated
- Average Detection Time - Time from issue start to alert
- Alert Rate - Alerts per day average
Alert Volume Trends
A line chart showing daily alert counts over the last 30 days. Use this to:
- Identify alert storms
- Track improvements from tuning
- Spot patterns in alert frequency
Monitor Performance
Table showing metrics for each monitor:
- Monitor Name - The monitor generating alerts
- Alert Count - Total alerts from this monitor
- Failure Rate - Percentage of checks that failed
- Avg Detection Time - How quickly issues are detected
Sort by any column to find:
- Most noisy monitors
- Monitors with high failure rates
- Slow detection times
Alert Distribution
Visual breakdown showing:
- Alerts by monitor type
- Time of day distribution
- Severity levels
Using the Report
Identifying Noisy Alerts
Look for monitors with:
- High alert counts but low incident correlation
- Frequent flapping (up/down cycles)
- Alerts during known maintenance
These are candidates for tuning or removal.
Improving Detection Time
Fast detection is critical for minimizing impact:
- Review monitors with slow detection times
- Consider more frequent check intervals
- Adjust thresholds for earlier warning
Exporting Data
Click Export to download a CSV with:
- Per-monitor statistics
- Daily alert volumes
- Detection time details
- Complete metrics for analysis
Alert Optimization
Reducing Alert Fatigue
Common causes and solutions:
- Flapping services - Add retry logic or increase thresholds
- Known issues - Use maintenance windows
- Low-priority alerts - Adjust severity or disable
- Duplicate alerts - Consolidate similar monitors
Threshold Tuning
Use the data to:
- Set appropriate failure thresholds
- Adjust check frequencies
- Configure proper retry counts
- Balance sensitivity vs noise
Best Practices
Regular Review
- Weekly review of top alerting monitors
- Monthly trend analysis
- Quarterly threshold adjustments
- Document changes and impacts
Team Collaboration
- Share findings with service owners
- Get feedback on alert usefulness
- Coordinate tuning efforts
- Track improvement metrics
Interpreting Results
Good Performance Indicators
- Stable or decreasing alert volume
- Quick detection times (under 5 minutes)
- Low false positive rate
- Alerts correlate with real incidents
Areas for Improvement
- Increasing alert trends without incidents
- Slow detection times for critical services
- High volume from specific monitors
- Alerts ignored by team
Taking Action
Immediate Steps
- Disable or tune the noisiest monitors
- Adjust thresholds on flapping services
- Add maintenance windows for known issues
- Review and update alert routing
Long-term Improvements
- Implement better monitoring strategies
- Use composite alerts for complex scenarios
- Add business hours logic where appropriate
- Regular alert effectiveness reviews