Automation Best Practices
Guidelines for building effective and maintainable automations.
Naming
Use descriptive names
Names should explain what the automation does.
| Good | Less Clear |
|---|---|
| Create P1 for Production API Down | Automation 1 |
| Notify On-Call for Heartbeat Failure | New automation |
| Escalate Unacked Incidents | Alert handler |
Include key details
Consider including:
- The trigger type
- Any key conditions
- The primary action
Starting Simple
Begin with basic automations
Start with straightforward rules before adding complexity:
- Single trigger
- No conditions (or one simple condition)
- One action
Example progression:
- Week 1: Monitor down → Create incident
- Week 2: Add condition for production monitors only
- Week 3: Add delayed notification action
Conditions
Be specific
Use conditions to prevent false triggers and unwanted noise.
| Approach | Result |
|---|---|
| No conditions | Every monitor triggers |
| name contains “Production” | Only production monitors |
| monitorType == HTTP | Only HTTP monitors |
Combine thoughtfully
When using multiple conditions, consider whether you need AND (all must match) or OR (any can match).
Actions
Order matters
Actions execute sequentially. Put the most critical action first.
Example order:
- Create incident (document immediately)
- Set delay (allow auto-recovery time)
- Send notification (escalate if still down)
Use delays wisely
Delays help prevent:
- Alert fatigue from brief outages
- Premature escalation
- Notification storms
Testing
Use draft status
Keep automations in draft while developing. Only publish when ready.
Verify conditions
Ensure conditions filter correctly:
- Test with events that should trigger
- Test with events that should not trigger
Check actions
Verify action configuration:
- Correct recipients for notifications
- Appropriate severity for incidents
- Reasonable delay durations
Maintenance
Review regularly
Periodically review automations to ensure they:
- Still match your operational needs
- Use current team members as recipients
- Reference active monitors
Update when things change
Update automations when:
- Team structure changes
- New monitors are added
- Escalation policies change
Common Patterns
Tiered Response
Multiple automations with different conditions:
| Automation | Condition | Action |
|---|---|---|
| Critical Response | name contains “Production” | P1 incident + immediate notification |
| Standard Response | name contains “Staging” | P3 incident |
Delayed Escalation
Single automation with delay:
- Create incident
- Wait 5 minutes
- Send escalation notification
What to Avoid
Over-automation
Don’t automate everything immediately. Start with high-value, frequently occurring scenarios.
Complex chains
Keep action chains short. More than 3-4 actions may indicate the need for separate automations.
Broad triggers without conditions
Always consider adding conditions to limit scope and reduce noise.
Related
- Automations Overview - Understanding automations
- Triggers & Conditions - Configure triggers
- Actions & Delays - Configure actions
- Examples - Sample automations