Automation Best Practices

Guidelines for building effective and maintainable automations.

Naming

Use descriptive names

Names should explain what the automation does.

Good Less Clear
Create P1 for Production API Down Automation 1
Notify On-Call for Heartbeat Failure New automation
Escalate Unacked Incidents Alert handler

Include key details

Consider including:

  • The trigger type
  • Any key conditions
  • The primary action

Starting Simple

Begin with basic automations

Start with straightforward rules before adding complexity:

  1. Single trigger
  2. No conditions (or one simple condition)
  3. One action

Example progression:

  1. Week 1: Monitor down → Create incident
  2. Week 2: Add condition for production monitors only
  3. Week 3: Add delayed notification action

Conditions

Be specific

Use conditions to prevent false triggers and unwanted noise.

Approach Result
No conditions Every monitor triggers
name contains “Production” Only production monitors
monitorType == HTTP Only HTTP monitors

Combine thoughtfully

When using multiple conditions, consider whether you need AND (all must match) or OR (any can match).

Actions

Order matters

Actions execute sequentially. Put the most critical action first.

Example order:

  1. Create incident (document immediately)
  2. Set delay (allow auto-recovery time)
  3. Send notification (escalate if still down)

Use delays wisely

Delays help prevent:

  • Alert fatigue from brief outages
  • Premature escalation
  • Notification storms

Testing

Use draft status

Keep automations in draft while developing. Only publish when ready.

Verify conditions

Ensure conditions filter correctly:

  • Test with events that should trigger
  • Test with events that should not trigger

Check actions

Verify action configuration:

  • Correct recipients for notifications
  • Appropriate severity for incidents
  • Reasonable delay durations

Maintenance

Review regularly

Periodically review automations to ensure they:

  • Still match your operational needs
  • Use current team members as recipients
  • Reference active monitors

Update when things change

Update automations when:

  • Team structure changes
  • New monitors are added
  • Escalation policies change

Common Patterns

Tiered Response

Multiple automations with different conditions:

Automation Condition Action
Critical Response name contains “Production” P1 incident + immediate notification
Standard Response name contains “Staging” P3 incident

Delayed Escalation

Single automation with delay:

  1. Create incident
  2. Wait 5 minutes
  3. Send escalation notification

What to Avoid

Over-automation

Don’t automate everything immediately. Start with high-value, frequently occurring scenarios.

Complex chains

Keep action chains short. More than 3-4 actions may indicate the need for separate automations.

Broad triggers without conditions

Always consider adding conditions to limit scope and reduce noise.