Runbooks

Step-by-step procedures your team can follow during incidents or routine operations.

Features

Runbooks provide structured guidance through:

  • Wizard interface - Shows one step at a time with next/previous navigation
  • Decision branching - “If service is down, go to step 10” routing
  • Rich instructions - Format text, add code blocks, include commands
  • Status management - Keep drafts private, publish when ready
  • Execution tracking - See who ran the runbook and when

![Placeholder: Runbook Execution Wizard Showing Step Navigation]

Common Uses

Document any repeatable procedure:

  • Service restarts - Graceful shutdown, wait, restart, verify
  • Rollbacks - Find previous version, deploy, test
  • Maintenance - Database vacuum, index rebuilds, cleanup
  • Renewals - SSL certificates, API keys, licenses
  • Troubleshooting - Common issues and their solutions

Benefits

  • Consistency - Everyone follows the same steps
  • Knowledge capture - Document important procedures
  • Reduce errors - No forgotten steps during incidents
  • Training - Help new team members

Structure

Each runbook contains:

  • Metadata - Clear title and description of purpose
  • Steps - Numbered sequence with detailed instructions
  • Decisions - Optional “go to step X” branching
  • Status - Draft for testing, Published for use

Tips

Create better runbooks:

  • Start with common tasks - Document what you do weekly
  • Keep steps focused - “Check status” not “Check status and restart if needed”
  • Be specific - Include exact commands and expected output
  • Maintain them - Update when your infrastructure changes

Getting Started

  1. Go to Runbooks
  2. Click Create
  3. Add steps
  4. Publish when ready

![Placeholder: Runbooks List Page Showing Published and Draft Runbooks]

Linking to Incidents

Runbooks can be attached to incidents for quick access:

  1. Open an incident
  2. Click Link Runbook
  3. Select the appropriate runbook
  4. Execute when needed

This helps ensure teams use the right procedures during incident response.


Learn more