What is Upstat?

Upstat is an incident management and monitoring platform built around a core principle: during incidents, finding the right person to fix the problem is often harder than detecting the problem itself.

The platform provides integrated monitoring, incident tracking, on-call scheduling, and status communication—designed so teams spend less time coordinating and more time resolving.


Platform Pillars

Upstat organizes around four operational areas.

Observation

Detect issues before customers report them and understand what’s affected.

  • Monitoring – HTTP/HTTPS endpoint checks, SSL certificate tracking, and heartbeat monitors for background jobs. Multi-region execution confirms issues aren’t localized network problems.
  • Service Catalog – Model your infrastructure as entities (services, databases, customers) with dependency relationships. When something fails, instantly see upstream causes and downstream impact.
  • Status Pages – Communicate operational status externally. Pages derive status from your catalog entities, so updates reflect actual system health.
  • Reports – Track MTTR, incident volume, alert performance, and availability trends over time.

Response

Manage incidents from detection through resolution with structured workflows.

  • Incidents – Create manually or automatically from monitor failures. Track severity, status, assignments, and timeline. Link incidents to affected services for context.
  • Runbooks – Document step-by-step procedures with decision trees. Execute them through a wizard interface during incidents.
  • Automations – Define trigger-based workflows. When a monitor fails or incident status changes, automatically notify channels, update statuses, or escalate.

Scheduling

Ensure the right people are available when issues occur.

  • On-Call – Build rosters with rotation rules (sequential, weekly, fair distribution). Handle timezone differences, holidays, and temporary substitutions.
  • Maintenance – Schedule windows that automatically suppress alerts. Recurring patterns handle regular maintenance without manual setup each time.
  • Calendar – Unified view of on-call shifts, maintenance windows, and incidents. Syncs with Google Calendar for external visibility.

Communications

Route alerts to the right people through the right channels.

  • Notifications – Multi-channel delivery via Slack, email, and SMS. Route based on severity, time of day, or on-call status.
  • Anti-Fatigue – Suppress duplicate alerts, control renotification frequency, and filter noise during maintenance windows.

Human Coordination Focus

Traditional incident tools excel at detecting “what broke.” Upstat additionally addresses “who can fix it.”

The service catalog tracks not just dependencies, but ownership. Teams have responsibility tags covering their areas of expertise. On-call schedules show who’s currently available. During an incident, you can quickly identify which team owns the affected service and who on that team is on-call.

This coordination layer—responsibility mapping, real-time availability, and ownership visibility—reduces the time spent figuring out who should be working on what.


Getting Started

New to Upstat? Begin here: