What should be included in an on-call handover?

Include active incidents and their status, recent incidents resolved during your shift, current system health observations, recent deployments or changes, temporary fixes requiring follow-up, and any upcoming scheduled events that might affect operations.

How long should an on-call handover take?

Most handovers take 15-30 minutes depending on shift complexity. Quiet shifts may need only 10 minutes. Shifts with multiple incidents or complex system states may require 45 minutes to ensure complete context transfer.

Should on-call handovers happen synchronously or asynchronously?

Both work best together. Written documentation provides detailed reference the incoming engineer can review. Synchronous conversation allows clarification questions and ensures nothing critical is missed. Combine written handoff notes with a brief live discussion.

What if the incoming engineer is unavailable for a live handover?

Document exhaustively in written format, consider recording a brief video walkthrough of current state, leave communication channels open for questions, and check back asynchronously within an hour to answer any confusion that arises.

SRE On-Call Handover Checklist for Shift Transitions

Why Handover Checklists Matter

On-call handover checklists ensure that critical operational context transfers completely between engineers during shift transitions. Without a structured checklist, important information gets lost, ongoing incidents fall through cracks, and incoming engineers waste time rediscovering what their predecessors already knew.

The difference between smooth operations and chaotic response often comes down to handover quality. A checklist transforms handovers from informal conversations into systematic knowledge transfer, ensuring every transition covers essential categories regardless of time pressure or fatigue.

This checklist provides a scannable reference you can use during every shift change. Print it, bookmark it, or adapt it to your team’s specific needs.

Active Incidents Checklist

Before handing off responsibility, verify you have covered every active incident:

For each ongoing incident, document:

Current severity level and customer impact
Investigation steps already completed
Current working theory about root cause
Next troubleshooting actions planned
Blocking issues preventing resolution
Subject matter experts already consulted
Stakeholders who have been notified
Commitments made about resolution timing

Verification questions:

Are there any incidents you investigated but did not formally declare?
Are there any alerts currently suppressed or silenced that need attention?
Did you escalate anything that has not yet been resolved?

Even low-severity incidents need documentation. What seems straightforward to you might confuse the incoming engineer unfamiliar with that specific failure mode.

Recently Resolved Incidents Checklist

Incidents resolved during your shift can resurface. Transfer this context:

For each recently resolved incident:

Brief description of what happened
Resolution actions taken
Verification steps confirming the fix
Potential recurrence indicators to watch
Follow-up work required (if any)

Time window: Cover incidents resolved within the last 4-6 hours. Recent problems can return, and incoming engineers need context if similar symptoms reappear.

Why this matters: When incoming engineers encounter returning symptoms, understanding recent resolutions helps them distinguish recurrence from new problems, enabling faster pattern recognition.

System Health Checklist

Beyond specific incidents, transfer broader operational context:

Current state observations:

Overall system health assessment (normal, degraded, attention needed)
Error rates compared to normal baselines
Resource utilization patterns (CPU, memory, network trends)
Services that are degraded but stable
Known anomalies being tracked but not yet actionable
Traffic patterns or unusual load characteristics

Dashboard review:

Key metrics showing any concerning trends
Alerts in warning state that have not yet fired
Services approaching capacity limits

This broader context helps incoming engineers quickly distinguish abnormal behavior from expected patterns. For comprehensive guidance on building on-call practices around these observations, see the Complete Guide to On-Call Management.

Recent Changes Checklist

Recent deployments and configuration changes represent the most common sources of new problems:

Document all changes during your shift:

Services or features deployed
Configuration changes applied
Infrastructure modifications
Feature flags enabled or disabled
Database migrations executed
Scaling actions taken

For each change, note:

Current deployment status (stable, monitoring, partial rollout)
Rollback readiness if problems emerge
Expected behavior changes or known side effects

Critical detail: If 25 percent of traffic routes to a new service version, the incoming engineer needs to know this before investigating performance differences across request segments.

Temporary Fixes Checklist

Engineers under pressure implement workarounds that need follow-up:

Document every temporary measure:

Services restarted to clear memory issues
Caches manually cleared
Features temporarily disabled
Manual processes covering automated system failures
Configuration changes applied as tactical fixes
Workarounds masking underlying problems

For each temporary fix:

What underlying issue requires proper resolution
How long the temporary fix is expected to hold
What monitoring indicates if the fix is failing
Who owns the permanent solution

Why this matters: Without explicit documentation, incoming engineers do not know these workarounds exist. Services appear healthy while running on stopgap fixes that could fail unexpectedly.

Upcoming Events Checklist

Alert incoming engineers to planned activities:

Scheduled events:

Maintenance windows during their shift
Planned deployments by other teams
Known traffic spikes (marketing campaigns, product launches)
External dependencies with announced maintenance
Holiday or weekend traffic pattern changes

For each event:

Expected timing
Anticipated impact on systems
Contacts responsible for the event
Rollback or cancellation procedures if needed

Advance awareness prevents confusion when expected changes occur and helps incoming engineers prepare mentally for anticipated load or disruption.

Access Verification Checklist

Before the outgoing engineer departs, verify the incoming engineer can access:

Essential systems:

Monitoring dashboards and alert systems
Production infrastructure (VPN, SSH, cloud consoles)
Incident management platform
Communication channels (team chat, paging systems)
Password vaults and credential management
Runbook and documentation repositories

Test verification:

Send a test alert to confirm paging works
Check roster configuration shows correct assignment
Confirm backup escalation path if primary fails

Why verify now: Access problems discovered mid-incident create dangerous delays. Five minutes of verification during handover prevents hours of scrambling later.

Communication Transfer Checklist

Ensure communication continuity:

Stakeholder status:

Who has been notified about ongoing situations
What information they received
Questions they asked
Commitments made about updates or resolution
Who expects the next communication and when

Channel status:

Active threads requiring follow-up
Questions awaiting answers
External parties expecting callbacks

This prevents incoming engineers from accidentally contradicting previous communications or missing stakeholders who require updates.

Handover Verification Checklist

Complete handover with explicit verification:

Outgoing engineer confirms:

All checklist categories have been covered
Written documentation is complete and accessible
No critical information remains undocumented

Incoming engineer confirms:

Understands current operational state
Knows immediate priorities and next actions
Has all necessary access verified
Accepts responsibility for on-call duty

Formal acknowledgment: Use explicit language like “I have reviewed the handover, understand current status, and am taking ownership of on-call responsibility. You are clear to hand off.”

This formal acknowledgment prevents ambiguity about who currently owns response responsibility, critical if issues emerge during transition.

Post-Handover Checklist

After formal transfer:

Outgoing engineer:

Remain available via chat for 30-60 minutes
Answer follow-up questions as incoming engineer encounters documented situations
Provide quick clarification while context remains fresh

Incoming engineer:

Review written documentation thoroughly
Walk through key dashboards independently
Identify any remaining questions
Confirm you can respond if alerts fire

Adapting This Checklist

Every team operates differently. Adapt this checklist to your context:

Add items for:

Team-specific systems or services
Compliance requirements (audit logging, change documentation)
Regional handoffs with timezone considerations
Customer-specific arrangements requiring awareness

Remove items if:

Your team does not use certain categories
Automation handles specific transfer tasks
Other documentation captures the information

Review quarterly: Update the checklist as systems evolve, new services launch, or team feedback identifies gaps.

Using Tools to Support Handovers

Checklists work best when supported by appropriate tooling:

What helps:

Centralized incident documentation visible to all team members
Roster visibility showing exactly when shifts change
Structured templates prompting comprehensive information capture
Real-time status tracking that persists across shifts

Platforms like Upstat provide incident activity timelines, participant tracking, and roster management that support handover workflows. When incoming engineers can see complete incident history and current roster state in one place, handover conversations focus on context rather than hunting for basic information.

What to avoid:

Scattered documentation across chat, wikis, and personal notes
Reliance on verbal-only handovers without written backup
Assumptions that information will be obvious or remembered

Final Verification

Before completing any handover, ask yourself:

Could someone unfamiliar with my shift understand current state from my documentation?
Have I covered everything I would want to know if I were starting this shift?
Is there anything I am assuming the incoming engineer already knows?

If you answer “no” to any question, add the missing context before signing off.

Quality handovers are not administrative overhead. They are operational insurance that maintains reliability when responsibility transfers between people. Use this checklist consistently, and your team will experience smoother transitions, faster incident response, and reduced context loss across every shift change.

Explore In Upstat

Support smooth handovers with centralized incident documentation, clear roster visibility, and real-time status tracking that keeps context accessible across shifts.

See How On-Call Management Works

SRE On-Call Handover Checklist

Effective on-call handovers prevent context loss and ensure operational continuity. This checklist provides a scannable reference for what to cover during every shift transition, from active incidents to system state and upcoming events.