Resource

Incident Response Runbook Template

An editable runbook template aligned to NIST SP 800-61. Four scenario playbooks (ransomware, business email compromise, insider threat, supply-chain compromise) with declared severity triggers, 30-minute, 4-hour, and 24-hour action lists, evidence preservation steps, communication templates, and post-incident review checklists. Adapt it. Print it. Put it in the binder your on-call team actually opens.

By Daniel Agrici, Chief Security Officer, EFROSReviewed by Stefan Efros, CEO & Founder, EFROS
Reviewed by CSO ·

Why every organization needs a runbook

The first hour of an incident is where most of the damage happens, and it is also when decision quality is at its worst. Executives are not yet looped in. On-call engineers are triaging from fragmented alerts. Legal has not been engaged. The insurance carrier has not been notified. No one has opened the chain-of-custody log because no one has decided whether this is a real incident yet. A written runbook replaces that scramble with a sequence of actions that the team has rehearsed.

A good runbook is not a policy document. It is a field manual. It names the role that makes the decision, the artifact that captures it, and the next action that follows. It assumes the person reading it is tired, in the middle of the night, and trying to figure out what to do next. Every sentence earns its place by answering a question that actually arises under those conditions.

The goal is not to cover every possible incident. The goal is to cover the four or five scenarios most likely to affect the organization, make decisions in advance about how to respond, and practice them until the team can execute without the document in front of them.

See our incident response pillar for the program-level context this runbook fits into.

The NIST SP 800-61 lifecycle as the framework

The template uses the four-phase lifecycle from NIST SP 800-61 Rev. 2: preparation, detection and analysis, containment and eradication and recovery, and post-incident activity. The first phase lives in your program (training, tooling, relationships, tabletop exercises). The scenario playbooks in this template cover the second and third phases. The fourth phase is the post-incident review that feeds back into the runbook itself.

Additional authoritative guidance comes from CISA incident response guidance and, for financial fraud cases, the FBI IC3 reporting portal. Align your procedures to NIST SP 800-61 as the backbone and layer in the sector-specific guidance that applies to your data and contracts.

Preparation is the phase where most of the return sits. Declared roles, tested backups, rehearsed escalation paths, and preexisting relationships with legal, insurance, and IR retainers are all work that has to happen before the incident. If you open the runbook for the first time during the event, you will spend the first hour finding contact information that should have been one page away.

Roles and responsibilities

Declared roles are the single most important piece of preparation. The five roles below should be named in advance with primary and alternate coverage. Anyone pulled in during the incident needs to know which role they are playing and who is above them.

  • Incident Commander (IC). Owns the response. Makes the call to declare, escalate, engage outside help, and close. Does not do hands-on technical work during the incident. Reassigns if they need to step out.
  • Communications Lead. Drafts and routes all internal and external messaging. Works with legal and executive leadership on tone and scope.
  • Forensics Lead. Owns evidence collection and chain of custody. Names the person or firm doing the analysis. Records hash values, times, and handlers.
  • Legal Liaison. Coordinates with outside counsel, regulators, insurance, and law enforcement. Decides when to invoke attorney-client privilege on the investigation workstream.
  • Executive Liaison. The executive to whom the IC escalates major decisions. Has authority to approve customer notification, ransom-payment positions, and external communications.

When the response is supported by a managed detection and response retainer, the retainer provides incident response support but does not replace the IC. See managed detection and response for how MDR and internal IR coordinate.

Scenario playbook — Ransomware

Ransomware remains the highest-impact scenario in the commercial incident catalog. The playbook below assumes encryption has started on at least one system and that isolation is the immediate priority. For a deeper operational walkthrough, see the ransomware response playbook for the first 24 hours.

Declared severity triggers

  • Any production system shows ransom notes or encrypted file extensions.
  • Endpoint protection alerts on mass file modification or known ransomware family.
  • Backup systems show unexpected deletion or modification activity.
  • Users report inability to access shared drives, file servers, or SaaS file stores.

First 30 minutes

  1. Incident Commander assumes the call. Opens the war-room channel and pages the core response team.
  2. Isolate identified endpoints from the network (EDR network containment or physical unplug).
  3. Disable the compromised user account in the identity provider and revoke active sessions.
  4. Snapshot affected virtual machines and cloud instances where possible.
  5. Freeze automated deploys and scheduled jobs that touch production.
  6. Verify backup systems are isolated and immutable copies are intact.
  7. Engage legal counsel and cyber insurance carrier per the engagement procedure.
  8. Start the incident log. Every action goes in with timestamp and actor.

First 4 hours

  1. Expand containment to additional endpoints showing the same indicators.
  2. Scope the blast radius: identify accounts used, systems accessed, data touched.
  3. Preserve memory and disk artifacts from at least two affected endpoints for forensics.
  4. Confirm backup integrity by test-restoring a non-critical file to a clean environment.
  5. Draft initial notification to executive leadership with facts known and unknowns.
  6. Coordinate with law enforcement if the threat actor is known to be sanctioned or if contract terms require notification.
  7. Begin timeline construction from EDR, SIEM, and authentication logs.

First 24 hours

  1. Publish internal all-hands communication with approved facts.
  2. Make ransom decision in writing, with counsel and insurance involvement. Default position is no payment.
  3. Rebuild compromised identities with fresh credentials. Rotate secrets in scope.
  4. Restore from the earliest clean backup, verified against known good baselines.
  5. Notify customers whose data may have been affected per regulatory and contractual obligations.
  6. Notify regulators where required (state AGs, FTC, HHS, DoD, others depending on data).
  7. Engage outside incident response firm if the incident exceeds internal capacity.

Evidence preservation

  • Memory dumps and disk images from at least two representative endpoints.
  • SIEM and EDR exports covering 90 days prior to detection.
  • Identity provider authentication logs and session records.
  • Firewall and DNS logs for the same window.
  • Ticketing system and communication logs covering the response.

Communication templates outline

  • Internal all-hands (approved by legal, executive, and comms lead).
  • Customer notification (tiered by data sensitivity and contractual obligations).
  • Regulator notifications on required timelines (some as short as 72 hours).
  • Public statement or press response only if the incident is public.
  • Insurance carrier status updates per policy terms.

Post-incident review checklist

  • Incident timeline from initial access to eradication.
  • Root cause with the specific control gap or human action that allowed entry.
  • Damage accounting (systems, data, downtime, direct cost).
  • Remediation items tied to owners and due dates.
  • Control improvements entering the control library.
  • Lessons for the runbook itself (what the plan missed).

Scenario playbook — Business Email Compromise

BEC is the most common incident by volume in most commercial environments. Losses are often financial rather than data-centric, which makes the first-hour actions around wire holds and bank notification especially important. Many BEC cases start as account takeovers via phishing or legacy auth gaps, so the containment steps focus on revoking sessions, neutralizing forwarding rules, and removing OAuth grants.

Declared severity triggers

  • Executive or finance user reports forwarding rules they did not create.
  • Unusual sign-in from impossible location on an executive or finance account.
  • Wire transfer or payment change request from an executive that HR or AP cannot verify out-of-band.
  • DMARC or DKIM failures on inbound mail from your own domain (spoofing pattern).

First 30 minutes

  1. Incident Commander opens the call. Pages the identity owner and finance lead.
  2. Disable the compromised account and revoke all active sessions.
  3. Pull mailbox forwarding rules, inbox rules, delegated access, and linked app grants.
  4. Freeze any pending wire transfers or payment modifications initiated via the affected account.
  5. Disable linked OAuth applications authorized by the account.
  6. Alert the finance team and AP team to treat all pending requests from the account as untrusted.
  7. Start the incident log.

First 4 hours

  1. Pivot on the tenant: identify other accounts with similar sign-in anomalies in the past 30 days.
  2. Review admin audit logs for config changes (tenant policy, transport rules, Conditional Access).
  3. Identify whether any funds have moved. If yes, engage the bank immediately for recall.
  4. Collect mailbox export of the compromised account for the past 90 days.
  5. Enable additional alerting on payment-change workflows tenant-wide.
  6. Notify legal counsel and cyber insurance per engagement procedure.

First 24 hours

  1. Force password reset with MFA re-enrollment for the affected user and anyone with delegated access.
  2. Review Conditional Access and sign-in risk policies. Tighten to require phishing-resistant MFA.
  3. Issue finance-team-wide reminder on wire and payment-change verification procedures.
  4. Engage FBI Internet Crime Complaint Center (IC3) if funds moved.
  5. Notify counterparties who may have received messages from the compromised account.
  6. Draft customer or partner notification if contract or regulation requires it.

Evidence preservation

  • Complete mailbox export in PST or eDiscovery format.
  • Sign-in logs for the affected account and peer accounts.
  • Inbox rules and forwarding configuration snapshots before and after.
  • OAuth app grants with timestamps.
  • Bank communications and recall attempts.

Communication templates outline

  • Executive brief (within hours if funds moved).
  • Finance and AP team-wide alert.
  • External counterparty notification if messages went out impersonating them.
  • FBI IC3 report where funds moved.
  • Insurance carrier status updates.

Post-incident review checklist

  • Timeline from initial phishing or credential theft to account takeover.
  • Root cause (phishing click, credential reuse, legacy auth gap, MFA bypass).
  • Financial impact (if any).
  • Control improvements: phishing-resistant MFA, Conditional Access tightening, payment-workflow out-of-band verification.
  • Updated user-facing guidance on reporting suspicious forwarding or rule changes.

Scenario playbook — Insider Threat

Insider threat cases differ from external incidents in one important way: the first external-to-the-user action is legal and HR coordination, not containment. Acting on an insider case without HR alignment exposes the organization to employment claims and evidentiary challenges. The playbook below reflects that sequencing.

Declared severity triggers

  • DLP alert on bulk export of sensitive data by an internal user.
  • User accessing systems or data outside their normal role profile.
  • Departing employee accessing customer lists, source code, or unreleased product data in volume.
  • Manager-submitted report of concerning behavior (threats, expressed intent to take data).

First 30 minutes

  1. Incident Commander coordinates with HR and legal before any user-facing action.
  2. Preserve all logs touching the subject user: identity, endpoint, DLP, email, ticketing, file access.
  3. Do not disable the account immediately unless safety or active exfiltration requires it (coordinate with HR first).
  4. Snapshot the user's endpoint remotely via EDR.
  5. Capture a forensic image of any corporate devices under the user's control.
  6. Freeze access to systems where continued activity would deepen the loss (source repos, customer data).
  7. Document the specific triggering signal and the time of detection.

First 4 hours

  1. Build the activity timeline from available logs.
  2. Determine whether exfiltration has occurred and through what channels (email, cloud storage, USB, personal device).
  3. Quantify the data scope (record counts, file counts, data sensitivity).
  4. Engage outside counsel if the activity may constitute theft, fraud, or breach of employment terms.
  5. Coordinate with HR on the conversation plan and timing.

First 24 hours

  1. Conduct coordinated HR-legal-security interaction with the subject user.
  2. Recover corporate devices and cloud artifacts per legal guidance.
  3. Disable access at termination or suspension, including federated identities and third-party SaaS.
  4. Engage law enforcement where the conduct crosses into criminal territory.
  5. Notify affected customers or partners if their data was exfiltrated.

Evidence preservation

  • EDR snapshots, USB connection logs, DLP alert detail.
  • Cloud storage audit logs (Google Drive, OneDrive, Dropbox).
  • Source repository clone and download records.
  • Email and outbound transfer records.
  • HR file showing timeline of notices, performance concerns, and prior communications.

Communication templates outline

  • HR-legal-security aligned internal protocol.
  • Narrow manager communication on a need-to-know basis.
  • External customer or partner notification if their data is in scope.
  • Law enforcement referral where applicable.

Post-incident review checklist

  • Timeline from first concerning signal to resolution.
  • Root cause analysis including policy gaps, access scope, and detection failure.
  • Recovery accounting (what data returned, what did not).
  • Program improvements: DLP tuning, data classification, offboarding automation, role-based access review cadence.
  • Lessons for manager-facing training on concerning behavior reporting.

Scenario playbook — Supply-Chain Compromise

Supply-chain cases arrive through trusted channels: signed updates, vendor consoles, RMM agents, dependency repositories. The first hour is about scoping reach because the compromised component may sit in many environments at once. Unlike endpoint-initiated incidents, there is usually no single patient zero. See also our financial services SOC 2 audit case study for how supply-chain evidence gets integrated into broader compliance documentation.

Declared severity triggers

  • Vendor notification that their build system, update channel, or product was compromised.
  • Public disclosure of compromise in a dependency used in production.
  • Unexpected binaries or behavior from a trusted third-party agent (EDR agent, RMM tool, backup agent).
  • Signed package with altered contents detected by integrity checks.

First 30 minutes

  1. Incident Commander coordinates with vendor management and engineering leads.
  2. Identify every instance of the compromised product or dependency in the environment.
  3. Isolate affected hosts from the network where the compromised product has execution.
  4. Block outbound traffic to known command-and-control indicators published by the vendor or intelligence feed.
  5. Freeze any in-flight deploys that would extend the footprint.
  6. Preserve memory, disk, and network telemetry from at least two affected hosts.
  7. Open the incident log and notify executive leadership.

First 4 hours

  1. Confirm scope across all environments (production, staging, dev, employee endpoints).
  2. Coordinate with the affected vendor for indicators of compromise, remediation steps, and patched versions.
  3. Scope user and data access the compromised product had (what could it read, write, exfiltrate).
  4. Review logs for evidence of actor behavior beyond the vendor product itself.
  5. Prepare customer communication if the product is integrated into your service.

First 24 hours

  1. Apply vendor patch or replace the compromised dependency.
  2. Rotate any secrets, API keys, or certificates that the compromised product had access to.
  3. Rebuild endpoints that cannot be cleanly verified as uncompromised.
  4. Update SBOMs and dependency inventories to reflect the remediation.
  5. Notify customers, regulators, and partners per contractual and regulatory requirements.
  6. Engage outside incident response if the scope or complexity exceeds internal capacity.

Evidence preservation

  • Memory and disk images from representative affected hosts.
  • Network telemetry covering the period before and during detection.
  • SBOM or dependency inventory snapshots.
  • Vendor communications and published IOCs.
  • Secret rotation records.

Communication templates outline

  • Internal executive and engineering briefings.
  • Customer notification where your service integrates the compromised component.
  • Regulator notifications where data may have been accessed.
  • Coordination with the vendor on joint communications and shared IOCs.

Post-incident review checklist

  • Timeline from first IOC to eradication and recovery.
  • Root cause analysis including the vendor path and internal detection gaps.
  • Scope of access and data the compromised component had.
  • Supply-chain controls to add: SBOM enforcement, signed-artifact verification, vendor risk tiering, dependency pinning.
  • Vendor risk program updates based on the incident.

Evidence preservation and chain of custody

Every action during an incident produces or destroys evidence. The forensics lead owns both the collection and the record of who touched what. Chain of custody is not exotic: it is a log with timestamp, actor, artifact, hash, and transfer. Missing it means evidence cannot be used in a legal or regulatory matter even if the technical work was correct. See ISO 27037 guidance on identification, collection, and preservation of digital evidence for the formal framework.

Practical discipline during an incident: acquire disk and memory images from affected hosts before rebuilding, retain all logs for the affected window (90 days preceding detection at minimum, longer if investigation requires), hash every artifact at collection, log every transfer between handlers, and store artifacts in an access-controlled evidence vault. When a third-party IR firm joins, the same standards apply to their handoffs.

Anything that cannot be reconstructed after the fact (live memory, network flows, ephemeral cloud metadata) is worth capturing during the first 30 minutes even at the cost of some containment delay. Artifacts you can replay from cold storage (disk images, long-retention logs) can wait a few hours.

Post-incident review and runbook improvement

Every incident ends with a written review. The review reconstructs the timeline, identifies the root cause, names the control gap, and produces a prioritized list of remediation items with owners and dates. Without the review, the incident has no institutional value. With it, the control library and the runbook both improve.

The review is blameless in process but not in substance. Missed detections, procedural failures, and unclear roles are called out by name so they can be corrected. What is blameless is the approach: individuals are not the target of the review. The system is.

The runbook itself is an artifact with a maintainer. After every incident (and every tabletop) the maintainer opens the runbook and updates the sections that the exercise revealed were incomplete, misleading, or out of date. A runbook that is never updated is a runbook that will fail the next time it is opened.