Incident Response for AI: Who’s on the Hook and What to Document in the First 24 Hours

Incident Response for AI: Who’s on the Hook and What to Document in the First 24 Hours
Photo by Collin / Unsplash

Your AI assistant recommends a refund, calls an internal tool, and—because a product page hid a prompt—emails a customer’s PII to a third‑party inbox. Is that a model bug, a supply‑chain issue, or a data breach? For AI, incidents often straddle all three. The first 24 hours decide whether you contain the blast radius or let it metastasize.

Somewhat unrelated, but a good watch

What counts as an AI incident?

An AI incident is more than just 'downtime'. Use a definition you can defend to counsel and regulators. The OECD’s 2024 paper distinguishes AI “incidents” (actual harm) from “hazards” (potential harm), and lists concrete harm dimensions (privacy, safety, financial, rights). Adopt that vocabulary in your playbooks to avoid hand‑waving in tense rooms. (OECD)

For example, indirect prompt injection embedded in images can steer a multimodal model to leak secrets or follow attacker instructions—documented by Bagdasaryan et al. (July 19, 2023). If your assistant “helpfully” obeys an image, that’s not user error; it’s a security event. (arXiv)

Who’s on the hook

Under the EU AI Act, two roles anchor accountability:

  • Provider. The party that develops an AI system and places it on the market under its name.
  • Deployer. The party that uses the AI system in production (e.g., your company wiring a vendor model into a workflow).

Even if you “only” consume APIs, as a deployer you own how the system is used, monitored, and controlled in your environment. Treat this as a duty of care, not a checkbox. (Artificial Intelligence Act)

Security agencies are explicit about shared responsibility. The April 15, 2024 joint guidance from CISA and international partners calls for mitigations across the deployment lifecycle of externally developed AI systems—protect, detect, and respond as if they were part of your own stack. If your SOC can’t see model tool‑calls, you’re not meeting that bar. (CISA)

For governance scaffolding, align your IR workflow to NIST AI RMF’s four functions (Govern, Map, Measure, Manage). It’s been public since January 2023 and is increasingly referenced in audits. (NIST Publications, NIST)

If this sounds like a lot, have no fear: ABV can be configured to log agent behavior, prompts, model versions, tool calls, costs, and guardrail hits under OpenTelemetry—with RBAC and audit logs. Those artifacts are what counsel and examiners will ask for. (abv.dev)

The first 24 hours: a runbook you can actually execute

Hours 0–2: Stabilize and preserve

  • Declare an AI‑labeled incident (e.g., “AI‑SEC‑2 Prompt Injection Exfiltration”) and name an incident commander.
  • Freeze the failing AI configuration.
  • Isolate the capability that’s causing harm. For agentic systems, kill specific tools (email, web, code‑exec) before you kill the model.
  • Preserve the exact conversation and tool‑call chain that led to harm, including the external content that influenced the model. Do not sanitize; copy before you redact.
  • Use ABVs export tools to export the incident trace with agent graph, prompts, tool invocations, and guardrail events; snapshot governance dashboard metrics that show scope. (abv.dev)

Hours 2–8: Scope and contain

  • Use ABVs evaluations tool to enumerate blast radius with data you can show: number of affected sessions, unique users, data types touched (e.g., emails, order IDs), and external API endpoints invoked.
  • Correlate similar incidents via ABV’s search across traces: look for the same tool‑call sequence or same referring domain.
  • Run a minimal red‑team replay to confirm containment. MITRE ATLAS provides tactics/techniques to structure those tests so you’re not guessing at attack classes. If you can’t reproduce the exploit path, you can’t prove you closed it. (MITRE ATLAS)

Note: It's important to be diligent in your red teaming and subsequent remediation. For example, some teams may patch the prompt while leaving the dangerous tool enabled, then declaring victory. If the agent keeps untrusted browsing turned on, a slightly different injection restores the exploit. Capture both the patched prompt and the updated tool policy.

Hours 8–24: Decide, document, and notify

  • Decide on service posture: full disable, tool‑only disable, or allow with guardrails. Document the risk trade‑off in one sentence.
  • Record all artifacts needed for reporting: timestamps, model/version IDs, full system prompt, user prompts and responses, tool call arguments and results, relevant environment variables, and the URLs or files that seeded the model’s behavior. ABVs long data retention and complete observability logs make this especially simple, turning your “we think” into “here’s the record.”
  • Map your reporting triggers. Example: for high‑risk systems in the EU, the AI Act requires providers to report serious incidents “immediately” upon establishing a link and no later than 15 days; interim notifications are allowed. Keep a dated note evidencing when you first established that link. (IAPP, Artificial Intelligence Act)

Trade‑offs you can expect to face

  • Preserve vs. patch. Freezing a bad configuration preserves evidence but may extend exposure. Solve by disabling specific tools, not the entire assistant, and snapshotting traces before any change.
  • Privacy vs. context. You need unredacted logs to scope harm, yet privacy law may limit who sees them. Restrict by role, not by data deletion; use ABV’s RBAC and audit logs to enforce who looked at what.
  • Vendor opacity vs. speed. Foundation model providers often treat safety layers as proprietary. Push for concrete artifacts (model ID, safety setting names) in your MSA and capture them during incidents.
  • Over‑reporting vs. late reporting. Premature regulator notices create noise; late notices create fines. Anchor the “awareness” date to when you established causal link, and keep that timestamp defensible. (IAPP)

If you’re using ABV

  • Turn on agent tracing and OpenTelemetry now, not later.
  • Configure guardrails with named policies and keep version history enabled.
  • Give legal, privacy, and support read access to governance dashboards so they aren’t asking engineering for screenshots during an incident.