Halt Control Audit

AI Agent Kill Switch Audit

10 questions · 45–60 min · Rav | @MrDecentralize

Most organizations have a stop button. They have never tested what stops when they press it. At machine speed, the gap between "clicked PAUSE" and "all agent actions stopped" is measured in seconds. The agent executes 40 more actions in that window. This audit maps where that gap lives.

Who this is for

Operations managers, risk officers, trading desk supervisors, and system owners deploying autonomous agents that execute actions at machine speed — trading, payments, provisioning, customer actions — who need confidence they can halt when things go wrong.

What this audit maps

Five critical gaps in kill switch controls:

Circuit breaker gap: Do you have automatic halts based on conditions, or only manual stopping?
Mechanics gap: What actually stops when you hit the button, and what continues executing?
Safe-state gap: Does the agent fail gracefully, or just stop mid-action leaving inconsistent state?
Time-bound gap: Does autonomy expire automatically, or run indefinitely until manually stopped?
Abuse gap: Can you halt malicious agent use or cascading failures before damage compounds?

Time required

Read-through: 25–30 min
Self-audit for one agent system: 45–60 min
Gap remediation: 3–6 weeks

Expected outcomes

Whether your kill switch is tested or theoretical
What "stop" means for your agent — and what it doesn't stop
Your measured halt latency, or why it's unmeasured
Prioritized list of halt control gaps

What this is not

This is an audit, not an implementation guide. It reveals what your kill switch doesn't do but does not provide circuit breaker architectures, safe-state designs, or interrupt mechanisms. For implementation methodology with control patterns and testing protocols, see the AI Agent Kill Switch Implementation Playbook.

What actually breaks in production

I was reviewing an AI agent deployment for automated foreign exchange hedging at a mid-sized investment firm. The agent monitored FX exposure across multiple portfolios and executed hedging transactions autonomously within pre-defined risk limits. The treasury desk was confident. "It's transformed our hedging efficiency. And we have full control — we can stop it anytime."

I asked what should have been a basic operational question: "Walk me through your procedure for stopping the agent. Say it's 10 AM during London market hours, high volatility, and the agent starts executing trades you don't want. How do you halt it?"

The lead developer pointed at the screen. "We flip the toggle. Agent goes to 'PAUSED' state." I asked him to demonstrate. He clicked. Status changed from ACTIVE to PAUSED.

"What about transactions that are already in flight?" He paused. "They... probably complete. The API calls were already sent." I kept going. "The agent can queue 5–10 concurrent calls. So in the time between you noticing something wrong and clicking PAUSE, you've got 10 FX trades already submitted that execute regardless?" Yes. "At $50 million notional per trade, that's $500 million exposure that completes after you 'stopped' the agent."

The question that revealed it

"What is your measured response time from trader perceives anomaly, escalates, supervisor decides to halt, clicks PAUSE, all agent actions actually stop — have you ever timed that?" They looked at each other. "No. We haven't run that scenario."

Average execution rate was 4 trades per second during active hedging. Estimated halt time: 10–15 seconds. In that window: 40–60 trades, $2–3 billion notional exposure, executing after they believed they had stopped the agent.

They had a PAUSE button. They had confident belief they could stop it. They had audit logs showing when PAUSE was clicked. They didn't have circuit breakers, interrupt mechanisms, measured halt latency, or tested procedures. They had the illusion of control. Not the mechanics of it.

How to use this audit

Read through all five sections first without answering. This builds the mental model of kill switch requirements.
Select one agent system to audit. Pick an autonomous agent that executes actions with consequences: financial, operational, or customer-facing.
Answer each question honestly. If you are uncertain, that is a Partial or Gap — not a reason to skip. Uncertainty about a control is itself a gap.
Review your gap score. The results panel generates after question 10 with prioritized gaps and next steps.
Prioritize remediation. Circuit breaker and halt mechanics gaps are the entry point — manual-only stopping cannot outpace machine-speed execution.

—

Answer questions to generate your gap score

0 of 10 answered

Controlled: 0 Partial: 0 Gap: 0 Skipped: 0

Section 1 of 5 · 2 questions

Circuit Breakers and Automatic Halt

Do you have automatic mechanisms that halt the agent based on conditions, or only manual stopping? At machine speed, humans cannot perceive and react faster than the agent executes. Manual-only stopping means significant damage occurs before intervention is possible.

Questions 1–2 · Circuit breaker existence, circuit breaker testing

Q1.1 Do you have circuit breakers that automatically halt the agent when conditions indicate malfunction or risk?

Circuit breakers trigger automatic halt based on: execution rate exceeding threshold, total exposure crossing a boundary, error rate spiking, agent confidence dropping below minimum, behavioral anomaly versus baseline, or external conditions such as market circuit breakers or system alerts. The critical distinction: automatic means the agent halts without human perception, decision, or intervention. Most organizations rely entirely on humans to notice a problem and click STOP. At 4 trades per second, a 13-second human response window is 52 more trades.

Red flags

"Operators watch the dashboard" — human perception is the only detection layer
"The agent has limits so circuit breakers aren't needed" — limits don't self-enforce at machine speed
"We've never had a runaway scenario" — first occurrence tests whether controls exist
No defined conditions that trigger automatic halt without human decision

Gap identified. Your only halt mechanism is human perception and reaction. At machine speed, significant exposure accumulates before a human can notice, escalate, decide, and click STOP. The FX hedging pattern: $2.75B executed during the 13-second human response window.

Q1.2 Have you tested that circuit breakers actually halt the agent when triggered — not just that they log alerts?

Testing means: deliberately trigger circuit breaker conditions in a test environment, verify the agent halts automatically, measure halt latency from trigger to full stop, confirm no actions execute after trigger, validate alert and notification actually fires. The common failure: circuit breaker is configured, logs an alert when conditions trigger, but the halt action is not wired to the trigger. The breaker fires on paper, the agent continues executing. First validation happens during an actual incident.

Red flags

"We configured circuit breakers but haven't tested them" — configured is not validated
"Testing would be disruptive" — first test should not be a production incident
Circuit breakers appear in documentation but no test results exist
Alert fires on trigger but halt action not confirmed separately

Gap identified. Your circuit breakers are theoretical. You cannot guarantee they halt the agent when conditions trigger. The first validation will be during an actual incident, which is also the worst time to discover a configuration error.

Implementation methodology for closing circuit breaker and halt mechanics gaps is covered in the Kill Switch Implementation Playbook — condition threshold design, halt action wiring, halt latency testing protocols, and safe-state patterns that survive audit review.

Join the waitlist for implementation access →

Section 2 of 5 · 2 questions

Manual Kill Switch Mechanics

What actually stops when you trigger the manual kill switch? If "stop" only stops new decisions, the agent continues executing for seconds or minutes after you believe you have stopped it. The gap between the two is your actual exposure window.

Questions 3–4 · Kill switch scope, measured halt latency

Q2.1 When you trigger the manual kill switch, what specifically stops — and have you verified this against what you assume stops?

Kill switch scope has five layers: new top-level tasks (easiest to stop), current mid-reasoning chains (rarely stopped — cascade completes), queued API calls already submitted (depend on API cancellation support), in-flight transactions already processing (depend on whether past cancellation window), tool access immediately revoked (varies by architecture). Most organizations build layer one and call it a kill switch. When the FX hedging team clicked PAUSE, 8 queued trades, 3 mid-settlement, and the active reasoning chain all continued. The stop was real. The scope was not what they assumed.

Red flags

"STOP button stops the agent" — stops what specifically?
"In-flight transactions probably complete" — probably is not a known scope
Assumed everything stops without testing what continues
No answer to "what is still executing 5 seconds after I click STOP?"

Gap identified. Your kill switch stops new decisions. Significant agent activity continues after triggering — queued API calls, in-flight transactions, mid-reasoning cascades. You don't know the actual scope of what stops versus what continues.

Q2.2 What is your measured time from kill switch trigger to complete halt of all agent actions — tested under realistic load?

Measured halt latency requires: deliberately triggering the kill switch during a realistic activity simulation, timing from trigger to last agent-initiated action completing, repeating under different load levels, and documenting best case, worst case, and average. Without this number, incident response planning has no basis. The FX hedging team's answer: "Probably 10–15 seconds." At 4 trades per second, that's 40–60 trades. At $50M notional per trade, that's $2–3B executing in the "stopped" window. Without measurement, you don't know what happens between clicking STOP and everything actually stopping.

Red flags

"It stops pretty quickly" — how quickly, measured?
"We haven't timed it" — no basis for incident response planning
"Assumed to be instant" — instant is not a measurement
"We'd measure it during an incident" — worst time to gather baseline data

Gap identified. You don't know how long the agent continues executing after you believe you've stopped it. Incident response plans cannot account for an exposure window you haven't measured.

Section 3 of 5 · 2 questions

Safe-State Design

Does your agent fail gracefully into a safe state, or just halt mid-action? Agents that stop wherever they were leave partial state: transactions half-executed, positions unhedged, records out of sync. Safe-state logic prevents cleanup from becoming the crisis.

Questions 5–6 · Safe-state logic, dual control for irreversible actions

Q3.1 When halted mid-task, does your agent enter a defined safe state — completing current action cleanly or rolling back — or does it stop wherever it was?

Safe-state logic covers three scenarios. Uncertainty: agent confidence drops below threshold, stop and alert rather than guess. Error: tool call fails, enter safe mode rather than cascade errors to downstream actions. Halt: kill switch triggered, complete current action to a clean boundary or roll back to the last consistent state, then log exactly where execution stopped. Without this, a three-step hedge stopped between steps 2 and 3 leaves: step 1 executed, step 2 status unknown, step 3 skipped, hedge register out of sync. Manual cleanup requires determining what executed, what didn't, and whether corrective trades are needed — often 15–30 minutes of confusion that compounds the original problem.

Red flags

"Agent just stops when killed" — wherever it was, regardless of partial state
"Manual troubleshooting required after halt" — no defined recovery procedure
"We don't have defined failure modes" — safe state requires defined design, not improvisation
No cleanup logic for partial transaction state

Gap identified. Agent halts create inconsistent state requiring manual troubleshooting. Every halt is disruptive rather than controlled. The cleanup effort after an emergency stop is often larger than the original incident.

Q3.2 For irreversible actions — payments, trades, deletions — do you require dual control before execution, or does the agent execute autonomously?

Dual control means: agent generates recommendation, human reviews and approves, agent executes only after approval. For high-value actions: two independent approvals required. The distinction matters because kill switches operate after the fact — they stop what hasn't happened yet. Dual control prevents the action from happening at all without authorization. A $45M FX trade executed autonomously on a misinterpretation produces a realized loss when unwound. The same trade routed through dual control gets caught by the second reviewer who recognized the signal was ambiguous, waits 15 minutes, and confirms the trade was wrong — no loss.

Red flags

"Agent is faster without human approval" — speed at the cost of irreversibility
"Single approval is sufficient" — single point of failure for high-consequence actions
"Dual control adds too much latency" — for irreversible actions at institutional scale, this trade-off needs a documented decision

Gap identified. Agent can execute irreversible actions without dual control. A single error — agent misinterpretation or single human approval of a bad decision — produces realized, difficult-to-reverse impact. Kill switches can't undo what already executed.

Section 4 of 5 · 2 questions

Time-Bound Autonomy

Does agent autonomy expire automatically, or run indefinitely until manually stopped? Continuous autonomy means the agent runs until someone actively decides to stop it. If stopping is forgotten, or the stopping mechanism fails, the agent continues executing without a time limit.

Questions 7–8 · Operational time windows, periodic re-authorization

Q4.1 Does your agent's autonomy operate within defined time windows that automatically deactivate, or run continuously until manually stopped?

Time-bound autonomy means: agent active only during defined periods, automatically deactivates at window end, requires explicit activation per session, cannot execute outside authorized windows. The risk of continuous autonomy: trading desk closes Friday at 6 PM, assumes someone will stop the agent. Nobody clicks STOP. Agent runs over the weekend. Saturday at 3 AM a market event triggers activity in thin weekend liquidity. No humans monitoring. Monday morning: review weekend execution, positions need assessment, potential unwinding. Root cause is not a technical failure — it's an assumption that manual stopping would occur. Time-bound autonomy removes the assumption.

Red flags

"Agent runs continuously" — indefinite unless manually stopped
"We stop it when we're done" — relies on human memory rather than automatic boundary
"24/7 operation is more efficient" — efficiency argument for removing automatic safety boundary
No mechanism to prevent execution outside intended operating hours

Gap identified. Agent operates indefinitely until manually stopped. If stopping is forgotten or the mechanism fails, execution continues without a time limit. No automatic safety boundary exists independent of human action.

Q4.2 Does agent autonomy require periodic re-authorization to continue, or does it run indefinitely once granted?

Periodic re-authorization creates a forcing function: autonomy expires after a defined period, an authorized person must actively decide to continue, and no one can forget the agent is running. Without this: Month 1, agent activated for a specific purpose. Month 3, original reason no longer applies. Month 6, market conditions changed, configuration is no longer appropriate. Month 9, audit asks why the agent is still running. Answer: "We forgot it was active." Indefinite authorization drifts into unmonitored operation. Re-authorization periods should reflect risk level: daily for financial execution agents, weekly for operational agents, monthly for informational ones.

Red flags

"Agent runs until we stop it" — no expiry, no forced review
"We review it when we think about it" — review is discretionary, not scheduled
"Periodic review would be burdensome" — for high-consequence agents, this burden is the control

Gap identified. Agent autonomy continues indefinitely without a forcing function for review. Agents drift into unmonitored operation. No mechanism ensures ongoing authorization remains appropriate for current conditions.

Section 5 of 5 · 2 questions

Abuse and Misalignment Scenarios

Can you halt the agent during insider misuse, misalignment, or cascading failures? These scenarios create the most damage, require the fastest response, and are the least tested halt conditions in most deployments.

Questions 9–10 · Insider misuse detection, cascading failure halting

Q5.1 Can you detect and halt agent misuse by authorized insiders — where a legitimate operator uses the agent to execute actions beyond their personal authority?

Insider misuse pattern: junior trader is authorized to operate the FX hedging agent. The agent is authorized to execute up to $100M per trade. The junior trader is personally authorized only up to $10M. The junior trader uses the agent to execute an $80M trade — within the agent's authorization, above the trader's personal limit. Without separation of duties, this executes without detection. No check verified that operator authority matched the action scope. The audit discovers it later: "Who authorized the $80M trade?" "The agent executed it." "Who triggered the agent?" "The junior trader." Investigation: agent became the privilege escalation path.

Red flags

"We trust our operators" — trust without separation of duties is not a control
"Agent authority equals operator authority" — they are separate authorization surfaces
"No monitoring for misuse" — insider abuse discovered after the fact if at all
"Insider abuse isn't our threat model" — it is the most common source of high-consequence agent incidents

Gap identified. Authorized insiders can use agent autonomy to execute actions beyond their personal authority. The agent becomes a privilege escalation path. Misuse is discovered after the fact, if at all.

Q5.2 Can you detect and halt cascading failures where agent actions create feedback loops that execute hundreds of unintended actions before human intervention?

Cascade pattern: payment processing agent retrieves transaction, validates, executes payment, logs confirmation. Malformed confirmation field: agent interprets as "payment failed, retry." Executes duplicate. Retrieves confirmation. Interprets as "payment failed, retry." 380 duplicate payments over 4 minutes before a human noticed and manually halted. $19M in duplicate payments. Without cascade detection: discovered after hundreds of repetitions. With circuit breaker set at 5 identical actions in 60 seconds: halts after the 4th duplicate, 4 payments to reverse instead of 380, malformed confirmation bug immediately identified. The detection pattern: same action type repeated beyond threshold triggers automatic halt regardless of the agent's reasoning about why it's doing it.

Red flags

"Agent will stop when it completes the task" — cascades present as completing a task, not looping
"No limits on action chains" — reasoning depth unbounded
"We'd notice if something was looping" — 380 duplicates in 4 minutes: humans noticed at 380
"Cascades haven't happened yet" — first occurrence tests whether detection exists

Gap identified. Agent can cascade hundreds of unintended actions before manual intervention. No automatic protection against feedback loops or reasoning cascades. Detection occurs after damage, not before it compounds.

—

Gaps requiring attention

Prioritization framework

Address firstCircuit breaker existence — Q1.1 (entry point; manual-only stopping cannot outpace machine-speed execution)

Address firstKill switch scope verification — Q2.1 (know what actually stops; assumed scope and actual scope are consistently different)

Address firstHalt latency measurement — Q2.2 (no incident response plan is meaningful without a measured exposure window)

Address firstCascading failure detection — Q5.2 (compound damage accumulates faster than any other failure mode; detection threshold must be automatic)

Within 30 daysCircuit breaker testing — Q1.2 (untested circuit breakers are assumptions; validate in staging, document halt latency, regression test quarterly)

Within 30 daysSafe-state logic — Q3.1 (define clean halt behavior for each multi-step action; halt should be controlled, not disruptive)

Within 30 daysInsider misuse detection — Q5.1 (separate operator authority from agent authority; audit trail must attribute agent actions to triggering operator)

Document and monitorDual control for irreversible actions — Q3.2 (document the deliberate decision on where autonomous execution is acceptable versus where dual control is required)

Document and monitorTime-bound operational windows — Q4.1 (define authorized operating hours; automatic deactivation at window end removes reliance on human memory)

Document and monitorPeriodic re-authorization — Q4.2 (set re-authorization periods matching agent risk level; forces active review rather than passive drift)

Next steps

Close these gaps. The Kill Switch Implementation Playbook covers step-by-step closure methodology for each control surface this audit maps: circuit breaker condition design, halt action wiring, latency testing protocols, safe-state pattern templates, time-bound authorization frameworks, and cascade detection thresholds that survive operational review.

Join the waitlist for implementation access →

What motivated this audit

Foundation article

Your AI Agent Is a Privileged Interpreter: The Trust Boundary Security Teams Keep Missing

Control Failure

ServiceNow Virtual Agent: Passed design approval. Failed in production. Security review asked whether the vendor was secure — not whether every agent action was identity-verified.

Foundation article

Memory Poisoning: The Attack Vector Nobody's Modeling — malicious instructions stored in agent memory replay into every future context window.