Research question
Security dashboards tend to compress complex systems into a single rate. For an AI execution boundary, the tempting number is deny rate: how often did the boundary reject proposed actions?
This paper argues that deny rate is at best a weak diagnostic and at worst a misleading safety metric. A low deny rate can indicate safe agents, permissive policy, missing coverage, or broken instrumentation. A high deny rate can indicate strong protection, poor user experience, malformed tools, or a prompt-injection campaign. The metric is uninterpretable without context.
Method
We derive metrics from three operational requirements:
- The boundary must prevent unauthorized side effects.
- Operators must be able to explain decisions after the fact.
- Auditors must be able to replay a representative sample without trusting the live service.
We then compare those requirements against deny rate and propose a replacement metric set.
Why deny rate fails
Deny rate is a ratio:
denied_actions / evaluated_actions
The numerator and denominator are both unstable. A product change may cause agents to propose fewer risky actions. A policy change may classify the same action differently. A connector outage may convert valid requests into denials. A new guard may catch prompt injection attempts that were previously invisible. A model upgrade may reduce malformed tool calls.
All of those events move deny rate. Only some indicate improved safety.
Failure mode 1: low deny rate hides weak policy
The most dangerous low-deny-rate system is one that never sees the risky action class. If policy does not cover a connector, if the boundary receives an incomplete intent, or if the framework bypasses the boundary for some calls, deny rate can be low because the evaluated set is too small.
This is the same reason access-control systems do not use “few denied requests” as proof of least privilege. Zero-trust architecture asks for continuous, explicit evaluation. The important question is not how often the system says no. It is whether every protected resource and action class was evaluated under the right policy.
Failure mode 2: high deny rate hides poor affordances
A high deny rate can be healthy during an attack. It can also mean the agent is badly configured. If users ask for permitted workflows but the framework repeatedly proposes malformed or overbroad tool calls, the boundary will deny correctly while the product fails operationally.
In that situation, reducing deny rate by weakening policy is the wrong repair. The repair is better tool schemas, clearer agent affordances, improved policy hints, or a narrower action vocabulary.
Failure mode 3: rate metrics miss false allows
The event that matters most is a false allow: an action that should have been denied or escalated but was allowed. Deny rate does not measure false allows. A boundary can deny many low-risk malformed calls while allowing one high-risk exfiltration event.
False-allow detection requires adversarial tests, replay, incident review, and policy coverage analysis. It cannot be inferred from aggregate deny rate.
Replacement metric set
An execution boundary should report a balanced set:
- Evaluation coverage. Percentage of consequential tool calls that passed through the boundary.
- Policy coverage. Percentage of action classes with explicit allow, deny, and escalation rules.
- Replay pass rate. Percentage of sampled receipts that verify and replay to the same verdict offline.
- False-allow discoveries. Count and severity of allows later reclassified as deny or escalate.
- False-deny discoveries. Count and severity of denials later reclassified as allow.
- Escalation latency. Time from machine escalation to human decision for approval-gated actions.
- Evidence completeness. Percentage of receipts with all required policy, schema, and approval artifacts available.
- Attack-pressure indicators. Denials grouped by reason code, source, connector, and prompt-injection signature.
This set turns the boundary into an operations surface rather than a single vanity metric.
Decision-quality score
A useful composite score can be built from replay pass rate, false-allow severity, evidence completeness, and evaluation coverage. Deny rate may remain as a supporting diagnostic, but it should not dominate.
One simple model:
decision_quality =
coverage_weight * evaluation_coverage +
replay_weight * replay_pass_rate +
evidence_weight * evidence_completeness -
false_allow_penalty * false_allow_severity
The exact weights are deployment-specific. The important property is that a system cannot score well by merely denying more.
Operational use
The metric set supports different teams:
- security teams inspect false allows, attack pressure, and coverage gaps;
- operations teams inspect escalation latency and false denials;
- compliance teams inspect replay pass rate and evidence completeness;
- product teams inspect malformed-call causes and agent affordance failures.
That separation is important. A single deny-rate target encourages local optimization. A balanced score makes tradeoffs visible.
Limitations
The proposed metrics depend on sampling discipline and ground-truth review. False allows are difficult to discover, and some will only surface through incident analysis. Evaluation coverage also requires accurate classification of “consequential” actions.
The model also assumes receipts exist for deny and escalation events. If a system records only allowed actions, it cannot produce the metric set.
Conclusion
Deny rate is easy to chart and hard to interpret. The useful question is whether the organisation can explain every allow, deny, and escalation after the fact. Execution boundaries should be measured by replayability, coverage, evidence completeness, escalation behavior, and false-allow discovery. Deny rate belongs in the appendix.