Policy freshness is a runtime safety property for AI agents. This paper defines stale-policy failure modes, freshness windows, receipt evidence, and fail-closed controls for execution boundaries that evaluate tool calls before side effects.

Research question

An AI execution boundary evaluates a proposed action against policy. That policy is not static. It changes when connectors drift, risk appetite changes, approvals expire, incidents occur, employees leave, or data-classification rules are updated.

This paper asks how an execution boundary should behave when its active policy snapshot may be stale.

Method

We model policy freshness as a control-plane property rather than an agent behavior problem. The analysis draws from zero-trust architecture, policy-based access control, signed metadata systems, and provenance practices. The goal is a practical control model: a boundary can decide whether its policy snapshot is fresh enough to evaluate a specific action.

Stale policy is not one failure

“Stale policy” is often treated as a single condition. It is at least six:

Age staleness. The policy snapshot is older than the configured freshness window.
Revocation staleness. A principal, connector, approval, or key has been revoked after the snapshot was issued.
Schema staleness. A tool or connector schema changed, so policy no longer describes the real action surface.
Threat-intel staleness. A detector or rule pack is older than the deployment requires for a risk class.
Distribution staleness. The boundary has not received the latest signed policy release.
Replay staleness. The deployment cannot retrieve the policy snapshot needed to verify old receipts.

Each condition needs a reason code because the operational response differs.

Freshness window

A freshness window defines how old a policy snapshot may be for a class of action. Low-risk read-only actions may tolerate a longer window. High-risk write, payment, deletion, credential, or external-message actions should tolerate a shorter window or require a live freshness check.

A useful policy model includes:

issued_at: when the policy bundle was produced;
valid_from: earliest time the bundle may be used;
expires_at: latest time the bundle may be used without refresh;
max_age_by_action_class: class-specific freshness windows;
revocation_epoch: monotonic counter for identity, connector, or approval revocations;
schema_digest: digest of the connector contract covered by policy.

The boundary should bind the receipt to the actual policy digest and freshness decision, not only to a human-readable version string.

Fail-closed behavior

When freshness cannot be established, the boundary should deny or escalate before side effects. The exact verdict depends on policy:

deny when the action is high-risk and no fresh policy is available;
escalate when human review can safely substitute for a missing freshness signal;
allow only when the action class explicitly permits degraded-mode evaluation.

The degraded mode must itself be policy-defined. Otherwise, “temporary outage” becomes an unbounded fail-open path.

Receipt evidence

A freshness decision should appear in the receipt. Minimum fields:

policy digest;
policy issued time and expiry;
boundary evaluation time;
freshness rule identifier;
freshness verdict;
reason code for stale or degraded decisions;
evidence reference for the policy bundle and connector schema.

This turns freshness from an internal cache state into audit evidence.

Distribution controls

Signed metadata systems such as The Update Framework provide a useful mental model: policy releases need versioning, expiry, signatures, and rollback protection. An execution boundary does not need to become a software-update system, but it does need several of the same controls:

reject unsigned policy bundles;
reject policy rollback unless an emergency policy explicitly allows it;
reject expired bundles for protected action classes;
preserve older bundles needed for receipt replay;
distinguish normal key rotation from compromise-driven revocation.

The policy distributor is part of the trusted control plane. Its behavior should be described as clearly as the boundary behavior.

Operational runbook

For a stale-policy event, operators need to answer:

Which action classes were affected?
Which receipts were produced under stale, degraded, or denied status?
Did any protected action allow during the stale window?
Was the stale condition age, revocation, schema, distribution, or replay related?
Which policy bundle and connector schema restore normal operation?

The boundary should produce enough structured evidence that those questions can be answered without reconstructing state from raw logs.

Limitations

Freshness controls do not guarantee good policy. A fresh but permissive policy can still allow bad actions. Freshness also does not eliminate the need for connector testing, data-classification accuracy, or human review for ambiguous high-risk events.

Freshness windows can introduce availability costs. If the policy distributor is down, high-risk actions may halt. That cost is the point of fail-closed governance and should be explicit in service-level design.

Conclusion

Policy freshness is a runtime safety property. A boundary that cannot prove its policy is current enough for the proposed action should not silently proceed. It should deny or escalate, record the freshness decision, and preserve the evidence needed for replay.

Freshness controls for AI execution policies