Agent Trust Control Plane
A file-backed governance layer for AI agents that classifies proposed actions as
allow, approval_required,
clarification_required, or blocked.
Human oversight where it matters. No friction where it does not.
"Tool-using agents can send messages, edit files, delete data, push code, and spend money. The question is not whether they should act. It is whether anyone is checking before they do."
01 The Problem
As AI agents gain tool access to file systems, APIs, cloud resources, and communication channels, they can execute high-stakes actions without meaningful human review. Most agent frameworks treat safety as a prompt concern: "be careful" baked into system messages that agents can interpret loosely or ignore under accumulated context pressure.
The failure mode is not a rogue agent acting maliciously. It is an ambiguous destructive request that escalates unchecked because no structural gate exists between "the agent thinks this is fine" and "the action is irreversible." Without a governance layer, the only safeguard is the model's own judgment. That is exactly what governance should not depend on.
02 What I Built
A local, deterministic control plane that sits between agent intent and action execution. Every proposed action is evaluated against a YAML policy registry, classified by risk, and either approved, gated for human review, routed for clarification, or blocked outright.
Policy Registry
YAML-based rules defining what agents can do, what needs approval, and what's blocked
Risk Engine
Deterministic classification matching actions against policy rules and budget constraints
Audit Trail
JSONL logging of every decision with rationale, matched rules, and approval IDs
Approval Inbox
Markdown-rendered queue of pending human decisions with context and one-click resolution
The CLI accepts structured proposals: agent name, action type, target, tool, destructiveness flag, and stated intent. It returns a decision with full rationale. A static HTML dashboard provides oversight visibility across all decisions and pending approvals.
03 Key Design Decisions
Deterministic over probabilistic. The risk engine uses rule-matching against explicit policies, not model-based judgment. Safety classification should not depend on the same type of reasoning it's meant to govern. A YAML policy that says "external sends require approval" always fires. It does not get overridden by a persuasive prompt.
Clarification as a first-class decision.
Most governance systems have binary gates: allow or block. But real agent failures often
start with ambiguity: a request that could be safe or destructive depending on intent
the agent hasn't confirmed. Adding clarification_required as
a distinct decision type catches the class of errors where agents escalate ambiguous
requests into irreversible actions.
Local and file-backed. No cloud dependency, no database. Policies are YAML files, audit trails are JSONL, the inbox is Markdown, and the dashboard is static HTML. Everything is inspectable, version-controllable, and works offline. This makes the control plane easy to audit and impossible to silently bypass.
04 Challenges
Policy granularity was a constant tradeoff. Too coarse and everything gets flagged, creating approval fatigue that defeats the purpose. Too fine and the policy file becomes unwieldy and brittle. The right balance came from starting with broad rules and adding specificity only where real false positives or false negatives showed up.
The clarification workflow needed careful design. When the control plane asks for clarification, the agent needs a structured way to re-submit with more context — not just retry the same proposal. This meant designing the CLI interaction as a conversation, not a single pass.
05 What I Learned
Safety needs structure, not just instructions
Telling an agent to "be careful" is not a governance model. Deterministic policy enforcement is.
Ambiguity is the real risk
Most dangerous agent actions are not obviously wrong. They are ambiguous requests that escalate without clarification.
Audit trails change behavior
When every decision is logged with rationale and matched rules, the system becomes self-documenting and easy to debug.
Simple formats win
YAML policies, JSONL logs, Markdown inbox. No database, no cloud. Inspectable, portable, version-controllable.
06 Why This Matters
As AI agents move from demos to production, executing real actions on real systems, governance becomes a hard requirement. This project demonstrates how to build that governance layer: deterministic policy enforcement, human approval gates, clarification workflows for ambiguous cases, and full audit trails.
Structured policy registries, risk classification, approval workflows, and file-backed audit infrastructure are the building blocks any organization needs once agents start taking actions that matter. The difference between trusting an agent because you hope it is safe and trusting it because you can prove it.