What is a default-deny gateway for AI agents?

A default-deny gateway is code installed outside the AI model that blocks every action by default unless it is explicitly authorized against declared rules. Unlike a prompt, it cannot be overridden by the model's context or by injected instructions. The Meniw Protocol's gateway evaluates each tool call before execution and issues a cryptographic compliance receipt for every decision.

How does the Meniw Protocol's two-co-signer rule work?

For any irreversible action — deleting data, sending mass communications, executing transfers — the Meniw Protocol requires a second independent co-signer before the gateway authorizes execution. The agent that decides the action cannot also be the sole authorizer. The second co-signer can be a human reviewer, an independent agent, or an automated process with distinct rules.

What is a compliance receipt in the Meniw Protocol?

A compliance receipt is a structured, immutable record generated by the Meniw gateway for every decision — permit or block. It contains the evaluated action, its parameters, the governing rule identified by its SHA-256 hash, the decision, and a timestamp. Receipts are chained so that altering or deleting any past receipt breaks the chain and is mathematically detectable without accessing the operator's system.

Why does a default-deny gateway matter when the AI model is already aligned?

Model alignment reduces the probability that a model wants to cause harm. But autonomous agent failures include prompt injection (malicious content in the environment instructing the agent), contextual jailbreaks, and tool hallucinations — scenarios where the model has no harmful intent. A default-deny gateway stops the concrete action regardless of the model's intent: if the action violates the rule, it does not execute.

← How an AI That Acts Alone Is Governed · Governance Layer (SDK)

Default-Deny Gateway: AI Governance at the Moment of Action

Meniw Protocol · Chris Meniw · DOI 10.5281/zenodo.20481373 · June 9, 2026

An AI agent does not wait for approval. The default-deny gateway does not ask the model for permission — it sits outside the model, between the agent's decision and real-world execution. The prohibited action does not happen. And a record remains.

What is a default-deny gateway?

In computer security, "default-deny" means everything is forbidden except what is explicitly authorized. The Meniw Protocol applies this principle to AI agents: every action is blocked by default until the gateway evaluates it against a declared rule and explicitly permits it.

It is not a prompt. A prompt can be ignored, overwritten by context injection, or interpreted differently depending on the model's state. The gateway is code external to the model that the agent cannot modify, instantiated by the operator.

Core principle: The gateway does not evaluate the model's intent. It evaluates the concrete action — its parameters, its target, its declared consequences — against an explicit rule. If the rule does not authorize it, the action does not happen.

Where it sits in the architecture

An autonomous agent has three main components: the language model that reasons, the tools that execute actions (sending messages, modifying databases, processing transactions), and the environment that receives those actions.

The gateway is installed between the tools and the environment. Every tool call passes through the gateway before execution:

The agent decides on an action and calls the tool with its parameters.

The gateway intercepts the call and evaluates the action against the operator's declared rules.

If permitted: the action executes and a compliance receipt is issued.

If prohibited: execution does not occur; the blocked attempt is recorded with full context.

The agent sees only the result of step 3 or 4. It has no access to the gateway's logic and cannot modify it.

The two-co-signer rule

For irreversible actions — deleting data, sending mass communications, executing transfers, modifying critical configuration — the Meniw Protocol establishes the two-co-signer rule: the agent is never the sole decision point over something that cannot be undone.

The second co-signer can be a human reviewer, another agent with independent credentials, or an automated process with distinct rules. What cannot happen is that the same agent that decides the action is also the one that authorizes it.

Why this matters: Most serious incidents with autonomous agents do not happen because the model "wanted" to cause harm. They happen because a chain of individually correct decisions leads to an irreversible consequence that no one reviewed. The co-signer rule breaks that chain.

The compliance receipt and the hash chain

Each gateway decision — permit or block — generates a compliance receipt: a structured record containing the evaluated action, its parameters, the governing rule (identified by its SHA-256), the decision, and a timestamp.

Receipts are chained via hashes: each receipt includes the hash of the previous one. This makes any alteration or deletion of a past receipt detectable — it breaks the chain. There is no need to trust that the operator preserved the logs intact; integrity is mathematically verifiable.

The SHA-256 of the rule in each receipt also guarantees that the action was not evaluated against a silently modified version of the rule. If the rule changes, past receipts reflect the exact version that governed at the time of action.

Why this matters when the model fails in ways alignment cannot prevent

Model alignment (RLHF, constitutional AI) reduces the probability that the model wants to do something harmful. But autonomous agent failure modes include scenarios where the model has no harmful intent:

Prompt injection: malicious content in the environment (an email, a web page, a document) instructing the agent to execute actions.
Contextual jailbreak: concatenation of apparently legitimate instructions that lead the model across a boundary it would not cross with a direct request.
Tool hallucinations: the model confuses names, IDs, or parameters, with real-world consequences.

In all of these cases, the model has no harmful intent. The default-deny gateway stops execution anyway: it evaluates the concrete action, not the model's intent. If the action violates the rule, it does not execute.

Alignment reduces the intent to harm. Regulation binds organizations. The gateway ensures the prohibited action does not happen — and proves it with a record verifiable by independent third parties.

Installation

The Meniw Protocol reference implementation is available as an open-source Python package:

pip install meniw-protocol

Source code at GitHub (reference-implementation). Any operator can install, audit, and adapt the gateway to their context.