Validation gate
The Validation Gate
The gate is the most important piece in Mnemix. It's the barrier between an agent's proposed action and the world. Every high-stakes action passes through it, and it returns a verdict that is deterministic, evidence-cited, and rerunnable.
AI proposes. Mnemix governs. The gate is the "governs."
Status: in active development. The gate runs inside the main Mnemix Worker — Stage 1 is a deterministic rule engine, Stage 2 is an isolated LLM grader for qualitative rules. The gate contract is spec-frozen but not yet publicly callable; payloads below show the design shape so you can plan against it.
The flow
PROPOSED AGENT ACTION (Bland, Vapi, MCP, a finance agent, anything)
│
▼
┌─────────────────────────────────────────────┐
│ GATE (Stage 1 rules + Stage 2 grader) │
│ inputs: proposed_action, policy_bundle_id, │
│ mode = standard | high_stakes │
│ runs: invariant / assertion / forbidden │
│ rules (deterministic, no LLM) │
│ calls: the GRADER for qualitative rules │
│ returns:{ verdict, evidence_refs[], │
│ rule_results[], rerun_hash } │
└─────────────────────────────────────────────┘
│
┌───────────┼────────────────┐
▼ ▼ ▼
allowed denied needs_human
(execute) (block + log) (queue for review)
Inputs
{
"proposed_action": { "action": "...", "value": {} },
"policy_bundle_id": "finance.recon.v0",
"mode": "high_stakes"
}
Outputs
{
"verdict": "denied",
"rule_results": [{ "rule": "locked_fact_honored", "passed": false }],
"evidence_refs": ["bank-stmt-2025-08"],
"rerun_hash": "sha256:…"
}
| Verdict | Meaning |
| :--- | :--- |
| allowed | All rules passed. The action proceeds. |
| denied | A rule failed. Blocked and logged. |
| needs_human | Inconclusive (e.g. an unresolved reference or in-band grader confidence). Queued for review. |
(Two vocabularies exist by design — locked in the gate-engine decision record: the external verdict is allowed / denied / needs_human; the Stage 2 grader's internal bands are PASS / CHATTER / REJECT, mapping PASS → allowed, REJECT → denied, and CHATTER → one backtrack retry → if unresolved → needs_human. Grader bands never cross the isolation boundary raw.)
Mode toggle — bifurcated execution
Every gate call declares a mode. The caller declares it, not the gate — the gate doesn't know whether an action is high-stakes; the caller does.
| Mode | Behavior | Used for |
| :--- | :--- | :--- |
| standard | Advisory. Logs the verdict, returns 200 regardless. | Chat workflows, observation pipelines — anywhere blocking would hurt UX more than letting output through. |
| high_stakes | Hard barrier. denied returns 422 to the caller. | Finance reconciliation, billing changes, anything irreversible. |
The grader (qualitative rules)
For any rule of type qualitative, the gate calls the grader — a hysteretic state machine, not a simple confidence threshold:
Autonomous ──(confidence < 0.70)──▶ Governed
Governed ──(confidence ≥ 0.85)──▶ Autonomous
The asymmetric gap (0.70 collapse / 0.85 recovery) prevents the state from chattering at the decision boundary. When confidence lands in the band (0.70 ≤ C < 0.85), the grader appends the prior output plus a natural-language description of the violation, retries once, and if still in-band returns needs_human. Max one backtrack per call — no recursive loops.
Grader scores. Gate decides. The grader never ships a decision; it ships a band the gate consumes.
Rerunnability — the audit guarantee
Every verdict is designed to be rerunnable by id: reload the original proposed action plus the exact bundle version, re-run the checks, and assert an identical verdict (or return a diff). This is how Mnemix wins enterprise procurement — auditors can re-execute any past decision against the same rules and facts that were active at that millisecond. General-purpose memory layers cannot; they have no policy bundles to replay.
Performance contract
These are internal engineering budgets (P99 targets from the locked spec), not published marketing latency claims.
| Path | P99 budget | | :--- | :--- | | Stage 1 — deterministic evaluator (no LLM) | 50 ms | | Stage 2 grader, cached (same tenant/bundle/input) | 4 ms | | Stage 2 grader, uncached (default model: Haiku 4.5) | 80 ms | | Gate + grader on the voice hot path | inside the voice budget |
Swapping the grader model requires proving ≤200ms P99 against the golden set before the swap ships. The gate fails open in standard mode if a budget is exceeded, and fails closed in high_stakes mode — in a high-stakes path, a slow gate denies rather than waves an action through.
See also
- Policy Bundles — what the gate evaluates.
- Locked Facts — what
invariantrules check against. - The Feedback Loop — how outcomes flow back.