Documentation

Validation gate

The Validation Gate

The gate is the most important piece in Mnemix. It's the barrier between an agent's proposed action and the world. Every high-stakes action passes through it, and it returns a verdict that is deterministic, evidence-cited, and rerunnable.

AI proposes. Mnemix governs. The gate is the "governs."

Status: in active development. The gate runs inside the main Mnemix Worker — Stage 1 is a deterministic rule engine, Stage 2 is an isolated LLM grader for qualitative rules. The gate contract is spec-frozen but not yet publicly callable; payloads below show the design shape so you can plan against it.

The flow

        PROPOSED AGENT ACTION  (Bland, Vapi, MCP, a finance agent, anything)
                    │
                    ▼
        ┌─────────────────────────────────────────────┐
        │  GATE  (Stage 1 rules + Stage 2 grader)      │
        │  inputs: proposed_action, policy_bundle_id,  │
        │          mode = standard | high_stakes       │
        │  runs:   invariant / assertion / forbidden   │
        │          rules (deterministic, no LLM)       │
        │  calls:  the GRADER for qualitative rules    │
        │  returns:{ verdict, evidence_refs[],         │
        │            rule_results[], rerun_hash }      │
        └─────────────────────────────────────────────┘
                    │
        ┌───────────┼────────────────┐
        ▼           ▼                ▼
    allowed       denied        needs_human
   (execute)  (block + log)  (queue for review)

Inputs

{
  "proposed_action": { "action": "...", "value": {} },
  "policy_bundle_id": "finance.recon.v0",
  "mode": "high_stakes"
}

Outputs

{
  "verdict": "denied",
  "rule_results": [{ "rule": "locked_fact_honored", "passed": false }],
  "evidence_refs": ["bank-stmt-2025-08"],
  "rerun_hash": "sha256:…"
}

| Verdict | Meaning | | :--- | :--- | | allowed | All rules passed. The action proceeds. | | denied | A rule failed. Blocked and logged. | | needs_human | Inconclusive (e.g. an unresolved reference or in-band grader confidence). Queued for review. |

(Two vocabularies exist by design — locked in the gate-engine decision record: the external verdict is allowed / denied / needs_human; the Stage 2 grader's internal bands are PASS / CHATTER / REJECT, mapping PASS → allowed, REJECT → denied, and CHATTER → one backtrack retry → if unresolved → needs_human. Grader bands never cross the isolation boundary raw.)

Mode toggle — bifurcated execution

Every gate call declares a mode. The caller declares it, not the gate — the gate doesn't know whether an action is high-stakes; the caller does.

| Mode | Behavior | Used for | | :--- | :--- | :--- | | standard | Advisory. Logs the verdict, returns 200 regardless. | Chat workflows, observation pipelines — anywhere blocking would hurt UX more than letting output through. | | high_stakes | Hard barrier. denied returns 422 to the caller. | Finance reconciliation, billing changes, anything irreversible. |

The grader (qualitative rules)

For any rule of type qualitative, the gate calls the grader — a hysteretic state machine, not a simple confidence threshold:

Autonomous ──(confidence < 0.70)──▶ Governed
Governed   ──(confidence ≥ 0.85)──▶ Autonomous

The asymmetric gap (0.70 collapse / 0.85 recovery) prevents the state from chattering at the decision boundary. When confidence lands in the band (0.70 ≤ C < 0.85), the grader appends the prior output plus a natural-language description of the violation, retries once, and if still in-band returns needs_human. Max one backtrack per call — no recursive loops.

Grader scores. Gate decides. The grader never ships a decision; it ships a band the gate consumes.

Rerunnability — the audit guarantee

Every verdict is designed to be rerunnable by id: reload the original proposed action plus the exact bundle version, re-run the checks, and assert an identical verdict (or return a diff). This is how Mnemix wins enterprise procurement — auditors can re-execute any past decision against the same rules and facts that were active at that millisecond. General-purpose memory layers cannot; they have no policy bundles to replay.

Performance contract

These are internal engineering budgets (P99 targets from the locked spec), not published marketing latency claims.

| Path | P99 budget | | :--- | :--- | | Stage 1 — deterministic evaluator (no LLM) | 50 ms | | Stage 2 grader, cached (same tenant/bundle/input) | 4 ms | | Stage 2 grader, uncached (default model: Haiku 4.5) | 80 ms | | Gate + grader on the voice hot path | inside the voice budget |

Swapping the grader model requires proving ≤200ms P99 against the golden set before the swap ships. The gate fails open in standard mode if a budget is exceeded, and fails closed in high_stakes mode — in a high-stakes path, a slow gate denies rather than waves an action through.

See also