FAP — Faithful Autonomous Progress

Why a new metric

Every AI tool ships with a number. Most of those numbers are fabricated — a model quality score, a "judgment score," an accuracy percentage measured on a synthetic eval set. They mean nothing about what actually happened in your codebase.

FAP is different. It falls straight out of the loop's own state. You can verify every component yourself: count the steps in your plan, check which ones have verified test runs, count how many times the loop paged you. The math is trivial. The fabrication surface is zero.

This replaces every fabricated "judgment score" we previously published. The only number we report is the one the loop actually produced.

The five axes

Distance

plan steps completed ÷ total steps

Answers: how long did it run?

Fidelity

steps that passed verification ÷ steps done

Answers: how faithfully? (no drift/breakage)

Autonomy

blockers self-resolved ÷ blockers faced

Answers: did it need you?

Faithful Streak

longest unbroken run of verified steps with no ask

Answers: longest hands-off span

FAP (headline)

verified steps reached with zero human help ÷ total steps

Range 0–1. The product's job is to push this upward as the learned push-vs-ask judgment improves.

What high FAP looks like vs what it doesn't

Scenario	Distance	Fidelity	FAP	Verdict
12/15 steps, all verified, 0 asks	80%	100%	80%	Good — high distance + perfect fidelity
12/15 steps, all verified, 1 ask	80%	100%	73%	One ask cost 1 step of FAP credit
15/15 steps "done", 7 fail verification	100%	53%	53%	High distance, low fidelity — not faithful
15/15, all verified, 0 asks	100%	100%	100%	Dark factory — what we're building toward

The real receipt

The receipt below is real. It was produced by running loop_driver receipt after the loop wrote a pytest suite for itself — a real claude -p run, real test subprocess, real verification gate. 19 tests pass on re-run independently.

SENTIGENT LOOP RECEIPT — Faithful Autonomous Progress across runs

────────────────────────────────────────────────────────────

loop FAP dist fid auto asks goal

loop_83cc8641 100% 100% 100% 100% 0 Write a pytest suite for loop_driver

────────────────────────────────────────────────────────────

1 loop · 1 completed · mean FAP 100% · paged you 0×

Honest scope

What this receipt proves:

✓ Real cross-session resume: the loop state was persisted atomically and picked up after session end.
✓ Per-step verify gate: a step was only marked done after the real test subprocess passed.
✓ Zero human asks: the loop self-resolved every blocker it encountered.
✓ Independent re-verification: 19 tests pass on a clean re-run after the session ended.

What we haven't proven yet: FAP compounding upward across many diverse runs as the learned push-vs-ask judgment improves. That's the frontier we're actively building. We will publish the data when we have it.

How FAP is computed

The receipt is generated by python -m sentigent.operator.loop_driver receipt. It reads the loop's own state file — the same file that drives resume — and aggregates across all recorded runs. Nothing is inferred from model outputs; every data point is a logged event (step started, step verified, blocker raised, human paged).

The formula

distance = steps_done / total_steps

fidelity = steps_verified / steps_done

autonomy = blockers_resolved / blockers_faced

FAP = steps_verified_with_zero_asks / total_steps

The product goal

The product's job is to push FAP and the faithful streak upward over time — as the loop runs more plans, the CloneResolver's learned push-vs-ask thresholds improve, meaning fewer unnecessary asks, fewer cliff-drives, and longer unbroken verified streaks. FAP is the measure of whether that's actually happening. No synthetic eval set. No held-out benchmark. Just: did the loop do the work, verify it, and not need you?