Loop Engineering

"My job is to write loops. The model is a subroutine; I'm the loop architect."
— Boris Cherny (Anthropic), who reported 100% of his 259 PRs in 30 days were written by Claude Code loops, Dec 2025.

0. The Output Metric

The dark factory's one number: given a VISION doc, how long and how faithfully can Sentigent push the plan before it needs a human? That's FAP — Faithful Autonomous Progress. Everything in the loop architecture serves this metric. It is real, per-run, and impossible to fabricate — it falls straight out of the loop's own state.

1. What the field has converged on

The Ralph loop — the seed

Geoffrey Huntley, 2025. while :; do cat PROMPT.md | claude; done. The insight that started it: progress accumulates in files + git + tests, NOT in the context window. Each lap gets a fresh context over a re-derived plan. Failures pipe back as a "contextual pressure cooker" that forces the model to fix its own mistakes. Huntley built a whole programming language this way for ~$297.

The five-part loop contract

Every production loop has exactly five parts:

Part	Meaning
TRIGGER	Timer (`every 15m`) or event (CI fail, PR comment)
SCOPE	Which repos / files / PRs the loop may touch
ACTION	What the agent does each lap (ideally a named, tested skill)
BUDGET	Max laps, token/$ cap, max sub-agents
STOP	Done-criteria, iteration ceiling, spend limit, no-progress halt

Open vs closed loops

Open loop: agent writes until it says done → demo only. No external verification. The agent is agreeing with itself on repeat.

Closed loop: runs tests/lint/typecheck each lap; failures feed back into the next lap's prompt → production-grade. "A loop with nothing to push back is the agent agreeing with itself on repeat."

Durable state across sessions

Context windows are finite; every reset/compaction loses something, and agents that sense low context do a "rushed finish." The fix: state-persistence files that let a new session resume unambiguously:

A progress log (what's done)
Verification records (what passed/failed)
Next actions (the stored next step)

Plus anchor files re-injected every lap: VISION.md (goal + success criteria), CLAUDE.md/AGENTS.md (rules/guardrails), PROMPT.md (the injected tick). Context reset + structured handoff beat compaction for long runs.

Three-agent shape (Anthropic)

Planner (spec) → Generator (implements in sprints) → Evaluator (tests like a user via Playwright, grades on hard thresholds), communicating through files. "Sprint contracts" = the generator proposes the work + its own success criteria, evaluator approves, then it builds. Keep it as simple as the model allows.

Cost is the new constraint

Uber capped engineers at $1,500/mo after burning the annual budget in 4 months. Controls that matter: hard max-iterations, no-progress detection (halt if the same error repeats N×), and a pre-set $/token ceiling.

2. The gap nobody has solved — Sentigent's wedge

Every loop halts or runs off a cliff at two hard moments:

A blocker — naive loops stop and page a human (kills autonomy), or barrel ahead and do the wrong thing.
No progress — naive loops either spin forever (burns budget) or stop too early.

The decision "push through this myself vs. stop and ask" is exactly a judgment call. It's the one thing a generic loop can't do well. Sentigent has the parts to make that decision learned from your history — that is the differentiated loop:

Ralph gives autonomy but no judgment (it just re-runs).
A bare harness gives structure but static rules.
Sentigent = a durable loop harness whose push-vs-ask decision is learned + whose per-lap safety is org-enforced.

3. The architecture

        VISION.md (goal + Done-criteria)         org guardrail packs
                  │                                      │  (per-lap safety)
                  ▼                                      ▼
   ┌──────────────── LOOP DRIVER (durable, cross-session) ────────────────┐
   │  state file: progress log · verification records · NEXT STEP          │
   │  (atomic, crash-safe)                                                  │
   │                                                                        │
   │  each lap:                                                             │
   │   1. read next step + anchor files (fresh context — Ralph discipline)  │
   │   2. run a FRESH `claude -p` over just that step                       │
   │   3. CLOSED-LOOP VERIFY (tests/typecheck/lint);                        │
   │      failure pipes into next lap's prompt                              │
   │   4. on blocker → CloneResolver decides push-or-ask                    │
   │      using LEARNED thresholds                                          │
   │   5. STOP checks: DoD satisfied? · no-progress (same fail N×)?        │
   │      · max laps? · budget? · kill?                                     │
   │   6. atomically persist → next step durably queued                     │
   └────────────────────────────────────────────────────────────────────────┘
                  │ every lap logged → real receipt (laps/verifies/resolves/$)
                  ▼
        resume(loop_id) picks up at stored next step after ANY session/crash

Implementation status

Component	File	Status
Fresh-context laps (Ralph)	`operator/loop.py`	✓ exists
Cross-session durable driver	`operator/loop_driver.py`	✓ shipped
Push-vs-ask on blockers	`operator/resolver.py` CloneResolver	✓ exists
Learned push-vs-ask thresholds	`CloneResolver.thresholds_from_calibration`	✓ the wedge
Done-criteria STOP	`operator/goal_dod.py` GoalDoD	✓ exists
Budget / kill STOP	BudgetGovernor / KillSwitch	✓ exists
Org guardrail packs	`guardrails/*.yaml` + `operator/guardrails.py`	✓ shipped
Closed-loop verify gate	`verifier.py`	✓ wired
No-progress detection	loop_driver same-fail-N× check	✓ added
FAP receipt	`loop_driver receipt`	✓ shipped

P1–P5 are feature-complete. Next: real-world hardening — run it with --execute on live visions to gather actual FAP, and wire the driver into the MCP operator_* tools so the loop is callable from Claude Code directly.

4. Positioning (honest)

Sentigent is a loop harness with learned judgment: it keeps pushing your plan across session boundaries (Ralph's autonomy + durable resume), but it knows — from your decision history — when to push through a blocker vs. stop and ask, and it enforces org guardrails on every lap. Ralph is the engine; Sentigent is the engine that doesn't need babysitting and won't drive off a cliff.