Loop Engineering
The 2026 meta for AI coding: stop prompting the agent; design the system that prompts it. A loop that keeps pushing a plan to completion, lap after lap, surviving every session boundary.
"My job is to write loops. The model is a subroutine; I'm the loop architect."
— Boris Cherny (Anthropic), who reported 100% of his 259 PRs in 30 days were written by Claude Code loops, Dec 2025.
0. The Output Metric
The dark factory's one number: given a VISION doc, how long and how faithfully can Sentigent push the plan before it needs a human? That's FAP — Faithful Autonomous Progress. Everything in the loop architecture serves this metric. It is real, per-run, and impossible to fabricate — it falls straight out of the loop's own state.
1. What the field has converged on
The Ralph loop — the seed
Geoffrey Huntley, 2025. while :; do cat PROMPT.md | claude; done.
The insight that started it: progress accumulates in files + git + tests, NOT in the context window.
Each lap gets a fresh context over a re-derived plan. Failures pipe back as a "contextual pressure cooker"
that forces the model to fix its own mistakes. Huntley built a whole programming language this way for ~$297.
The five-part loop contract
Every production loop has exactly five parts:
| Part | Meaning |
|---|---|
| TRIGGER | Timer (every 15m) or event (CI fail, PR comment) |
| SCOPE | Which repos / files / PRs the loop may touch |
| ACTION | What the agent does each lap (ideally a named, tested skill) |
| BUDGET | Max laps, token/$ cap, max sub-agents |
| STOP | Done-criteria, iteration ceiling, spend limit, no-progress halt |
Open vs closed loops
Open loop: agent writes until it says done → demo only. No external verification. The agent is agreeing with itself on repeat.
Closed loop: runs tests/lint/typecheck each lap; failures feed back into the next lap's prompt → production-grade. "A loop with nothing to push back is the agent agreeing with itself on repeat."
Durable state across sessions
Context windows are finite; every reset/compaction loses something, and agents that sense low context do a "rushed finish." The fix: state-persistence files that let a new session resume unambiguously:
- A progress log (what's done)
- Verification records (what passed/failed)
- Next actions (the stored next step)
Plus anchor files re-injected every lap: VISION.md (goal + success criteria),
CLAUDE.md/AGENTS.md (rules/guardrails),
PROMPT.md (the injected tick).
Context reset + structured handoff beat compaction for long runs.
Three-agent shape (Anthropic)
Planner (spec) → Generator (implements in sprints) → Evaluator (tests like a user via Playwright, grades on hard thresholds), communicating through files. "Sprint contracts" = the generator proposes the work + its own success criteria, evaluator approves, then it builds. Keep it as simple as the model allows.
Cost is the new constraint
Uber capped engineers at $1,500/mo after burning the annual budget in 4 months. Controls that matter: hard max-iterations, no-progress detection (halt if the same error repeats N×), and a pre-set $/token ceiling.
2. The gap nobody has solved — Sentigent's wedge
Every loop halts or runs off a cliff at two hard moments:
- A blocker — naive loops stop and page a human (kills autonomy), or barrel ahead and do the wrong thing.
- No progress — naive loops either spin forever (burns budget) or stop too early.
The decision "push through this myself vs. stop and ask" is exactly a judgment call. It's the one thing a generic loop can't do well. Sentigent has the parts to make that decision learned from your history — that is the differentiated loop:
- Ralph gives autonomy but no judgment (it just re-runs).
- A bare harness gives structure but static rules.
- Sentigent = a durable loop harness whose push-vs-ask decision is learned + whose per-lap safety is org-enforced.
3. The architecture
VISION.md (goal + Done-criteria) org guardrail packs
│ │ (per-lap safety)
▼ ▼
┌──────────────── LOOP DRIVER (durable, cross-session) ────────────────┐
│ state file: progress log · verification records · NEXT STEP │
│ (atomic, crash-safe) │
│ │
│ each lap: │
│ 1. read next step + anchor files (fresh context — Ralph discipline) │
│ 2. run a FRESH `claude -p` over just that step │
│ 3. CLOSED-LOOP VERIFY (tests/typecheck/lint); │
│ failure pipes into next lap's prompt │
│ 4. on blocker → CloneResolver decides push-or-ask │
│ using LEARNED thresholds │
│ 5. STOP checks: DoD satisfied? · no-progress (same fail N×)? │
│ · max laps? · budget? · kill? │
│ 6. atomically persist → next step durably queued │
└────────────────────────────────────────────────────────────────────────┘
│ every lap logged → real receipt (laps/verifies/resolves/$)
▼
resume(loop_id) picks up at stored next step after ANY session/crash
Implementation status
| Component | File | Status |
|---|---|---|
| Fresh-context laps (Ralph) | operator/loop.py | ✓ exists |
| Cross-session durable driver | operator/loop_driver.py | ✓ shipped |
| Push-vs-ask on blockers | operator/resolver.py CloneResolver | ✓ exists |
| Learned push-vs-ask thresholds | CloneResolver.thresholds_from_calibration | ✓ the wedge |
| Done-criteria STOP | operator/goal_dod.py GoalDoD | ✓ exists |
| Budget / kill STOP | BudgetGovernor / KillSwitch | ✓ exists |
| Org guardrail packs | guardrails/*.yaml + operator/guardrails.py | ✓ shipped |
| Closed-loop verify gate | verifier.py | ✓ wired |
| No-progress detection | loop_driver same-fail-N× check | ✓ added |
| FAP receipt | loop_driver receipt | ✓ shipped |
P1–P5 are feature-complete. Next: real-world hardening — run it with --execute
on live visions to gather actual FAP, and wire the driver into the MCP operator_*
tools so the loop is callable from Claude Code directly.
4. Positioning (honest)
Sentigent is a loop harness with learned judgment: it keeps pushing your plan across session boundaries (Ralph's autonomy + durable resume), but it knows — from your decision history — when to push through a blocker vs. stop and ask, and it enforces org guardrails on every lap. Ralph is the engine; Sentigent is the engine that doesn't need babysitting and won't drive off a cliff.