An open-source loop that drives your plan to done across Claude Code sessions — it resumes from where it stopped, decides keep going or ask you the way you would, and checks its own work before moving on.
uv pip install sentigent
The problem
The session ends mid-task — the next one starts from zero.
It hits a blocker and either pings you about everything, or barrels ahead and breaks things.
It says "done" when it isn't.
How it works
You give it a goal and steps. It runs one step in a fresh context, checks the step actually passed, and saves the next step to disk. The session can die right here — the next one picks up exactly where it left off. Repeat until done.
Progress lives in files, git, and test results — not in a context window that vanishes. Every lap starts fresh from what's saved on disk.
Smarter than re-run-and-hope
Most loops are dumb: they re-run and hope. Sentigent learned your calls from your own history.
Every risky move — bash commands, writes to sensitive files, deploys — gets a verdict: proceed, slow down, get more context, or ask you. Learned from what actually worked, not rules you write.
At a blocker, it answers as you and keeps moving. It only pings you when it's genuinely unsure — not about everything, not nothing.
clone_adopt — teach it a new callclone_status — see what it knows
Block force-push, gate deploys, protect .env files — your org's lines in the sand evaluate
before learned judgment gets a vote.
A block rule cannot be overridden by any confidence score.
The safety floor has its own test suite.
No silent drift
Every step has a finish line you set — tests pass, a file exists, a command succeeds. Passes means next step. Fails means retry with the error fed back in. Can't tell means it stops. So a long run can't quietly drift off course.
liveWithout a real verify gate, "100% done" just means the model said so. With it, 100% means 19 tests passed on a clean, independent re-run.
It compounds
Every decision is recorded. Every outcome feeds back. Memory deepens as you widen the scope.
Remembers every decision and outcome. Patterns mined automatically. Nothing leaves your laptop — local SQLite, under 50ms, no network needed. Starts working on session one.
Share what works. Org-wide policies and patterns synced with each org's data walled off at the database level — no cross-tenant leakage.
Anonymized patterns, only if you choose to join. The tables are built; no org has opted in yet. Not live.
Agent Intuition needs nothing but your machine. Organizational Wisdom and Collective Intelligence are opt-in.
The one honest number
We don't ship a made-up accuracy score. We show how many steps it finished, verified, with zero help from you — we call it FAP. A real run: the loop wrote its own test suite, all steps verified, you weren't needed once.
What each number means
loop_83cc8641 100% 100% 100% 100% 0 Write a pytest suite for loop_driver
What this proves — and what it doesn't yet
What it proves: the loop ran a real claude -p session,
wrote a real test file, verified it with a real subprocess, and resumed across a session boundary without
losing a step. 19 tests pass on an independent re-run. Zero times did it ask for help.
What we haven't shown yet: that FAP reliably goes up over many different runs as the judgment improves. We need more runs to make that claim. When we have them, we'll publish the numbers. Until then, we only show what the system actually produced.
Get started
# install
uv pip install sentigent
# seed a plan (durable across sessions)
python -m sentigent.operator.loop_driver start \
--goal "Ship feature X with passing tests"
# drive it — fresh claude -p laps, closed-loop verify each step
python -m sentigent.operator.loop_driver drive <loop_id> --execute
# see the honest scoreboard
python -m sentigent.operator.loop_driver receipt
Also callable from inside Claude Code over MCP:
operator_start /
operator_status /
operator_resume.
Your plans, decisions, and history stay on your machine.
For teams
Three interlocking mechanisms so autonomous loops don't drive off a cliff.
Plain-YAML rules — block · approve · warn — checked on every step before it runs. Write your safety lines once; every loop respects them. Versioned in git.
Your hard-won workflow, written down and kept current automatically. Switch models tomorrow; this stays.
Your judgment lives in a file, not in any model's weights — so it's yours regardless of which model you're running next year.
Org rules fire before any individual judgment call.
A block cannot be talked past by any confidence score.
# org guardrail pack — versioned in git, enforced per lap
rules:
- id: irreversible-recursive-delete
match: "rm -rf|rm -fr"
action: block
severity: critical
- id: production-deploy
match: "kubectl apply|deploy --prod"
action: approve # human sign-off required
severity: high
- id: secrets-write
match: "\.env|credentials|\.pem"
action: warn
severity: high
Candor
| Capability | Evidence / note | Status |
|---|---|---|
| Cross-session resume | Atomic state file survives kills, reboots, session ends. loop_83cc8641 re-resumed independently. | Proven |
| Per-step verify gates | Step not marked done until real test subprocess passes. 19 tests, clean re-run. | Proven |
| Judgment layer + hooks | PreToolUse hook fires <200ms, proceeds/slow-downs/escalates in production. | Proven |
| Local Layer-1 learning | Episodes + procedural rules in local SQLite, patterns mined each run. | Proven |
| Policy wall (block / escalate) | Fires before individual judgment; has its own test suite; cannot be overridden. | Proven |
| Clone live calibration | Logic and precedent retrieval real; calibration across diverse blocker types is still maturing. | Built |
| Org guardrail packs (YAML) | Format shipped and enforced per lap. Opt-in; not yet bundled into default install wizard. | Built |
| Layer-2 org policies (Supabase) | Schema and RLS shipped; requires your Supabase project to activate. | Built |
| FAP compounding across many runs | 1 run at 100%. We need ≥4 diverse runs to show a trend. Insufficient data to claim this yet. | Roadmap |
| Collective Intelligence (cross-org) | Tables shipped, schema defined. No org has opted in. Not live. | Roadmap |
We report only what the system actually produced.
Comparison
| Approach | What you get | What's missing |
|---|---|---|
| Raw while-loop | Autonomy, no judgment. It re-runs and hopes. Tied to one model. | No verify gate. Re-runs blind. No cross-session memory. |
| Bare harness | Structure, but rigid rules. You babysit every hard moment. | Static rules you wrote. No learning. No clone. |
| Sentigent | Resumes across sessions, decides push-or-ask from your history, enforces your guardrails, and shows one honest number. | Still early. FAP proven on one run. Clone calibration maturing. We say so. |
Install once. Give it a goal. Check back when it's done — or when it actually needs you.
Get Sentigent on GitHub