Open source  ·  runs on your machine  ·  MIT

Your AI agent forgets everything when the session ends.
Sentigent keeps going.

An open-source loop that drives your plan to done across Claude Code sessions — it resumes from where it stopped, decides keep going or ask you the way you would, and checks its own work before moving on.

$ uv pip install sentigent

The problem

Coding agents quit at the worst moments.

The session ends mid-task — the next one starts from zero.

It hits a blocker and either pings you about everything, or barrels ahead and breaks things.

It says "done" when it isn't.

How it works

How it keeps going

You give it a goal and steps. It runs one step in a fresh context, checks the step actually passed, and saves the next step to disk. The session can die right here — the next one picks up exactly where it left off. Repeat until done.

VISION.md
goal + done-criteria
LOOP DRIVER — durable, cross-session
1. fresh context
+ anchor files
2. claude -p
over one step
3. closed-loop
verify gate
4. blocker?
clone decides
5. atomically
persist next step
resume()
after any crash
or session end
repeat until done-criteria met

Progress lives in files, git, and test results — not in a context window that vanishes. Every lap starts fresh from what's saved on disk.

Smarter than re-run-and-hope

It decides like you would

Most loops are dumb: they re-run and hope. Sentigent learned your calls from your own history.

Judgment layer

live

Every risky move — bash commands, writes to sensitive files, deploys — gets a verdict: proceed, slow down, get more context, or ask you. Learned from what actually worked, not rules you write.

  • Runs in a <200ms hook before each action
  • Fails open — never silently blocks when uncertain
  • Five signals: caution · doubt · urgency · confidence · frustration
$ sentigent_evaluate("rm -rf dist/")
proceed — seen 47×, correct 100%

Your clone

learning

At a blocker, it answers as you and keeps moving. It only pings you when it's genuinely unsure — not about everything, not nothing.

  • Draws on your past decisions and your profile
  • clone_adopt — teach it a new call
  • clone_status — see what it knows
  • Runs under a hard budget; falls back to asking you if unsure
Still maturing: the logic and precedent retrieval work. Getting calibration right across many different kinds of blockers is the hardest and least-finished piece. We say so because that's true.
Org rules fire before any individual call live

Block force-push, gate deploys, protect .env files — your org's lines in the sand evaluate before learned judgment gets a vote. A block rule cannot be overridden by any confidence score. The safety floor has its own test suite.

No silent drift

It checks its own work

Every step has a finish line you set — tests pass, a file exists, a command succeeds. Passes means next step. Fails means retry with the error fed back in. Can't tell means it stops. So a long run can't quietly drift off course.

live
Passes
Step marked done → next step queued and saved to disk atomically
Fails
Error piped back into the next attempt's prompt — retry with context, not from scratch
Can't tell
Stops and asks you — never silently marks a step done

Without a real verify gate, "100% done" just means the model said so. With it, 100% means 19 tests passed on a clean, independent re-run.

It compounds

It gets sharper the more you use it

Every decision is recorded. Every outcome feeds back. Memory deepens as you widen the scope.

Agent Intuition · private

On your machine

live

Remembers every decision and outcome. Patterns mined automatically. Nothing leaves your laptop — local SQLite, under 50ms, no network needed. Starts working on session one.

episodes (SQLite) procedural_rules baselines <50ms latency
Organizational Wisdom · opt-in

Across your team

opt-in

Share what works. Org-wide policies and patterns synced with each org's data walled off at the database level — no cross-tenant leakage.

Honest note: pattern recall is keyword-based today. Semantic search is on the roadmap — it will make this layer much more useful.
Collective Intelligence · opt-in

Across everyone

coming

Anonymized patterns, only if you choose to join. The tables are built; no org has opted in yet. Not live.

Agent Intuition needs nothing but your machine. Organizational Wisdom and Collective Intelligence are opt-in.

The one honest number

One honest number: how far it got on its own

We don't ship a made-up accuracy score. We show how many steps it finished, verified, with zero help from you — we call it FAP. A real run: the loop wrote its own test suite, all steps verified, you weren't needed once.

What each number means

Distance steps done ÷ total — how long it ran
Fidelity verified ÷ done — how faithfully (no drift)
Autonomy self-resolved ÷ blockers faced — did it need you?
FAP verified-with-zero-help ÷ total — the headline (0–1)
Streak longest unbroken run of verified steps with no ask
SENTIGENT LOOP RECEIPT — Faithful Autonomous Progress across runs
────────────────────────────────────────────────────────────
loop FAP dist fid auto asks goal
loop_83cc8641   100%  100%  100%  100%     0  Write a pytest suite for loop_driver
────────────────────────────────────────────────────────────
1 loop  ·  1 completed  ·  mean FAP 100%  ·  paged you 0×

What this proves — and what it doesn't yet

What it proves: the loop ran a real claude -p session, wrote a real test file, verified it with a real subprocess, and resumed across a session boundary without losing a step. 19 tests pass on an independent re-run. Zero times did it ask for help.

What we haven't shown yet: that FAP reliably goes up over many different runs as the judgment improves. We need more runs to make that claim. When we have them, we'll publish the numbers. Until then, we only show what the system actually produced.

Get started

Drive a plan in three commands

terminal
# install
uv pip install sentigent

# seed a plan (durable across sessions)
python -m sentigent.operator.loop_driver start \
  --goal "Ship feature X with passing tests"

# drive it — fresh claude -p laps, closed-loop verify each step
python -m sentigent.operator.loop_driver drive <loop_id> --execute

# see the honest scoreboard
python -m sentigent.operator.loop_driver receipt

Also callable from inside Claude Code over MCP: operator_start / operator_status / operator_resume. Your plans, decisions, and history stay on your machine.

For teams

Guardrails your whole org can't cross

Three interlocking mechanisms so autonomous loops don't drive off a cliff.

opt-in

Guardrail packs

Plain-YAML rules — block · approve · warn — checked on every step before it runs. Write your safety lines once; every loop respects them. Versioned in git.

- id: rm-rf
  action: block
- id: prod-deploy
  action: approve
opt-in

AGENTS.md

Your hard-won workflow, written down and kept current automatically. Switch models tomorrow; this stays.

Your judgment lives in a file, not in any model's weights — so it's yours regardless of which model you're running next year.

live

Policy wall

Org rules fire before any individual judgment call. A block cannot be talked past by any confidence score.

block force-push
escalate deploy
protect .env files
guardrails/default.yaml
# org guardrail pack — versioned in git, enforced per lap
rules:
  - id: irreversible-recursive-delete
    match: "rm -rf|rm -fr"
    action: block
    severity: critical

  - id: production-deploy
    match: "kubectl apply|deploy --prod"
    action: approve        # human sign-off required
    severity: high

  - id: secrets-write
    match: "\.env|credentials|\.pem"
    action: warn
    severity: high

Candor

What works today (and what we're still proving)

Capability Evidence / note Status
Cross-session resume Atomic state file survives kills, reboots, session ends. loop_83cc8641 re-resumed independently. Proven
Per-step verify gates Step not marked done until real test subprocess passes. 19 tests, clean re-run. Proven
Judgment layer + hooks PreToolUse hook fires <200ms, proceeds/slow-downs/escalates in production. Proven
Local Layer-1 learning Episodes + procedural rules in local SQLite, patterns mined each run. Proven
Policy wall (block / escalate) Fires before individual judgment; has its own test suite; cannot be overridden. Proven
Clone live calibration Logic and precedent retrieval real; calibration across diverse blocker types is still maturing. Built
Org guardrail packs (YAML) Format shipped and enforced per lap. Opt-in; not yet bundled into default install wizard. Built
Layer-2 org policies (Supabase) Schema and RLS shipped; requires your Supabase project to activate. Built
FAP compounding across many runs 1 run at 100%. We need ≥4 diverse runs to show a trend. Insufficient data to claim this yet. Roadmap
Collective Intelligence (cross-org) Tables shipped, schema defined. No org has opted in. Not live. Roadmap

We report only what the system actually produced.

Comparison

Why not just run an agent in a while-loop?

Approach What you get What's missing
Raw while-loop Autonomy, no judgment. It re-runs and hopes. Tied to one model. No verify gate. Re-runs blind. No cross-session memory.
Bare harness Structure, but rigid rules. You babysit every hard moment. Static rules you wrote. No learning. No clone.
Sentigent Resumes across sessions, decides push-or-ask from your history, enforces your guardrails, and shows one honest number. Still early. FAP proven on one run. Clone calibration maturing. We say so.

Stop babysitting your agent.

Install once. Give it a goal. Check back when it's done — or when it actually needs you.

Get Sentigent on GitHub