Autonomous Agents · AI & Automation

Goals in. Work out.
Agents that finish.

An autonomous agent takes a goal, decides what to do, calls real tools, reads what came back, and keeps going until the job is done — not a chatbot that answers, a worker that acts. Built on the Claude tool-use loop, gated on anything irreversible, and shipped to production.

Start a project → See it run →

/01Drag the complexity · watch the loop work harder

Live · agent loop, running

Success 94%

⌖ "Triage this ticket, refund if eligible, reply."

Tools available0

Steps taken0

Success rate0%

Cost / run$0

Task complexity · steps allowed 4 steps

Tasks finished autonomously94%

Manual handling cut−68%

Tool calls / run11×

Median time-to-done38s

02 — Outcomes

Agents that did the work.

A ledger of named agents where the line that moved was a task finished end-to-end — triaged, resolved, shipped — not a draft a human still had to do. 6 of 30 shown · ledger updates as agents scale.

Northwind Ops

Ops agent · Tickets + refunds

Reads the ticket, checks eligibility against policy, issues the refund through the billing API behind an approval gate, then closes the loop with a reply

−68%Manual handling

Cobalt Research

Research agent · Market briefs

Plans a search tree, pulls sources, cross-checks claims, and writes a cited brief — running the tool-use loop across web, database, and code until the question is actually answered

9.4×Briefs / analyst-day

Vera Triage

Triage agent · Inbound queue

Classifies, routes, and resolves the simple cases itself; escalates the rest with a structured summary — pauses at the approval gate before anything that touches a patient record

−74%Time-to-first-action

Lumen DevEx

Coding agent · Bug fixes

Reproduces the issue, edits the code, runs the test suite as a tool, and opens a PR — looping on failures until the build is green, with a human gate before merge

+58%Issues auto-resolved

Drift Finance

Reconciliation agent · Ledgers

Matches transactions across systems, flags the breaks, drafts the journal entries, and waits for sign-off — every side-effectful write gated and logged for audit

−81%Reconciliation hours

Forge Sales

SDR agent · Enrich + reply

Enriches the lead, drafts a grounded reply from the CRM, books the meeting on the calendar tool, and updates the record — running unattended within a clear permission boundary

3.1×Replies / rep-hour

03 — The loop, live

It doesn't just answer.
It acts, then checks.

Give it a goal and watch the agent run: think about what's needed, call a tool, read the result, decide the next move — looping until it's done. That's the Claude tool-use loop: the model returns tool_use, your code runs the tool, feeds the result back, and the model continues. Switch on the approval gate and it pauses before anything irreversible.

Scratchpad · Goal: triage & resolve ticket

Think → Act → Observe

⌖ Triage this ticket, refund if the account is eligible, then reply.

Max steps · loop budget6

Run status · Claude tool-use loop

Press run — the agent will think, call tools, and loop until the goal is done.

Steps—

Tools used—

Status—

The model decides which tool to call and when it's done — your code just runs the tools and feeds results back. Gate the irreversible ones (refunds, emails, writes) so a human approves before the agent acts.

04 — Anatomy of a run

A run, traced,
not a black box.

Every autonomous run leaves a trace: a plan, the tool calls it made, what it observed, where it reflected, and how it finished. This is the room we work in — each stage instrumented, each tool chosen for a reason, every side-effect logged.

Agent run · Northwind Ops · Ticket #4821

Steps 7 · Tools 4 · Status done

StageToolWhat happensSignal

PlanClaude Opus 4.8Read the goal, sketch a short plan, decide the first action to takeplanned

Tool callstool_use · MCPModel returns a tool_use block; runtime executes the tool, feeds the result backtool_use

Observationstool_resultTool output returns to the model as context for the next decisionobserved

ReflectSelf-check · scratchpadDecide: goal met, try another tool, or escalate — loop or finishloop

GateHuman-in-the-loopPause before irreversible actions (refund, email, write) for approvalgated

ActBilling API · idempotentExecute the approved side-effect once, with an idempotency key+1 write

Finalstop_reason · end_turnGoal met — agent returns the result and stops the loop cleanlydone

TraceSentry · OpenTelemetryEvery step, tool, and token logged for replay, audit, and debugginglogged

green ran & logged

live the stage running in the demo above

amber gated · waiting on human approval

05 — Ship to production

Scope the goal.

Before a tool is wired, we pin down what "done" means, where the agent is allowed to act, and what must never happen without a human. An autonomous run is only as safe as the boundary you draw around it.

/ Week 00 · Scope & guardrails

GoalWhat "done" looks like, in measurable terms — not a vibe, a verdict

PermissionsWhich systems the agent may read, write, and act on — scoped tight

GatesThe irreversible actions that always pause for human approval

BudgetMax steps, max spend, and a hard stop so it can never run away

Wire the tools.

An agent is its tools. We define each one — name, schema, what it does — so the model can call it cleanly, and expose them over MCP so the same tools work across runtimes. Garbage tools, garbage actions; this is where most agents quietly fail.

/ Week 01 · Tools & MCP

DefineTyped tool schemas · clear names · tight inputs

ExposeMCP servers · reusable across agents and runtimes

Side-effectsIdempotency keys · retries · every write logged

PermissionsPer-tool scopes · least privilege by default

SandboxCode & browser tools run isolated, never on prod

Close the loop.

The model returns a tool_use block, your runtime executes the tool, the result goes back as context, and the model decides again. Think → Act → Observe, repeating until stop_reason says done — with a hard step budget so it always terminates.

/ Week 02 · Loop & control

Tool-use loop · run until stop_reason end_turn

Max-steps budget · hard stop, no runaway

tool_result fed back as model context

Parallel tool calls where it's safe to fan out

Managed Agents for a hosted, durable loop

Gate the irreversible.

Reading is free; acting is not. Refunds, emails, deletes, writes — anything you can't undo pauses for human approval, or runs only inside a tight permission boundary. The agent proposes; a person (or a policy) disposes.

/ Week 03 · Human-in-the-loop

Evaluate honestly.

Run a frozen set of real tasks every change and score it — did it finish, was it correct, did it stay in bounds, what did it cost. No "looked great in the demo." A success rate that moves, or the change doesn't ship.

/ Week 04 · Evaluate

Task success94% — goals finished without a human stepping in

In-bounds100% — never acted outside its permission scope

Avg steps5.8 — efficient loops, no aimless wandering

Escalation rate6% — handed off cleanly when unsure

Ship & watch.

Live with durable execution for long runs, tracing on every step, and alerts when an agent loops too long or drifts out of bounds. We watch success as the world changes underneath it, and keep the agent finishing the job as your systems evolve.

/ Ongoing · Ship & watch

Durable runs · Temporal

Step-level tracing

Runaway-loop alerts

Permission boundaries

Approval gates on writes

Idempotent side-effects

Weekly task eval

Human-in-the-loop review

Goals in. Work out.
Agents that finish.

Agents that did the work.

It doesn't just answer.
It acts, then checks.

A run, traced,
not a black box.

Scope the goal.

Wire the tools.

Close the loop.

Gate the irreversible.

Evaluate honestly.

Ship & watch.

A tended agent gets sharper.

The kit, shown.

Stop drafting.
Start finishing. Goals in.

Goals in. Work out. Agents that finish.

Agents that did the work.

It doesn't just answer.It acts, then checks.

A run, traced,not a black box.

Scope the goal.

Wire the tools.

Close the loop.

Gate the irreversible.

Evaluate honestly.

Ship & watch.

A tended agent gets sharper.

The kit, shown.

Stop drafting.Start finishing. Goals in.

Goals in. Work out.
Agents that finish.

It doesn't just answer.
It acts, then checks.

A run, traced,
not a black box.

Stop drafting.
Start finishing. Goals in.