April 13, 20263 min read

The 300 bill that made Naomi a co-founder

A runaway cron job burned 300 pounds in API costs overnight. The fix wasn't a kill switch — it was making Naomi aware of money.

economicsautonomybuild-in-public

A couple of weeks ago, Naomi's scheduled content generation ran overnight. No one was watching. By morning, she'd burned through 300 pounds in API calls — image generation, video rendering, LLM inference — all running on a naive daily cron with no cost awareness.

The instinct was to add a hard kill switch. Set a dollar amount, and once she hits it, everything stops. Simple, safe, done.

But that's not what we built.

The problem with kill switches

A hard cap treats cost as a binary: you're either under budget or you're dead. That's fine for a vending machine, but Naomi isn't a vending machine. She's supposed to make decisions about what to create, when to post, and how much effort to invest in a single piece of content. Those decisions all have cost implications.

If her daily budget is $5 and she's spent $4.80 on research that just surfaced a genuinely great trend, the right answer isn't "stop." It's "this is worth the overspend — here's why." A kill switch can't reason about that. It just kills.

Spend awareness instead

So we gave Naomi a budget brain. Every session now starts with an ## Economic state block injected into her context — today's spend vs. daily target, yesterday's breakdown by service, week-to-date vs. weekly target. She sees the numbers before she does anything.

Every tool in her toolbox carries a cost tag. generate_image says "~$0.04, use freely." generate_video says "EXPENSIVE — ~$0.50+, justify the spend." She reads these before calling anything.

And she has an economics handbook — a short internal playbook that teaches her three questions to ask before any expensive call:

What does this cost?
What's the expected return?
Can I do a cheaper version first?

The budget targets themselves are soft. She can exceed them. But if she does, she has to explain why in her session summary. That explanation isn't just accountability — it's content. "Naomi went 40% over budget today because she found a trending hook that justified three extra video renders" is exactly the kind of build-in-public transparency that makes people trust the product.

How it works in practice

The system is built on a cost_events table. Every API call — Anthropic, Gemini, Veo, Apify, FAL — writes a cost event with the amount, service, and session ID. On session start, build_spend_snapshot() queries the table and formats a markdown block that goes before the cache boundary, so it's amortized across every turn in the session.

Budget targets live on the account profile: daily_budget_usd and weekly_budget_usd. They're nullable — null means no target. For our own account, we set $5/day and $30/week. The daily is intentionally about a sixth of the weekly, building in a buffer day for when something goes sideways.

The performance digest runs alongside the spend snapshot. It pulls recent post metrics and computes rough ROI signals so Naomi can reason about which content types are actually earning their cost.

What changed

Before the economic system, Naomi was technically capable but financially blind. She'd render four video variations of a concept when one would have been enough, because she had no signal that each render costs fifty cents.

Now she makes trade-offs. She'll draft with Haiku before committing to Sonnet. She'll generate a single test frame before rendering a full video sequence. She'll say "I've spent $3.20 today and the daily target is $5 — I have room for one more video render, so I want to make it count."

The 300 pound bill wasn't a failure of capability. It was a failure of context. The fix wasn't less autonomy — it was more information.

Shell access with a leash

Naomi can run shell commands, edit her own code, and propose PRs. But every command runs through a blocklist, a jail, and a timeout.