The four layers that made paid API calls safe to ship
I spent three sessions planning the migration from Railway to Cloudflare Workers before I wrote a single line of the new spend module. Most of that time went to answering one question: how do I make it structurally impossible to send a paid SMS twice for the same reminder.
The problem
The planner app sends SMS reminders through Twilio on a schedule. Every SMS costs real money. Under the old Railway architecture the scheduler was a single long-running Python process; a bug in the loop would get noticed inside a minute because logs scrolled past in the terminal. Under Cloudflare, the scheduler runs as a Cron Trigger every minute, in a fresh Worker with no memory of previous invocations. A bug that retried the same send without protection could send the same reminder fifty times before I noticed.
I wrote down what "protected" had to mean, and it came out as four layers: a killswitch to stop everything, a budget cap per user, an idempotency guarantee so retries don't double-send, and only then the actual vendor call. Any one of those layers catches a different failure mode; skipping any one of them means a bug in the others becomes expensive.
Before I wrote a single line, I wanted the idempotency layer specifically right. I knew the naive "check if sent, then send" pattern had a race condition in it, but I hadn't built the mental model for what replaced it. Here's the conversation that closed the gap.
UNIQUE constraint on (recipient, occurrence_id, date) and let the database reject the second write. On vendor failure after the insert, delete the row or mark it failed; never leave it in limbo.The insert-before-send pattern
The idempotency layer turned out to be the most interesting one. The naive approach is to check "have I sent this already" before calling Twilio, but that leaves a race window: two concurrent invocations both see no prior send, both insert the log after, both call Twilio. The fix is to invert the check: insert a row representing the intent to send first, with a UNIQUE constraint on (recipient, occurrence_id, date), then call Twilio. If two invocations race, exactly one succeeds at the INSERT; the other fails cleanly with a constraint violation and exits without sending.
The other three layers (killswitch, budget cap, vendor call) are easier to reason about individually but only work if they compose in the right order. Put the killswitch first, the budget cap second, the idempotency insert third, the vendor call fourth. Reverse any two and you get a bug that costs money instead of a bug that costs nothing.
What this bought me
Two weeks after the migration shipped, a bug in the reminder scheduler caused the same cron to fire twice in a 60-second window. Without the idempotency layer, that would have been two SMS sends for every active reminder in the system. With it, the second invocation hit the UNIQUE constraint, logged the rejection, and exited. Zero user-visible impact, zero duplicate sends, zero budget impact. I learned about the bug from an audit log query I ran the next day.
The query that surfaced it looked like this:
What I'd do differently
Principle 2 in my workspace architecture rules (defense in depth for spend protection) came out of this work. The thing I'd emphasise more, if I wrote it again: insert-before-send has to be the third layer at the earliest, never the last. If the UNIQUE constraint is downstream of the budget check and the budget check has a bug, the constraint still saves you. The other way around is not true.
A week after shipping the pattern, I checked in with Claude about whether the approach generalised beyond SMS:
UNIQUE constraint, then call the vendor, then roll back the insert on failure. Same shape for SMS, email, webhook dispatch, and background job enqueueing. The key insight is that the constraint runs before the expensive call, not after.INSERT is trivial compared to the cost of a duplicate send.The pattern is in the workspace architecture rules now. Every spend-touching module in this portfolio inherits it by default, and the principle number is cited in every relevant diary entry. The cost was three planning sessions and one small refactor; the return is a class of bug that can no longer happen.