Why We Moved 100+ Workflows to Temporal, One Reversible PR at a Time

Most teams migrate a live system one of two ways.
They freeze feature work and move everything at once, betting the company on a big-bang cutover. Or they let the old system and the new one run side by side for months, until nobody is sure which system runs which job anymore.
We did neither. We moved 100+ live workflows from Inngest to Temporal one reversible PR at a time. No migration freeze. No rewrite. No downtime.
Monk's agents move money, so the workflows have to hold
Monk's agents collect cash, apply payments, reconcile invoices, and work the edge cases finance teams dread. Those workflows run for days. They call systems that fail in creative ways, like banks, ERPs, and email. And they cannot lose state halfway through. A retry that double-applies a payment is not a bug ticket. It's a customer's books.
Temporal gives us durable execution. A workflow's state survives crashes, restarts, and deploys. When a step fails, it resumes from where it stopped instead of starting over. The guarantees we used to hand-roll in every workflow, like retries, timeouts, idempotency, and recovery, are now properties of the platform. We don't have to get them right a hundred separate times.
How we ran the migration
We had a hard constraint. The system is live for paying customers, so we could not freeze it and we could not rewrite it.
So every workflow moved in a single PR that could be reverted on its own. At no point could the system be in a state we couldn't undo in one step. We migrated 100+ workflows that way. We never froze, and we never lost a day of shipping.
The payoff showed up in onboarding
Before, every workflow carried its own handling for retries, state, and failure. Learning the system meant learning a dozen one-off patterns.
On Temporal, every workflow has the same shape. The same primitives for state. The same model for retries and recovery. Once you understand one, you understand the rest. New engineers ship their first workflow on day two.
We're growing fast, and every bit of complexity in the codebase is time a new engineer spends learning instead of building.
Boring infrastructure, ambitious product
We write a lot about building boring agents, systems that are predictable because the stakes are high. Durable execution is the same idea one layer down. We would rather spend our originality on the AR problems no one has solved than on reinventing retry logic. Pick infrastructure with hard guarantees, give every workflow the same shape, and save the invention for the product.
The full technical write-up
Frank wrote up the complete migration on the Temporal blog, including the architecture, the parity checks, and the rollout.
We're hiring
If moving a live system one reversible PR at a time sounds like your kind of problem, we're building a team of engineers who want to do hard things at the application layer. See open roles.



.avif)