Why did Monk migrate from Inngest to Temporal?

Inngest was great for moving fast on day one, but at more than a hundred async jobs it hurt at scale: slow to find a misbehaving run, idempotency checks scattered across the codebase, and no isolation between high-volume traffic and core flows. Monk's agents move money and run for days, so they need durable execution.

How did Monk migrate without downtime?

Monk moved each of its 100+ workflows in a single, independently reversible PR, with both systems live the whole time. Each workflow was characterized with tests, scaffolded dark, cut over behind a tenant-scoped feature flag, then the Inngest receiver was removed. The system was never left in a state that couldn't be rolled back in one step.

What is durable execution?

Durable execution means a workflow's state is persisted as it runs. If a process crashes or is redeployed, the workflow resumes from where it stopped instead of starting over, which removes the need to hand-roll retries and recovery in application code.

Did Monk rewrite its system to adopt Temporal?

No. Monk avoided a big-bang rewrite. Each workflow was moved incrementally in a reversible change, with both runtimes wrapping the same domain service so a tenant on Temporal and a tenant still on Inngest behaved identically.

How did Monk handle per-tenant concurrency in Temporal?

Temporal has no native per-tenant concurrency limit, so Monk built it two ways. Coarse isolation comes from separate namespaces and worker pools. Fine-grained limits come from a coordinator workflow that fans out child workflows behind a sliding concurrency window.

What are the main gotchas migrating from Inngest to Temporal?

Retries count differently: Inngest counts attempts after the first failure, while Temporal's maximum attempts includes the first, so the correct mapping is N plus one. And an Inngest step is not a Temporal activity: Temporal replays the workflow from history, so nondeterministic code such as timestamps, random values, or ad hoc database reads must move into an activity.

What changed for engineers after moving to Temporal?

Every workflow now shares the same shape, with the same primitives for state, retries, and recovery. New engineers ship their first workflow on day two because the patterns are consistent across the codebase.

Why We Moved 100+ Workflows to Temporal, One Reversible PR at a Time

Q: How many workflows did Monk migrate?

Monk migrated more than 100 live production workflows from Inngest to Temporal, one reversible PR at a time, without freezing feature development.

June 18, 2026

min read

Engineering

Most teams migrate a live system one of two ways.

They freeze feature work and move everything at once, betting the company on a big-bang cutover. Or they move the easy jobs, lose momentum, and let the half-migrated state set like concrete, so every engineer has to remember which jobs live where, forever.

We did neither. We moved 100+ live workflows from Inngest to Temporal one reversible PR at a time. Both systems ran the whole way. No migration freeze. No rewrite. No downtime.

Why we moved off Inngest

Monk is an AI platform for accounts receivable. Our agents send invoices, run Intelligent Collections to chase outstanding ones, apply bank transactions to invoices, and sync everything into accounting systems and ERPs. Most of the product is autonomous and async. By the time we started looking at Temporal, that async surface was north of a hundred jobs.

Inngest was great for going fast on day one. At a hundred-plus jobs, it started hurting us at scale. Finding the one run that misbehaved was slow. The "did we already do this?" idempotency checks were scattered across the codebase. A burst of high-volume webhook traffic could pressure the workflows running billing and customer-facing flows, because nothing isolated them.

The decision to move was easy. The hard part was the how. You do not migrate a hundred running workflows in a weekend, and you should not try.

Monk's agents move money, so the workflows have to hold

These workflows run for days. They call systems that fail in creative ways, like banks, ERPs, and email. And they cannot lose state halfway through. A retry that double-applies a payment lands on a customer's books.

Temporal gives us durable execution. A workflow's state survives crashes, restarts, and deploys. When a step fails, it resumes from where it stopped instead of starting over. The guarantees we used to hand-roll in every workflow, like retries, timeouts, idempotency, and recovery, are now properties of the platform. We don't have to get them right a hundred separate times.

How we ran the migration: one reversible PR at a time

We never let a single workflow sit half-moved. Each one followed the same four steps, and each step was independently reversible and shipped small:

Characterize. Before touching anything, write tests that lock in what the Inngest job actually does: inputs, outputs, side effects, retry behavior, the branch logic nobody remembers. This is the safety net for every step after it.
Scaffold. Add the Temporal workflow and its activities alongside the Inngest job, with no traffic pointed at them. The new code ships dark.
Cut over. Flip the dispatch behind a feature flag scoped to a tenant. Route one canary tenant first, watch it, then roll forward. Roll back in seconds by flipping the flag.
Remove. Once the cutover is stable for a sprint, delete the Inngest receiver. High line count, lowest risk, because the code has not been receiving traffic.

The reason this is safe: both paths wrap the same logic. The Temporal activity is a thin shell over the same domain service the Inngest job already called. The business logic keeps one home the whole way through, and the flag only chooses which runtime wraps it. The characterization tests hold that service still while the wrapper changes underneath it.

Once the shape was muscle memory, the per-workflow cycle was a day or two. Every workflow we moved has the same four commit titles, in the same order.

Two speed bumps in the migration

Two of the Inngest-to-Temporal mappings caught us out before we learned to watch for them.

Retries count differently. Inngest's retries: N means N attempts after the first failure. Temporal's maximumAttempts: N is the total, first attempt included. Port retries: 2 straight across to maximumAttempts: 2 and you have silently dropped an attempt. The correct mapping is maximumAttempts: N + 1.

A step and an activity are not the same primitive. Inngest re-invokes your function over HTTP once per step and memoizes each step's result by its name. Temporal replays the whole workflow function from event history and hands back the recorded result of each activity. The consequence shows up the moment you port a function body: anything that sat between step.run calls in Inngest, a Date.now(), a random pick, a quick read off the database, was harmless there because each step ran in a fresh invocation. The same line in a Temporal workflow body runs on every replay and breaks determinism. It has to move into an activity. Catch this, or you have copied the body across without migrating it.

The thing you will miss most: flow control

In Inngest, capping concurrency per tenant is two lines of config. Inngest keeps a separate virtual queue per tenant and never runs more than your limit at a time. Throttle, debounce, rate limiting, priority, event batching, all the same way: declarative config, no code.

Temporal is not built the same way. A worker can cap how many activities or workflow tasks a single process runs at once, but there is no native "run at most N workflows of this type" or "at most N per tenant." It has been an open, heavily upvoted request on the Temporal repo for years. So you build it. Two levers got us back what we gave up.

Coarse: separate namespaces. Our highest-volume integrations can produce more events in an hour than most jobs produce in a day. They get their own namespace, worker pool, and ECS service. A flood there cannot starve the workers running billing and customer-facing flows. One Docker image, two entry points, picked per service.

Fine: a coordinator workflow. Inside a pool, when you need a real per-tenant limit, one workflow lists the work and fans out children behind a sliding window, parking until a slot frees. The bug everyone writes first is forgetting to free the slot on the failure path, not just on success. For very large fan-outs, continue-as-new periodically instead of looping forever in one run.

How we deployed it

Workers run on ECS Fargate. Two services share one Docker image with two entry points. They autoscale on CPU, mostly on Fargate Spot with one always-on task as a floor. Spot is cheap and safe here, because Temporal reschedules any activity whose worker gets reclaimed mid-run.

What changed

The hard wins:

Per-workflow visibility. Every run has searchable attributes, full history, and a status. Finding the one that misbehaved is quick.
Idempotency by construction. Deterministic workflow IDs replaced an entire class of "did we already do this" checks.
Workload isolation by namespace. High-volume webhook ingestion is structurally separated from core workflows. A burst on one cannot pressure the other.

The soft wins:

New engineers ship their first workflow on day two. Every workflow has the same shape, so the only real learning curve is the determinism constraint.
The pattern stuck. The four-PR loop is muscle memory across the team now, and the same shape applies to the next migration off any framework.

Boring on purpose

We write a lot about building boring agents, systems that stay predictable because the stakes are high. Durable execution is the same idea one layer down. We would rather spend our time on the AR problems no one has solved than reinvent retry logic. Pick infrastructure with hard guarantees, give every workflow the same shape, and put the creativity into the product.

The full technical write-up

Frank wrote the complete, code-level walkthrough on the Temporal blog, including a single workflow mapped piece by piece, the full Inngest-to-Temporal mapping table, and the coordinator-workflow code.

We're hiring

If moving a live system one reversible PR at a time sounds like your kind of problem, we're building a team of engineers who want to do hard things at the application layer. Browse our customer stories, or see open roles.

Automate Accounts Receivable with Monk

Monk brings together collections, cash application, and forecasting. 40%+ DSO reduction. $1B+ in receivables managed. 26 hours a month back to your team.

Book a demo