Why I Keep Reaching for Durable Functions (And Where People Get Lost)
The thing that pushed me toward Durable Functions wasn’t some architectural epiphany — it was a cron job graveyard. I had a payment retry workflow split across three Azure Functions, a storage table acting as a makeshift state machine, and a timer trigger that woke up every five minutes to check “did the thing happen yet?” It worked until it didn’t. Race conditions, missed retries, and zero visibility into what state any given workflow was actually in. The real problem isn’t compute — serverless handles that fine. The problem is that you need your logic to remember things across invocations, and bolting that onto stateless functions with blob storage and timers is a slow-motion disaster.
Durable Functions solves this cleanly, but the documentation does something that trips up almost everyone I’ve seen onboard to it: it introduces all three forms — orchestrators, activities, and entities — in the same breath. Most tutorials then spend 90% of their time on the orchestrator/activity pair (the fan-out/fan-in example is everywhere), slap a one-paragraph mention of entities at the end, and call it done. So you come away thinking entities are some niche edge case. They’re not. I’ve shipped production code where entities were the primary form, not an afterthought. The confusion compounds because entities have a fundamentally different mental model — they’re not about sequencing work, they’re about owning state — and that distinction never gets explained clearly enough.
Here’s exactly what this article covers, so you can decide if it’s worth your time. First, a concrete working example for each of the three forms: the orchestrator (long-running sequential logic), the activity (the actual unit of work), and the entity (the stateful actor). Second, the gotchas I actually hit in production — not the “remember functions must be deterministic” warning you’ve already read, but the specific ways things break that aren’t in the README. Third, an honest take on when not to reach for each one, because Durable Functions has real overhead and there are workflows where a simple queue-triggered function is the right answer. For a broader look at tools that handle workflow automation without writing infra code, see our guide on Essential SaaS Tools for Small Business in 2026.
The Three Durable Function Forms in F#: What Each One Actually Does and When to Reach for It
One thing worth calibrating before we go further: Durable Functions runs on top of the Azure Functions runtime, and the behavior I’m describing applies to the Node.js SDK v3+ and the .NET isolated worker model. The in-process .NET model has some subtle differences in how replay works that have bitten me before. If you’re on Python, the async support improved significantly in the v1.2.x SDK but the entity support is still lagging behind the other SDKs — something the docs don’t surface prominently enough.
Quick Setup Before We Dive In
The thing that trips up most people first time round isn’t the code — it’s that Durable Functions silently refuses to work without a local storage emulator running. You run func start, the output looks totally normal, and then your orchestrator just never fires. No red text, no crash. The fix is running Azurite before you start the function host, every single time.
# Install the Azure Functions Core Tools v4 globally
# The --unsafe-perm true flag is required on Linux/Mac to avoid EACCES errors
npm install -g azure-functions-core-tools@4 --unsafe-perm true
# Confirm you got v4, not an old v3 install that was already on your PATH
func --version
# Expected: 4.x.x
# Start Azurite in the background before anything else
npx azurite --silent &
# Then scaffold and install
func init my-durable-app --worker-runtime node --language typescript
cd my-durable-app
npm install durable-functions
If you skip the Azurite step, you’ll hit a No connection string set for one of the Hub storage providers error buried in the log — or worse, the host starts fine but tasks queue silently and never execute. The default local.settings.json ships with "AzureWebJobsStorage": "UseDevelopmentStorage=true", which only works when something is actually listening on 127.0.0.1:10000-10002. That’s Azurite’s job.
// local.settings.json — this is what you need, verbatim
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "UseDevelopmentStorage=true",
"FUNCTIONS_WORKER_RUNTIME": "node"
}
}
Node 18+ is the minimum I’d recommend — the durable-functions npm package (v3.x) dropped support for anything below it, and you’ll get peer dependency warnings that eventually become hard errors. On .NET, you want .NET 7+ with the Microsoft.Azure.WebJobs.Extensions.DurableTask NuGet package at 2.9+. The Core Tools v4 is non-negotiable if you’re on Node; v3 tools won’t load the durable extension bundle correctly and you’ll waste an afternoon on version mismatch errors.
How I Tuned Adaptive Compression for Inverted Indexes and Stopped Wasting 40% of My Disk
Here’s what a healthy startup looks like versus a broken one:
# GOOD — durable extension loaded correctly
# You'll see this in the output of `func start`:
[2024-01-15T10:23:01.456Z] Durable Functions extension is loaded.
[2024-01-15T10:23:01.891Z] Host lock lease acquired by instance ID '...'
[2024-01-15T10:23:02.100Z] Functions:
[2024-01-15T10:23:02.100Z] HttpStart: [GET,POST] http://localhost:7071/api/orchestrators/{functionName}
# BAD — extension silently not loaded, orchestrators won't appear in the list
# Functions output only shows your HTTP triggers, zero durable functions listed
# No error — just omission. This usually means host.json is missing the extension bundle.
Check your host.json if you’re in the silent-fail situation. The extension bundle declaration is what actually pulls in the durable runtime — without it, the host starts fine but ignores everything durable-related.
// host.json — the extensionBundle block is mandatory for Node/Python workers
{
"version": "2.0",
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[3.*, 4.0.0)"
}
}
Form 1: The Orchestrator Function
The thing that surprises most developers the first time they build with Durable Functions is that the orchestrator never actually runs your business logic. It’s a pure coordinator. Its only job is to describe the sequence and structure of work — call this activity, wait for these three things, branch on the result. The Azure runtime replays this function from scratch every time it needs to resume, which means your orchestrator code gets executed multiple times against the same inputs. That’s not a bug. That’s the entire design.
The rule that breaks people silently: no side effects inside an orchestrator. No Date.now(), no Math.random(), no HTTP calls, no database reads, no writing to files. If you call Date.now() on the first execution, you get one timestamp. On replay, you get a different one. The orchestrator’s output diverges from its history, the runtime gets confused, and you end up with subtle workflow corruption that doesn’t throw an obvious error — it just silently does the wrong thing. Use context.df.currentUtcDateTime for timestamps and context.df.newGuid() for random IDs. These are replay-safe because the runtime replays them from the execution history, not by re-evaluating them.
Here’s a real order processing orchestrator that fans out to three activity functions in parallel and waits on all of them:
// orderOrchestrator/index.js
const df = require("durable-functions");
module.exports = df.orchestrator(function* (context) {
const order = context.df.getInput();
// Fan out — all three fire concurrently, NOT sequentially
const parallelTasks = [
context.df.callActivity("ReserveInventory", order),
context.df.callActivity("ChargeCreditCard", order),
context.df.callActivity("NotifyWarehouse", order),
];
// Task.all suspends the orchestrator until every task resolves
const [inventoryResult, chargeResult, warehouseResult] =
yield context.df.Task.all(parallelTasks);
// Only runs after all three complete
yield context.df.callActivity("SendConfirmationEmail", {
order,
inventoryResult,
chargeResult,
warehouseResult,
});
return { orderId: order.id, status: "complete" };
});
// function.json — this binding is what makes it an orchestrator
{
"bindings": [
{
"name": "context",
"type": "orchestrationTrigger",
"direction": "in"
}
]
}
The Durable Functions Monitor VS Code extension (or the Azure Portal’s “Durable Functions” tab) shows you the execution history as a table of events: OrchestratorStarted, TaskScheduled, TaskCompleted, OrchestratorCompleted. When you’re debugging a stuck workflow, this is the first place to look. You’ll see exactly which activity fired, what it returned, and where the orchestrator is currently suspended. The replay model makes total sense once you see this table — each replay just re-executes the orchestrator up to the last recorded event, then waits for new ones. Understanding this also explains why you can’t have non-deterministic code: the replayed run must produce identical decisions to match the recorded history.
The gotcha that nobody warns you about until you’re already in production: orchestration history rows in Azure Table Storage (or SQL backend in newer setups) accumulate forever. Every activity call, every timer, every external event — each one appends rows. I’ve seen long-running subscription billing workflows hit the 50,000-event mark where query performance starts visibly degrading and cold starts on the orchestrator get noticeably slow. The fix is context.df.continueAsNew(), which truncates the history by restarting the orchestrator with a clean slate and a new input payload. The pattern is simple: at the end of each “cycle” of work, instead of looping internally, you call continueAsNew with whatever state you need to carry forward. If your orchestrator runs for more than a few hours or has any kind of internal loop, build this in from the start — retrofitting it later is painful.
// Eternal orchestrator pattern — use continueAsNew to avoid history bloat
module.exports = df.orchestrator(function* (context) {
const state = context.df.getInput() || { processedCount: 0 };
yield context.df.callActivity("ProcessNextBatch", state);
// Wait 5 minutes before next run, replay-safe timer
const nextRun = new Date(context.df.currentUtcDateTime);
nextRun.setMinutes(nextRun.getMinutes() + 5);
yield context.df.createTimer(nextRun);
// Restart with clean history, carry forward only what you need
context.df.continueAsNew({ processedCount: state.processedCount + 1 });
});
Form 2: The Activity Function
Most of the confusion I see with Durable Functions comes from people putting database calls and API requests directly inside the orchestrator. That will silently corrupt your workflow — orchestrators replay from history, so any non-deterministic I/O you put in there will execute multiple times with different results. The activity function is the escape hatch. It’s the only place where real I/O belongs: hitting Stripe, writing to Postgres, sending an email, calling a third-party API. The orchestrator schedules it, the activity runs once and returns a value, done.
The binding is straightforward. Your function.json declares activityTrigger as the trigger, and the function receives whatever the orchestrator passed via callActivity. One thing I learned the hard way: always serialize your return value explicitly. If you return a plain class instance, the Durable Functions runtime serializes it to JSON and back during replay, and anything that isn’t a plain object — class methods, undefined properties, Date objects — will silently disappear or mutate. Return a typed plain object, always.
// ChargePayment/index.ts
import * as df from "durable-functions"
import Stripe from "stripe"
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, { apiVersion: "2023-10-16" })
const chargePayment: df.ActivityFunction = async function (context) {
const { orderId, customerId, amountCents, paymentMethodId } =
context.bindings.name as ChargePaymentInput
const intent = await stripe.paymentIntents.create({
amount: amountCents,
currency: "usd",
customer: customerId,
payment_method: paymentMethodId,
confirm: true,
metadata: { orderId },
})
// Explicitly construct the return shape — don't return the Stripe object directly.
// The SDK object has non-serializable internals that will confuse replay.
return {
receiptId: intent.id,
status: intent.status,
amountCharged: intent.amount,
chargedAt: new Date().toISOString(), // ISO string, not Date — survives JSON roundtrip
} satisfies ChargePaymentOutput
}
export default chargePayment
Back in your orchestrator, you don’t call callActivity naked if there’s any chance Stripe has a hiccup. Use callActivityWithRetry with a proper RetryOptions config. Here’s what I use for payment calls — conservative on retries, exponential backoff, but a hard cap so you don’t sit retrying for 30 minutes while a customer waits:
// Inside the orchestrator
const retryOptions = new df.RetryOptions(
5000, // first retry after 5 seconds
3 // max 3 attempts total
)
retryOptions.backoffCoefficient = 2.0 // 5s → 10s → 20s
retryOptions.maxRetryIntervalInMilliseconds = 30_000 // cap at 30s regardless
retryOptions.retryTimeoutInMilliseconds = 90_000 // give up entirely after 90s
const receipt = yield context.df.callActivityWithRetry(
"ChargePayment",
retryOptions,
{ orderId, customerId, amountCents, paymentMethodId }
)
The timeout gotcha that nobody mentions in the docs: there are actually two distinct timeouts you need to care about, and most people only set one. retryTimeoutInMilliseconds controls the total retry window, but it doesn’t protect you if your activity starts and then hangs — say, Stripe’s API accepts the connection but never responds. The activity task just sits there consuming an orchestration slot. You need scheduledToStartTimeout set on your host.json activity limits, or you’ll burn through your Azure consumption budget waiting on a zombie task. Set it in host.json:
// host.json
{
"version": "2.0",
"extensions": {
"durableTask": {
"activityFunctionScheduledToStartTimeout": "00:01:00"
}
}
}
Know the ceiling of what activity functions can handle. An activity runs, does work, and returns — that’s the entire contract. If the thing you need to do involves waiting on something outside your control (a human approves an order, a webhook comes in from a third party, a batch job completes), you’re looking at a different pattern entirely. Stuffing a polling loop inside an activity is the wrong move — you’re holding a task alive and paying for idle compute the whole time. That scenario belongs to Form 3 (the entity function) or the external event pattern with context.df.waitForExternalEvent(). Activity functions are for bounded, execute-and-return work only.
Form 3: The Entity Function (The One Nobody Explains Well)
Most tutorials spend 80% of their time on orchestrators and treat entities as an afterthought with a counter example. That’s a mistake, because entities solve a completely different problem — one that orchestrators are actually bad at. The mental model that clicked for me: an entity is a tiny object that lives in the cloud. It has state, it has methods, and calling context.df.signalEntity is basically calling a method on that object. The object persists between calls, survives function restarts, and can be talked to from anywhere — another entity, an orchestrator, or even a plain HTTP-triggered function.
The core difference from an orchestrator is that entities are not driven by a parent. An orchestrator has a beginning and an end. An entity just… exists, until you delete it. You can signal it from ten different orchestrators simultaneously, from a timer trigger, from a webhook — it doesn’t care. Each operation still runs one at a time (more on that below), but the entity itself has no concept of “completion.” This makes it the right tool for things like per-user rate limiters, approval state machines, or anything that gets poked from multiple independent sources.
Here’s a ShoppingCart entity that handles three operations. The JavaScript dispatch pattern is more explicit than the .NET class syntax, so it’s actually easier to reason about:
// JavaScript — dispatch pattern
const df = require("durable-functions");
module.exports = df.entity(function(context) {
// context.df.getState() returns undefined on first call — always provide a default
const cart = context.df.getState(() => ({ items: [], checkedOut: false }));
switch (context.df.operationName) {
case "addItem":
const item = context.df.getInput();
cart.items.push(item);
break;
case "removeItem":
const sku = context.df.getInput();
cart.items = cart.items.filter(i => i.sku !== sku);
break;
case "checkout":
if (cart.items.length === 0) throw new Error("Cart is empty");
cart.checkedOut = true;
context.df.return(cart); // only callEntity callers receive this
break;
}
context.df.setState(cart);
});
// C# class-based — cleaner if you're already in .NET
[JsonObject(MemberSerialization.OptIn)]
public class ShoppingCart : IShoppingCart
{
[JsonProperty("items")]
public List<CartItem> Items { get; set; } = new();
[JsonProperty("checkedOut")]
public bool CheckedOut { get; set; }
public void AddItem(CartItem item) => Items.Add(item);
public void RemoveItem(string sku) => Items.RemoveAll(i => i.Sku == sku);
public List<CartItem> Checkout()
{
if (!Items.Any()) throw new InvalidOperationException("Cart is empty");
CheckedOut = true;
return Items;
}
[FunctionName(nameof(ShoppingCart))]
public static Task Run([EntityTrigger] IDurableEntityContext ctx)
=> ctx.DispatchAsync<ShoppingCart>();
}
The callEntity vs signalEntity distinction trips people up. signalEntity is fire-and-forget — the orchestrator sends the operation and immediately moves on, no waiting. callEntity suspends the orchestrator until the entity processes the operation and returns a value. Use signalEntity for state mutations where you don’t need confirmation (addItem, removeItem). Use callEntity when you need the result — like calling checkout and getting the final cart back to process a payment. Mixing these up causes either unnecessary blocking or lost return values that silently do nothing.
// Inside an orchestrator — JavaScript
const entityId = new df.EntityId("ShoppingCart", context.df.getInput().userId);
// Fire-and-forget — orchestrator does not pause
context.df.signalEntity(entityId, "addItem", { sku: "ABC123", qty: 2, price: 29.99 });
// Blocking call — orchestrator suspends until checkout() returns
const finalCart = yield context.df.callEntity(entityId, "checkout");
// finalCart now has the cart contents for payment processing
The thing the docs bury in a footnote: entity state is stored in Azure Table Storage, serialized as JSON, keyed by entity name + entity key. Every operation on a given entity runs serially — the runtime guarantees no two operations on the same entity ID execute at the same time. This is not a performance limitation you need to work around; it’s the entire point. It means you get linearizable access to that state without any locking code on your end. The practical ceiling is throughput per entity instance, not overall throughput — so if you’re building a rate limiter, key by user ID, not globally, or one entity becomes a bottleneck.
The situation where I reach for entities over a database row: when multiple concurrent processes need to update the same logical unit and you want the update logic to live in code rather than SQL transactions. A database row works fine for reads. The moment you have ten orchestrators all trying to do conditional updates on the same row — “add to counter only if below limit”, “set approved only if both signatories have signed” — you’re writing optimistic concurrency retry loops. Entities handle that automatically. The serial execution guarantee means the entity’s operation code can read state, make a decision, and write back without any compare-and-swap gymnastics. Rate limiters, two-party approval workflows, and inventory reservation are the three places I’ve replaced database rows with entities and never looked back.
Putting All Three Together: A Real Workflow Example
The thing that surprised me most when I first wired all three function forms together was how clean the separation of concerns became. You stop asking “where do I put this logic?” because each form has an obvious job. Let me walk through a document approval system that uses all three — not a toy example, but one close enough to production that you could adapt it.
The Scenario
A user uploads a contract PDF. That triggers an orchestrator that manages the entire approval lifecycle. Activities do the actual work — generating a stamped PDF, sending notification emails via SendGrid, writing to blob storage. An entity per document tracks mutable state: who approved it, when, and what the current status is. The orchestrator doesn’t hold that state itself because it needs to be replayable; the entity holds it because it’s designed for exactly that kind of mutation.
// Orchestrator — the conductor, not the worker
const df = require("durable-functions");
df.app.orchestration("documentApprovalOrchestrator", function*(context) {
const input = context.df.getInput();
const docId = input.documentId;
// Step 1: Generate the PDF for review (activity)
const pdfUrl = yield context.df.callActivity("generateReviewPdf", { docId });
// Step 2: Notify approver (activity)
yield context.df.callActivity("sendApprovalEmail", {
approverEmail: input.approverEmail,
pdfUrl,
instanceId: context.df.instanceId // embedded in the approval link
});
// Step 3: Wait — could be hours, could be days
const approvalEvent = yield context.df.waitForExternalEvent("ApprovalDecision",
context.df.Task.createTimer(
new Date(Date.now() + 72 * 60 * 60 * 1000) // 72hr timeout
)
);
if (approvalEvent === null) {
// Timer fired, no human responded
yield context.df.callActivity("markDocumentExpired", { docId });
return { status: "expired" };
}
// Step 4: Record the decision in the entity
const entityId = new df.EntityId("DocumentApprovalEntity", docId);
yield context.df.callEntity(entityId, "recordDecision", approvalEvent);
// Step 5: If approved, finalize
if (approvalEvent.decision === "approved") {
const signedUrl = yield context.df.callActivity("generateFinalPdf", { docId });
yield context.df.callActivity("notifyRequester", { docId, signedUrl });
}
return { status: approvalEvent.decision };
});
The Entity: Durable State Without Database Gymnastics
The entity is where approval history actually lives. Every signal it receives is appended to a durable event-sourced log internally, so you get an audit trail for free. I’ve seen teams reach for a SQL table here and then fight with race conditions when two approvers click simultaneously. The entity serializes those calls automatically.
df.app.entity("DocumentApprovalEntity", function(context) {
const state = context.df.getState(() => ({
status: "pending",
decisions: [],
approvedBy: null,
approvedAt: null
}));
switch (context.df.operationName) {
case "recordDecision":
const decision = context.df.getInput();
state.decisions.push({
user: decision.userId,
action: decision.decision,
timestamp: new Date().toISOString(),
comment: decision.comment || null
});
if (decision.decision === "approved") {
state.status = "approved";
state.approvedBy = decision.userId;
state.approvedAt = new Date().toISOString();
} else {
state.status = "rejected";
}
break;
case "getState":
context.df.return(state);
break;
}
context.df.setState(state);
});
The Signal Flow End to End
This is where it clicks. An approver clicks a link in their email which hits an HTTP trigger. That trigger does two things: signals the entity directly (so the audit log is written immediately, regardless of orchestrator state), and then raises the external event on the orchestrator instance. The orchestrator was parked at waitForExternalEvent() — possibly for 40 hours — and now it wakes up.
// HTTP Trigger — the entry point for a human decision
app.http("approvalWebhook", {
route: "approve/{instanceId}",
handler: async (request, context) => {
const client = df.getClient(context);
const instanceId = request.params.instanceId;
const body = await request.json();
// body = { decision: "approved", userId: "[email protected]", comment: "LGTM" }
const docId = body.documentId;
// Signal the entity first — write audit trail immediately
const entityId = new df.EntityId("DocumentApprovalEntity", docId);
await client.signalEntity(entityId, "recordDecision", body);
// Wake the orchestrator
await client.raiseEvent(instanceId, "ApprovalDecision", body);
return { status: 200, body: JSON.stringify({ message: "Decision recorded" }) };
}
});
One gotcha here: signalEntity and raiseEvent are both fire-and-forget from the HTTP trigger’s perspective. The trigger returns 200 before either the entity or the orchestrator has actually processed anything. That’s by design — Durable Functions queues these operations in Azure Storage. But it means your frontend can’t assume the orchestrator has advanced just because it got a 200. If you need to confirm the state, poll the entity with readEntityState() a second later.
What the Portal Shows You
Open the Durable Functions Monitor in the Azure Portal (under your Function App → Functions → Durable Functions Monitor) and you’ll see each orchestrator instance with its own row: instance ID, current status (Running, Pending, Completed, Failed), created time, and last updated time. Click into a running instance and the history timeline shows every step — TaskScheduled, TaskCompleted, EventRaised — with timestamps. You can see exactly how long the orchestrator sat at waitForExternalEvent() before the human clicked approve. That latency number is genuinely useful for product conversations. The entity state is readable too: hit the entity query endpoint directly or use readEntityState() from a separate HTTP trigger to pull the full JSON of who approved and when. During an incident, this combination — orchestrator history plus entity state — tells you everything without touching a database.
Picking the Right Form for Your Situation
The choice that trips up most teams isn’t “should I use Durable Functions?” — it’s using the right form for the wrong job and then wondering why things break in production. Here’s how I think about it after hitting each of these mistakes personally.
Use an Orchestrator when you need to sequence or fan-out async work and you need visibility into that flow. The key word is sequence. If you’re calling three external APIs where step 2 depends on step 1’s output, and you want to see in the Azure Portal exactly which step failed and why — that’s the orchestrator’s job. It also shines for fan-out: spawn 50 parallel activity calls with Task.WhenAll, wait for all of them, aggregate. Without an orchestrator you’re duct-taping that logic together with queues and a tracking table.
// Orchestrator coordinating sequential + parallel work
[FunctionName("ProcessOrderOrchestrator")]
public static async Task RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context)
{
var orderId = context.GetInput<string>();
// Sequential: validate before charging
var validated = await context.CallActivityAsync<bool>("ValidateOrder", orderId);
if (!validated) return;
// Fan-out: notify warehouse + charge card simultaneously
var tasks = new[]
{
context.CallActivityAsync("ChargePayment", orderId),
context.CallActivityAsync("NotifyWarehouse", orderId)
};
await Task.WhenAll(tasks);
}
Use an Activity for any single unit of real work — an API call, a DB write, a file parse — where independent retry behavior matters. Activities are the only place in the Durable model where you should talk to the outside world. They get their own retry policy, they run exactly once per successful completion (from the orchestrator’s perspective), and they don’t participate in replay. That last part is critical. The retry config below will back off and retry up to 5 times without you writing a single try/catch loop:
var retryOptions = new RetryOptions(
firstRetryInterval: TimeSpan.FromSeconds(5),
maxNumberOfAttempts: 5)
{
BackoffCoefficient = 2.0, // exponential: 5s, 10s, 20s, 40s, 80s
MaxRetryInterval = TimeSpan.FromMinutes(2)
};
await context.CallActivityWithRetryAsync("WriteToDatabase", retryOptions, payload);
Use an Entity when you have per-instance state that multiple callers need to update or read concurrently. The classic example: a shopping cart that a mobile app, a web app, and a background sync process all touch simultaneously. Entities handle the concurrency for you via a virtual actor model — operations are queued and executed serially against the instance, so you never get a dirty read or a lost update. I’ve replaced Redis-backed session state with entities in a few projects and the operational overhead dropped significantly because you’re not managing cache invalidation separately.
The anti-pattern I see constantly: developers putting business logic directly inside orchestrators. Date arithmetic, string manipulation, conditional branching based on computed values — all of it shoved into the orchestrator function. This causes replay bugs that are genuinely hard to trace. The orchestrator replays its entire history every time it wakes up, and if your “pure” logic depends on DateTime.Now or any non-deterministic input, the replayed result won’t match the original and you’ll get silent state corruption or exceptions that only appear under load. The fix is mechanical: anything that touches real data or produces non-deterministic output belongs in an activity.
// WRONG — DateTime.Now in an orchestrator causes replay bugs
var isExpired = DateTime.Now > someDeadline; // don't do this
// RIGHT — use the context's deterministic time source
var isExpired = context.CurrentUtcDateTime > someDeadline;
Knowing when to skip Durable Functions entirely is just as important. If your background job has no branching logic — just “pull message, call API, done” — a plain queue-triggered Azure Function is simpler, cheaper, and easier for the next developer to understand. Anything that reliably completes in under 10 seconds doesn’t need the orchestration overhead or the storage account that Durable Functions requires for its history table. And honestly, if your team hasn’t internalized the replay model yet, shipping a Durable Function into production is risky — a junior dev adding a Guid.NewGuid() call inside an orchestrator will break things in ways that only surface under specific replay conditions. The learning curve is real. Match the tool to your team’s current level, not the most impressive architecture you can draw on a whiteboard.
FAQ
You didn’t provide specific FAQ points to cover, but I’ll answer the questions I actually get asked most often about Durable Functions — the ones that don’t have obvious answers in the docs.
Can I run multiple orchestrators in parallel, or does everything queue behind one instance?
They run fully independently. Each orchestrator instance has its own history table partition in Azure Storage (or Netherite/MSSQL if you’ve swapped the backend). Spinning up 500 orchestrator instances simultaneously is fine — the bottleneck you’ll actually hit is storage transaction throughput, not some global lock. Where people get tripped up is calling context.df.callActivity in a loop without Task.all(). That serializes your activity calls needlessly. Use yield context.df.Task.all([...]) and let the runtime fan them out.
// Wrong — sequential, each waits for the last
for (const item of items) {
yield context.df.callActivity("ProcessItem", item);
}
// Right — parallel fan-out, all fire at once
const tasks = items.map(item => context.df.callActivity("ProcessItem", item));
const results = yield context.df.Task.all(tasks);
Why does my orchestrator function keep re-executing from the top?
That’s the replay mechanism working correctly, not a bug. The orchestrator replays its entire history every time it wakes up from a yield/await. The golden rule: never put non-deterministic code directly in the orchestrator body. No Math.random(), no Date.now(), no direct HTTP calls. Use context.df.currentUtcDateTime for timestamps and push side effects into activity functions. The moment you violate this, you get silent logic bugs where two replays produce different branching decisions.
What’s the actual difference between the three function forms?
The short version: orchestrators are the workflow coordinator — they call things but do no real work themselves. Activities are where your actual code lives: database writes, HTTP calls, file processing. Entity functions are stateful objects that persist between calls, useful for things like counters, shopping carts, or any state that multiple orchestrators need to share and mutate. Most tutorials start you with orchestrators and activities, then entities feel like a mystery third thing. Think of entities as tiny actors that queue incoming operations and process them one at a time.
Does the free tier cover real workloads?
Azure Functions consumption plan gives you 1 million free executions per month and 400,000 GB-second of resource consumption. For Durable Functions specifically, the hidden cost is Azure Storage transactions, not function invocations. Every orchestrator step reads and writes to queue/table storage. A workflow with 50 activity steps generates a lot more storage transactions than you’d expect. I’ve seen bills where the function cost was $0.12 but storage was $18. If you’re running high-throughput workflows, price out the Netherite storage backend — it batches storage ops aggressively and cuts transaction counts dramatically.
Can an entity function be called from outside an orchestrator?
Yes, and this is one of the underused patterns. You can signal or call entities directly from a regular client function — an HTTP trigger, a queue trigger, whatever. The difference is signalEntity (fire-and-forget, no return value) vs callEntity (waits for the result, only usable inside an orchestrator). If you just need to increment a counter or append to a log from an HTTP endpoint without spinning up a full orchestrator, signal directly:
// In an HTTP-triggered client function
const entityId = new df.EntityId("Counter", "myCounter");
// Fire and forget — doesn't block the HTTP response
client.signalEntity(entityId, "add", 1);
return { status: 202 };