Durable Functions are regular Lambda functions with “durable execution” enabled. This feature extends Lambda’s programming model with two new primitives — “steps” and “waits” — enabling automatic checkpointing, recovery after failures, and pausing execution for up to a year with no compute charges during idle periods. You don’t need to manage additional infrastructure or write custom state management and error handling code.
The underlying technical mechanism is checkpoint and replay: when a failure occurs, Lambda re-runs the function from the beginning but skips already completed steps (which have checkpoints) and reuses saved results, ensuring no progress is lost.
With standard Lambda, your code runs from start to finish in a single invocation and is completely stateless. This creates three major limitations when building complex workflows:
Durable Functions solves these problems with two core primitives in the open-source Durable Execution SDK:
context.step(): Wraps a block of business logic for automatic checkpointing and retry. Once a step completes, it is skipped in subsequent replays.context.wait() / callback: Pauses execution for a period of time or until an external signal is received (e.g., human approval). During the wait, the function is suspended with no compute charges, then automatically resumes.The architecture diagram below illustrates an AI workflow using Durable Functions: the client calls via API Gateway to a Lambda function (with durable execution enabled); the function uses step() to call Bedrock (LLM) and saves results to DynamoDB — each step is automatically checkpointed; uses wait()/callback to pause for human approval then resume; execution state is emitted to EventBridge and CloudWatch for monitoring.
After enabling durable execution and adding the SDK, use DurableContext to wrap each business step. The core concepts are context.step() and context.wait():
def handler(event, context):
# Step 1: call LLM - automatically checkpointed
summary = context.step("summarize", lambda: call_llm(event))
# Step 2: wait for human approval (free compute while waiting)
decision = context.wait_for_callback("approval")
# Step 3: only runs when approved - skips completed steps on replay
if decision == "approved":
context.step("persist", lambda: save_result(summary))
Note: although the total workflow can last up to a year, each Lambda container still runs for a maximum of 15 minutes between checkpoint/wait points.
Lambda Durable Functions was announced on December 2, 2025 (re:Invent 2025), initially GA in US East (Ohio) with Python (3.13, 3.14) and Node.js (22, 24) runtimes, then expanded to more regions. Can be enabled via Console, AWS CLI, SAM, CloudFormation, CDK, or SDK. The Java SDK launched in Developer Preview on February 26, 2026.
AWS Lambda Durable Functions blurs the line between Lambda’s simplicity and the complexity of workflow orchestration systems. Instead of piecing together Step Functions, SQS, and DynamoDB, you write sequential logic in a single function and let the SDK handle checkpointing, retries, and pauses. This is an ideal choice for multi-step AI workflows, order processing, and human-in-the-loop approval processes.