Handling Errors and Cleanup in Temporal Workflows (TypeScript)
A guide to error handling and cleanup in Temporal workflows, covering the Saga pattern, retry policies, and cleanup triggers on failure.

https://github.com/robinbraemer
Developing robust Temporal workflows involves anticipating failures and ensuring that any necessary cleanup or compensating actions occur regardless of how a workflow or activity fails. This includes handling activity exceptions, workflow errors, timeouts, and cancellations. Below, we outline best practices for error handling in Temporal (TypeScript), including the Saga pattern for compensating transactions, use of retry policies, and how to trigger cleanup activities on failure.
Common Failure Scenarios in Temporal
Temporal workflows can fail in several ways, each requiring proper handling:
Activity Failures: If an Activity (the code executed outside the Workflow) throws an exception or times out, the Workflow will receive an
ActivityFailureexception. The original error is wrapped inside (accessible viaerror.causein the Workflow) (Temporal Error Handling In Practice) (Temporal Error Handling In Practice). By default, Temporal will retry failed activities based on a retry policy, unless the error is marked as non-retryable (Temporal Error Handling In Practice). After final retry exhaustion (or if non-retryable), the Workflow must handle the failure (e.g., via a try/catch).Workflow Failures: If the Workflow code itself throws an uncaught exception, the Workflow will fail. You should catch exceptions in the Workflow to perform cleanup or compensation logic instead of letting the Workflow fail silently. Any error not caught will terminate the Workflow run.
Cancellation: A Workflow cancellation (e.g., via
handle.cancel()) causes aCancelledFailureto be thrown inside the Workflow and any running activities (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal). Activities detect cancellation through heartbeat mechanisms (Temporal delivers cancellation on the next heartbeat) (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal). Without special handling, cancellation would stop the Workflow immediately, skipping any following steps.Timeouts: Temporal supports activity timeouts (start-to-close, etc.) and Workflow execution timeouts. An activity timeout is treated as an activity failure (throwing an exception that can be caught in the Workflow). However, a Workflow Execution Timeout is essentially a hard terminate of the Workflow – it will stop execution without giving the Workflow a chance to clean up (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal). For business logic deadlines, it's recommended to avoid using hard Workflow timeouts or terminate and instead use timers or cancellation logic within the Workflow to handle time-based limits (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal) (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal). This way, the Workflow can catch a timeout condition (as a cancellation or error) and perform cleanup.
Understanding these scenarios allows us to design workflows that catch failures and ensure a cleanup Activity or compensating transaction runs in all cases.
Compensating Transactions and the Saga Pattern
For workflows that perform multiple steps (especially across external systems) and need all-or-nothing semantics, use the Saga pattern (compensating transactions). In Temporal, you implement Sagas by pairing each forward operation with a corresponding compensation operation that can undo it (Saga Compensating Transactions | Temporal) (Saga Compensating Transactions | Temporal). If any step fails, previously completed steps are rolled back by invoking their compensations in reverse order, leaving the overall system in a consistent state (as if the workflow's side effects never happened).
Temporal's TypeScript SDK does not have a built-in Saga helper (unlike the Java SDK’s Saga class), but it's straightforward to implement:
Perform each activity and record its compensation: After each successful activity, record a compensation function (e.g., an Activity that reverses that step) in a list. For example, if you create a record in one activity, record a compensating activity to delete that record.
On failure, run compensations in reverse order: In the Workflow’s
catchblock, iterate through the recorded compensation functions and call them (typically in reverse order of the original operations). Each compensation should be designed to safely handle the case where the original operation might not have fully completed (idempotent or conditional undo logic).Optionally, handle compensation failures: If a compensation action itself fails, log or handle it appropriately (Temporal will by default retry activities, so a failed compensation activity can be retried as well). The workflow should still attempt all compensations even if one of them fails.
TypeScript Workflow example using Saga pattern:
import { CancellationScope } from '@temporalio/workflow';
type Compensation = () => Promise<void>;
export async function OrderWorkflow(): Promise<void> {
const compensations: Compensation[] = [];
try {
// Step 1: Perform operation and record its compensation
await createOrder();
compensations.unshift(async () => { await deleteOrder(); }); // compensation for step 1
// Step 2: Perform next operation
await reserveInventory();
compensations.unshift(async () => { await releaseInventory(); }); // compensation for step 2
// Step 3: Perform another operation
await chargePayment();
compensations.unshift(async () => { await refundPayment(); }); // compensation for step 3
// ... more steps as needed ...
} catch (err) {
// If any step fails, execute compensating actions for completed steps
await CancellationScope.nonCancellable(async () => {
for (const compensate of compensations) {
try {
await compensate();
} catch (compErr) {
console.error("Compensation failed:", compErr);
// continue to next compensation even if one fails
}
}
});
throw err; // rethrow to mark workflow as failed after compensation
}
}
In the above example, each successful step pushes a compensating function onto a stack. If a failure occurs, we execute all collected compensations. We wrap the compensation loop in a non-cancellable scope to ensure it runs to completion even if the workflow was cancelled (more on this below). This pattern ensures that resources created or actions taken in earlier steps are undone when a later step fails (Saga Compensating Transactions | Temporal) (Saga Compensating Transactions | Temporal). Temporal’s documentation provides a similar example of collecting compensation callbacks in TypeScript (Saga Compensating Transactions | Temporal).
Best practices for Saga compensations:
Order and Idempotency: Invoke compensations in the reverse order of the original actions (LIFO order) since the latest action should be undone first. Make each compensation action idempotent or safe to run even if the original step partially failed or was never performed. For example, a “delete resource” compensation should succeed (or do nothing) even if the resource didn’t exist, as shown by functions like
putBowlAwayIfPresentin Temporal's saga example (Saga Compensating Transactions | Temporal).Marking Non-Retryable Errors: If a failure at a certain step is not transient (e.g., a business logic validation), consider throwing it as a non-retryable error (using
ApplicationFailure.nonRetryable) from the activity (Temporal Error Handling In Practice). This prevents Temporal from retrying the activity endlessly and instead fails fast, triggering the compensation logic.Handling Compensation Failures: Design compensating activities with their own retry policies – you generally want to retry them on failure as well (since cleanup is crucial). Even if a compensation ultimately fails, the Workflow should catch that error (as in the example above) and continue attempting the remaining compensations. Log these failures for visibility. The Workflow can still be marked failed after compensation, or you might choose to swallow the original error if you consider the saga completion a “graceful” outcome.
Consistency Consideration: In rare cases, an original activity might complete after its compensation has run, due to timing issues (for example, a delayed activity attempt finishing after the Workflow already assumed it failed) (Support for Saga compensating transactions in Typescript - Community Support - Temporal). To guard against this, ensure that activities are designed to have no effect if they are cancelled or if a compensating action was performed. This might involve application-level checks (e.g., the activity writes data with a version or token that the compensation invalidates) (Support for Saga compensating transactions in Typescript - Community Support - Temporal). While such race conditions are uncommon, being aware of them is part of Temporal best practices for absolutely reliable transactions.
Triggering Cleanup Activities on Failure
Not all failure scenarios require a full saga with multiple compensating steps. Often, you just need to run a single cleanup activity at the end of a workflow to release resources (delete temporary files, send a compensating notification, etc.) if the workflow fails. Temporal workflows can use standard try/catch/finally logic to ensure cleanup runs:
Try/Catch in the Workflow: Wrap your workflow logic in a
try { ... } catch (err) { ... }. In the catch block, call a cleanup activity. This catch will execute for any unhandled exception in the try block, whether it's an activity failure or an error thrown by workflow code. For example:import { CancellationScope, isCancellation } from '@temporalio/workflow'; import * as activities from '../activities'; // import activities, including cleanup const { cleanupTempFiles } = activities; // assume this is an activity to cleanup files export async function FileProcessingWorkflow(input: string): Promise<void> { try { await activities.processFile(input); // main activity that might fail await activities.otherStep(input); // ... normal workflow logic ... } catch (err) { // Determine failure type and perform cleanup if (isCancellation(err)) { // If workflow was cancelled, ensure cleanup runs in a non-cancellable scope await CancellationScope.nonCancellable(async () => { await cleanupTempFiles(input); }); } else { // Non-cancellation failure (activity throw or other error) await cleanupTempFiles(input); } throw err; // rethrow to fail the workflow after cleanup } }In this example, if any activity throws an error or the workflow is cancelled, the catch block triggers the
cleanupTempFilesactivity. We useCancellationScope.nonCancellablewhen the error is a cancellation to shield the cleanup step from being cancelled (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation). This is important because when a workflow cancellation is requested, the root cancellation scope of the workflow is cancelled, which would normally cancel all subsequent activity invocations. Running the cleanup in a non-cancellable scope ensures the cleanup activity is started and completed even if the workflow was cancelled mid-execution (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal). Temporal's documentation shows this pattern: detecting a cancellation withisCancellation(error)and then running the cleanup logic in a non-cancellable scope (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation).Using Finally: You can also use a
finallyblock to schedule cleanup logic that should run regardless of success or failure. However, be mindful that if the workflow is cancelled, you still need the non-cancellable scope trick. Often, thecatchapproach with rethrow (as above) is sufficient, since you typically only want to cleanup on failure scenarios. If you need to do something on both success and failure, you might call that in finally, and still handle cancellation as shown.Cleanup Activity Implementation: The cleanup itself is just another activity. It can be a simple call to delete files, rollback a database change, send a compensating event, etc. Ensure the cleanup activity is idempotent or safe to run multiple times, since in failure scenarios Temporal might retry it if it fails, or a workflow might be retried/restarted and attempt the cleanup again. For example, attempting to delete a file that’s already deleted should not error. This makes the cleanup robust.
Workflow Cancellation vs Termination: Always prefer cancellation over termination when you want workflows to do cleanup. Cancellation triggers the cancellation exception inside the workflow (which you can catch as shown above to run compensations/cleanup) (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal). In contrast, a termination (or an execution timeout expiring) will immediately stop the workflow without giving it a chance to handle the event (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal). Thus, for scenarios where you might externally stop a workflow and still need cleanup, use
WorkflowHandle.cancel()rather thanWorkflowHandle.terminate(). The Temporal team specifically advises using workflow cancellation (which is graceful) or internal timers for timeouts, instead of hard timeouts that the workflow cannot intercept (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal) (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal).
Retry Policies and Timeout Handling
Temporal’s built-in retry policies and timeout mechanisms play a role in how you handle failures and trigger cleanups:
Activity Retry Policy: By default, Temporal will retry an activity that fails or times out, using an exponential backoff strategy. You can configure the retry policy on each activity (max attempts, interval, etc.) or disable retries. Best practice is to allow retries for transient errors so that the workflow can succeed without human intervention. Only bypass retries for errors that are definitively not recoverable (e.g., validation errors). In those cases, throw an
ApplicationFailure.nonRetryable()from the activity (Temporal Error Handling In Practice) so the workflow catches it immediately and can perform compensation. For example, ifchargePaymentactivity in a saga gets a decline response (business rule fail), you might throw a non-retryable failure to trigger an immediate rollback instead of retrying the charge. Conversely, if an activity simply times out due to a network issue, letting Temporal retry it a few times is wise; only after final failure would the workflow enter the compensation logic.Workflow Retry Policy: You can also configure retries at the workflow level (for the whole workflow run). If a workflow run fails, Temporal Server can automatically start a new run of the workflow (often used for cron or recurring scenarios). However, if you are implementing compensation inside the workflow, you typically will not want the entire workflow retried from scratch on failure, as that could duplicate work. In most cases, leave Workflow retries disabled when using manual compensation logic, or handle idempotency carefully so that a retried workflow run doesn’t repeat side effects in an inconsistent way. (This is a more advanced scenario; often it's simpler to not rely on workflow retries for saga-style workflows and instead handle all cleanup in one run.)
Heartbeats for long-running Activities: If an activity performs a lengthy operation, use heartbeats (
Activity.Context.heartbeatin the activity code) with a heartbeat timeout. This makes the Temporal server aware that the activity is alive and allows you to cancel it faster if needed. If you cancel a workflow while an activity is running, the activity will only receive a cancellation signal on its next heartbeat (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal). A well-behaved activity should periodically heartbeat and also handle cancellation internally (e.g., by catching aCanceledErrorin activity code or checkingActivity.Context.cancelled). Proper heartbeat usage ensures timely cancellation and thus timely execution of your cleanup logic in the workflow.Handling Workflow Timeouts Gracefully: As noted, a Workflow Execution Timeout is akin to a kill switch with no chance to run cleanup code. To ensure cleanup runs, do not solely rely on execution timeouts. Instead, implement a timer inside the workflow (e.g., using Temporal’s
sleeporScheduleAPIs) to enforce a deadline. If the timer fires, you can decide to throw a controlled exception or cancel the workflow from within (which triggers the cancellation flow you can handle). Another pattern is to use a parent workflow to start a child with a shorter timeout: if the child workflow times out and fails, the parent workflow catches that failure and can run a cleanup activity or compensation in the parent context (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal). This “parent-child” approach effectively turns an external timeout into a catchable error in the parent workflow.
Putting It All Together: Best Practices
To ensure a cleanup activity runs in all failure cases, combine the above techniques:
Always catch errors in the Workflow: Surround your critical workflow steps with a try/catch. In the catch, invoke cleanup or compensations. This covers activity exceptions, activity timeouts (which surface as exceptions), and even cancellation (which throws a
CancelledFailure). UseisCancellation(error)to detect cancellations and run cleanup in a non-cancellable scope if needed (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation). This guarantees that even if the workflow is cancelled mid-way, your cleanup activity (or compensating actions) will be executed before the workflow truly terminates.Use CancellationScope.nonCancellable for cleanup steps: This is a crucial Temporal API to prevent a cancellation from aborting the cleanup. Any Activities or workflow code inside a non-cancellable scope will ignore cancellation requests from parent scopes (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation) (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation). In practice, wrap your cleanup activity call or compensation loop in
CancellationScope.nonCancellable(...)whenever you call them from a catch/finally. Temporal samples demonstrate this pattern for running cleanup after a cancellation event (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation).Leverage Saga pattern for multi-step workflows: If your workflow does several distinct operations that need rollback, structure your code to collect compensations for each step. The example above and Temporal’s saga tutorial code show how to do this in TypeScript (Saga Compensating Transactions | Temporal). This ensures that any failure at any point triggers the appropriate cleanup of all prior successful steps.
Prefer workflow cancellation over termination/timeouts: Design your system to request cancellations when you need to stop a workflow early. This gives the workflow a chance to perform its cleanup. Avoid terminating workflows except in truly unrecoverable situations, as it will not run any workflow code thereafter (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal).
Configure retries thoughtfully: Let Temporal handle transient failures with retries, but mark permanent errors as non-retryable to fall out to your compensation logic. Also consider the retry policy on your cleanup activity – you may want it to retry on failure (so the cleanup itself is robust).
Consult Temporal documentation: Temporal’s docs and community resources have extensive discussions on failure handling. For example, the Temporal docs on cancellation and scopes provide guidance on using
CancellationScopeand checking forCancelledFailure(Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation), and the Temporal blog covers the Saga pattern with examples in multiple languages (Saga Compensating Transactions | Temporal) (Saga Compensating Transactions | Temporal). These can provide additional context and examples.
By following these practices, you can ensure that no matter how a workflow fails – whether an activity crashes, a third-party service call times out, or a cancellation request comes in – your Temporal workflow will reliably execute the necessary cleanup or rollback logic before completing. This leads to more resilient, correct applications that gracefully handle errors in complex long-running processes.
Sources:
Temporal Community Forum – Handling workflow cancellation and ensuring cleanup (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal) (How to define cleanup for workflow cancellations (typescript-sdk) - Community Support - Temporal)
Temporal Documentation – Workflow Cancellation and Scopes (TypeScript) (Interrupt a Workflow - TypeScript SDK | Temporal Platform Documentation)
Temporal Blog – Saga Pattern (Compensating Transactions) in TypeScript (Saga Compensating Transactions | Temporal) (Saga Compensating Transactions | Temporal)
Temporal Community Forum – Saga compensations and activity completion considerations (Support for Saga compensating transactions in Typescript - Community Support - Temporal) (Support for Saga compensating transactions in Typescript - Community Support - Temporal)
Flightcontrol Blog – Temporal Error Handling (error wrapping and non-retryable errors) (Temporal Error Handling In Practice) (Temporal Error Handling In Practice)
Temporal Community Forum – Workflow timeouts vs. cleanup best practices (Recommended way for running cleanup activity on workflow timeout - Community Support - Temporal)


