Once incoming emails have been processed by BotMailRoom, we need to send a webhook to the user’s application. Since we’re sending data to a system we don’t control, if something is temporarily wrong with that system, we basically have two options:
- Give up immediately
- Retry until you succeed or give up
Different systems have different retry policies, but generally there is some fixed schedule (e.g. first retry 10 seconds
later, then 1 minute
, then 1 hour
, etc.) with some maximum number of retries before giving up. There’s an almost infinite number of ways to implement these policies in practice (e.g. the tenacity package for Python), but Temporal provides a nice way to do this as part of our workflow with the activity retry policy:
WEBHOOK_RETRY_POLICY = common.RetryPolicy(
backoff_coefficient=4,
initial_interval=timedelta(seconds=30),
maximum_interval=timedelta(hours=12),
maximum_attempts=7,
)
await workflow.execute_activity(
send_webhook,
SendWebhookInput(
...
),
start_to_close_timeout=timedelta(seconds=30),
retry_policy=WEBHOOK_RETRY_POLICY,
)
This policy will retry the activity up to 6 times (1 initial + 6 retries = 7 maximum), with a backoff schedule of:
- 30 seconds
- 2 minutes
- 8 minutes
- 32 minutes
- 2 hours 8 minutes
- 8 hours 32 minutes
- 12 hours
We don’t need to worry about issues with our own system that might impact the retries (e.g. incorrect retry logic, a node failure where we aren’t tracking the number of times the webhook has been sent, etc.), because Temporal handles that for us. If the webhook fails to deliver entirely, you can then notify the user as part of the Temporal workflow as well.