On 2024-05-20 12:40 GMT-7, Courier experienced a sudden spike in outbound message volume. All messages were sent normally. However, the queue used to process message update events became overwhelmed and could not accept events at the rate they were produced. This caused a delay in message status updates as the queue backed up.
Although the queue would have recovered eventually on its own, the engineering team chose to increase queue capacity to resolve the issue more quickly. This increase was implemented at 14:43, with full recovery of enqueued message update events by 14:50.
Messages processed between 12:40 and 14:43 experienced a delay in status updates of up to 400 seconds, with a typical delay of about 100 seconds. There was no delay in message processing or delivery; all message update events were eventually processed. Outbound webhooks, which depend on the impacted queue, were similarly delayed, as were message statuses shown in the Logs UI and reported by the API.