On February 2, 2026, a deployment introduced a configuration change that referenced a file not present in our production build artifacts, causing backend services to become unavailable. API endpoints, automations, and tracking functionality were impacted for 2 hours and 17 minutes in the US region and 2 hours and 42 minutes in the Ireland region. Recovery was prolonged by a concurrent outage at GitHub Actions, our CI/CD provider, which prevented our standard automated rollback. Our team identified the root cause, executed a manual rollback independent of the affected provider, and restored full service across all regions. We have defined targeted action items to prevent recurrence.
All times in PST.
| Time | Event |
|---|---|
| 10:59 | Deployment of latest release initiated through standard CI/CD pipeline |
| 11:44 | Deployment completed and went live; services immediately began experiencing errors due to a missing configuration dependency |
| 11:49 | GitHub Actions, our CI/CD provider, experienced a complete outage, preventing standard rollback procedures |
| 11:52 | Monitoring alerts triggered; engineering team engaged |
| 12:00 | Incident declared; rollback initiated; engineering team assembled |
| 12:07 | Status page updated — issue identified and rollback in progress. Rollback ends up being blocked by GitHub actions outage. |
| 12:39 | Team pivoted to an alternative manual rollback approach. This required testing and validating the new approach. |
| 13:50 | Team executed the alternative manual rollback independent of GitHub Actions after they were fully satisfied that the new approach was safe. |
| 14:01 | Manual rollback completed in US region; services confirmed operational |
| 14:26 | Ireland region deployment completed; services confirmed operational |
| 15:13 | All services verified stable across all regions; incident resolved |
The disruption was traced to a configuration change included in the latest release. The change introduced a startup dependency on a utility file that was intended to be bundled with the deployment package. However, the file was not included in the production build artifacts. When backend services attempted to initialize, they were unable to locate the required file and could not start, resulting in all incoming requests being rejected.
This discrepancy was not caught prior to production deployment because the file was present and functioning correctly in the development environment. The difference in how build artifacts are assembled between development and production environments meant the issue only manifested in production.
| # | Action Item | Owner | Priority | Status |
|---|---|---|---|---|
| 1 | Require successful staging deployment and smoke tests before any production deployment | Engineering | P1 | In Progress |
| 2 | Improve the reliability of automated smoke tests | Engineering | P1 | In Progress |
| 3 | Add build-time validation to confirm all referenced startup dependencies are present in deployment packages | Engineering | P2 | Open |
| 4 | Adopt an expedited rollback process as the standard emergency procedure, independent of CI/CD provider availability, reducing recovery time by approximately 35 minutes | Engineering | P2 | In Progress |