Atuin Incident Report: 2026-02-11

On Feb 11, 2026, Atuin was down for approx 2hrs.

This is our longest outage in ~5yrs, and I am very sorry for any impact it may have caused.

Cause

~2mo ago, we moved from our own self hosted Hetzner servers, to Railway.

Earlier today, Railway had an outage (for some subset of users) for several hours. Their full incident report is available here

https://blog.railway.com/p/incident-report-february-11-2026

TLDR is that a change to their abuse detection meant that valid workflows were being sent SIGTERM. This aligns with our logs - our container had been shutdown, did not come back up, and was then returning 502 to all requests.

Impact

Clients could not sync, new users could not register. The client is designed so that the remote going offline does not impact regular operation - shell history can be stored locally, searched locally, as per usual. However, it would not sync with our service + other devices.

Detection

Our paging was misconfigured and did not wake me overnight. Upon waking, I saw messages from users alerting us to downtime.

Remediation

Within ~15m, we migrated our API back over to Hetzner, and will be investing in our own infrastructure there once again.

Prevention

  1. We should have had the page come through sooner

  2. We need to have more control and visibility over our infrastructure. The responsiblity in vendor choice lies with us, our uptime + commitment to users is our responsiblity, and we will be taking steps to ensure we don’t face issues like this in the future. While I have empathy that building a cloud provider is hard, and admire what Railway are doing, several hours of downtime without very clear and regular communication is not what we’re looking for in our infrastructure.

4 Likes