Post-Mortem: Backend Incident
Earlier, we ran into an issue with our mailing system that caused some backend instability on our side.
We are sorry for the disruption and for any confusion this may have caused.
What happened
The mailing system started failing, and part of our backend was still too tightly connected to it. Because of that, the backend reacted badly and restarted more than it should have.
User impact
- Some email notifications were delayed or not sent
- There may have been short periods of instability on some backend-related features, including license validation
- Copying operations for existing instances were not affected
Root cause
This was a mailing-system issue, not an issue with the trade copying engine itself. The main problem was that the mailing system was still connected too closely to other backend processes.
What we changed
- We fixed the failure path that was causing the backend instability
- We decoupled the mailing system from the rest of the platform
- We added extra protection so if the mailing system fails again, it should stay isolated and not affect anything else
Current status
The fix has been applied and we are monitoring things closely.
Bottom line
This incident was limited to the mailing side of the platform. Existing copy operations continued running normally, and we’ve now isolated the mailing system so the same kind of issue should not cause wider disruption next time.