When the standard release process cannot be followed, and how to document the change after.
The standard path to changing production is the release PR; see Change management. Emergency change covers the rare cases where that path cannot be used and a person must operate against production directly.
The default response to most emergencies is wait it out. The standard release process is fast enough that emergency change is rarely the better answer. Use it only when there is no realistic alternative.
When emergency change applies
Section titled “When emergency change applies”A few real scenarios:
- GitHub is unavailable and the release pipeline cannot run, and there is something that genuinely must ship before GitHub recovers.
- The platform is in an unrecoverable state and recovering requires manual intervention that the standard process cannot express.
- Infrastructure that is deliberately not automated is being changed (for example, the locked-down security AWS account).
If the situation does not match one of these, the answer is the release PR.
How a manual change is made
Section titled “How a manual change is made”Production access is not granted by default to anyone. An engineer who needs to make a manual change requests elevated AWS privileges; the CTO approves and grants access by raising the engineer’s AWS SSO permissions for the duration of the work. The mechanics are the same as the Authorised actors section of Change management.
Recording the change
Section titled “Recording the change”The record of what was done depends on whether the change is part of an active incident:
- During an active incident: the incident.io record is the record. Notes added to the incident describe what was done and why.
- Outside an incident: raise a Jira ticket capturing what is being changed and why. The ticket must be signed off by the CTO, or by another person if the CTO is the one making the change. A comment from the signing person is sufficient.
The preference is to raise the ticket or open the incident before the change is made. If the situation is too urgent for that, the record must be in place within 24 hours of the change.
After the change
Section titled “After the change”Every emergency change is followed by a post-incident review (PIR). The PIR covers what happened, why the standard process could not be used, and what would prevent the same situation requiring emergency change in future.