Skip to content

Incident response

How Changineers detects, responds to, contains, and notifies on security incidents and breaches.

This page is the engineering procedure for security incidents and confirmed or suspected breaches. It’s deliberately separate from routine production-outage response: security incidents have different communication defaults (no automatic status-page post, disclosure gated by the CEO and legal counsel, timing constrained by regulatory or investigative needs).

For routine production alarms and rollbacks, see Change management § Recovery. For customer-facing platform availability, the status page is the default channel.

The Incident Response Plan policy is published in the Changineers trust portal.

A security event is anything observable that touches the confidentiality, integrity, availability, or privacy of Changineers data, systems, or networks. A security incident is a security event that has caused, or is likely to cause, loss or damage.

Triage decides which is which.

If you discover or suspect a security incident, raise it immediately. Channels:

  • security@changineers.com.au for reports from outside Changineers (customer reports, researcher disclosures, vendor advisories).
  • incident.io for production-platform issues. It pages on-call and opens a Slack channel for coordination.
  • #security in Slack for anything that doesn’t fit either of the above. Tag @channel to surface it.

Record what you observed, when, and how. Mark speculation about cause as speculation.

External researchers report through Responsible disclosure.

Whoever triages the report assigns severity, and revises it as investigation continues.

SeverityDefinition
P0 / CriticalActive exploitation in progress, or confirmed compromise of customer data, production secrets, or the AWS root account. Any risk of harm to an individual arising from the incident.
P1 / HighStrong indication of compromise without confirmed exploitation, or a vulnerability with direct exploitation risk. Lost or stolen unencrypted device. Suspected adversary persistence (backdoors, malware). Unauthorised access to business data.
P2 or P3 / Medium and LowSuspicions or odd behaviours that need investigation but show no clear sign of risk. Suspicious emails. Unusual but unverified activity.

The on-call schedule lives in incident.io. Look there for who’s currently paged for security incidents and how to escalate.

When you declare an incident, incident.io assigns the paged on-call engineer as the Incident Manager by default. The Incident Manager runs the response and can hand off the role to someone else when scope or expertise demands.

The Incident Manager has authority to take technical containment, eradication, and recovery actions without prior approval from legal or executive staff. Customer-facing or business remediation (issuing credits, identity-protection services, public mitigation statements) requires CEO and legal-counsel approval.

For P0 incidents and any suspected breach, page the CTO as well. The CTO may take over as Incident Manager or appoint someone else depending on the situation. P1 incidents notify the CTO via Slack. P2 and P3 issues open a ticket and route to the appropriate team.

If the CTO is the suspected actor or otherwise conflicted, escalate to the CEO. The CEO engages external legal counsel from there.

For P0 and P1 incidents the active response runs through three steps, aligned with the Detect, Respond, and Recover Functions of NIST SP 800-61 r3. The pre-incident Functions (Govern, Identify, Protect) live in the rest of this handbook.

  1. Detect. Confirm severity, identify scope, and decide whether to declare formally in incident.io. Capture indicators of compromise as they surface.
  2. Respond. Preserve evidence before destructive remediation: export relevant CloudTrail events, snapshot AMIs and EBS volumes for affected instances, and hold any affected laptop intact. Limit the impact: revoke compromised credentials, isolate affected hosts, block malicious traffic at the WAF or Security Group level. Then remove the cause: rotate keys, patch vulnerable code, remove backdoors. Run breach determination (see § Breach determination and notification) in parallel with the technical response.
  3. Recover. Restore systems and verify integrity. Watch for reinfection or persistence. Return to normal operations once the cause is removed and the fix is verified.

Improvement runs continuously through the response. Apply detection or preventive changes as you identify them. Open follow-up tickets immediately for changes that can’t be made mid-incident. The post-mortem captures the timeline and decisions once the incident is closed; most preventive improvements should already be in flight by then.

Use Slack huddles in the incident channel for real-time discussion. Post decisions and major findings back to the channel so the timeline stays complete.

incident.io is the system of record for declared incidents. The incident timeline, the Slack channel transcript, and the post-mortem attach to the incident record.

Write a post-mortem for every P0; produce one by default for P1. The CTO decides whether a wider post-mortem meeting is called.

Keep incident records for at least seven years.

If Changineers’ Slack, Google Workspace, or AWS environment is suspected to be part of the compromise, the Incident Manager moves the response to an out-of-band channel. SMS or personal phone numbers are the fallback; retrieve them from the Google Workspace directory. The Incident Manager announces the switch and propagates the new channel to all responders.

Keep all incident discussion on the new channel after the switch.

If the suspected actor is an employee, contractor, or vendor, treat the incident as sensitive. Escalate the Incident Manager role to the CTO immediately; the CTO contacts the CEO directly. Keep discussion inside the response team.

The AWS root account is the highest-privilege identity in Changineers’ AWS Organization. Compromise of root means the attacker can disable CloudTrail, create IAM users, exfiltrate data from any account, and lock Changineers out. Treat any indicator of root use as P0 until proven otherwise.

  • Activity recorded against the root principal in CloudTrail.
  • Root MFA device added, replaced, or removed.
  • Root password reset.
  • Any new IAM user. Changineers uses SSO; IAM users are emergency break-glass only.
  • CloudTrail or AWS Config disabled.
  • AWS account contact details (email, phone, billing address) changed without a corresponding change ticket.
  • GuardDuty findings indicating credential exfiltration, anomalous API calls from the root principal, or unexpected EC2 or Lambda activity in unused regions.
  1. Report the compromise to AWS through the “report a compromised AWS account” path on AWS Support, and open a parallel high-severity AWS Support case for the account. The compromise report routes to AWS Trust & Safety, who can lock or assist on the account; the support case keeps an AWS support engineer engaged.
  2. Regain control of the root credentials and MFA, working with AWS if the attacker has already replaced them. Once back in control, rotate the root password and the root MFA device.
  3. Rotate every IAM access key, SSO session, and CI/CD credential that touches AWS.
  4. Review CloudTrail for actions taken by the root principal during the suspected window. Open child runbooks for each action found.
  5. Review IAM for cross-account roles, new IAM users, or trust-policy changes the attacker may have left as persistence.

Re-enable CloudTrail and AWS Config if they were disabled. Audit S3 for buckets created or modified during the window. Audit Lambda, EC2, and ECS for resources created during the window.

Once contained and recovered, run the full post-mortem. Update this runbook with anything that didn’t work as expected.

A breach is unauthorised access to, or disclosure of, unencrypted customer or personal data. Not every incident is a breach. The CEO, in consultation with external legal counsel, makes the breach determination based on the investigation output.

Once a breach is confirmed, Changineers notifies affected customers without undue delay. The internal target is 24 to 48 hours from confirmation. MSA contractual terms may set tighter limits; check the affected customer’s agreement first. Notification goes via the customer’s primary contact (email and phone where available).

Customers are responsible for notifying affected end users.

The notification includes:

  • A brief description of what happened, the date of the breach, and the date of discovery.
  • The categories of data involved.
  • Steps customers can take to protect themselves and their users.
  • What Changineers is doing to investigate, contain, and prevent recurrence.
  • Contact details for follow-up questions (security@changineers.com.au).

Australia’s Notifiable Data Breaches scheme under the Privacy Act applies. For breaches that meet the eligible data breach threshold (likely to result in serious harm), notification goes to:

The CEO and legal counsel decide whether the threshold is met.

If a law-enforcement agency requests a delay in writing, citing either an active investigation or national-security grounds, defer notification for the period the request specifies. Document the request and its scope.

Confirm verbal requests in writing within 30 days; without that confirmation, proceed with notification.

The Incident Manager keeps a breach log covering: dates of breach and discovery, data categories involved, number of records and customers affected, notifications sent, and remediation steps.

Only the CEO and legal counsel authorise external statements about a security incident or breach. Do not post to the status page or any other public channel without that approval. Disclosure may be delayed by regulatory windows, ongoing investigation, or to avoid tipping off the attacker.

Exercise the plan at least annually through a tabletop walkthrough or a live drill in non-production accounts. Document findings and track action items to closure.