2026.1
The Changineers Contingency Plan establishes procedures to recover Changineers following a disruption resulting from a disaster. This Disaster Recovery Policy is maintained by the Changineers Security Officer and Privacy Officer.
Policy Statements
Section titled “Policy Statements”Changineers policy requires that:
(a) A plan and process for business continuity and disaster recovery (BCDR), including the backup and recovery of systems and data, must be defined and documented.
(b) BCDR shall be simulated and tested at least once a year. Metrics shall be measured and identified recovery enhancements shall be filed to improve the BCDR process.
(c) Security controls and requirements must be maintained during all BCDR activities.
Controls and Procedures
Section titled “Controls and Procedures”BCDR Objectives and Roles
Section titled “BCDR Objectives and Roles”Objectives
Section titled “Objectives”The following objectives have been established for this plan:
-
Maximize the effectiveness of contingency operations through an established plan that consists of the following phases:
- Notification/Activation phase to detect and assess damage and to activate the plan;
- Recovery phase to restore temporary IT operations and recover damage done to the original system;
- Reconstitution phase to restore IT system processing capabilities to normal operations.
-
Identify the activities, resources, and procedures needed to carry out Changineers processing requirements during prolonged interruptions to normal operations.
-
Identify and define the impact of interruptions to Changineers systems.
-
Assign responsibilities to designated personnel and provide guidance for recovering Changineers during prolonged periods of interruption to normal operations.
-
Ensure coordination with other Changineers staff who will participate in the contingency planning strategies.
-
Ensure coordination with external points of contact and vendors who will participate in the contingency planning strategies.
Example of the types of disasters that would initiate this plan are natural disaster, political disturbances, man made disaster, external human threats, and internal malicious activities.
Changineers defined two categories of systems from a disaster recovery perspective.
- Critical Systems. These systems host production application servers/services and database servers/services or are required for functioning of systems that host production applications and data. These systems, if unavailable, affect the integrity of data and must be restored, or have a process begun to restore them, immediately upon becoming unavailable.
- Non-critical Systems. These are all systems not considered critical by definition above. These systems, while they may affect the performance and overall security of critical systems, do not prevent Critical systems from functioning and being accessed appropriately. These systems are restored at a lower priority than critical systems.
Line of Succession
Section titled “Line of Succession”Decision-making authority during a contingency event sits with the Chief Operating Officer, who is responsible for workforce safety and for executing this Contingency Plan. The Chief Technology Officer is responsible for recovery of Changineers’s technical environments. If either is unavailable or chooses to delegate, the Chief Executive Officer takes over or appoints a delegate.
- Sonya Corcoran, COO: sonya@changineers.com.au
- James Gregory, CTO: james@changineers.com.au
- Renee Lim, CEO: renee@changineers.com.au
Response Roles and Responsibilities
Section titled “Response Roles and Responsibilities”Changineers operates fully remote. BCDR response is organised around three roles:
-
Engineering is responsible for recovery of the platform and its supporting cloud infrastructure, testing redeployments, and assessing the impact to AWS services and data. Led by the CTO.
-
Security is responsible for assessing and responding to cybersecurity incidents according to Changineers’s Incident Response policy, and assists Engineering and Operations as needed during non-cybersecurity events. Led by the Security Officer.
-
Operations is responsible for workforce coordination, vendor communications, and any customer-facing communications required during a contingency. Led by the COO.
Role leads maintain local copies of this policy and of contact details for the succession list, so that a contingency can be coordinated even if internet access is degraded.
All members of Changineers’s leadership are informed of every contingency event.
General Disaster Recovery Procedures
Section titled “General Disaster Recovery Procedures”Notification and Activation Phase
Section titled “Notification and Activation Phase”This phase addresses the initial actions taken to detect and assess damage inflicted by a disruption to Changineers. Based on the assessment of the Event, sometimes according to the Changineers Incident Response Policy, the Contingency Plan may be activated by either the COO or Head of Engineering. The Contingency Plan may also be activated by the Security Officer in the event of a cyber disaster.
The notification sequence is listed below:
-
The first responder is to notify the COO. All known information must be relayed to the COO.
-
The COO is to contact the Response Teams and inform them of the event. The COO or delegate is responsible to begin assessment procedures.
-
The COO is to notify team members and direct them to complete the assessment procedures outlined below to determine the extent of damage and estimated recovery time. If damage assessment cannot be performed locally because of unsafe conditions, the COO is to following the steps below.
- Damage Assessment Procedures:
- The COO is to logically assess damage, gain insight into whether the infrastructure is salvageable, and begin to formulate a plan for recovery.
- Alternate Assessment Procedures:
- Upon notification, the COO is to follow the procedures for damage assessment with the Response Teams.
-
The Changineers Contingency Plan is to be activated if one or more of the following criteria are met:
- Changineers will be unavailable for more than 48 hours.
- On-premise hosting facility or cloud infrastructure service is damaged and will be unavailable for more than 24 hours.
- Other criteria, as appropriate and as defined by Changineers.
-
If the plan is to be activated, the COO is to notify and inform team members of the details of the event and if relocation is required.
-
Upon notification from the COO, group leaders and managers are to notify their respective teams. Team members are to be informed of all applicable information and prepared to respond and relocate if necessary.
-
The COO is to notify the hosting facility partners that a contingency event has been declared and to ship the necessary materials (as determined by damage assessment) to the alternate site.
-
The COO is to notify remaining personnel and executive leadership on the general status of the incident.
-
Notification can be message, email, or phone.
Recovery Phase
Section titled “Recovery Phase”This section provides procedures for recovering Changineers infrastructure and operations at an alternate site, whereas other efforts are directed to repair damage to the original system and capabilities.
Procedures are outlined per team required. Each procedure should be executed in the sequence it is presented to maintain efficient operations.
Recovery Goal: The goal is to rebuild Changineers infrastructure to a production state.
The tasks outlines below are not sequential and some can be run in parallel.
- Contact Partners and Customers affected to begin initial communication - DevOps
- Assess damage to the environment - DevOps
- Create a new production environment using new environment bootstrap automation - DevOps
- Ensure secure access to the new environment - Security
- Begin code deployment and data replication using pre-established automation - DevOps
- Test new environment and applications using pre-written tests - DevOps
- Test logging, security, and alerting functionality - DevOps and Security
- Assure systems and applications are appropriately patched and up to date - DevOps
- Update DNS and other necessary records to point to new environment - DevOps
- Update Partners and Customers affected through established channels - DevOps
Reconstitution Phase
Section titled “Reconstitution Phase”This section discusses activities necessary for restoring full Changineers operations at the original or new site. The goal is to restore full operations within 24 hours of a disaster or outage. If necessary, when the hosted data center at the original or new site has been restored, Changineers operations at the alternate site may be transitioned back. The goal is to provide a seamless transition of operations from the alternate site to the computer center.
-
Original or New Site Restoration
- Repeat steps 5-9 in the Recovery Phase at the original or new site / environment.
- Restoration of Original site is unnecessary for cloud environments, except when required for forensic purpose.
-
Plan Deactivation
- If the Changineers environment is moved back to the original site from the alternative site, all hardware used at the alternate site should be handled and disposed of according to the Changineers Media Disposal Policy.
Testing and Maintenance
Section titled “Testing and Maintenance”The COO and/or Head of Engineering shall establish criteria for validation/testing of a Contingency Plan, an annual test schedule, and ensure implementation of the test. This process will also serve as training for personnel involved in the plan’s execution. At a minimum the Contingency Plan shall be tested annually (within 365 days). The types of validation/testing exercises include tabletop and technical testing. Contingency Plans for all application systems must be tested at a minimum using the tabletop testing process. However, if the application system Contingency Plan is included in the technical testing of their respective support systems that technical test will satisfy the annual requirement.
Tabletop Testing
Section titled “Tabletop Testing”Tabletop Testing is conducted in accordance with CMS’s RMH Chapter 6 Supplemental Contingency Planning Exercise Procedures. The primary objective of the tabletop test is to ensure designated personnel are knowledgeable and capable of performing the notification/activation requirements and procedures as outlined in the CP, in a timely manner. The exercises include, but are not limited to:
- Testing to validate the ability to respond to a crisis in a coordinated, timely, and effective manner, by simulating the occurrence of a specific crisis.
Simulation and/or Technical Testing
Section titled “Simulation and/or Technical Testing”The primary objective of the technical test is to ensure the communication processes and data storage and recovery processes can function at an alternate site to perform the functions and capabilities of the system within the designated requirements. Technical testing shall include, but is not limited to:
- Process from backup system at the alternate site;
- Restore system using backups; and
- Switch compute and storage resources to alternate processing site.
Work Site Recovery
Section titled “Work Site Recovery”Changineers’s software development organization is a distributed team who work from multiple locations with Internet access and do not require an office. Whilst individuals may be affected by disasters, we are resilient through no two individuals being in the same place at the same time.
Application Observability
Section titled “Application Observability”All applications developed by Changineers are fully observable in all environments. By using the latest technologies we can provide active tracing, log aggregation, and real-time metrics for all our systems continuously.
At all times Changineers’s systems have:
-
Traces sampled using AWS X-Ray which provides deep inspection capabilities for every transaction that flows through the system. Real-time distributed tracing exposes request latency through each layer of the stack, as well as surfaces errors from individual components. Traces are tagged with tenant and user identifiers to allow pin-point support for issues.
-
Metrics gathered for all important aspects of the system using Amazon CloudWatch. Request latency is captured at p50, p90, and p99 thresholds for all HTTP servers and databases. Success and failure metrics are captured for all API calls. Storage amounts, free space, memory consumption, and invocation durations are all monitored.
-
Logs generated for all transactions and stored in Amazon CloudWatch Logs, which contain detailed information about the activity of the system. No sensitive data is ever stored in log files.
The Changineers Development team has access to various Amazon CloudWatch Dashboards that provide visibility into the health of the system at all times.
Alerting
Section titled “Alerting”When any of the aforementioned traces, metrics, or logs indicates that the system may be performing in a suboptimal manner (errors, performance degrading, suspicious or malicious activity) an CloudWatch Alarm will trigger and notify any subscribers (see below).
Examples of Alarms:
-
A Lambda function invokes, but fails before completing
-
A DynamoDB query takes longer to complete than an acceptable threshold
-
Our CloudFront distribution receives an unexpected spike in activity
Depending on the severity of the Alarm, different actions are taken. High severity issues result in the Head of Engineering and the Development team being contacted immediately via an interruptive medium (e.g. phone call), whilst lower severity issues may trigger e-mails or other non-interuptive methods. The severity of an issue varies based on it’s impact to customers.
Outages
Section titled “Outages”Due to the architecture that Changineers Platform uses, it is very unlikely for there to be any large-scale outages. The use of Amazon API Gateway and AWS Lambda with zero-downtime deployments results in few opportunities for service availability to be affected.
In the case of any outages, the Changineers Development team will be immediately notified via our CloudWatch Alarms, and affected customers will be notified.
Application Service Event Recovery
Section titled “Application Service Event Recovery”Changineers will develop a status page to provide real time update and inform our customers of the status of each service. The status page is updated with details about an event that may cause service interruption / downtime.
A follow up root-cause analysis details (RCA) will be available to customers upon request after the event has transpired for further details to cause and remediation plan for the future. Event Service Level
Short (hours)
Section titled “Short (hours)”- Experience a short delay in service.
- Changineers will monitor the event and determine course of action. Escalation may be required.
Moderate (days)
Section titled “Moderate (days)”- Experience a modest delay in service where processes in flight may need to be restarted.
- Changineers will monitor the event and determine course of action. Escalation may be required.
- Changineers will notify customers of delay in service and provide updates on Changineers’s status page.
Long (a week or more)
Section titled “Long (a week or more)”- Experience a delay in service and processes in flight may need to be restarted.
- Changineers will monitor the event and determine course of action. Escalation may be required.
- Changineers will notify customers of delay in service and provide updates on Changineers’s status page.
Production Environments and Data Recovery
Section titled “Production Environments and Data Recovery”Production data broadly takes two forms in the Changineers Platform:
-
User uploaded files such as pictures, videos, and documentation they produce.
-
System generated data through the use of the Platform, such as user activity data, textual responses to questions, timestamped video annotations.
User uploaded files
Section titled “User uploaded files”For user uploaded files, data is stored in Amazon S3 in the tenant’s preferred region. This type of data is never deleted by the application (prevented by IAM policies), and can only be deleted by privileged administrator users with the explicit requirement of multi-factor deletes. All modifications to files are versioned, and over time files are archived to Infrequent Access storage, and eventually to Amazon S3 Glacier for long-term storage.
Data in S3 buckets are distributed across multiple data centers within a single region. Any failure of a single datacenter does not affect the availability of data.
In the event that archived data is required to be retrieved, a request to Glacier can be submitted.
System generated data
Section titled “System generated data”For any data that is created by the system on the behalf of a user, for example any data typically stored in a database, the Changineers Platform stores the data in Amazon DynamoDB. Data in DynamoDB is automatically replicated across multiple data centers within a tenant’s region, providing resilience against data center failures.
Additionally, our DynamoDB is configured to take continuous snapshots of the data providing to-the-second backups. Periodic backups are also taken, and can be backed up to secondary regions if the tenant’s data sovereignty requirements permit.
Recovery
Section titled “Recovery”Recovery of production Environments and data should follow the procedures listed above and in Data Management - Backup and Recovery
Revision History
Section titled “Revision History”| Date | Summary | Approved by |
|---|---|---|
| 2020-01 | Initial revision. | James Gregory |
| 2026-04-24 | Updated BCDR roles for remote-first operation. | James Gregory |