Skip to content

Configuration and Change Management

2020.1

Changineers standardizes and automates configuration management through the use of automation scripts as well as documentation of all changes to production systems and networks. Automation tools such as Terraform automatically configure all Changineers systems according to established and tested policies, and are used as part of our Disaster Recovery plan and process.

Policy Statements

Changineers policy requires that:

(a) All production changes, including but not limited to software deployment, feature toggle enablement, network infrastructure changes, and access control authorization updates, must be invoked through approved change management process.

(b) Each production change must maintain complete traceability to fully document the request, including requestor, date/time of change, actions taken and results.

(c) Each production change must be fully tested prior to implementation.

(d) Each production change must include a rollback plan to back out the change in the event of failure.

(e) Each production change must include proper approval.

  • The approvers are determined based on the type of change.
  • Approvers must be someone other than the author/executor of the change.
  • Approvals may be automatically granted if certain criteria is met. The auto-approval criteria must be pre-approved by the Security Officer and fully documented and validated for each request.

Controls and Procedures

Configuration Management Processes

  1. Configuration management is automated using industry-recognized tools like Terraform to enforce secure configuration standards.

  2. All changes to production systems, network devices, and firewalls are reviewed and approved by Security team before they are implemented to assure they comply with business and security requirements.

  3. All changes to production systems are tested before they are implemented in production.

  4. Implementation of approved changes are only performed by authorized personnel.

  5. Tooling is used to generate an up to date system inventory.

    • All systems are categorized and labeled by their corresponding environment, such as dev, test, and prod.
    • All systems are classified and labeled based on the data they store or process, according to Changineers data classification model.
    • The Security team maintains automation which monitors all changes to IT assets, generates inventory lists, using automatic IT assets discovery, and services provided by each cloud provider.
    • IT assets database is used to generate the diagrams and asset lists required by the Risk Assessment phase of Changineers’s Risk Management procedures.
    • Changineers Change Management process ensures that all asset inventory created by automation is reconciled against real changes to production systems. This process includes periodic manual audits and approvals.
    • During each change implementation, the change is reviewed and verified by the target asset owner as needed.
  6. All IT assets in Changineers have time synchronized to a single authoritative source.

    • All AWS instances are pointing to the same set of ntp.org servers.
  7. All frontend functionality (e.g. user dashboards and portals) is separated from backend (e.g. database and app servers) systems by being deployed on separate infrastructure.

  8. All software and systems are required to complete full-scale testing before being promoted to production, and is done so automatically wherever possible.

  9. All code changes are reviewed to assure software code quality, while in development, to proactively detect potential security issues using pull-requests and static code analysis tools. More details can be found in the Software Release / Code Promotion section.

Configuration Monitoring and Auditing

All infrastructure and system configurations, including all software-defined sources, are centrally aggregated to a configuration management database AWS Config.

Configuration auditing rules are created according to established baseline, approved configuration standards and control policies. Deviations, misconfigurations, or configuration drifts are detected by these rules and alerted to the security team.

AWS Config is configured to actively monitor runtime environments to detect drift, and ScoutSuite is used to analyse changes to configuration and infrastructure as it is deployed to lower environments. Failure to rectify an issue raised by ScoutSuite results in a failing build and an inability to continue the release.

Production Systems Provisioning

  1. Before provisioning any systems, a request must be created and approved in the GitHub Production Change Management (PRODCM) project.

  2. The security team must approve the provisioning request before any new system can be provisioned, unless a pre-approved automation process is followed.

  3. Once provisioning has been approved, the implementer must configure the new system according to the standard baseline chosen for the system’s role.

  4. If the system will be used to store sensitive information (including ePHI), the implementer must ensure the volume containing this sensitive data is encrypted.

  5. Sensitive data in motion must always be encrypted.

  6. A security analysis is conducted once the system has been provisioned. This can be achieved either via automated configuration/vulnerability scans or manual inspection by the security team. Verifications include, but is not limited to:

    • Removal of default users used during provisioning.
    • Network configuration for system.
    • Data volume encryption settings.
    • Intrusion detection and virus scanning software installed.
  7. The new system is fully promoted into production upon successful verification against corresponding Changineers standards and change request approvals.

User Endpoint Security Controls and Configuration

  1. Employee laptops, including Windows, Mac, and Linux systems, are configured manually by the device owner.

  2. The following security controls are applied at the minimum:

    • Disk encryption
    • Unique user accounts and strong passwords
    • Auto-update of security patches
  3. The security configurations on all end-user systems are inspected by Security through either a manual periodic review or an automated compliance auditing tool.

Server Hardening Guidelines and Processes

The Changineers Platform uses AWS Lambda to respond to user’s requests, routed via a managed API Gateway, with data stored in Amazon DynamoDB and Amazon S3. None of these services require any special hardening by Changineers workers.

Configuration and Provisioning of Management Systems

  1. Provisioning management systems such as configuration management servers, remote access infrastructure, directory services, or monitoring systems follows the same procedure as provisioning a production system.

  2. Critical infrastructure roles applied to new systems must be clearly documented by the implementer in the change request.

Configuration and Management of Network Controls

All network devices and controls on a sensitive network are configured such that:

  • Network controls are implemented using Virtual Private Clouds (VPCs) where appropriate, and make thorough use of Security Groups. All infrastructure is managed as code and stored in approved repositories. All changes to the configuration follow the defined code review, change management and production deployment approval process.

  • Vendor provided default configurations are modified securely, including

    • default encryption keys,
    • default SNMP community strings, if applicable,
    • default passwords/passphrases, and
    • other security-related vendor defaults, if applicable.
  • Encryption keys and passwords are changed anytime anyone with knowledge of the keys or passwords leaves the company or changes positions.

  • Traffic filtering (e.g. firewall rules) and inspection (e.g. Network IDS/IPS or AWS VPC flow logs) are enabled.

  • An up-to-date network diagram is maintained.

Provisioning AWS Accounts

AWS Account Structure / Organization

Changineers maintains a single Organization in AWS, maintained in a top-level AWS account (master). Sub-accounts are connected that each hosts separate workloads and resources in its own sandboxed environment. The master account itself handles aggregated billing for all connected sub-accounts but does not host any workload, service or resource, with the exception of DNS records for Changineers root domain, using AWS Route53 service. DNS records for subdomains are maintained in the corresponding sub-accounts.

Access to each account is managed centrally, following the HR on-boarding and exit processes.

The account and network structure looks like the following:

Changineers-master
│  └── billing and root DNS records only
│
├── Changineers-beta
|    ├── API Gateway
|    ├── DynamoDB
|    ├── Lambdas
|    ├── S3
|    └── VPC
│        └── Subnets
│             └── Security-Groups
│            
└── Changineers-prod
     ├── API Gateway
     ├── DynamoDB
     ├── Lambdas
     ├── S3
     └── VPC
         └── Subnets
              └── Security-Groups

Infrastructure-as-Code

Changineers AWS environments and infrastructure are managed as code. Provisioning is accomplished using a set of automation scripts and Terraform code. Each new environment is created as a sub-account connected to Changineers-master. The creation and provisioning of a new account follows the instructions documented in the Bootstrap a new AWS environment page of the development wiki.

Automated change management for deploys to AWS

The Changineers Continuous Delivery Pipeline automates creation and validation of change requests. This is done in a 3-step process:

  1. Request a deployment

    GitHub Actions is used for continuous delivery (build and deploy), and we employ Slack automation such that:

    • Whenever deployment to a controlled environment (e.g. production) is desired, the developer requests a deployment by messaging the Changineers Build Bot with a /deploy command including the desired environment and git commit details.
    • The bot requests approval to deploy from an appropriate party
  2. Obtain Approval

    • The Changineers bot will message an appropriate person for approval.
    • The required approvers will review the details of the change and approve/decline accordingly.

    Important

    1. Note that the above flow does not catch weaknesses in design, and therefore does not replace the need for threat modeling and security review in the design phase.
    2. Additional requirements may be added later as the process continues to mature.
  3. Deploy

    When the Build Bot receives approval, it contacts GitHub Actions and initiates a job to deploy.

    • The deployment runs Terraform, copies any immutable artifacts from earlier environments, and completes the deployment.

    • Terraform performs a zero-downtime deploy of any affected services by releasing new versions of the software first whilst maintaining the previous versions

    • Once the deployment is complete, synthetic transactions are executed on the system’s “sandbox” tenant which allow detection of runtime issues before users are affected.

    • If any issues are detected, the release is halted and changes are rolled back.

    • If no issues are found, user traffic is drained from previous versions and directed towards the new versions. Once this is complete, the previous versions are decommissioned.

    • Once a deploy is completed, the Build Bot reports back to Slack that the deployment is complete.

Patch Management Procedures

Local Systems

Changineers requires auto-update for security patches to be enabled for all user endpoints, including laptops and workstations.

  • The auto-update configuration and update status on all end-user systems are inspected by Security through either manual periodic audits or automated compliance auditing agents installed on the endpoints.

Cloud Resources

Changineers’s Platform has a “Serverless” architecture that ensures there are no long-lived services, and relies heavily on AWS managed services.

  • AWS Lambda is used to dynamically provision compute resources on-demand per-request, and terminate them at the end of a user request.

  • Versions of software dependencies are audited continuously using the same tooling we use for vulnerability scanning (refer to Vulnerability Management).

There are no long-lived servers or compute resources that require patching.

Production Deploy / Code Promotion Processes

In order to promote changes into Production, a valid and approved Change Request (CR) is required. It can be created in the Change Management System/Portal which implements the Changineers Change Management workflow, using the Production Change Management (PRODCM) GitHub project to manage changes and approvals.

  • At least two approvals are required for each PRODCM ticket. By default, the approvers are

    • Security Lead and
    • Engineering Lead.
  • Each PRODCM ticket requires the following information at a minimum:

    • Summary of the change
    • Component(s) impacted
    • Justification
    • Rollback plan
  • Additional details are required for a code deploy, including:

    • Git commit
    • Deploy branch (e.g. master)
    • Target environment
    • Links to pull requests and/or GitHub issues
    • Security scan status and results

Emergency Change

In the event of an emergency, the person or team on call is notified. This may include a combination or Development, IT, and Security.

If an emergency change must be made, such as patching of a zero-day security vulnerability or recovering from a system downtime, and that the standard change management process cannot be followed due to time constraint or personnel availability or other unforeseen issues, the change can be made by:

  • Notification: The Engineering Lead, Security Lead, and/or IT Lead must be notified by email, Slack, or phone call prior to the change. Depending on the nature of the emergency, the leads may choose to inform members of the executive team.

  • Access and Execution: Manually access of the production system or manual deploy of software, using one of the following access mechanisms as defined in Access Control policy and procedures:

    1. Support/Troubleshooting access
    2. Root account or root user access
    3. Local system access (for on-premise environment)
  • Post-emergency Documentation: A PRODCM ticket should be created within 24 hours following the emergency change. The ticket should contains all details related to the change, including:

    • Reason for emergency change
    • Method of emergency access used
    • Steps and details of the change that was made
    • Sign-off/approvals must be obtained per the type of change as defined by the standard CM process
  • Prevention and Improvement: The change must be fully reviewed by Security and Engineering together with the person/team responsible for the change. Any process improvement and/or preventative measures should be documented and an implementation plan should be developed.