Security Architecture and Operating Model¶
2020.1
In the digital age, cyber attacks are inevitable. At Changineers, we are taking a “zero trust”, “minimal infrastructure” approach to managing risk and information security.
This document describes our guiding principles and aspirations in managing risk and the building blocks of our security model.
Policy Statements¶
Changineers policy requires that:
(a) Changineers’s security program and operations should be designed and implemented with the following objectives and best practices:
- data-centric, cloud-first
- use best-of-breed open-source and commercial off the shelf products
- automate everything possible
- assume compromise therefore never trust, always verify
- apply controls using least-privilege and defense-in-depth principles
- avoid single point of compromise
- prompt self management and reward good behaviors
(b) Security shall remain a top priority in all aspects of Changineers’s business operations and product development.
Controls and Procedures¶
Changineers Security Principles¶
(1) Data-centric model; zero-trust architecture¶
“Zero Trust” is a data-centric security design that puts micro-perimeters around specific data or assets so that more granular rules can be enforced. It remedies the deficiencies with perimeter-centric strategies and the legacy devices and technologies used to implement them. It does this by promoting “never trust, always verify” as its guiding principle. This differs substantially from conventional security models which operate on the basis of “trust but verify.”
In particular, with Zero Trust there is no default trust for any entity — including users, devices, applications, and packets—regardless of what it is and its location on or relative to the corporate network. In addition, verifying that authorized entities are always doing only what they’re allowed to do is no longer optional; it’s now mandatory.
Summary
- No internal network. 100% cloud.
- Fully segregated with granular policy enforcements.
- Individually secured devices. No production access by default.
(2) Serverless infrastructure¶
We extend the zero-trust security model with a “Minimal Infrastructure” approach. For all of our systems we prefer to:
- Use tried-and-tested Software-as-a-Service products instead of building additional software ourselves that aren’t part of our value proposition.
- For infrastructure we do provision ourselves, we prefer to use Amazon’s Cloud services that require minimal additional development.
- Finally, for anything we develop in-house we use a “Serverless” architecture using AWS Lambda which ensures there are no long-lived services available as attack vectors. When the system is not in use, there are no active components.
Following these principals allows us to contain and control access at a much more granular level, compared to operating on-premise infrastructure. Via access to the extensive APIs provided by the cloud services, we can more easily integrate and automate security operations. Additionally, minimizing infrastructure significantly reduces always-on attack surfaces. Services that are not used are turned off, instead of being idly available which opens itself up to attacks. Together with Zero Trust, this security model and architecture enables a high degree of flexibility for end-user computing while maintaining the highest level of security assurance.
Summary
- Buy instead of build when possible
- AWS managed services
- Serverless architecture
- Minimal persistent attack surface making it virtually impenetrable.
(3) Least-privilege temporary access¶
Cyber attacks are inevitable. When it comes to preparing for potential attacks, Changineers security operations take the approach that assumes a compromise can happen at any time, to any device, with little to no indicators. This is also an extension of the “zero trust” model. When building security operations, we carefully perform risk analysis and threat model, to identify potential single point of compromise and to avoid having the “keys to the kingdom”.
Compromise of any single system or user or credential, cannot lead to a broad or full compromise of the entire infrastructure or operations. For example, if an attacker gains access to an admin credential, it cannot directly lead to the compromise of all systems and data in the environment.
Summary
- Need-based access control for both employees and computing services.
- Access to critical systems and resources are closed by default, granted on demand.
- Protected by strong multi-factor authentication.
- No “keys to the kingdom”; no single points of compromise.
- “Secrets” must remain secret at all times.
(4) Immutable builds and zero-downtime deploys¶
The Changineers Platform is composed of small independently-deployable components that each have their own development and deployment lifecycles. Before any component is deployed to our production environments, it is thoroughly tested and validated in our lower environments which are completely isolated from production. This allows us to test upcoming changes while ensuring there is no impact to our customers.
A particular build of a component is considered immutable, meaning as it progresses through our environments, it is always the same artifact that is deployed every time. When a component is promoted from a lower environment to our production environment, we guarantee it is the exact same version through every step of the process. Once a component is deployed to our production environment the change will be available to Changineers customers and end-users.
Changes to our infrastructure (database schema changes, storage buckets, load balancers, DNS entries, etc…) are also described in our source code and deployed to our environments with exactly the same approach as application code. This architectural approach to managing infrastructure is referred to as infrastructure as code and is key to our fully automated deployments.
When a deployment occurs in any environment a zero-downtime deployment strategy is used, that ensures no disruption to Customers or users during a deployment. The strategy varies product-by-product depending on what individual technologies are in use, but the pattern we broadly follow is:
- New version of a component is prepared and tested
- New version is deployed into the target environment alongside the existing version of the component.
- A set of sythetic transactions are run through it using our “sandbox” tenant available in all environments.
- User traffic is directed to the new version and monitored.
- Once the new version is stable and receiving user traffic, the previous version is slow “drained” of user traffic and removed from service.
- The rolling release is now complete.
If at any point any of the checks or tests fail for the new version the release is aborted and rolled back. Only when all checks pass is the previous version decommissioned.
Summary
- Infrastructure as code with active protection.
- Automated security scans and full traceability from code commit to production.
- “Human-free” deployment ensures each build is free from human error or malicious contamination.
(5) End-to-end data protection and privacy¶
It is of the utmost importance that Changineers provides for confidentiality (privacy), integrity and availability of its customer’s data. Your data is protected with end-to-end encryption, combined with strong access control and key management. We also prohibit our internal employees to access customer data directly in production. So your data remains safe and private at all times. We will never use or share your data without your prior consent.
Summary
- Data is safe both at rest and in transit, using strong encryption, access control and key management.
- No internal user access is allowed to customer data in production.
(6) Strong yet flexible user access¶
Access control is critical and we must get it right. That’s why we leverage tried-and-true technologies such as Amazon Cognito, which provides federated identity support, OAuth, multi-factor authentication, and fine-grained authorization to provide strong yet intuitive access options for our customers to access Changineers’s Platform and services.
Summary
- Amazon Cognito.
- Multi-factor authentication.
- Fine-grain attribute-based or role-based authorization.
(7) Watch everything, even the watchers¶
You can’t protect what you can’t see. This applies to the infrastructure, environments, operations, users, systems, resources, and most importantly, data. It is important to inventory all assets, document all operations, identify all weaknesses, and visualize/understand all events.
This includes conducting various risk analysis, threat modeling, vulnerability assessments, application scanning, and penetration testing. Not only that, this requires security operations to keep an eye on everything, and someone should also “watch the watchers”.
At first, this would require significant manual effort and may seem impossible to keep up-to-date. Our goal is to automate security operations, so that this can be achieved programmatically as our operations evolve to become more complex.
Additionally, Changineers security team will actively monitor threat intelligence in the community, with feeds and information sharing platform such as NH-ISAC to stay abreast of the attacker activities and methodologies.
Summary
- All environments are monitored; All events are logged; All alerts are analyzed; All assets are tracked.
- No privileged access without prior approval or full auditing.
- We deploy monitoring redundancy to “watch the watchers”.
(8) Centralized and automated operations¶
As much as possible, Changineers security will translate policy and compliance requirements into code for easy implementation and maintenance. This allows to enforce policy and compliance in a fast and scalable way, rather than relying solely on written policies and intermittent manual audits. For example, Access Control policies for production environments are translated into AWS IAM policies and implemented via Terraform code, whilst network security rules are continuously evaluated with AWS Config.
Automation makes it truly possible to centralize security operations, including not only event aggregation and correlation, but also the orchestration and management of previously siloed security controls and remediation efforts.
Summary
- Automate everything
- Continuous monitoring with AWS Config
Security Architecture¶
Changineers developed a security architecture on using AWS Cloud and DevSecOps practices.
Cloud Native Architecture¶
The Changineers Platform is designed following the AWS Well-Architected Framework.
Serverless¶
Designed for the cloud using the latest in Serverless technologies to create an architecture that is both secure and massively scalable. By designing with the Serverless Application Lens in mind Changineers has created a system that is capable of scaling across multiple data centers in multiple regions (considering a Customer’s data sovereignty requirements). There are no persistent servers, which significantly reduces any attack surface areas, and all security patching and operating system maintenance is covered by Amazon’s Cloud Security.
Multi-tenanted¶
Using Amazon’s SaaS Storage Strategies for Multi-tenant Storage as a baseline, Changineers Platform is a multi-tenant solution that has strong isolation guarantees for customer data whilst enabling economies of scale through reuse of infrastructure across multiple tenants.
The isolation model of AWS Lambda described in Amazon’s Security Overview of AWS Lambda explains how Changineers is able leverage Lambda’s design to create components that are highly reusable between tenants with no security implications.
Authentication and Authorization Systems¶
In the Changineers Platform authentication is managed by Amazon Cognito which is a managed identity and authentication provider.
Amazon Cognito manages all storage of user credentials and encrypts passwords at rest. It is HIPAA eligible and PCI DSS, SOC, and ISO/IEC 27001, ISO/IEC 27017, ISO/IEC 27018, and ISO 9001 compliant.
Advanced security features for Amazon Cognito help protect our users from unauthorized access to their accounts using compromised credentials. When Amazon Cognito detects users have entered credentials that have been compromised elsewhere, it prompts them to change their password.
If Amazon Cognito detects unusual sign-in activity, such as sign-in attempts from new locations and devices, it assigns a risk score to the activity and through configuration can choose to either prompt users for additional verification or block the sign-in request. Users can verify their identities using SMS or a Time-based One-time Password (TOTP) generator, such as Google Authenticator.
Amazon Cognito uses common identity management standards including OpenID Connect, OAuth 2.0, and SAML 2.0, and can be configured to federate identity from other identity providers such as AzureAD.
All login attempts to Changineers Platform is logged for audit purposes.
Environments¶
The Changineers Platform has two default environments:
- Production, where the latest version of the application resides for use by users
- Beta, a pre-release environment for validating changes before they’re rolled out to users
For integration with Customer systems or on an ad-hoc basis, new environments can be provisioned following the Software Delivery Life Cycle practices.
Architecture Diagrams¶
Detailed architecture diagrams of the in-scope networks, endpoints, applications as well as the security operations are developed and maintained by Changineers. Diagrams are created using the C4 model for visualising software architecture and focus on various levels of detail.
Platform high-level context:
Within the Changineers Platform container:
Then deeper within a single service:
A broad network-level diagram:
Metrics, Measurements and Continuous Monitoring¶
A set of metrics / KPIs have been defined to assist in the measuring, reporting and optimizing the security program and the controls in place.
Changineers follows Site Reliability Engineering principles and defines various Service Level Indicators for monitoring the behaviour of our systems, with Service Level Objectives that the teams strive to achieve, and where Customers request it Changineers can include Service Level Agreements in contracts.
Service Level Indicators¶
Changineers captures a variety of performance metrics (in addition to other metrics) that are used to continuously monitor the behaviour of the system. Below is a sample of some of the metrics we capture:
For each AWS Lambda functions:
- Invocation counts with success/failure dimensions
- Concurrent invocations
- Execution duration (p50, p95, and p99)
- Version changes
For Amazon DynamoDB tables:
- Get, Query, and Put counts with success/failure dimensions
- Query result record counts
- Get, Query, and Put durations (p50, p95, and p99)
- Hot key/partition metrics
For Amazon S3 buckets:
- Put and Get object counts with success/failure dimensions
- Request latency (p50, p95, and p99)
For Amazon CloudFront distributions:
- Request counts with success/failure dimensions
- Request latency (p50, p95, and p99)
- Origin cache hit/miss ratios
For Amazon Cognito User Pools:
- Login attempt counts with success/failure dimensions
- Risk classification of login attempts (low, medium, high)
- Multi-factor authentication usage
- Rate limiting and throttling behaviour
The above metrics are collected and aggregated in Amazon CloudWatch and presented to Changineers in Dashboards that can be used to visualise the current behaviour of the system.
If the system is behaving unexpectedly, such as poor performance or an increase in malicious activity, the Changineers development team will be alerted following the Application Observability processes.
Service Level Objectives¶
For each of Changineers services there is a SLO defined that the team strives to achieve. As a baseline we aim to be as available as the underlying services that we depend on.
Our objectives are:
-
99.9% availability or greater for all our services, which corresponds to around 9 hours of downtime a year.
-
1000 active users per day, with use being evenly distributed throughout the day.
-
200 concurrent active users, with use being unevenly distributed or clustered around particular times of the day (e.g. the beginning of the work day, or start of a class/workshop) .
-
200ms response times on average, with p95 response times being within 1000ms.
Wherever possible, we aim to exceed this objective and are continually improving our system’s performance based on user feedback and testing.
Service Level Agreements¶
For Customers who have high availability requirements, Changineers can include SLAs in any contracts. Higher availability requirements than our SLOs will need to be discussed on a case-by-case basis.
Quality of Service¶
Changineers strives to provide a high quality of service to all of its customers. This is accomplished through a security architecture that encompasses all of Changineers’s operations and provides high data confidentiality, integrity, and availability.
An overview of Changineers’s architecture can be found in Security Architecture. Changineers uses a highly scalable cloud architecture to provide system quality at all times.
All systems are monitored and measured in real time as described in Application Observability.
Changineers uses DevOps methodology as described in Software Development Process to ensure a smooth delivery process of all systems and applications.