Observability

Where to look at metrics, logs, traces, and alarms for the Changineers platform.

This page covers where to look when you want to see what the platform is doing, and how alarms get from a metric to incident.io.

Metrics and dashboards

All metrics, dashboards, and alarms live in Amazon CloudWatch. Default AWS metrics are augmented with custom metrics emitted from application code where there’s something specific worth tracking.

SLOs are codified in Terraform. If you want to know what counts as “production health degraded” for a service, look at its alarms in infra/providers/aws/03-environment/03-application/.

Logs

Application logs land in CloudWatch Logs. Day-to-day querying is through CloudWatch Logs Insights. Logs are JSON-structured; the infrastructure annotates each line with the service name, correlation ID, and request ID automatically.

What not to log

Don’t log personal data: no email addresses, no full names, no phone numbers, no addresses, no customer-uploaded content. Use identifiers (user ID, tenant ID, resource ID) instead. The same applies to credentials: no passwords, no API keys, no OAuth or session tokens, no AWS credentials.

If you spot any of these in logs, treat the incident as a leaked credential under Vulnerability management and rotate the affected credential first.

Tracing

AWS X-Ray is enabled for distributed tracing. When you’re tracking a request across services, X-Ray is the place to start.

Alarms

Alarms are defined in Terraform and routed through SNS topics into incident.io. The pattern:

Each service has its own SNS topic subscription with an AlarmName prefix filter, so an alarm called ApiGateway5xxErrors routes to the API service’s incident.io feed automatically.
Alarm names follow <Service><Description> (e.g. ApiGateway5xxErrors, DatabaseConnectionFailures).

When you add a new alarm, follow the naming convention and the existing service-prefix routing in the Terraform, and the routing follows.

Synthetic monitoring

Sentry runs uptime pings against the production public endpoints.

Observability

Metrics and dashboards

Logs

What not to log

Tracing

Alarms

Synthetic monitoring

Hiring & onboarding

Building software

Shipping & operations

Data

Resilience & response

Public

Deprecated (pending removal)