To monitor the running system, we're making use of CloudWatch alarms. If an alarm gets triggered, it automatically sends out an email to TBD. To have immediate actioning, we're getting assisted by the supporting Amazon team located in Berlin. If an alarm gets triggered, an on-call person is immediately paged and will look into the case as soon as possible.

We currently have the following alarms in place:

  • AuthorizationFailuresAlarm: Detect any attempts to access unauthorized resources
  • LimitExceededFailuresAlarm: Detect if any account limits are reached
  • IAMPolicyChangesAlarm: Detect IAM policy changes
  • HighCpuUsageAlarm: Detect high CPU usage
  • HighNetworkUsageAlarm: Detect high network usage


Deployment is done using CloudFormation. Make sure to have the AWS-CLI profiles named 'mxnetci' (prod) or 'mxnetcidev' (test) before you continue. 

To deploy, simply run ./ in the cloudformation_metrics directory. The output looks like follows:

8c85902e42be:cloudformation_metrics mabreu$ ./
Deployment stage (test, prod)
Deploying to test
upload: cloudformation_templates/alarms-stack.yml to s3://cloudformation-metrics-dev-templates/alarms-stack.yml

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - metrics-stack


In order to separate concerns, we're making use of nested CloudFormation stacks. These allow to have mostly independent templates that can be managed without influence the others. They are all managed in the master-metrics.yml file. Dependencies are marked using the DependsOn-Attribute. Further documentation is available at

  • No labels