This document describes the security design about MXNet CI.

Components

The following components are part of MXNet’s CI setup.

EC2

 

Ubuntu 16.04

Windows Server 2016

Jenkins

Apache Jenkins acts as the CI software package to validate the integrity of MXNet’s code and pull requests. Jenkins consists two parts: The Master acts as scheduler and offers a web interface while the actual code compilation and execution happens on slaves. In both cases, the process is being executed using a restricted user specifically created for Jenkins(TODO: Windows?!).

Master

OUTDATED: replaced with GitHub SSO. TODO: Update

Jenkins contains a plugin called Matrix Authentication, which supports granular permission management across anonymous access as well as users and groups. In this setup, anonymous access only permits viewing builds (result status and execution log) – it is currently being discussed to allow downloading the generated artifacts for debugging purposes only.

The entire configuration of the Jenkins Master can be define using the web interface. In this setup, only the admin account is able to change any of the configuration. Additionally, the execution configuration is stored in a so-called Jenkinsfile. This file contains the various stages as well code which may be executed during or in between these stages. On the Master, this code is being executed within the Groovy Sandbox. The Jenkinsfile is dynamically retrieved from the official Apache MXNet (incubating) GitHub repository and is individual for every branch and PR.

For the beginning, only an admin account will exist – no other users accounts will be created. The admin password will be stored in KMS and retrieved during the startup of the instance. Due to the way Jenkins is written, this password is stored with an unsafe encryption. 

Jenkins supports multiple ways to connect Slaves. Two widely used methods are

  1. SSH/RDP: The master uses SSH/RDP to connect to the slaves and execute all commands. This requires some type of credential management on the Master and is thus not preferred.
  2. JNLP: The master launches a JNLP-service on port 48592. All slaves connect to this port using anonymous access. An alternative would be the usage of secrets, but would also require credential management and is not necessary due to the port not being accessible from outside the VPC (see “Firewall”).

In this setup, option 2 has been chosen. This allows separation of concerns, no credential management is necessary and communication between instances is kept to a minimum on the application level.  

 

Slave

Slaves connect to the master using JNLP and await to execute Jenkins-jobs. Due to the nature of CI and the fact that all branches as well as Pull Requests are being compiled and tested, all slaves have to execute completely arbitrary code. This may pose a security risk as all types of malware can be installed on a slave by submitting a malicious Pull Request. By design, this cannot be avoided without introducing a lot of burden on Committers by adding the requirement that all builds have to be triggered manually. In order to reduce possible impact, all slaves store no credentials, generate no end-user-artifacts (no Continous Deployment!) and are completely disposable. Pull Requests always have to get reviewed by an MXNet Committer before they get merged into the Master branch – a persistent malware residing in MXNet is thus very unlikely. Additionally, temporary malware introduced by a Pull Request may be disposed automatically after a few days due to the auto scaling features. A freshly started slave will always use a manually created AMI and use read-only resources during the initial start-up – thus, there’s no way that malware may reside in between instance generations. Due to the nature of an Open Source project, PRs are usually getting reviewed by various people at arbitrary points in time; it’s thus very unlikely that a malicious PR stays undetected for a long time.

Docker

Docker and Nvidia-Docker are utilized to provide a deterministic environment on slaves and not part of the security measurements. This is due to the reason that all Dockerfiles as well as the Jenkinsfile reside in the GitHub-repository and thus can be modified by anybody.

GitHub

The monitored repository is located at https://github.com/apache/incubator-mxnet. All branches and Pull Requests are being retrieved on a regular base, triggered by a Web Hook. This requires running the Web Hook service on Jenkins’ web-interface-port 80/443. To ensure no anonymous requests are accepted, a shared secret is being used. This secret is being stored in KMS and retrieved during start-up of the master instance. Secret-rotation is not preferred due to heavy resource constraints on the Apache Infra team – they manage the GitHub-repository and every rotation would require a ticket.

To commit the build status back to GitHub, credentials of an authorized GitHub-account have to be used. These are stored in KMS and retrieved during start up of the Jenkins Master.

Serverless

Serverless is an infrastructure-as-code-tool to create, wire and deploy AWS Lambdas using CloudFormation. In this setup, all IAM-roles, VPCs, security groups and S3-buckets used by Lambda are being defined.

Terraform

Terraform is an in infrastructure-as-code-tool to define the environment for EC2 instances. In this setup, Route 53, S3, IAM-users, IAM-roles, IAM-policies, VPCs, security groups, EFS-volumes and EC2-instances are being defined. 

Terraform is being used to deploy the Jenkins Master. This process consists of the following steps:

  1. Acquire instance
  2. Upload jenkins-config to S3
  3. Store start-up script and data in CloudInit-files
  4. Set Route53 records

CloudWatch

This service is being used to store metrics necessary for the auto scaling. All stored information is publicly accessible over the publicly accessible Jenkins REST API (TODO: Link) and thus not classified. To detect harmful behaviour caused by malfunctioning auto scaling or malicious actions, various alarms are being used.

Lambda

In order to retrieve data for CloudWatch, Lambda executes Python 3.4 scripts to read from the public Jenkins REST API using JenkinsAPI. The deployment of all scripts is triggered manually using Gordon. After aggregating the information, they're getting pushed as Metrics to CloudWatch using aws-cli. No other services or servers are being accessed. 

CloudFormation

CloudFormation is used to define the environment of Lambda scripts, controlled by Gordon.

S3

S3 is used to store configuration files only. The following S3-buckets are being used:

  • mxnet-ci-master: Jenkins-Master configuration files
  • mxnet-ci-logging: Logging directory
  • gordon-lambda-mxnet-ci: CloudFormation templates, Lambda-libraries and Lambda scripts

Firewall

Security groups

VPC

AWS permissions

Security measurements

  • No labels