The information here will be of interest to people looking to understand, improve and replicate Apache CloudStack test infrastructure across sites.
This document talks about the continuously evolving test infrastructure used to setup, deploy, configure, and test Apache CloudStack. Information here is useful for anyone involved in build, test and continuous integration.
The following diagram shows a small representation of the deployed test environment. The test environment is switched through various network models supported by cloudstack over the course of the tests. One can also control the underlying hypervisor providing compute power to the cloud. This is a general proof-of-concept and does not necessarily represent the true state of the system at any point. The infrastructure keeps evolving and adapting to CloudStack's features and bug fixes.
The central jenkins (master) instance at jenkins.bacd.org connects to a jenkins slave within the data center via SSH. Traditionally a non-standard port with key-pair only access is preferred to keep the deployment secure. This slave is a VM with necessary packages, scripts and tools that drives the rest of the dedicated test infrastructure through various configurations required for testing. All the necessary configurations are controlled by a matrix of combinations. These combinations are nothing but jobs on the jenkins (master).
The jenkins slave instance(s) are placed on your managed infrastructure. This managed infra could be an internal cloudstack deployment by itself.
The cluster1, cluster2 ... are racks/pods of hypervisor blades outside your regular managed infrastructure that will form the testbed for cloudstack setup using jenkins' jobs. For this setup we manage all the power features using an IPMI network (not shown here).
The storage is usually shared by all of the test-bed infrastructure. But you are free to deploy this as you have modelled your production installation of cloudstack.
The necessary packages and tools that compose this jenkins slave are described below. Briefly described is also the purpose of the tool and any special configuration required.
At the center of the workflow is the "jenkins slave" appliance that manages the infrastructure. This is a Cent OS 6.2 VM running on a XenServer. The slave VM is responsible for triggering the entire test run when it is time. This schedule is indicated by jenkins master through the test jobs.
The slave appliance is composed of the following parts:
cobbler is a provisioning PXE server (and much more) useful for rapid setup of Linux machines. It can do DNS, DHCP, power management and package configuration via puppet. It is capable of managing network installation of both physical and virtual infrastructure. Cobbler comes with an expressive CLI as well as web-ui frontends for management.
Cobbler manages installations through profiles and systems:
The profile list looks as follows on our sample setup:
root@infra ~# cobbler profile list
root@infra ~# cobbler system list
So eg: if acs-qa-h11 is mapped to rhel63-KVM cobbler will refresh the machine with a base OS of RHEL and install necessary virtualization packages for KVM.
When a new image needs to be added we create a 'distro' in cobbler and associate that with a profile's kickstart. Any new systems to be hooked-up to be serviced by the profile can then be added easily by cmd line.
Cobbler reimages machines on-demand but it is up to Puppet recipes to do configuration management within them. The configuration management is required for KVM hypervisors (KVM agent for eg: ) and for the CloudStack management server which needs mysql, cloudstack, etc. The puppetmasterd daemon on the slave-vm is responsible for 'kicking' nodes to initiate configuration management on themselves when they come alive.
So the slave-vm is also the repository of all the Puppet recipes for various modules that need to be configured for the test infrastructure to work. The modules are placed in /etc/puppet and bear the same structure as our github repo. When we need to affect a configuration change on any of our systems we only change the github repo and the systems in place are affected upon next run.
DNS is controlled by cobbler but its configuration of hosts is set within dnsmasq.d/hosts. This is a simple 1-1 mapping of hostnames with IPs. For the most part this should be the single place where one needs to alter for replicating the test setup. Everywhere else only DNS names are/should-be used.
DHCP is also done by dnsmasq. All configuration is in /etc/dnsmasq.conf. static mac-ip-name mappings are given for hypervisors while the virtual instances get dynamic ips
ipmi for power management is setup on all the test servers and the ipmitool provides a convienient CLI for booting the machines on the network into PXEing.
Once CloudStack has been installed and the hypervisors prepared we are ready to use marvin to stitch together zones, pods, clusters and compute and storage to put together a 'cloud'. Once configured - we perform a cursory health check to see if we have all systemVMs running in all zones and that built-in templates are downloaded in all zones. Subsequently we are able to launch tests on this environment
Only the latest tests from git are run on the setup. This allows us to test in a pseudo-continuous fashion with a nightly build deployed on the environment. Each test run takes a few hours to finish.
There are two github repositories controlling the test infrastructure.
a. The Puppet recipes at gh:acs-infra-test
b. The gh:cloud-autodeploy repo that has the scripts to orchestrate the overall workflow
When jenkins triggers the test job following sequence of actions occur on the test infrastructure
$ python configure.py -v $hypervisor -d $distro -p $profile -l $LOG_LVL
The $distro argument chooses the hostOS of the mgmt server - this can be ubuntu / rhel. LOG_LVL can be set to INFO/DEBUG/WARN for troubleshooting and more verbose log output.
The jenkins pipeline (sequence of jobs) for the above workflow of tests is described below in flowchart format.
View the configuration of test-matrix/test-matrix-extended jobs to understand further.
Any of the several steps in the workflow can fail. If you find a failure, please bring the issue to notice on the dev@ mailing list.