This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Skip to end of metadata
Go to start of metadata

"My company has a xyz feature for CloudStack. Can someone help me test it?"

This page is a proposal for creating a unified design for QA testing and automation. It should be looked at as a blueprint to follow for a QA set-up (think QACloud like DevCloud)

Any comments / suggestions will be much appreciated.

Goals

  1. Enable community to run QA tests in automated way
  2. Make it easy to replicate a set-up
  3. Enable to run BVTs continuously
  4. Overall, work together as a community towards keeping master branch stable

It is important to note that keeping it simple and easy to replicate is one of the main goals of this exercise. Thus, it may not cater to every possible use case.
The code is based on Prasanna's effort on similar lines (big thanks to Prasanna!), and in future should enable the creation of test-beds for unified testing.

To ensure the master branch is relatively stable almost always, the proposal is to create a "staging" branch where all check-ins shall go. Once the BVTs pass against the staging branch, the commits may be merged with the master branch. Ideally, the commit process should be automated.

Test Run Requirements

  1. Every test run should be isolated from previous test run
  2. Clean-up hypervisors between runs
  3. Spawn up new management server for each run

Infrastructure

Every single set-up would contain

  1. XenServer to host VM for Cobbler etc., and CloudStack management server
  2. Isolated private network that is RFC1918 compliant. ( Recommend to use 172.16.0.0/16 subnet so it is easier to follow configuration). This allows CloudStack to run in isolation
  3. Three baremetal hosts to serve as the "Cloud"
  4. Dedicated IPMI network for baremetal provisioning

At the heart is a VM that serves as a Jenkins slave, and contains a set of tools to allow one to run QA tests on a clean environment. These tools, hosted on the VM (lets call it "QACloud"), are :

Tool / Software

Use

Comments

Cobbler

Provision baremetal hosts with hypervisor
Provision management server OS

Cobbler integrates well with Puppet and provides useful helper scripts

Puppet

Configure management server with mysql etc

 

DNSMASQ

DHCP + DNS

Used to provision hosts

Squid

HTTP Proxy

Testing infrastructure uses isolated network. Some tasks, like yum install, need internet access

NFS Server

As NFS secondary storage

Packaged systemVM and builtin templates for quicker download

ipmitool

Interact with baremetal machines through IPMI network

 

Marvin

Deployment and testing framework for CloudStack

 

The whole code is Python based making it easier to package as needed in future.

A typical deployment would have the QACloud VM also act as a Jenkins slave for builds and launching test suite

Understanding Process Flow

There are two steps in the QA process :

  1. Set up Cloud
  2. Launch and report test results

Typically, the setting up of the Cloud is a separate Jenkins job, upon successful run of which we launch a test suite in a different job. Note that, the code contains basic elements for you to set-up a single Jenkins job for both the steps, though it may be very slow (Since BVTs are run sequentially in this case to avoid Marvin issues)

Setting up from scratch

Follow the instructions here to set-up the infrastructure from scratch. This is a more involved task and needs familiarity with the CI tools.

It lists the set of commands / scripts used to set-up Cobbler, Puppet manifests etc. The sample configuration for DNSMASQ is provided and allow one to easily change to suit their needs.

Easy Set-Up

Setting up the infrastructure

  1. Budget at least 4 machines to your set-up. Install XenServer (6+) on one machine to host QACloud VM and management server.
  2. Set-up and configure IPMI network and enable hosts for IPMI. This link may be useful http://stuff.mit.edu/afs/athena/dept/cron/documentation/dell-server-admin/en/DRAC_5/racugc1j.htm
  3. Set-up private switch for hosts on eth0. Reference config is available here
  4. Import QACloud template by downloading from here
  5. Change MAC address in Cobbler (cobbler system edit --mac-address=<MAC> --name=xen1/2/3 ), DNSMASQ (/etc/dnsmasq.conf) and system.properties file
  6. Set IPMI password in configure script
  7. Set-up public internet access for QACloud VM so it may proxy requests as needed. This may involve setting up correct ip routes. (Be careful here, as incorrect configs for routes / DNS will manifest in weird network boot issues)
Advanced Options
  1.  If the switch config does not  mimic provided reference, please change the IP address for DNSMASQ and system.properties file
  2. Set correct routes for IP configuration in QACloud and its host XenServer

The configuration would look something like the below after things are correctly set-up, after which you should proceed to adding Jenkins jobs as described in next section.

Jenkins Jobs

Set-up Jenkins job for infrastructure provisioning. This is a simple jenkins job that packages CloudStack RPMs and launches a tester.py python script. The following configuration may be used :

echo export M2_HOME=/usr/local/apache-maven-3.1.0
echo export M2=/usr/local/apache-maven-3.1.0/bin
echo export PATH=/usr/local/apache-maven-3.1.0/bin:$PATH
PACKAGE_NAME="CloudStack_Auto-rhel6.3"
cd $WORKSPACE
rm -rf dist/rpmbuild/*
cd packaging/centos63
./package.sh
cd ../..
tempdir=`mktemp -d`
mkdir -p "$tempdir"
cp dist/rpmbuild/RPMS/x86_64/*.rpm $tempdir/
createrepo $tempdir/mv $tempdir $PACKAGE_NAME
tar -cvzf $PACKAGE_NAME.tar.gz $PACKAGE_NAME
cd /root/cloud-autodeploypython2.7 /root/cloud-autodeploy/tester.py $WORKSPACE xen

Set up Jenkins job for launching BVT suite (TODO : Insert details from QA here)

Set up Jenkins post build task to launch continuous build and provide Git cherry pick info by adding below to post-build shell script task

python2.7 /root/cloud-autodeploy/cherryPickCommits.py <JenkinsUrl> <JenkinsUser> <JenkinsPassword>
python2.7 /root/cloud-autodeploy/kickContinuousBuild.py <JenkinsUrl> <JenkinsUser> <JenkinsPassword> <Jenkins Job Name>
Merging commits from staging to master

This part relates to the overall goal of keeping the master branch relatively stable, and make cherry-picking of commits an easier process rather than someone monitoring a mailing list.

Thus, all BVTs run against a "staging" branch. It is important to note that "master" may not just be the repository master branch, but a term used to refer to any branch you may want to keep stable so as to create a RC out of.

The flow here is :

  1. Kick off a Python script after BVT run against "staging"
  2. If the BVT run is successful, get the commit SHAs from last successful BVT run to current run and merge the commits to "master"
    1. In case or merge / git  rebase failure, notify developers.
  3. If BVT run is unsuccessful, get the commit SHAs from last successful BVT run to current run, and mail developers to look at results and see if their commit broke the BVT

Thus, the flow ensures that only clean commits make its way to master branch. This should enable us to go one step further in making master almost always stable.

The obvious question to ask is : how do we ensure commits pass successfully across all hypervisor configuration types?

This goes back to providing a unified design to set-up infrastructure, and tie all configurations together using Jenkins job. At the end of the Jenkins job, the above flow shall be kicked off.

Troubleshooting

  • Cobbler / DNSMASQ issues : Useful to check ip routes in this case. These will manifest as PXE boot errors or DHCP sending a hostname instead of IP address in PXE response. This has been a source of lot of frustration.
  • Puppet errors : Check certificates. Remove .pem file in /var/lib/puppet/ssl if empty
  • Marvin issues : Python "requests" package may mask many errors that may cause Marvin to fail when seemingly things are right from Management Server logs. Check errors from this package when see things failing randomly.
  • Management Server issues : Very rarely the management server fails to come up even after service cloudstack-management restart scripts returns fine. Currently know of no good resolution to this but to wait till next deployment is done

Enhancements

1. Integrate across multiple hypervisor runs

2. Better scheduling using a pool of hosts

4. Providing hooks so one can customize actions to be taken in case of failures.

Open Questions

1. The VM is pretty big. Where do we host it? People can set up using the provided instructions too

2. Have a Hackathon to improve this / set up from scratch?

3. To keep template size manageable, the VM has ~50GB of space for NFS. This may run out quickly. Add instructions to add a new volume to VM to serve as NFS?

  • No labels