Bug Reference

The Jira issue associated with this design spec

Branch

NA

Introduction

Purpose

* Provide fast build or rebuild of environments for testing.
* Enable multiple independent concurrent builds
* Be available on-demand through automation or individual request.
* Be capable of fully utilising all available hardware
* Flexible enough to be used to build super-realistic development environments.

We intend to contribute and maintain our work within the Apache repos.


We envision that Trillian would cater for a number of use cases:
1. CloudStack community integration testing of master against multiple deployment scenarios (using ASF infra)
2. CloudStack community integration testing of PRs against multiple deployment scenarios (using ASF infra)
3. Organisations/individuals running the full suites of tests available in Marvin against any physical environment they have.
4. Organisations/individuals deploying and running the full suites of tests available in Marvin against virtualised infrastructures which can be deployed by Marvin.

As we intend Trillian to test multiple environments concurrently, we use nested virtualization on ESXi hosts (our testing has shown that this is the only hypervisor which can support the nested virtualisation of all other hypervisors with reasonable performance). We use Ansible to deploy and configure all aspects of the build as this will greatly lower the barrier to entry for independent testers.

We use CloudStack to provision the management server and virtualised (nested) hosts on the physical hosts. We are creating Ansible playbooks and roles which can:

1. Create guest instances using Rene's Ansible 2.0 CloudStack modules - a Marvin VM, a Mgmt Server (CentOS or Ubuntu), any number of compute hosts (KVM, vSphere or XenServer. Hyper-V later)
2. Configure hosts (inc. installing the relevant CloudStack agent where required)
3. Install required ACS packages on management server
4. Configure a zone (including adding the compute hosts) via Marvin.
5. Run the required Marvin tests.
6. Return the results

We may need to propose enhancements to Marvin in order to sync the configuration of hosts with the configuration used by Marvin.

Using virtualised test environments, we can have multiple test scenarios running concurrently. To do this we have found that it is necessary to create pools or ranges of VLANs and IP addresses and allocate them to environments. So for any given physical environment which will be used for testing in, we take the total range(s) of IPs and VLANs available and carve them into non-overlapping chunks suitable for concurrent use as mgmt, public and guest networks. These are stored in a MariaDB database. When a range is being used in a testing environment, that range is marked as 'inuse' in the database. When creating a test environment, Trillian looks in the database for the next available VLAN range, the next available public IP range and so on. The returned values are used to populate a Marvin cfg file which in turn will be used to both build the environment and when running the Marvin testing. When the virtualised infra is cleaned up, the database will be updated to reflect that the used ranges are available again.

This initiative has only recently been started, and as stated earlier we are currently figuring out the requirements (and quirks) of the individual pieces and looking for the most suitable wrapper to glue it all together.

Also I have found that Marvin requires a little work to make the output more meaningful/readable (especially in the case of errors and exceptions) and to make it a little more intelligent about the tests it can/can't run based on the chosen infrastructure components. I have also found unreachable or very slow ISO and template paths hardcoded into Marvin or individual tests.

We plan to enhance tests to address these issues and also reduce run times where possible.

 

References

  • relevant links

Document History

Glossary

Feature Specifications

  • put a summary or a brief description of the feature in question 
  • list what is deliberately not supported or what the feature will not offer - to clear any prospective ambiguities
  • list all open items or unresolved issues the developer is unable to decide about without further discussion
  • quality risks (test guidelines)
    • functional
    • non functional: performance, scalability, stability, overload scenarios, etc
    • corner cases and boundary conditions
    • negative usage scenarios
  • specify supportability characteristics:
    • what new logging (or at least the important one) is introduced
    • how to debug and troubleshoot
    • what are the audit events 
    • list JMX interfaces
    • graceful failure and recovery scenarios
    • possible fallback or work around route if feature does not work as expected, if those workarounds do exist ofcourse.
    • if feature depends other run-time environment related requirements, provide sanity check list for support people to run
  • explain configuration characteristics:
    • configuration parameters or files introduced/changed
    • branding parameters or files introduced/changed
    • highlight parameters for performance tweaking
    • highlight how installation/upgrade scenarios change
  • deployment requirements (fresh install vs. upgrade) if any
  • system requirements: memory, CPU, desk space, etc
  • interoperability and compatibility requirements:
    • OS
    • xenserver, hypervisors
    • storage, networks, other
  • list localization and internationalization specifications 
  • explain the impact and possible upgrade/migration solution introduced by the feature 
  • explain performance & scalability implications when feature is used from small scale to large scale
  • explain security specifications
    • list your evaluation of possible security attacks against the feature and the answers in your design* *
  • explain marketing specifications
  • explain levels or types of users communities of this feature (e.g. admin, user, etc)

Use cases

put the relevant use case/stories to explain how the feature is going to be used/work

Architecture and Design description

  • discussion of alternatives amongst design ideas, their resources/time tradeoffs and limitations. Explain why a certain design idea is chosen over others
  • highlight architectural patterns being used (queues, async/sync, state machines, etc)
  • talk about main algorithms used
  • explain what components are being changed and what the dependent components are
  • regarding database: talk about tables being added/modified
  • performance implications: what are the improvements or risks introduced to capacity, response time, resources usage and other relevant KPIs
  • preferably show class diagrams, sequence diagrams and state diagrams
  • if possible, publish signatures of all methods classes and interfaces implement, and the explain the object information of different classes

Web Services APIs

list changes to existing web services APIs and new APIs introduced with signatures and throughout documentation

UI flow

  • either demonstrate it visually here or link to relevant mockups

IP Clearance

  • what dependencies will you be adding to the project?
  • are you expecting to include any code developed outside the Apache CloudStack project?

Usage Impact

  • Are there any entities being created that require usage reporting for billing purposes? 

  • Does this change any existing entities for which usage is being tracked already?

Appendix

Appendix A:

  • No labels