Par of this AIP is being implemented in AIP-47 New design of Airflow System Tests
In the Apache Airflow project, the contributors have a need to run system tests with external systems (for example Google Cloud Platform) automatically. Specifically before merging any of the pending changes to the main repository.
We already have a community-shared way to run unit tests automatically for Apache Airflow. The approach for contributing to Airflow (as described in CONTRIBUTING documentation) is to create your own fork with own copy of TravisCI project running unit tests automatically.
There are CI scripts and environment in Airflow, that allow Travis CI to run unit tests automatically, but there is no execution of system tests nor any other tests that require communication with a real external system such as Google Cloud Platform project.
But there is no way currently (in a way shared with the Community) to run such System Tests automatically.
Example of such System Test DAGs are those developed during development of Google Cloud Platform operators (this is currently in CLOUD_BUILD branch which will hopefully soon be merged to master):
- Google Compute Engine operator examples - including Instance Group Management
- Google Cloud Function operator examples
- Google Cloud Spanner operator examples
- Google Cloud SQL operator examples - including Google Cloud SQL Query operator
- Google Cloud Storage ACL operator examples
- Google Cloud Bigtable operator examples
Those DAGs are used for two purposes:
- they are used as example documentation sources. For example the documentation of Google Compute Environment operators is generated using the examples.
- they are actually runnable examples - providing that the environment variables are configured properly and authentication works.
The tests can be run through airflow and they should succeed by performing full lifecycle of the service in question (Compute Instance, Cloud Function etc.). Running those examples have been wrapped in unit-tests-like system test classes that are ignored by default but when proper variables are set, they can be run automatically. They also have helpers that allow to setup and teardown costly environment for such service tests automatically.
- Compute System Test and Compute System Test Helper
- Cloud Function System Test
- Spanner System Test and Spanner System Test Helper
- Cloud SQL System Test and Cloud SQL System Test Helper
- Cloud SQL Query System Test and Cloud SQL Query System Test Helper
- Cloud Storage ACL System Test
- BigTable System Test and BigTable system Test Helper
As part of the Google Cloud Operators implementation, also a Cloud Build configuration was implemented that allows to run all the System Tests automatically. Using a privately owned/billed Google Cloud Platform project. Such build requires also an integration with Airflow Breeze Development environment which was developed for this specific purpose - to help with faster development of Google Cloud related operators. Design of the Breeze environment is here and it covers two usages for the environment - support for Cloud Build but also support for local development workflow which might become the base for or be merged with AIP-7 Simplified development workflow work.
It would be great improvement in quality, if we can have such system tests executed automatically before any merge to main project.
Running System Tests for Google Cloud Platforms mandates use of a Google Cloud Platform project with billing enabled and creating an appropriate service accounts that have necessary permissions to perform those operations. This can be either a private account of developer/team developing the operators, or eventually Apache Airflow community could have a shared GCP project to run such tests before merge automatically on approved pull requests.
Similar approach could be reused for other cloud/external service operators, not only for Google Cloud Platform.
For now we can focus only on Google Cloud Platform operators and later reuse the learnings for other clouds/external services.
There are several services (and more coming) for Google Cloud Platform sharing this project (and service account(s) associated) is potentially dangerous if anyone can get credentials and use the service accounts. This means that forked/private repositories should use their own GCP projects and service accounts to setup Travis CI to use those for test executions. This should be configurable but easy to share in the team working on the same fork.
Eventually a shared GCP project/service account that might be used to run tests for the main repository before the merge to master happens. That would be sanity check that could verify that there are no special/forgotten setup in the personal GCP projects that prevents those tests from running for others.
The tests in the main GCP projects should only be run after at least code review and possibly some kind of automated “vulnerability” inspections that could prevent approaches to abuse the GCP environment. Adversary attacks on open-source infrastructure had recently become a powerful hacking techniques as is recognised as a powerful vector of attacks as it is traditionally difficult to prevent - community/open-source projects are often rather relaxed about security, but they are used in sometimes millions commercial installations. Attacking open-source infrastructure is usually much simpler than attacking the commercial installation directly. The threat is real and is actively exploited. Some high-profile example is recent Gentoo repo hack. There is a nice short write-up about looming dangers in OS infrastructure,
System tests tend to run much slower than UI tests. There should be very few of those tests, but even if there are few they will take several minutes rather than seconds that is usual for unit tests
Proposed changes to the workflow/infrastructure
Possible even now, without special shared Google Cloud Project and big changes to the workflow:
System Tests automation as implemented with Airflow Breeze can be run by anyone who has a billable GCP project
Cloud Build integration with GCP is an optional step - only if the team working on their fork have your own GCP project and setup Cloud Build Integration
System Tests execution is already conditional and disabled by default, unless credentials are properly setup for Cloud Build
There is already a bootstrapping process that creates appropriate service accounts and sets up the GCP project to be able to run the tests automatically
System tests should only be prerequisites for pull requests to become mergeable because it takes a lot of time and resources to run them
Requires common GCP project/service account and workflow adaptation
System tests using shared credentials in main repository of Airflow should be only run after code from forks have been reviewed and approved but before merge happens - to verify that they will be runnable by everyone.
The main blocker to being able to run integration tests against a live GCP in any kind of automated fashion is cost: we'd have to get Google (or someone else) to sponsor us with GCP credits.
I think we could simply provide support first for anyone who have their own TraviCI copy (and own project in GCP). I guess all contributors work in some companies and can use some GCP projects of their own, or create a new one - those will be cents rather than dollars for their use even if you run IT with every push. We can make it so that it will be super-easy to set it up - add credentials as secrets in Travis CI with certain environment variables, and in case those variables are set we will automatically run all integration tests automatically in CI script.
Then when we see the usefulness of it we can start discussion (with Google for example) about some credits for shared project/main repo. I think it might be super useful and people might actually start using it for other operators if it will be that easy to setup.
This is also related to AIP-10 Multi-layered and multi-stage official Airflow CI image and AIP-7 Simplified development workflow. I ported the GCP-independent parts of Airflow Breeze into apache airflow as part of AIP-10 addressing most of the AIP-7 needs. The idea is that we could then refactor the GCP based solution to use the official image - same as used in Travis CI builds and build on top of that. I will resume working on this once (hopefully ) AIP-10 is implemented.