Developing with the Python SDK
Gradle can build and test python, and is used by the Jenkins jobs, so needs to be maintained.
You can directly use the Python toolchain instead of having Gradle orchestrate it, which may be faster for you, but it is your preference. If you do want to use Python tools directly, we recommend setting up a virtual environment before testing your code.
If you update any of the cythonized files in Python SDK, you must install the
cython package before running following command to properly test your code.
The commands below assume that you're in the
Virtual Environment Setup
Setting up a virtualenv is required for running tests directly, such as via pytest or an IDE like PyCharm.
This installs Python SDK from source and includes the test and gcp dependencies.
You can deactivate the virtualenv when done.
Virtual Environments with pyenv
A more advanced option, pyenv allows you to download, build, and install locally any version of Python, regardless of which versions your distribution supports.
pyenv also has a virtualenv plugin, which manages the creation and activation of virtualenvs.
The caveat is that you'll have to take care of any build dependencies, and those are probably still constrained by your distribution.
These instructions were made on a Linux/Debian system.
Setup pyenv (with virtualenv plugin)
- Install prerequisites for your distribution.
- curl | bash
- Add the required lines to ~/.bashrc (as returned by the script) and open a new shell.
Example: Running unit tests with PyCharm using Python 3.5.2 in a virtualenv
Install Python 3.5.2 and create a virtualenv
- Optional (install older library development files required by 3.5.2):
apt install libssl1.0-dev
- pyenv install 3.5.2
- Optional (restore newest library development files):
apt install libssl-dev
- pyenv virtualenv 3.5.2 ENV_NAME
Upgrade packages (optional)
- pyenv activate ENV_NAME
- pip install --upgrade pip
- pip install --upgrade setuptools [+ any other packages in "pip list"...]
- pyenv deactivate
Set up PyCharm
- Start by adding a new project interpreter (from the bottom right or in Settings).
- Select "existing environment" and select the interpreter, which should be under ~/.pyenv/versions/3.5.2/envs/ENV_NAME/bin/python or ~/.pyenv/versions/ENV_NAME/bin/python.
- Switch interpreters at the bottom right.
Running Tests using pytest
If you've set up a virtualenv above, you can now run tests directly using pytest.
Running Tests using tox
Tox does not require a virtualenv with Beam + dependencies installed. It creates its own.
It also runs tests faster, utilizing multiple processes (via pytest-xdist).
For a list of environments, run tox -l.
tox also supports passing arguments after double dashes to pytest.
To check just for Python lint errors, run the following command.
tox commands to run the lint tasks:
Apache Beam uses yapf formatter (https://github.com/google/yapf) to ensure that all code conforms to the same style.
Use the following tox command to format every python file under sdks/python/apache_beam:
It may be faster to format just a single directory or subset of files. This can be done with:
To format files with uncommitted changes, run:
To format files that were changed in your branch, run:
You can check if your code has been YAPF-formatted by using the following command:
If you need to exclude one particular file or pattern from formatting, just add it to the .yapfignore file (sdks/python/.yapfignore).
This step is only required for testing SDK code changes remotely (not using directrunner). In order to do this you must build the Beam tarball. From the root of the git repository, run:
--sdk_location flag to use the newly built version.
Run hello world against modified SDK Harness
Run hello world against modified Dataflow Fn API Runner Harness and SDK Harness
Run integration test
–sdk_location flag if tar ball is needed and built from
python setup.py sdist, otherwise tar ball under default location (target directory of Gradle build) will be used.
Run a ValidatesRunner test
This will run all tests with
ValidatesRunner attribute from
apache_beam/transforms/util_test.py in streaming mode. You can manually edit the attributes in the test file to limit which tests you would like to run.
Run integration test from IDE
To run an integration test from an IDE in a debug mode, you can create a Nosetests configuration. For example, to run a VR test on Dataflow runner from IntelliJ/PyCharm, you can adjust the configuration as follows:
- set Target to
Moduleand point to the test file.
- set Additional arguments (sample, adjust as needed):
- set Working directory to
Run a screen diff integration test for Interactive Beam
For Interactive Beam/Notebooks, we need to verify if the visual presentation of executing a notebook is stable.
A screen diff integration test that executes a test notebook and compare results with a golden screenshot does the trick.
Some preparation work:
To run the tests:
Golden screenshots are temporarily taken and stored by system platform. Current supported platforms are: Darwin(MacOS) and Linux.
Each test will generate a stable unique hexadecimal id. The golden screenshots are named after that id.
To add new tests, the simplest way is to put a new test notebook file (.ipynb) under the apache_beam/runners/interactive/testing/integration/test_notebooks directory. Then add a single test under apache_beam/runners/interactive/testing/integration/tests directory.
A test is simple as:
How to install an unreleased Python SDK without building it.
SDK source zip archive and wheels are built continuously after commits are merged to https://github.com/apache/beam
- Click on a recent `Build python source distribution and wheels job` that ran successfully on the github.com/apache/beam master branch from this list.
- Click on “List files on Google Cloud Storage Bucket” on the right side panel.
- Expand “List file on Google Cloud Storage Bucket” in the main panel.
- Locate and Download the .zip file.(e.g. apache-beam-2.25.0.dev0.zip) from GCS.
- It’s simplest to download the file using your browser by replacing the prefix “gs://” with “https://storage.googleapis.com/” i.e. https://storage.googleapis.com/beam-wheels-staging/master/02bf081d0e86f16395af415cebee2812620aff4b-207975627/apache-beam-2.25.0.dev0.zip
- Or follow these instructions to download using the gsutil command line tool.
Install the downloaded zip file. e.g.