Child pages
  • Guide to Plotting API
Skip to end of metadata
Go to start of metadata

Document under development

Information within it is subject to change at short notice.

Introduction

This is a guide to the new plotting API in Open Climate Workbench.

Prerequisites

  • Basic knowledge of Python 2.7.x
  • Understanding how the numpy array data structure works is essential since all results passed to the plotting functions are 3D numpy arrays!
  • Working knowledge of matplotlib is not required, however it would be helpful if you really wanted to know how the functions actually work, not to mention be able to implement your own if need be.
  • Now what are you waiting for, get all of this stuff installed if you don’t have it already! 

Code Location

The plotting code can be found in the svn repository trunk here: https://svn.apache.org/repos/asf/incubator/climate/trunk/ocw/plotter.py

Differences between new and old API

Old API

  • Plots were rendered via PyNGL (A Python wrapper to the plotting tools in the NCAR Command Language (NCL)
    • PyNGL API is not very OOP friendly
    • Most of the work is done by setting many attributes of one object, making the code difficult to read
    • Source code and documentation are not regularly updated
    • Difficult to install from source
  • Only two types of plots were supported (line plots, filled contour maps)
  • Only single panel plots (one subplot per figure) supported

New API

  • Plots are rendered by matplotlib [1]
    • Open Source (BSD License)
    • Very active development community
    • Two API Types
      • Object Oriented (Every component of the figure is an object)
      • Can also support MATLAB style API
        plot(*args, **kwargs) # MATLAB Style
        ax.plot(*args, **kwargs) # OOP Style
        
    • Includes additional toolkits for convenience
      • Basemap (requires GEOS install which can also be a pain)
      • Axes Grid (Easy Subplots and figure wide colorbars)
  • Supported Plots (As seen in Kim et. al 2013, J. Climate)
    • Contour Map (draw_contour_map)
    • Time Series (draw_time_series)
    • Taylor Diagram (draw_tayor_diagram)
    • Portrait Diagram (draw_portrait_diagram)
    • Subregions (draw_subregions)
  • PROVIDES SEEMELESS SUPPORT FOR MULTIPLE PANELS OF EQUALLY SCALED SUBPLOTS ON ONE FIGURE

How it works

The general algorithm for generating an arbitrary number of panels of subplots (as described above) can be summarized in these steps:

  1. A matplotlib figure instance is created to hold all of the graphics
  2. The dimensions of the input data are checked to so that individual matplotlib axes instances are placed along the desired grid (rows and columns) within the figure via Axes Grid.
  3. Plots are iteratively drawn on each axes rectangle in the figure for each array of data contained in the input.
  4. Figure wide labels and / or colorbars are drawn using a master axes rectangle bounded by the entire grid of subplots.
  5. Finally the figure gets saved to the disk in the user specified format, at 300 dpi.

Why 300 DPI?

This might seem like a needlessly high number since this will often produce figures that are larger than what can be fit on most computer screens. In matplotlib the default DPI for figure instances is 80. This results in lower resolution figures but more importantly some plots will not render correctly if the resolution is insufficient. Because each data point on the plot is represented by a pixel, this also means that it is possible to overflow the plotting functions with too much data to be plotted. matplotlib will raise an exception with the message 'Agg rendering complexity exceeded' when this happens. Also, many scientific journals require all figures to be at least 300 DPI in most cases for them to be considered publication ready. 

Caveats

Perhaps the most troublesome design issue with each of the plotting functions is the considerable number of arguments that need to get passed in.  Unfortunately this is a necessary evil without creating additional container classes, which can be cumbersome since not all arguments are consistent between functions. Just look at the matplotlib source code if you think this is bad! Python of course provides a mechanism to alleviate the pain of this when writing function wrappers (via *args, **kwargs) but you will still need to know what each argument does. Time to learn!

Common Parameters Between plotting functions

Legend: L (Time Series), M (Contour Map), P (Portrait Diagram), T (Taylor Diagram), S (Subregions)

Parameter

Type

Description

Default Value

Functions

results

3D array of float

Input data to plot

n/a

LMPT

fname

str

Output file location

n/a

LMPTS

fmt

str

Output file format

‘png’

LMPTS

gridshape

tuple of int

Number of rows / cols of subplots

(1, 1)

LMPT

clabel

str

Colorbar label

‘’

MP

ptitle

str

Figure title

‘’

LMPTS

xlabel

str

x-axis title

‘’

LP

ylabel

str

y-axis title

‘’

LP

subtitles

1D array or list of str

titles for each subplot

None

LMPT

clevs

1D array of float

List of color levels

None

MP

nlevs

int

Target number of levels when clevs is None

10

MP

cmap

str or LinearSegmentedColormap

Sets color scaling of plot data

None

MP

extend

str

Toggle arrows at ends of colorbar

‘neither’

MP

aspect

float

Aspect ratio of subplot axes rectangle

None

LP

results

This is what is actually being visualized in your plot!  Each dimension can have a different meaning depending on the type of plot, but generally one of the dimensions (usually the first) corresponds to the number of datasets being plotted (subplots). It can also be a 2D array if you only plan to plot one data set, but 3D works just fine if there is a singleton dimension.

fname

Can be the file name relative to the current working directory or an absolute path.

fmt

The file format is set to png by default though matplotlib supports many others. However only png has been tested so far (since it is the only one we care about for our purposes).

gridshape

Provides the desired number of rows and columns in the subplot grid. Normally excess whitespace would be created if the user were to enter a gridshape that has too many rows or columns. For convenience, a helper function _best_grid_shape is used to adjust the gridshape automatically to remove this excess whitespace. An exception is raised in the opposite case (too few rows / columns). Not setting it will give you normal plotting functionality for one dataset.

This is the single most powerful feature of the new API, as it allows the user to make plots for single or multiple datasets with the same function!

clabel, ptitle, xlabel, ylabel

Fairly self explanatory, these will be blank if not set.

clevs

Denotes the values that map each interval boundary on the colorbar (and contour levels for contour maps), the corresponding colors will depend on the colormap set. If set to None this will automatically be set by the helper function _nice_levels to fit a “nice” set of colorbar levels based on how the values in datasets are distributed. Currently the 5th and 95th percentiles are cut off before setting the levels so outliers don’t skew the colorscale too far in one direction.

nlevs

The target number of levels to be used for automatically generating clevs via _best_levels. The word “target” is used here because sometimes it is not possible to get the desired number of “nice” colorbar levels.

cmap

Maps the data to a color on a scale for a colorbar. If set, it must be an instance of matplotlib.colors.LinearSegmentedColormap. Colorbars are currently drawn figure wide, but support for individual colorbars may be added later with an additional keyword argument.

Usually, colormaps are instantiated in this way:

import matplotlib.pyplot as plt
cmap1 = plt.cm.jet
cmap2 = plt.cm.rainbow
cmap3 = plt.cm.rainbow_r # Same as rainbow but reversed
# etc...

Passing in a matplotlib.colors.LinearSegmentedColormap in this way allows the user to easily incorporate custom colormaps into their plots if they are so inclined. However, users who do not need this functionality may find it more convenient to just pass in the name of the colormap as a string. For instance, the following would be equivalent to the code above:

cmap1 = 'jet'
cmap2 = 'rainbow'
cmap3 = 'rainbow_r' # Same as rainbow but reversed
# etc...

matplotlib has default colormaps stored by name in the cm module. Here is a list of all of the colormaps shipped with matplotlib:


Currently, the default colormap set by RMCET is coolwarm. You can change it using the set_cmap helper function. For example:

plotter.set_cmap('jet')

Will change the default colormap to jet.

extend

Set whether to extend the colorbar edges with arrows. ‘neither’ does not extend, ‘both’ extends both sides, ‘min’ extends the side with the lowest level, and ‘max’ extends the side with the highest level. When clevs is None, it is automatically set to ‘both’.

aspect

Controls the ratio of the width to height of each subplot axes rectangle. This mechanism is necessary to ensure that every axes rectangle looks the same regardless of the number of subplots needed, and the method for achieving this is actually more complicated than one might think at first glance. To do this, either the relative size of the figure or axes rectangle must be adjusted. We chose the former and left it up to the user to resize their figures appropriately afterwards.

Figure size is dynamically determined by the _fig_size helper function, which adjusts the figure size from both the aspect ratio and gridshape.

Plotting Functions Overview

Now that you have a better idea of how everything works, let's discuss the plotting functions in more detail!

Contour Map (draw_contour_map)

plotter.draw_contour_map(results, lats, lons, fname, fmt='png', gridshape=(1, 1), clabel="", ptitle="", subtitles=None, clevs=None, nlevs=10, cmap=None, extend='neither', meridians=None, parallels=None)

Draws filled contours on a map. It is primarily used to visualize 2D geospatial data over a map projection. The hard work of converting between projections and drawing the map boundaries is done by basemap, but for simplicity we use cylindrical projections for each map. This results in distortion at higher latitudes but also allows the greatest flexibility. Note that the aspect keyword argument is not used here since basemap sets the aspect ratio automatically.

Parameter

Type

Description

Default Value

results

3D array of float

Input data to plot (ndatasets, nlats, nlons)

n/a

lats

1D or 2D array of float

Domain Latitudes

n/a

lons

1D or 2D array of float

Domain Longitudes

n/a

fname

str

Output file location

n/a

fmt

str

Output file format

‘png’

gridshape

tuple of int

Number of rows / cols of subplots

(1, 1)

clabel

str

Colorbar label

‘’

ptitle

str

Figure title

‘’

subtitles

1D array or list of str

Titles for each subplot

None

clevs

1D array of float

List of color levels

None

nlevs

int

Target number of levels when clevs is None

10

cmap

str or LinearSegmentedColormap

Sets color scaling of plot data

None

extend

str

Toggle arrows at ends of colorbar

‘neither’

meridians

1D array of float

Longitude locations to draw meridians

None

parallels

1D array of float

Latitude locations to draw parallels

None

results

First dimension corresponds to the dataset number, second and third are the number of latitudes and longitudes respectively.

lats, lons

Arrays giving the location of each dataset gridcell in spherical (latitude, longitude) coordinates.

parallels, meridians

Arrays corresponding to the location to draw grid lines. By default they are drawn every 1, 5, 10, or 20 degrees depending on domain size. You can set them both to empty lists ([]) if you don’t want any grid lines to be drawn. Note that they are only labeled if just one subplot is drawn.

Examples

Time Series (draw_time_series)

Draw lines depicting time series data. The x-axis always corresponds to the time. In the future, this may be made more generic and support arbitrary types of line plots.

Parameter

Type

Description

Default Value

results

3D array of float

Input data to plot (X ndatasets, ntimes)

n/a

times

1D array or list of datetimes

Time information

n/a

labels

1D array or list of str

Label for each line in the legend

n/a

fname

str

Output file location

n/a

fmt

str

Output file format

‘png’

gridshape

tuple of int

Number of rows / cols of subplots

(1, 1)

ptitle

str

Figure title

‘’

xlabel

str

x-axis label

‘’

ylabel

str

y-axis label

‘’

subtitles

1D array or list of str

Titles for each subplot

None

aspect

float

Aspect ratio of subplot axes rectangle

None

label_month

bool

Toggle month labels on x-xaxis

False

yscale

str

Sets the scale of the y-axis

‘linear’

results

Second and Third dimensions must correspond to number of datasets and number of times.

times

Used internally by matplotlib for fancy axis formatting. May change in the future so a list of int corresponding to the time units is used instead since generating a list of datetimes is not always intuitive. However if you know the time units and calendar info beforehand, you could just use netCDF4.num2date.

labels

Label each line in the plot, usually this corresponds to the name of the dataset.

label_month

Toggle whether to label month names or years. Only set this to true when your times correspond to months. This will probably get deprecated since this is easy to detect from within the function anyways.

yscale

Set the scale of the y-axis, either to ‘linear’ (default) or ‘log’ (base 10 logarithm). The latter is often used when plotting with respect to vertical pressure levels, for example.

Examples

Taylor Diagram (draw_tayor_diagram)

Compare relative performance between datasets to a reference, ie models to an observation dataset. Note that while multiple subplots are supported, using such functionality is not recommended at this time. The plot is made by a custom class (TaylorDiagram) which uses a special polar axes instance. This makes it difficult to control the padding and scaling of the subplots when used in conjunction with axes_grid1.ImageGrid, which implicitly assumes a standard rectilinear axes in each grid cell. We suggest using portrait diagrams as an alternative, though it can be more difficult to interpret visually.

Parameter

Type

Description

Default Value

results

3D array of float

Input data to plot (X, ndatasets, 2)

n/a

names

1D array or list of str

List of dataset names

n/a

refname

str

Name of reference dataset

n/a

fname

str

Output file location

n/a

fmt

str

Output file format

‘png’

gridshape

tuple of int

Number of rows / cols of subplots

(1, 1)

subtitles

1D array or list of str

titles for each subplot

None

pos

str or tuple

Legend location

‘center right’

frameon

bool

Toggle drawing of the legend border

False

radmax

float

Extent radial axis

1.5

results

Second dimension corresponds to the number of datasets, third dimension is always 2.

names, refname, frameon

Self explanatory

pos

Refer to matplotlib legend documentation for legend position codes

radmax

The radial axis is in terms of reference standard deviation.

Examples

Portrait Diagram (draw_portrait_diagram)

Plot information from multiple datasets in a color coded table. It is meant to show metrics from datasets (models and obs) with respect to some other field (usually subregions).

Parameter

Type

Description

Default Value

results

3D array of float

Input data to plot (X, Y, ndatasets)

n/a

rowlabels

list of str

Labels for each row

n/a

collabels

list of str

Labels for each column

n/a

fname

str

Output file location

n/a

fmt

str

Output file format

‘png’

gridshape

tuple of int

Number of rows / cols of subplots

(1, 1)

clabel

str

Colorbar label

‘’

ptitle

str

Figure title

‘’

xlabel

str

x-axis title

‘’

ylabel

str

y-axis title

‘’

subtitles

1D array or list of str

titles for each subplot

None

clevs

1D array of float

List of color levels

None

nlevs

int

Target number of levels when clevs is None

10

cmap

str or LinearSegmentedColormap

Sets color scaling of plot data

None

extend

str

Toggle arrows at ends of colorbar

‘neither’

aspect

float

Aspect ratio of subplot axes rectangle

None

results

Third dimension must be equal to the number of datasets. The first two can be arbitary but usually correspond to the number of metrics and/or subregions. Yes, portrait diagrams can fit a lot of information on just one subplot!

rowlabels, collabels

An explicit check on the number of elements in each of these lists is checked to make sure they are consistent with the shape of datasets.

Examples

Subregions (draw_subregions)

Draw subregion rectangles over a map. These consist of overlays of the rectangles given by each subregion domain over a map. In the future, we will plan on adding an option to overlay the topography, which is something that is very doable with basemap.

Parameter

Type

Description

Default Value

subregions

list of subRegion objects

List of subregion info

n/a

lats

1D or 2D array of float

Domain Latitudes

n/a

lons

1D or 2D array of float

Domain Longitudes

n/a

fname

str

Output file location

n/a

ptitle

str

Figure title

‘’

meridians

1D array of float

Longitude locations to draw meridians

None

parallels

1D array of float

Latitude locations to draw parallels

None

subregion_masks

dictionary of 2D array of bool

Masks for each subregion

None

subregions

These are just instances of rcmes.classes.subRegion.

subregion_masks

Special masks of 2D bool arrays can be used to plot slightly more complex (albeit still jagged) shaped subregions. The corners of the array correspond to the bounding lats and lons in the subRegion object, and is indexed into the dictionary by name. This feature has not been fully tested and may not work as intended. Not really needed for the time being, though some method of accounting for non-rectangular subregions will need to be implemented at some point.

Examples

Additional Questions?

Please join the dev mailing list at Apache for any additional questions about the plotting API not covered here by email:

  • dev@climate.incubator.apache.org

References

[1] Hunter, John D. "Matplotlib: A 2D graphics environment." Computing in Science & Engineering (2007): 90-95.

  • No labels