MADlib® uses Doxygen for documentation.  Doxygen is a standard tool for generating documentation from annotated C++ sources, but it also supports other popular programming languages such as C and Python.

MADlib Doxygen Sites

Documenting SQL

  • SQL documentation is supported by a Doxygen filter that translates CREATE FUNCTION / CREATE AGGREGATE statements to (empty) C++ function definitions. The source code for the SQL2C++ filter consists of the flex and bison files sql.ll and sql.yy at https://github.com/apache/madlib/tree/master/doc/src.
     
  • Current features:

    • Translate CREATE FUNCTION and CREATE AGGREGATE statements into empty C++ function definitions
    • Both inline (C-style) comments of the form /** ... and end-of-line comments of the form --! ...\n are recognized as Doxygen comments
    • Since PostgreSQL and Greenplum disallow labeling the arguments of aggregate functions, the filter will automatically uncomment C-style comments that start with /*+ (currently only at spots where it makes sense). The same can be used for default arguments.  (This is useful when using function overloading to mimic default arguments, which are not supported by Greenplum or PostgreSQL <= 8.2). Example:

      CREATE AGGREGATE fancyAggregate(/*+ "identifierA" */ INTEGER) ( ... )
      CREATE FUNCTION amazingFn(val DOUBLE PRECISION /*+ DEFAULT .01 */) RETURNS INTEGER ...

      will be translated into:

      <inferredReturnType> fancyAggregate(integer identifierA) { };
      integer amazingFn(float8 val = .01) { };
      
      
    • For aggregates, the return type will be automatically inferred from the transition state / final function
    • Capitalization of identifiers will be preserved if put in quotes "iDeNtiFiEr"
    • Line numbers are preserved
       
  • Still to be implemented:
    • Support for all PostgreSQL types
    • Automatically generate documentation for type definitions

General

  • All module documentation should be moved to .sql_in files. See bayes.sql_in and regression.sql_in as examples.
  • All uninstallation SQL files should end in "_drop.sql_in". Otherwise, they show up in Doxygen as visual clutter in the file list.
  • All files containing a "/sql/" in their path are excluded. These files are assumed to belong to regression tests and should not clutter the file list, either.
  • When in doubt, stick to the best practices of the language you are using. E.g., Python gives the following advice for its docstrings: http://www.python.org/dev/peps/pep-0257

Math symbols in user docs

Math symbols can be obtained while compiling documentation using two methods: 

  1. Using MathJax (default): MathJax is a JavaScript display engine that displays mathematics directly on the browser. By default, the MathJax CDN is used to access the Javascript files. If it is desired to use a local installation of MathJax then set the environment variable MATHJAX_DIR to the folder containing the MathJax.js file. 
  2. Using Latex, dvips and gs: Doxygen allows compiling images for each formula in the documentation using latex and other tools. To use this option, ensure that all prerequisite libraries are installed (as described here). Set CMake variable -DDOXYGEN_USE_MATHJAX=NO to disable compiling with MathJax. 

Section Guide

  1. Create a new group for your module (in methods/mainpage.dox) and use @addtogroup your_module.
  2. Write @about section to describe your algorithm.
  3. Write @prereq section, for example: Requires SVEC MADlib module. Nothing about PostreSQL or Greenplum database.
  4. Write @usage section to describe the API. (In the future we may need to split this into os-level side and in-db side.)
  5. Write @examp section. The reason we say 'examp' (instead of example) is because we don't want to see this on a Doxygen example tab.
  6. Use @literature to list your references.

See linear regession documentation for an example that derives from the source linear.sql_in

  • No labels