This is a page to record processes and procedures related to documentation, especially tips for developers who would like to contribute to the documentation.
The entire rendered Impala documentation set is now available on the Documentation tab of the Apache Impala web site. The rendered documentation is available in HTML and PDF. These links include all of the currently available Impala documentation: the release notes, SQL reference, installation, administration, and development guides.
Source of the main Impala documentation (SQL Reference and such) is in XML, using the DITA XML format and buildable by an open source toolchain.
Version control is through git. The doc source files live underneath the docs/ subdirectory, in the same repository as the Impala code.
XML Filename Conventions
Most Impala doc content is in files that live under the docs/topics/ subdirectory. (The exception is the set of reusable fragments that live in the docs/shared/impala_common.xml file.)
All the files under topics/ have a filename prefix of impala_ and a suffix of .xml.
Each SQL statement, query option, aggregate function, class of built-in scalar functions, and major areas such as "security" and "performance", has an associated XML file with a name starting with impala_, and then is the name of the thing with any spaces replaced by underscores. For example, impala_create_table.xml, impala_max.xml, impala_mem_limit.xml.
For instructions to set up the doc build environment and produce HTML and PDF, see the docs/README.md file.
XML Source Conventions
Frequently Used Tags
The essential tags for documenting programming-type information are similar to the tags found in HTML. For example:
A paragraph with some text.
<li>Unordered list item.</li>
<li>Ordered list item.</li>
Sample code and/or output.
Spacing is preserved in output, so don't add extra indentation.
Left-justify the codeblock start and end tags.
Internal and External Links
Reuse via conref= Attribute
Before checking in doc updates, make sure the XML markup is valid. To perform a basic check for mismatched tags, missing or extra or unescaped delimiters, and other mistakes that are straightforward to detect, run the command:
xmllint --noout <filename>.xml
You can run the command with multiple filenames, for example all the modified files that are about to be committed. (On a modern laptop, it only takes about 0.1 seconds to run xmllint --noout on all the XML files in the topics/ subdirectory, so you can be generous about validating frequently or for more files than strictly necessary.)
If everything is OK, the command produces no output. If there is any output, it will show the approximate line(s) where the XML error occurs.
By default, the xmllint command only checks the validity aspects that apply to any XML document, for example, that every start <tag> is matched by a closing </tag>. But not that the actual tag and attribute names are ones recognized by the DITA XML dialect. That extra level of checking requires additional setup. (Instructions pending.)
By convention, to allow for filtering in JIRA reports and email notifications, commits that are purely for documentation updates include the eyecatcher string DOCS inside square brackets. (Literal example with square brackets omitted for the moment because Confluence wants to turn it into a link.)
Often, an IMPALA- JIRA for a new feature or improvement has a subtask (with its own JIRA number) for documenting the new feature or the behavior changes. When including a JIRA number in the subject line of the commit message, prefer to use the JIRA number for the parent issue rather than the one for the doc-specific subtask.