Proposal: Checksums, CRC, Parity - Layering Enhancements

(This page is work-in-progress/WIP)

This page describes an enhancement to the Daffodil DFDL Layering extension, enabling DFDL schemas using this extension to describe data formats in such a way as to enable algorithmic computations of CRCs, Checksums, Parity bits, etc. These must be checked when parsing, to insure data validity and/or well-formedness, and must be computed and incorporated into the data infoset when unparsing data.

Principles of Operation

If Daffodil's layers are able to read and write DFDL variables, then these can be used to provide parameters to layer computations, and to receive results back from layer computations.

At parse time, a variable that is set by a layer computation can be the calculated value of a checksum, crc, or parity bit. This can then be compared by ordinary DFDL assert/discriminator statements, or by validation constraints (such as via Schematron rules) against the actual corresponding infoset value of the checksum/crc/parity element carried in the data.

A failing DFDL assert would deem data where the checksum/crc/parity does not have the matching value to be malformed. A failing Schematron rule where the checksum/crc/parity does not have a matching value would deem the data to be well-formed but invalid. The choice of which is preferred is up to the DFDL schema author.

At unparse time, a variable that is set by a layer computation can also calculate the value of a checksum/crc/parity across the layer data in the same manner as when parsing. The element that will carry this value in the infoset would have a dfdl:outputValueCalc property referencing this variable to insert the new checksum/crc/parity value into the infoset for subsequent unparsing by normal means.

All the normal DFDL behaviors around forward referencing would apply if the infoset element carrying the checksum/crc/parity value appears before the layered sequence itself.

Some details:

Layer transformations must be able to read and write DFDL variables, which means they need access to the parse or unparse state of the computation, not just the I/O layer state.
When parsing, DFDL variables read by a layer transformation must be defined or it is a runtime SDE.
When unparsing, DFDL variables read by a layer transformation must either have a value, or the read must cause suspension of the layer transformation if the variable is not yet defined. This layer transformation must be resumed once the variable gets its value.
When unparsing, computations of dfdl:outputValueCalc, or dfdl:length or other runtime-valued expressions can reference DFDL variables that will be set by a layer transform. Such computations must suspend if the layer transformation has not yet set the variable, and resume when the variable is set.
The restriction where a layered sequence may not carry DFDL statements such as setVariable or newVariableInstance is lifted. Such layered sequences are likely to be the locations where one or more newVariableInstance statements should appear to create instances for use by the layer transformation.

Examples

Worked examples showing layering transformations that implement checksums and parity are:

IPv4 packets - See the ethernetIP DFDL schema on the DFDL schemas github site.
- TBD: currently a pull request: https://github.com/DFDLSchemas/ethernetIP/pull/1
GPS data with parity - See the GPS-SPS DFDL schema on the DFDL schemas github site.
- TBD: currently a pull request: https://github.com/DFDLSchemas/gps-sps/pull/1
Check-digit computation - See the checkDigit DFDL schema on the OpenDFDL site.
- TBD: currently a pull request: https://github.com/OpenDFDL/checkDigit/pull/1

Page tree

Proposal: Checksums, CRC, Parity - Layering Enhancements

Principles of Operation

Examples