(As-is-built 2021-10-06)

This page describes an enhancement to the DFDL language, the Daffodil DFDL Layering extension, enabling DFDL schemas using this extension to describe data formats in such a way as to enable algorithmic computations of CRCs, Checksums, Parity bits, etc. These are checked when parsing, to insure data validity and/or well-formedness, and are recomputed and incorporated into the data infoset when unparsing data. 

Principles of Operation

Daffodil's layers are able to read and write DFDL variables. Variables are used to provide parameters to layer computations, and to receive results back from layer computations. 

At parse time, a variable that is set by a layer computation can be the calculated value of a checksum, crc, or parity bit. This can then be compared by ordinary DFDL assert/discriminator statements, or by validation constraints (such as via Schematron rules) against the actual corresponding infoset value of the checksum/crc/parity element carried in the data.

A failing DFDL assert would deem data where the checksum/crc/parity does not have the matching value to be malformed. A failing Schematron rule where the checksum/crc/parity does not have a matching value would deem the data to be well-formed but invalid. The choice of which is preferred is up to the DFDL schema author. 

At unparse time, a variable that is set by a layer computation can also calculate the value of a checksum/crc/parity across the layer data in the same manner as when parsing. The element that carries this value in the infoset can have a dfdl:outputValueCalc property referencing this variable to insert the new checksum/crc/parity value into the infoset for subsequent unparsing by normal means.

All the normal DFDL behaviors around forward referencing apply if the infoset element carrying the checksum/crc/parity value appears before the layered sequence itself. 

Some details:

  • Layer transformations are able to read and write DFDL variables, which means they need access to the parse or unparse state of the computation.
  • When parsing, the DFDL variables read or written by a layer transformation must be defined or it is an SDE. 
  • When unparsing, DFDL variables read by a layer transformation must either have a value, or the read causes suspension of the layer transformation if the variable is not yet defined. This layer transformation is resumed once the variable gets its value. 
  • When unparsing, computations of dfdl:outputValueCalc, or dfdl:length or other runtime-valued expressions can reference DFDL variables that are (to be) set by a layer transform. Such computations will suspend if the layer transformation has not yet set the variable, and resume when the variable is set. 

Examples

Worked examples showing layering transformations that implement checksums and parity are:

  • IPv4 packets - See the daffodil-test src/test/resources/org/apache/daffodil/layers/IPv4.tdml test, related IPv4.dfdl.xsd schema. This is fake, but very much like IPv4. 
    • There is also a PR to update the ethernetIP github DFDL schema project to compute checksums.  https://github.com/DFDLSchemas/ethernetIP/pull/1. However, there are checksums in the IPv4Header, UDP, TCP, ICMP, etc. So the work involved here is extensive.
    • This work likely should be done only after DAFFODIL-1927 - Getting issue details... STATUS is complete so that the pluggable layers that compute these checksums can be created as a plugin. 
  • GPS data with parity - See the GPS-SPS DFDL schema on the DFDL schemas github site.
  • Check-digit computation - See the daffodil-test src/test/resources/org/apache/daffodil/layers/TestCheckDigit.tdml test file and related checkDigit.dfdl.xsd schema. 


  • No labels