Extended handling of "optional" fields

Proposal:

Make the expression of an "optional" field optional.

When parsing an optional field the parser always saves the current position before starting to parse the field.

If when parsing the field no "AssertionException" is being caught, this information is just discarded. If however such an exception is caught, then the parser resets the position to the initial position and the next field can continue parsing from that location.

In addition to this, we add a new field type: "assertion" ... these can generally be seen similar to "const" fields. However in contrast to "const" fields these need to be saved as properties in the model. An assert field contains an expression field. If the value parsed matches this, the value is saved in the property. If however this value doesn't match the expected value, an "AssertionException" is thrown.

These changes are intended especially for protocols like BacNET and PROFINET where there are optional fields, based on their content.

Example:

['0x07' BACnetUnconfirmedServiceRequestWhoHas
    [optional BACnetComplexTagUnsignedInteger 'deviceInstanceRangeLowLimit'                                        ['0', 'BACnetDataType.UNSIGNED_INTEGER' ]]
    [optional BACnetComplexTagUnsignedInteger 'deviceInstanceRangeHighLimit' 'deviceInstanceRangeLowLimit != null' ['1', 'BACnetDataType.UNSIGNED_INTEGER' ]]
    [optional BACnetComplexTagOctetString     'objectIdentifier'                                                   ['2', 'BACnetDataType.OCTET_STRING'     ]]
    [optional BACnetComplexTagOctetString     'objectName'                   'objectIdentifier == null'            ['3', 'BACnetDataType.OCTET_STRING'     ]]
]

[discriminatedType 'BACnetComplexTag' [uint 4 'tagNumberArgument', BACnetDataType 'dataType']
    [assert        uint 4           'tagNumber'                 'tagNumberArgument'                                           ]
    [const         TagClass         'tagClass'                  'TagClass.CONTEXT_SPECIFIC_TAGS'                              ]
    [simple        uint 3           'lengthValueType'                                                                         ]
    [optional      uint 8           'extTagNumber'              'tagNumber == 15'                                             ]
    [virtual       uint 8           'actualTagNumber'           'tagNumber < 15 ? tagNumber : extTagNumber'                   ]
    [virtual       bit              'isPrimitiveAndNotBoolean'  '!(lengthValueType == 6) && tagNumber != 1'                   ]
    [optional      uint 8           'extLength'        'isPrimitiveAndNotBoolean && lengthValueType == 5'                     ]
    [optional      uint 16          'extExtLength'     'isPrimitiveAndNotBoolean && lengthValueType == 5 && extLength == 254' ]
    [optional      uint 32          'extExtExtLength'  'isPrimitiveAndNotBoolean && lengthValueType == 5 && extLength == 255' ]
    [virtual       uint 32          'actualLength'     'lengthValueType == 5 && extLength == 255 ? extExtExtLength : (lengthValueType == 5 && extLength == 254 ? extExtLength : (lengthValueType == 5 ? extLength : (isPrimitiveAndNotBoolean ? lengthValueType : 0)))']
    [typeSwitch 'dataType'
        ['NULL' BACnetComplexTagNull
        ]
        ['BOOLEAN' BACnetComplexTagBoolean
        ]
        ['UNSIGNED_INTEGER' BACnetComplexTagUnsignedInteger [uint 3 'lengthValueType', uint 8 'extLength']
            [array int 8 'data' length '(lengthValueType == 5) ? extLength : lengthValueType']
        ]
        ['SIGNED_INTEGER' BACnetComplexTagSignedInteger [uint 3 'lengthValueType', uint 8 'extLength']
            [array int 8 'data' length '(lengthValueType == 5) ? extLength : lengthValueType']
        ]
        ['REAL' BACnetComplexTagReal [uint 3 'lengthValueType', uint 8 'extLength']
            [simple float 8.23 'value']
        ]
        ['DOUBLE' BACnetComplexTagDouble [uint 3 'lengthValueType', uint 8 'extLength']
            [simple float 11.52 'value']
        ]
        ['OCTET_STRING' BACnetComplexTagOctetString [uint 32 'actualLength']
            // TODO: The reader expects int but uint32 get's mapped to long so even uint32 would easily overflow...
            [virtual    uint    16                           'actualLengthInBit' 'actualLength * 8']
            [simple     string 'actualLengthInBit' 'ASCII'   'theString']
        ]
        ['CHARACTER_STRING' BACnetComplexTagCharacterString
        ]
        ['BIT_STRING' BACnetComplexTagBitString [uint 3 'lengthValueType', uint 8 'extLength']
            [simple uint 8 'unusedBits']
            [array int 8 'data' length '(lengthValueType == 5) ? (extLength - 1) : (lengthValueType - 1)']
        ]
        ['ENUMERATED' BACnetComplexTagEnumerated [uint 3 'lengthValueType', uint 8 'extLength']
            [array int 8 'data' length '(lengthValueType == 5) ? extLength : lengthValueType']
        ]
        ['DATE' BACnetComplexTagDate
        ]
        ['TIME' BACnetComplexTagTime
        ]
        ['BACNET_OBJECT_IDENTIFIER' BACnetComplexTagObjectIdentifier
        ]
    ]
]

Controlling/Changing the endianess

Especially in parts of the PROFINET protocol, the endianess needs to change througout the protocol stack. So our current approach with one fixed endianess doesn't work well in this case. I've tried using manual fields, but the result was everything but ideal.

In general I've encountered multiple situations:

One layer of the protocol stack is fixed to a given endianess (UDP is Big Endian, for example, PROFINET IO CM Blocks are Big Endian too)
Endianess of fields are dependent on some parsed values (In DCE/RPC there is a 4 bit field, that controlls if the following fields are BigEndian or LittleEndian)
Endianess of all fields of a complex type are dependent on the endianess of the parent (In case of PROFINET IO CM the ProfinetIO CM packet has the same endianess of the DCE/RPC packet that contains it)

In order to address this we discussed 3 options:

Adding a new block type "endianessSwitch" which sets the endianess of the parser to a given endianess, and hereby reads all the fields it contains using this endianess and then returns back to it's original endianess when leaving the block
We add a flag to fields that allow setting the endianess for parsing this field (and if it's a complex one ... all of it's children
We add a flat to the type definition

All options have their advantage/disadvantage.

Adding a flag to the field would simply require us to come up with a sensible notation, which I haven't currently found anything I would feel comfortable with.

With adding a flag to a type: This seems simple, but how would we control this from the usage?

Chris' thoughts on this

In all cases we would simply make the "endianess" setting in the ReadBuffer and WriteBuffer writable. So we can simply read the endianess settings a buffer currently has and we can change this. This would not require us to copy and duplicate the datastructures and waste memory. Reading BE or LE has absolutely no impact on how the read/write-position is updated ... it simply controlls if the read values are returned in an inverted byte order. So it should not have any memory or performance impact, even if we keept toggling endianess after every field we read.

I would propose implementing two of these aboe scenarios:

We define a new block type of "endianessSwitch" ... this can either take a constant "LE" or "BE" (Or the long form, which might be more explicit) ... or an expression which evaluates to one of the two.
We add the BE/LE flag to type definitions (But a reduced form, which explicitly doesn't support dynamic endianess)

Now to my reasoning:

Usually, it will not only be one field that needs to be switched, so adding a flag to the fields would require duplicating a lot of the flagging-code to the mspec documents. Using a block of equally endianed fields simplifies things. And in addition to that it explicitly focusses on the detail of setting the endianess, which should simplify implementing this in the code-generation ... as a endianessSwitch simply gets converted into an allmost constant try-finally block.

Not allowing dynamic switching in types and simply forcing this to be constant solves the problem of how the system tells the parser which endianess to use.

So taking into account the 3 scenarios above. This would then be solved the following way:

"One layer of the protocol stack is fixed to a given endianess"

Set the UDP or the PROFINET IO CM Block type definitions to "BE".

"Endianess of fields are dependent on some parsed values"

Use a endinanessSwitch.

"Endianess of all fields of a complex type are dependent on the endianess of the parent"

If the type definition doesn't have any BE/LE flags, it simply uses the endianess of the parent.

Sebastian' thougths on this

We can support all three cases:

For the endianess I would implement this as a generic `batchSetAttribute` e.g.
[batchSetAttribute endianess='integerEncoding == IntegerEncoding.BIG_ENDIAN' [simple sometype field1...] [simple sometype field2...] [simple sometype field3...] [simple sometype field4...] ]
For the field it would have the same attribute as in 1
[simple sometype field1... endianess='integerEncoding == IntegerEncoding.BIG_ENDIAN']
This would be like any other type arg to me

Defining it this way it would be future-proof, robust and re-useable for other attributes. Additionally I would (a discussed ages ago) define a top-level element for endianess in the mspec to make it part of the definition.

Naming "endianess" "byte order"

It seems that we should name the attribute "byte order" instead of "endianess".

Cleaning up the string types

Right now all simple types (except bit) have a fixed lenght. The string type was changed to support expressions.

This sort of makes things a bit inconsistent, because a fixed length string field would require providing the constant lenght in an expression.

It would be good if a fixed length would use the usual length and if a dynamic length would use either a different name "vstring" (just a thought) or if it could use both notations.

Also could we use the attribute concept sebastian for the endianess for the encoding. So the default would be UTF-8, but it could be changed with "encoding='UTF-16'".

Cleaning up the way we handle Encodings

Right now we only used "encoding" settings for string fields, however this is not 100% correct. For other places we just implicitly defined them. For example for float we have the "float 8.23" and "float 11.52" this actually defines two different encodings. So we'll change this to "float 32" and "float 64" which defaults to IEEE floating point encoding.

Cleaning up the code-generation

Especially for Java the templates have been becoming more and more complex. Especially when we would be adding all of the "byte order setting" this would become quite unmaintainable.

The idea was to create a set of static functions, that can be imported with static imports, that handle the logic for the different types of fields have.

Every field method would have the individual read/write operation as first argument, followed by all the mandatory pieces of information.

Optional attributes would be passed with var-arg parameters at the end.

    // simple type case
    lalala = readOptionalField("lalala",  dataReaderUnsignedInt( ....), something == true)

    // Enum type case // TODO: magic trick is to use the function pointer of the enum parse static method
    lalala = readOptionalField("lalala", enumReader(TypeOfHurz::enumForValue, dataReaderUnsignedIntCreatorDingels( ... readBuffer ....)),  something == true, withAdditonasdl(), withEncoding(), withFickiciy())

    // Complex type case // TODO: magic trick is to use a supplier lambda for the staticParse
    lalala = readOptionalField("lalala", complexTypeReader(()->TypeOfHurz.staticParseIwas(readBuffer, asdasdasdasd, asdasd), readBuffer),  something == true)

Cleaning up the way params are passed to types

In General we decided to replace the square bracets around the passing of parameters from one type to the other with round bracets. This makes the syntax a bit more similar to normal programming languages.

We also moved the place where params are passed from the end to directly after the complex type name.

Space shortcuts

Page tree

Extended handling of "optional" fields

Controlling/Changing the endianess

Chris' thoughts on this

"One layer of the protocol stack is fixed to a given endianess"

"Endianess of fields are dependent on some parsed values"

"Endianess of all fields of a complex type are dependent on the endianess of the parent"

Sebastian' thougths on this

Naming "endianess" "byte order"

Cleaning up the string types

Cleaning up the way we handle Encodings

Cleaning up the code-generation

Cleaning up the way params are passed to types

Space shortcuts

Page tree

MSPEC improvements

Extended handling of "optional" fields

Controlling/Changing the endianess

Chris' thoughts on this

"One layer of the protocol stack is fixed to a given endianess"

"Endianess of fields are dependent on some parsed values"

"Endianess of all fields of a complex type are dependent on the endianess of the parent"

Sebastian' thougths on this

Naming "endianess" "byte order"

Cleaning up the string types

Cleaning up the way we handle Encodings

Cleaning up the code-generation

Cleaning up the way params are passed to types