Finite State Machines introduction

A Finite State Machine is a model of behavior composed of states, transitions and actions. The state is the place we currently are in the system. We have been previously in another state, and we will transit in another state, depending on a transition. A transition is a way to go from one state to another state, under some specific conditions. For a given condition, there is only one path from state S1 to state S2. Further more, we will not accept two transitions from a state S1 to a state S2 (we don't need it, even if it is seen as a restriction).

The following schema shows how we can transit from one state to two different states :
 
We will use this notation to describe this little FSM :
Sn -[DIRxSBOX:a]-> Sn+1
Sn -[DIRxSBOX:b]-> Sn+k
where the left part is the starting state, the right part is the ending state and -[DIRxSBOX:X]-> is a transition in respect with the condition X.
The following schema shows how we can reach a state from two different states :
 
Here is the FSM :
Sn -[DIRxSBOX:a]-> Sk
Sm -[DIRxSBOX:b]-> Sk
The following schema shows a bad transition between two states :
 

States

There are three different kinds of states :

  • Starting state
  • Ending state
  • Normal state
    The starting state is the state which is the first one in the graph. We will have only one starting state for a FSM.
    The ending state is the last state of the FSM. We won't have any transition from this state. We will have only one ending state for a FSM (but we may represent more than one, for the clarity of the schema). All other states are normal states, with one or more incoming transitions, and with one or more outgoing transitions. We can also have reflexive transitions ( Sn -[DIRxSBOX:X]-> Sn ).

States just are logical places where we can say we store some information representing the current state.

Transitions

Transitions are just a way to go from a starting state to an ending state, in respect with a condition.

We can read the following notation :
Sn -[DIRxSBOX:C]-> Sm
like :
"transit from state Sn to state Sm if and only if the condition C is true"

Actions

When we have a transition from a state to another, we may want to trigger an action. This action will be executed just before the new state is reached; it can be seen as a part of the transition, but also as a part of the state.

Let's introduce a specific notation:
Sn -[DIRxSBOX:k]-> Sm(a)

Here, state Sn transits to state Sm if condition K is met, and the action a is executed before state Sm could be considered as active.

Example

At this point, a real-life sample could help a lot. We will use a piece of Ldap ASN.1 grammar for that purpose.

As this grammar is pretty complexe, we will just take a subset of it: the UnbindRequest.

Bar.java
LDAPMessage ::= SEQUENCE {
                messageID       MessageID,
                UnbindRequest ::= [DIRxSBOX:APPLICATION 2] NULL
                controls       [0] Controls OPTIONAL }

Controls ::= SEQUENCE OF Control

Control ::= SEQUENCE {
                controlType             LDAPOID,
                criticality             BOOLEAN DEFAULT FALSE,
                controlValue            OCTET STRING OPTIONAL }

The FSM which represent this grammar is the following :
 
The transitions are using hexadecimal values, which are ASN.1 BER codes for different kind of objects :

 Code

Type 

Description 

 0x01

 BOOLEAN

 It describes a boolean value

 0x02

 INTEGER

 It describes an Integer value

 0x04

 OCTET STRING

 It describes an octet string value

 0x30

 SEQUENCE / OF

 The following elements are included in a sequence

 0x42

 APPLICATION [DIRxSBOX:2]

 The tag is an epplication specific one, and it's number is 2

 0x80

 CONTEXTUAL[DIRxSBOX:0]

 The tag is a contextual tag, and its number is 0

 #

none

This value is used as a substitute for an invalid value

Of course, a deep knowledge of ASN.1 BER encoding is necessary to understand where those values came from. At this point, it's not really important. What we should have in mind is that those values are like keys. We will have them, and they will drive the transitions.

Now, we also have to know that, in ASN.1 BER encoding, each element is encoded as a TLV (Type/Length/Value). For instance, an OctetString as a type (0x04), a length, which is the number of bytes necessary to store the value, and a value.

The string 'sample' is encoded as : 0x04 0x06 0x73 0x61 0x6D 0x70 0x6C 0x65

  • 0x04 is the type
  • 0x06 is the length ( 6 bytes)
  • 0x73 0x61 0x6D 0x70 0x6C 0x65 are the bytes for 'sample' (the ASCII code for each char)
    So we can have a transition like:
controlType --[DIRxSBOX:0x04]--> controlValue

if the controlvalue is 'sample' and if we have reached the state controlType

Let's add an action :

controlType \-\-[DIRxSBOX:0x04]\-\-> controlValue (storeControlValue)

storeControlValue( ) <=
  check that the value is not null otherwise throw an exception
  store the controlValue in the current control object

We can now represent the FSM for this little grammar as a table:

 

 0x01

0x02

0x04 

0x30 

0x42 

0x80

#

 start

  error

  error

  error

 LdapMessage

  error

  error

  error

 LdapMessage

  error

 messageId

  error

  error

  error

  error

  error

 messageId

  error

  error

  error

  error

 unbindRequest

  error

  error

 unbindRequest

  error

  error

  error

  error

  error

 controls

(tick)end 

 controls

  error

  error

  error

 control

  error

  error

  error

 control

  error

 controlType

  error

  error

  error

  error

  error

 controlType

 criticallity

  error

 controlValue

 control

  error

  error

(tick)end

 criticallity

  error

  error

 controlValue

 control

  error

  error

(tick)end 

 controlValue

  error

  error

  error

 control

  error

  error

(tick)end 

This table is the key.

Decoding an UnbindRequest

 The decoder will use it to parse the PDU. Let's play with it :

PDU : 0x30 0x05 0x02 0x01 0x23 0x42 0x00 0x30

This PDU has been received. This is a new message, so we will set the FSM state to start

step 1 : from start to LdapMessage state 

The table tells us that if we have a 0x30 tag, and if we are in state start, then we must switch to state LdapMessage. There is no associated action:

TLV

state

next state

action

0x30 0x05

start

LdapMessage

store the expected V length : 5

Here, we just keep a track of the value expected value size, as the T is a constructed PDU (this is a SEQUENCE OF)

step 2: from LdapMessage state to messageId state

Now, we have read the first two bytes, 0x30 0x05. The next TLV is terminal, as it's an INTEGER (0x02 0x02 0x23).

TLV

state

next state

action

0x30 0x05

start

LdapMessage

store the expected V length : 5

0x02 0x01 0x23

LdapMessage

messageId

- decrement the expected size by 3
- Store the MessageId : 35 (0x23)

We have already decoded two elements of the grammar, and read 5 bytes. The expected number of remaining bytes to read isequal to 2.

step 2: from messageId state to UnbindRequest state

The next bytes are 0x42 0x00. We have a valid transitition for the 0x42 byte : UnbindRequest

TLV

state

next state

action

0x30 0x05

start

LdapMessage

store the expected V length : 5

0x02 0x01 0x23 

LdapMessage

messageId

- decrement the expected size by 3
- Store the MessageId : 35 (0x23)

0x42 0x00

messageId

unbindRequest

- create an UnbindRequest object
- store the messageId into it
- decrement the expected size by 2

Now, we have an expected size which is 0. This is the end of this PDU, we have decoded a full message. What about the last byte (0x30)? Well, this is the start of a new PDU, but at the current one has been fully decoded, we can return the created message to the caller. The remainding byates are not lost, anyway. Another call to the decoder will use those bytes and will try to decode them.

Step 2 reloaded : from messageId state to ???

Let's now suppose we have this PDU:

PDU: 0x30 0x05 0x02 0x01 0x23 0x80 0x00 0x30

The 0x80 value drive to an error, when we are in state messageId. We will just throw an error. This mechnaism is really interesting, because it's a fail fast one. We won't do any attempt to recover from this error: the PDU *must* be correct.

Conclusion  

FSM are perfect to decode ASN.1 encoded data. It's a fast, reliable, fail-fast and simple to implement technic. However, there are some drawbacks:

  • Implementing the mechanism is not by itself a serious concern, but in our case, we have a two levels FSM, as we have to deal with T, L and V in another FSM.
  • The Length control is not very easy. We must implement a stack which keep a track of each level length, to be sure that the PDU is correct
  • A FSM needs a table for each element of the grammar we want to parse. If we don't have a compiler to build these tables, it's a quite error prone work to build them by hand
  • Debugging a FSM is not very simple. We could add some debug trace, but it's clearly more complicated than a recusive descent.
    However, thos drawbacks are really minor regarding the advantages of using FSM to decode ASN.1 BER encoded data.
  • No labels