DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Finite State Machines introduction
A Finite State Machine is a model of behavior composed of states, transitions and actions. The state is the place we currently are in the system. We have been previously in another state, and we will transit in another state, depending on a transition. A transition is a way to go from one state to another state, under some specific conditions. For a given condition, there is only one path from state S1 to state S2. Further more, we will not accept two transitions from a state S1 to a state S2 (we don't need it, even if it is seen as a restriction).
The following schema shows how we can transit from one state to two different states :
We will use this notation to describe this little FSM :
Sn -[DIRxSBOX:a]-> Sn+1
Sn -[DIRxSBOX:b]-> Sn+k
where the left part is the starting state, the right part is the ending state and -[DIRxSBOX:X]-> is a transition in respect with the condition X.
The following schema shows how we can reach a state from two different states :
Here is the FSM :
Sn -[DIRxSBOX:a]-> Sk
Sm -[DIRxSBOX:b]-> Sk
The following schema shows a bad transition between two states :
States
There are three different kinds of states :
- Starting state
- Ending state
- Normal state
The starting state is the state which is the first one in the graph. We will have only one starting state for a FSM.
The ending state is the last state of the FSM. We won't have any transition from this state. We will have only one ending state for a FSM (but we may represent more than one, for the clarity of the schema). All other states are normal states, with one or more incoming transitions, and with one or more outgoing transitions. We can also have reflexive transitions ( Sn -[DIRxSBOX:X]-> Sn ).
States just are logical places where we can say we store some information representing the current state.
Transitions
Transitions are just a way to go from a starting state to an ending state, in respect with a condition.
We can read the following notation :
Sn -[DIRxSBOX:C]-> Sm
like :
"transit from state Sn to state Sm if and only if the condition C is true"
Actions
When we have a transition from a state to another, we may want to trigger an action. This action will be executed just before the new state is reached; it can be seen as a part of the transition, but also as a part of the state.
Let's introduce a specific notation:
Sn -[DIRxSBOX:k]-> Sm(a)
Here, state Sn transits to state Sm if condition K is met, and the action a is executed before state Sm could be considered as active.
Example
At this point, a real-life sample could help a lot. We will use a piece of Ldap ASN.1 grammar for that purpose.
As this grammar is pretty complexe, we will just take a subset of it: the UnbindRequest.
LDAPMessage ::= SEQUENCE {
messageID MessageID,
UnbindRequest ::= [DIRxSBOX:APPLICATION 2] NULL
controls [0] Controls OPTIONAL }
Controls ::= SEQUENCE OF Control
Control ::= SEQUENCE {
controlType LDAPOID,
criticality BOOLEAN DEFAULT FALSE,
controlValue OCTET STRING OPTIONAL }
The FSM which represent this grammar is the following :
The transitions are using hexadecimal values, which are ASN.1 BER codes for different kind of objects :
Code |
Type |
Description |
|---|---|---|
0x01 |
BOOLEAN |
It describes a boolean value |
0x02 |
INTEGER |
It describes an Integer value |
0x04 |
OCTET STRING |
It describes an octet string value |
0x30 |
SEQUENCE / OF |
The following elements are included in a sequence |
0x42 |
APPLICATION [DIRxSBOX:2] |
The tag is an epplication specific one, and it's number is 2 |
0x80 |
CONTEXTUAL[DIRxSBOX:0] |
The tag is a contextual tag, and its number is 0 |
# |
none |
This value is used as a substitute for an invalid value |
Of course, a deep knowledge of ASN.1 BER encoding is necessary to understand where those values came from. At this point, it's not really important. What we should have in mind is that those values are like keys. We will have them, and they will drive the transitions.
Now, we also have to know that, in ASN.1 BER encoding, each element is encoded as a TLV (Type/Length/Value). For instance, an OctetString as a type (0x04), a length, which is the number of bytes necessary to store the value, and a value.
The string 'sample' is encoded as : 0x04 0x06 0x73 0x61 0x6D 0x70 0x6C 0x65
- 0x04 is the type
- 0x06 is the length ( 6 bytes)
- 0x73 0x61 0x6D 0x70 0x6C 0x65 are the bytes for 'sample' (the ASCII code for each char)
So we can have a transition like:
controlType --[DIRxSBOX:0x04]--> controlValue
if the controlvalue is 'sample' and if we have reached the state controlType.
Let's add an action :
controlType \-\-[DIRxSBOX:0x04]\-\-> controlValue (storeControlValue) storeControlValue( ) <= check that the value is not null otherwise throw an exception store the controlValue in the current control object
We can now represent the FSM for this little grammar as a table:
|
0x01 |
0x02 |
0x04 |
0x30 |
0x42 |
0x80 |
# |
|---|---|---|---|---|---|---|---|
start |
error |
error |
error |
LdapMessage |
error |
error |
error |
LdapMessage |
error |
messageId |
error |
error |
error |
error |
error |
messageId |
error |
error |
error |
error |
unbindRequest |
error |
error |
unbindRequest |
error |
error |
error |
error |
error |
controls |
|
controls |
error |
error |
error |
control |
error |
error |
error |
control |
error |
controlType |
error |
error |
error |
error |
error |
controlType |
criticallity |
error |
controlValue |
control |
error |
error |
|
criticallity |
error |
error |
controlValue |
control |
error |
error |
|
controlValue |
error |
error |
error |
control |
error |
error |
|
This table is the key.
Decoding an UnbindRequest
The decoder will use it to parse the PDU. Let's play with it :
PDU : 0x30 0x05 0x02 0x01 0x23 0x42 0x00 0x30
This PDU has been received. This is a new message, so we will set the FSM state to start.
step 1 : from start to LdapMessage state
The table tells us that if we have a 0x30 tag, and if we are in state start, then we must switch to state LdapMessage. There is no associated action:
TLV |
state |
next state |
action |
|---|---|---|---|
0x30 0x05 |
start |
LdapMessage |
store the expected V length : 5 |
Here, we just keep a track of the value expected value size, as the T is a constructed PDU (this is a SEQUENCE OF)
step 2: from LdapMessage state to messageId state
Now, we have read the first two bytes, 0x30 0x05. The next TLV is terminal, as it's an INTEGER (0x02 0x02 0x23).
TLV |
state |
next state |
action |
|---|---|---|---|
0x30 0x05 |
start |
LdapMessage |
store the expected V length : 5 |
0x02 0x01 0x23 |
LdapMessage |
messageId |
- decrement the expected size by 3 |
We have already decoded two elements of the grammar, and read 5 bytes. The expected number of remaining bytes to read isequal to 2.
step 2: from messageId state to UnbindRequest state
The next bytes are 0x42 0x00. We have a valid transitition for the 0x42 byte : UnbindRequest
TLV |
state |
next state |
action |
|---|---|---|---|
0x30 0x05 |
start |
LdapMessage |
store the expected V length : 5 |
0x02 0x01 0x23 |
LdapMessage |
messageId |
- decrement the expected size by 3 |
0x42 0x00 |
messageId |
unbindRequest |
- create an UnbindRequest object |
Now, we have an expected size which is 0. This is the end of this PDU, we have decoded a full message. What about the last byte (0x30)? Well, this is the start of a new PDU, but at the current one has been fully decoded, we can return the created message to the caller. The remainding byates are not lost, anyway. Another call to the decoder will use those bytes and will try to decode them.
Step 2 reloaded : from messageId state to ???
Let's now suppose we have this PDU:
PDU: 0x30 0x05 0x02 0x01 0x23 0x80 0x00 0x30
The 0x80 value drive to an error, when we are in state messageId. We will just throw an error. This mechnaism is really interesting, because it's a fail fast one. We won't do any attempt to recover from this error: the PDU *must* be correct.
Conclusion
FSM are perfect to decode ASN.1 encoded data. It's a fast, reliable, fail-fast and simple to implement technic. However, there are some drawbacks:
- Implementing the mechanism is not by itself a serious concern, but in our case, we have a two levels FSM, as we have to deal with T, L and V in another FSM.
- The Length control is not very easy. We must implement a stack which keep a track of each level length, to be sure that the PDU is correct
- A FSM needs a table for each element of the grammar we want to parse. If we don't have a compiler to build these tables, it's a quite error prone work to build them by hand
- Debugging a FSM is not very simple. We could add some debug trace, but it's clearly more complicated than a recusive descent.
However, thos drawbacks are really minor regarding the advantages of using FSM to decode ASN.1 BER encoded data.



