The format is designed to act as an universal serialization format. It can serialize arbitrary object graphs (including reference loops between objects). It is cross-platform in a sense that Ignite clients written in different languages understand that format.
Also it worth noting that there are two matters:
By default Ignite uses binary serialization for storing objects in caches. Consider a following example:
Here an object of a user class will be converted to binary format to be stored in a cache. And a reverse conversion will take a place when object is read.
Binary object container format basically allows to store a set of named fields with their values. A simple example with a such structure is plain Java class:
Binary containers are context-dependent. It means that serialized byte array does not bear all information about a class name and field names. To restore Java object from bytes a type and a schema should be registered on a receiving side. It works quite naturally for an example of storing data to a cache provided earlier. But some information about type and fields must be present in binary form. Binary object container format stores type id and schema id for that purposes. Having them and a proper context it is possible to restore e.g. Java object from bytes.
Schematically such context can be represented as follows:
At a very top level a binary serialized object consists of the following parts:
Header contains various meta information and a structure is described below. Data part contains values of binary object fields. Footer contains field offsets. A particular field offset point out a field position.
Sizes are in bytes.
|Value type||1||Used for nesting, value is 0x67 as it is binary object|
|Format version||1||To be able add new format versions in future|
|Flags||2||Various flags, see below|
|Type id||4||Hash function of type name|
|Hash code||4||Hash code for a cache key (not very useful for values)|
|Total length||4||Object length in bytes including Header, Data and Footer|
|Schema id||4||Hash function of field ids|
|Footer position||4||Offset in bytes from object beginning|
NOTE: at a time of writing constants with offsets were located in GridBinaryMarshaller class.
NOTE: constants were located in BinaryUtils class.
Offsets of object fields are stored in Footer part. With stored offsets it takes constant time to find where a particular field resides. Header specifies a footer position. Each offset can take 1, 2 or 4 bytes (it is specified by offset flags in a header). Footer can be compact or not (more details later).
Let's consider an example of an object with two fields:
And here it is after serialization:
Binary format support multiple types as first-class citizens. As described below each binary container starts with byte 0x67 which indicates a binary container type. Each field value inside binary container (except nulls) starts from one byte specifying a type. In the example provided earlier there were 0x03 and 0x09 type bytes for int and String conversely. First (and only) byte of a null value is 0x65. Among other supported types there are maps and lists. If any type is not supported directly then it can be represented as nested binary container (type 0x67).
Binary format supports two modes for writing object Footer. It is controlled by BinaryConfiguration.setCompactFooter. Initially there were no compact footer mode in the format. Instead additionally to an offset fieldId was stored for each field in a footer. For the previous serialization example verbose footer looks as follows:
Calculated hash code is stored in object header. Hash code is used when object acts as key in cache, otherwise it is perhaps redundant. By default a hash code is calculated using bytes from Data part.
As was already mentioned each field (not raw) value in binary format starts with 1 byte indicating a type. If a serialized object class does not have special serialization in binary format (like other simple types like int, long, String have) it can be serialized as a nested binary object. In that case it's value starts from 0x67 byte.
Things become even more interesting when there is a need to store an object links from which form a graph with cycles. For example a following tree representation will produce such object graph:
Let's examine a serialization of tree with just 3 nodes, 1 root and 2 children. 3 nodes, root, a, b.
root.parent = null, root.left = a, root.right = b
a.parent = root, a.left = a.right = null
b.parent = root, b.left = b.right = null
Here links a parent object which comes before children in binary stream are marked with a type byte 0x66 which is Handle type (link, reference). And a value is 4-byte integer of a back offset from this handle field to a original object. It is a back offset because an original object is located before a handle. Reminder null is encoded as single byte 0x65. Note that each object (root, a, b) has the same type id and schema id here.
Let's consider an read/write example with a cache:
As we know inside cache storage Pair will be stored in binary format. But how forward and backward conversion is performed?
For a forward conversion it is roughly as follows:
Backward conversion requires BinarySchema and a class name accessible for a given typeId and schemaId, let's assume that we have them. Here is how it goes:
In fact it is possible that BinarySchema and a class name is not accessible for a particular binary object. E.g. when your first operation is reading a value from cache. There is a special machinery for such cases – BinaryMetadata registration. Generally it allows to request remotely needed metadata for typeId/schemaId including class names and schemas. Detailed description is outside of this document scope.
Additionally for storing a number of field values binary object format is capable for storing raw bytes which are supposed for custom interpretation. One often example is serialization for classes implementing Binarylizable. Binarylizable object has user defined serialization/deserialization methods. And after binary serialization such object will contain no fields (and no schema).
Consider an example:
And here are bytes after serialization:
Also internally format allows to store raw data additionally to fields serialized using ordinary binary serialization technique. In such case serialized object structure will be different. Details are not described here as the case seems unusual.