A communications protocol necessarily deals with channels of communication and with serialization. While the old client/server protocol in Geode used an undocumented proprietary on-wire format we are considering using a third-party library for either serialization or serialization and communications combined.
Any library we choose to use must support backward-compatibility so that clients running old versions of Geode can talk to servers running a newer version.
- A client/server communications library must fit into the server-side architecture or it will only offer partial functionality.
- In a Geode server a client is identified with a ClientProxyMembershipID that uniquely identifies the client and contains information characterizing the client. We need to be able to create an equivalent identifier with a third-party library.
- A Geode server can identify a client when a message is received. That is, we know the clients aforementioned ID. This must be possible with a third-party library.
- A Geode server can send subscription events to a client. These are messages pushed from the server to a client via a client-initiated communication channel (e.g., TCP/IP socket).
- The protocol should not require persistent storage
In addition, we want to have a message representation that facilitates modularization of Geode code and client code. A “put” message handler shouldn’t have to deal with transactions or authorization checks, as is currently don’t in Geode client/server message handlers.
Geode also handles messages from old clients, so a serialization protocol for client/server messages must provide backward compatibility features such as those found in Protobuf.
If we choose a library such as Protobuf, Thrift, Avro, etc., it should be a mature product with an active developer base and support the widest range of clients possible. We must also ensure that it fits with our requirements, especially if it implements messaging as well as serialization. We must also run benchmarks and collect the same metrics from each library for comparison.
Metrics should include
- Serialization time
- Deserialization time
- Serialized form size in bytes
- CPU time in client
- CPU time in server
- Garbage generated in client
- Garbage generated in server
If the library also implements messaging we need a few more things
- Message/response latency
A library benchmark should use Java Micro Benchmark to simulate a “get” operation that returns a byte array (cache keys and values will possibly be serialized using a different mechanism). For libraries implementing messaging we should gather metrics for a varied number of threads: 1, 10 and 100.
All final benchmark figures should be collected on the same machine using the same version of Java.
Libraries to consider
This Wikipedia page has some qualitative comparisons: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats.
If you look at existing comparisons and benchmarks there are a few that always sit at the top regarding performance
For serialization: Protobuf, Thrift, MessagePack
For serialization & messaging: gRPC, Thrift
I think Apache Avro ought to also be considered for serialization and messaging since it is an Apache project and its serialization is somewhat similar to Geode’s PDX.