When using multiple WAN sites with geode users are required to set the distributed-system-id property (DSID). Each site must have its own unique DSID, and ids are limited to being between 1 and 255. This limits the number of WAN sites to 255. It also requires that external coordination before setting up a WAN site to make it that it gets a DSID. Once data is generated within a site, it is not possible to change the DSID without exporting and reimporting the data.
Goals:
Not in scope:
The distributed-system-id is used for three things: WAN receiver discovery, WAN conflict detection and PDX type ID generation.
When creating a gateway-sender users specify the remote-distributed-system-id property. That property which indicates which WAN site the sender should send to. Users configure the remote-locators property on their locator to point to locators in other WAN sites. Geode then will discover the DSID and addresses of remote gateway-receivers and connect the sender to those receivers.
Because WAN is asynchronous it's possible for two WAN sites to modify the same entry at the same time. For this case geode uses an eventual consistency model where all sites should end up choosing the same final value.
In order to detect that there is a conflict between updates from two different WAN sites geode compares the DSID of the two modifications. If the DSIDS are different geode then performs WAN specific conflict resolution, which involves comparing timestamps or invoking a user conflict resolver.
Geode passes the distributed-system-id along with events when doing a put over the WAN. See VersionTag.toData and GatewaySenderGFEBatchOpImpl for a couple of places where this is sent over the wire.
When serializing an object with PDX geode generates a PDXType which describes how to deserialize the object. Geode generates a unique ID for that type (PDXID) and embeds the ID in the serialized data. The PdxType is stored in the PDX type registry which is available everywhere, so that all members can find the PDXType using the PDXID.
In order to generate unique PDXIDs each WAN site embeds the DSID in PDXIDs generated in that site. That ensures two WAN sites never generate the same PDXID. PDXIDs are 4 bytes. The high byte is the DSID and the remaining 3 bytes are generated under a lock within a given WAN site.
Most of the public APIs treat DSID as an integer. So we could just increase the size of the DSID to be an integer. This would add an extra 3 bytes of overhead to every update message and to every entry in the system, since DSIDs are passed around and stored for WAN Conflict Detection. It would also add an extra 3 bytes for every serialized object. Because values stored in geode are often object graphs and not single objects, that means that a single serialized value may have a much larger overhead increase than 3 bytes.
The PDXID currently allocates 1 byte for the DSID and 3 bytes for the unique ID of the type. We could give the DSID an extra byte without changing the size of the PDXID. This would allow 64K DSIDs and 64K types.
Since the DSID is larger update messages and entries will need an extra byte of overhead, but serialized values would stay the same size.
Unfortunately there is a lot of existing data out there that has a 4 byte PDXID that was generated with the old format. So this proposal would still require all of the rolling upgrade work to translate between formats that the other options require.
This is the "if space is cheap" option. If we are willing to take on a bunch of extra overhead we could get some usability benefits by getting rid of the integer dsid entirely. Here's one way that might look:
For receiver discovery and conflict detection, we have to change the size of the DSID that is passed around in various messages - In the VersionTag, in GatewaySenderGFEBatchOpImpl. We also have to change the size of the DSID written to disk. When sending messages to old members or reading from old disk files we would need to translate the ID for all of these messages.
If a newer member is trying to send a DSID larger than a byte to an older member, it should throw an exception.
This means that a user would be able to upgrade all of the WAN sites/clients, etc. while still using the small DSIDs. Once that is done they could start introducing sites with larger DSIDs.
There are special concerns with changing the format of anything that is stored as a region value, because we pass around and persist values as opaque byte arrays. We don't keep track of the version of the member that generated the byte array.
One option for upgrading the type ID would be introduce a new DataSerializer code for PDX values with larger ids. PDX is really a subformat of DataSerializable. When a user stores a PDX object it is serialized using DataSerializable.writeObject. The actual bytes generated look something like this
93 | PDXID | pdx bytes |
The leading 93 is the DataSerializable constant that indicates that the following bytes are PDX (See DSCODE.PDX). So we could introduce a new code like DSCODE.PDX_2 that indicates PDX bytes with a different ID format. New members could use the old code if the PDXID is less than 255, and the new code if it is larger. In that way a user would be able to upgrade all of their sites, clients, etc. before introducing members with larger DSIDs.
If an older member was passed a value that was serialized using DSCODE.PDX_2, it would just fail to deserialize the value.