DSF ID Loader
To be Reviewed By: October 8, 2021
Authors: Jens Deppe
Status: Draft | Discussion | Active | Dropped | Superseded
Superseded by: N/A
Related: N/A
Problem
To facilitate backwards compatibility, Geode has a concept known as Data Serializable Fixed ID (DSFID) which allows serializable objects to evolve over time using an API that allows newer versions of the classes to still be deserialized by older versions of Geode. This is a core capability of Geode that is primarily intended for internal classes but is also utilized by various modules, notably WAN, Lucene and Redis.
To use DSFIDs, a class needs to register itself so that it is known by the serialization framework. Many core classes register in DSFIDFactory
. The modules, mentioned before, register during their service loader discovery and initialization phase. Under some situations, however, Geode will require knowledge of the DSFID very early on during member startup. If a member receives a message with an unknown DSFID it will throw a DSFIDNotFoundException
. Without introducing module circular dependencies, there is no way for a module to register its required DSFIDs early enough to avoid this issue.
Anti-Goals
None identified.
Solution
The proposed solution has 2 parts:
Currently, the DSFIDSerializer
interface exposes a registerDSFID(int dsfid, Class<?> dsfidClass)
method. This method will be moved into a new interface ( DataSerializerFixedIdRegistrant
) and DSFIDSerialzer
will extend this interface:
interface DataSerializeableFixedIdRegistry { register(int id, Class<? extends DataSerializableFixedId> clazz); }
This will reduce the API surface area required by the following new service provider interface which would perform the actual Fixed Id registration:
interface DataSerializeableFixedIdRegistrant { register(DataSerializableFixedIdRegistry registry); }
This SPI will be initialized as part of the static class initialization already existing in InternalDataSerializer
.
In order to use this, a module will need to:
- Include a provider configuration resource file:
resources/META-INF/services/org.apache.geode.internal.serialization.DataSerializabledFixedIdRegistrant
- This file will contain one or more fully qualified class names which implement the above interface
The result will be that DSFIDs will be registered and available when the core serialization framework is instantiated.
Error Conditions
Currently, an IllegalArgumentException
is thrown, during registration, if the given class does not have a zero-argument constructor. This should be expanded to also produce an IllegalArgumentException
if a registration is attempted for a previously registered ID.
Changes and Additions to Public Interfaces
No changes to public interfaces.
Performance Impact
No anticipated performance impacts.
Backwards Compatibility and Upgrade Path
No backwards compatibility impact.
Prior Art
N/A
FAQ
None yet.
Errata
None yet.
15 Comments
Anthony Baker
Nice! This has always been a sticky point for extending the known set of core data types.
Would it make sense to include a section on discarded alternatives? In particular I'm wondering about providing a stronger module lifecycle interface where these kinds of registrations could be handled. I know we will get there eventually but it could be too big of a step until we have completed the modules support that Udo/Patrick are tackling.
Darrel Schneider
I think DSFIDSerializer is too wide an interface for what DSFIDLoader needs. Could you create a new interface that only has " void registerDSFID(int dsfid, Class<?> dsfidClass);" on it? I think that is all DSFIDLoader needs.
Jacob Barrett
I agree. SPIs should try to limit the surface area exposed outward into the implementation.
Kirk Lund
I really want to see us steer away from using acronyms and abbreviations. Let's call this DataSerializableFixedIdLoader. Our coding standard has always supposed to be "Id" instead of "ID", but examples of "ID" slipped into the code over the years. Let's stick to "Id" everywhere we can. We also don't really need to have "DSFID" on everything, so I'd recommend simplifying the callback to "register". Some day (not necessarily part of this change), I really want to rename DSFIDSerializer as well to get rid of "DSFID".
public
interface DataSerializableFixedIdLoader
{
void
register(DSFIDSerializer serializer);
}
Jacob Barrett
Given that this interface isn't actually "loading" anything I feel like the "loader" part of the name is misleading. Something like
DfsidRegistrant
might be more applicable.Kirk Lund
Or maybe "DataSerializableFixedIdRegistration"?
Jacob Barrett
Perhaps we should avoid the abbreviation of data serializable fixed ID entirely. What about
DataSerializeableFixedIdResgistrant
?We can read the fixed ID from a
DataSerializableFixedId
implementation.For that matter we could do away with
DataSerializeableFixedIdResgistrant
completely and useDataSerializableFixedId
as the loader interface and just invoke thegetDsfid()
method.Jens Deppe
I don't think that will be possible since we're registering classes and not actual instances of those classes (which would allow for calling
getDsfid()
).Jacob Barrett
You are correct.
I will revise back to my original then:
Jacob Barrett
Spitballing... We could eliminate the the potential disconnect between the
DataSerializableFixedId
classes ID and the ID theDataSerializableFixedIdRegistrant
knows about by using annotations. An annotation on theDataSerialiazableFixedId
implementation could provide the ID without the need to instantiate it.Jacob Barrett
We should also define rules around how duplicate registration attempts should behave. I think they should throw an appropriate exception unique to duplication. Should such an event be fatal?
We should also define what should happen in the classes is not
DataSerializableFixedId
.Patrick Johnson
This idea seems pretty reasonable, and shouldn't be an issue with the work Udo and I are doing around modularity. We'll just have to treat geode-serialization similarly to how we treat modules that contain implementations of CacheService.
Jens Deppe
I've updated the RFC with the list of suggestions made up to this point.
Udo Kohlmeyer
On the high-level the RFC looks ok. But I have some questions about the implementation:
CacheService
initialization step or is this another step that needs to happen in addition to the initialization of theCacheService
?CacheService
initialization, what happens if either of these fail?CacheService
initialization, would it make sense to have the DataSerializable?DataSerializableFixedID
interface? The core system should not know about non-core extensions (like Lucene, Redis, Memcache) fixed IDs. These should ONLY be specified within the extension itself and provided to the registry service.register(Class<? extends DataSerializableFixedId> clazz)
where the cluster assigns and allocates the fixed IDs. One could still have "core" operations have fixed IDs, but for non-core extensions, the allocation should be dynamic. For example, how to handle a situation where the Memcache and Redis integrations happen to choose to use the same ID ranges.I know that #3 is a stretch goal, but we HAVE to resolve #1 and #2 for this RFC to be viable at all.
Jens Deppe
Thanks for your comments Udo. In response:
CacheService
initialization but will happen statically as part of the JVM startup - tied into the static initialization of {{InternalDataSerializer}}. Currently, any registration failures will cause the cache not to start up.