Discussed in AVRO-248

Arguments in Favor

  • Anonymous unions make reuse difficult (AVRO-266)
  • Other serialization systems support names for unions and branches, arrays

Proposal

{ "type": "union", "name": "Foo", "branches": ["string", "Bar", ... ] }

Language APIs

For Java, code is generated for a union, a class could be generated that includes an enum indicating which branch of the union is taken, e.g., a union of string and int named Foo might cause a Java class like

      public class Foo {
        public static enum Type {STRING, INT};
        private Type type;
        private Object datum;
        public Type getType();
        public String getString() { if (type==STRING) return (String)datum; else throw ... }
        public void setString(String s) { type = STRING;  datum = s; }
        ....
      }

Then Java applications can easily use a switch statement to process union values rather than using instanceof.

  • when using reflection, an abstract class with a set of concrete implementations can be represented as a union (AVRO-241). However, if one wishes to create an array one must know the name of the base class, which is not represented in the Avro schema. One approach would be to add an annotation to the reflected array schema (AVRO-242) noting the base class. But if the union itself were named, that could name the base class. This would also make reflected protocol interfaces more consise, since the base class name could be used in parameters return types and fields.
  • Generalizing the above: Avro lacks class inheritance, unions are a way to model inheritance, and this model is more useful if the union is named.
  • No labels