Page tree
Skip to end of metadata
Go to start of metadata


Integration Test Support Matrix

Logical TypeJavaC++JavaScriptNotes
NullNoNoNo
  • C++ has NullType, NullArray type
BooleanYesYesYes
Signed Integers

Yes

Yes

Yes
  • TODO: Use strings for 64-bit integers
Unsigned IntegersNoYesYes (question)
  • TODO: Use strings for 64-bit integers
  • Unsigned integers are being generated in the integration test suite (code), but Java's interpretation of them may be incorrect
Half Precision FPNoNoNo
  • C++ supports HalfFloat container type / NumPy interop
Single Precision FP

Yes

YesYes
Double Precision FPYesYesYes
Variable BinaryYesYesYes
Variable String (UTF8)YesYesYes
Fixed Size BinaryYesYesYes
Variable ListYesYesYes
Fixed Size ListNoNoNo
  • Java and JS have FSL container type, C++ does not yet
  • Integration tests not being generated yet
Decimal 128-bitYesYesYes (question)
  • Q: What is JavaScript's level of support for this?
TimestampYesYesYes (question)
Date (32/64-bit varieties)YesYesYes (question)
TimeYesYesYes (question)
Interval MONTH_DAYNoNoNoSee discussion below
Interval DAY_TIME (Timedelta)NoNoNoSee discussion below
Dictionary-encoded TypesYesYesYes
StructYesYesYes
Dense UnionNoNoNoSee discussion below
Sparse UnionNoNoNoSee discussion below
MapNoNoNo


Incomplete Logical Types

The following data types have one or more areas of work required:

  • Not implemented in one or more reference implementations
  • Implemented in different ways, or only supporting a subset of the desired specification
  • Not being integration tested; binary compatibility between implementations not being validated

Interval / Timedelta

Presently, the Interval metadata type is as follows:

enum IntervalUnit: short { YEAR_MONTH, DAY_TIME}

table Interval {
  unit: IntervalUnit;
}

The DAY_TIME variety is a 64-bit compound type consisting of 4 bytes for int32 days  and 4 bytes for int32 milliseconds. In principle, we want to be able to represent data that arises from the difference of timestamps, which support units from SECOND to NANOSECOND.

In PR https://github.com/apache/arrow/pull/920 for  ARROW-352 - Getting issue details... STATUS  Wes proposed to augment the type to:

table Interval {
  /// The kind of interval, YEAR_MONTH or DAY_TIME
  ///
  /// TODO(wesm): Should this be renamed to kind and change resolution to be
  /// "unit" for consistency with the other temporal types?
  unit: IntervalUnit;

  /// The unit of time resolution for DAY_TIME. If null, assumed to be
  /// milliseconds
  resolution: TimeUnit = MILLISECOND;
 }

Additionally, the memory representation for the DAY_TIME Interval would change to a 64-bit integer representing the number of size of the interval according to the set resolution.

The only implementation with any interval types yet is Java so users of this will be impacted.

In March 2018 mailing list discussion, Jacques proposed

- For interval, I'd like to propose moving to a single value
representation instead of a composite. I think that it is unlikely that
anyone needs a composite representation. If they do, they can compose their
own with the other primitives available. I believe this would look like:
- Interval Day to Seconds: 8 bytes representing number of
milliseconds.
- Interval Year to Months: 4 bytes representing number of months.

Wes agreed while bringing up the request for a time unit, which would default to milliseconds per above.

Related Mailing List Discussions

Related PRs / JIRAs

Map Type

Current Status

Map has been added as a logical type and defined in the Flatbuffer schema format with 1 field "keysSorted" which indicates if the child keys vector has been presorted. A Map is a nested type that is represented as List<entry: Struct<key: K, value: V>>. 

  • Need to agree on metadata representation.
Proposed Metadata Representation
  • Same memory layout as List<entry: Struct<key: K, value: V>>. This is so implementations lacking Map can alias as repeated struct values.
  • `Struct` and `K` fields are constrained to be non-nullable, other fields can be nullable

Sample JSON Metadata:


{

"name" : "MapName",
"nullable" : true|false,
"type" : {
    "name" : "map",
    "keysSorted" : true|false
},
"children" : [{
    "name" : "entry",
    "nullable" : false,
    "type" : {
        "name" : "struct"
    },
    "children" : [{
        "name" : "key",
        "nullable" : false,
        "type" : {
            "name" : K
        },
        "children" : []
    },{
        "name" : "value",
        "nullable" : true|false,
        "type" : {
            "name" : V
        },
        "children" : []
    }]
}]


Union Types

Related PRs / JIRAs

  • No labels