Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Removed outdated information on format implementation and integration testing

Table of Contents

Integration Test Support Matrix

...

  • C++ has NullType, NullArray type

...

Yes

...

Yes

...

  • TODO: Use strings for 64-bit integers

...

  • TODO: Use strings for 64-bit integers
  • Unsigned integers are being generated in the integration test suite (code), but Java's interpretation of them may be incorrect

...

  • C++ supports HalfFloat container type / NumPy interop

...

Yes

...

  • Java and JS have FSL container type, C++ does not yet
  • Integration tests not being generated yet

...

  • Q: What is JavaScript's level of support for this?

...

The following data types have one or more areas of work required:

  • Not implemented in one or more reference implementations
  • Implemented in different ways, or only supporting a subset of the desired specification
  • Not being integration tested; binary compatibility between implementations not being validated

Interval / Timedelta

UPDATE (2019-05-16): This was resolved in ARROW-835 https://github.com/apache/arrow/commit/6f80ea4928f0d26ca175002f2e9f511962c8b012

Presently, the Interval metadata type is as follows:

Code Block
enum IntervalUnit: short { YEAR_MONTH, DAY_TIME}

table Interval {
  unit: IntervalUnit;
}

The DAY_TIME variety is a 64-bit compound type consisting of 4 bytes for int32 days  and 4 bytes for int32 milliseconds. In principle, we want to be able to represent data that arises from the difference of timestamps, which support units from SECOND to NANOSECOND.

In PR https://github.com/apache/arrow/pull/920 for 

Jira
serverASF JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyARROW-352
 Wes proposed to augment the type to:

Code Block
table Interval {
  /// The kind of interval, YEAR_MONTH or DAY_TIME
  ///
  /// TODO(wesm): Should this be renamed to kind and change resolution to be
  /// "unit" for consistency with the other temporal types?
  unit: IntervalUnit;

  /// The unit of time resolution for DAY_TIME. If null, assumed to be
  /// milliseconds
  resolution: TimeUnit = MILLISECOND;
 }

Additionally, the memory representation for the DAY_TIME Interval would change to a 64-bit integer representing the number of size of the interval according to the set resolution.

The only implementation with any interval types yet is Java so users of this will be impacted.

In March 2018 mailing list discussion, Jacques proposed

Code Block
- For interval, I'd like to propose moving to a single value
representation instead of a composite. I think that it is unlikely that
anyone needs a composite representation. If they do, they can compose their
own with the other primitives available. I believe this would look like:
- Interval Day to Seconds: 8 bytes representing number of
milliseconds.
- Interval Year to Months: 4 bytes representing number of months.

Wes agreed while bringing up the request for a time unit, which would default to milliseconds per above.

Related Mailing List Discussions

Related PRs / JIRAs

  • ARROW-270 
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyARROW-270
     Request for more generic interval type
  • ARROW-352 https://github.com/apache/arrow/pull/920
  • ARROW-835 
    Jira
    serverASF JIRA
    serverId5aa69414-a9e9-3523-82ec-879b028fb15b
    keyARROW-835
     Supporting Timedelta64 from pandas

Map Type

Current Status

Map has been added as a logical type and defined in the Flatbuffer schema format with 1 field "keysSorted" which indicates if the child keys vector has been presorted. A Map is a nested type that is represented as List<entry: Struct<key: K, value: V>>. 

  •  Need to agree on metadata representation.
Proposed Metadata Representation
  • Same memory layout as List<entry: Struct<key: K, value: V>>. This is so implementations lacking Map can alias as repeated struct values.
  • `Struct` and `K` fields are constrained to be non-nullable, other fields can be nullable

Sample JSON Metadata:

Code Block
{

"name" : "MapName",
"nullable" : true|false,
"type" : {
    "name" : "map",
    "keysSorted" : true|false
},
"children" : [{
    "name" : "entry",
    "nullable" : false,
    "type" : {
        "name" : "struct"
    },
    "children" : [{
        "name" : "key",
        "nullable" : false,
        "type" : {
            "name" : K
        },
        "children" : []
    },{
        "name" : "value",
        "nullable" : true|false,
        "type" : {
            "name" : V
        },
        "children" : []
    }]
}]

Union Types

Related PRs / JIRAs

...

See https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 for the current status of integration testing and implementation across languages.