Table of Contents |
---|
Integration Test Support Matrix
...
- C++ has NullType, NullArray type
...
Yes
...
Yes
...
- TODO: Use strings for 64-bit integers
...
- TODO: Use strings for 64-bit integers
- Unsigned integers are being generated in the integration test suite (code), but Java's interpretation of them may be incorrect
...
- C++ supports HalfFloat container type / NumPy interop
...
Yes
...
- Java and JS have FSL container type, C++ does not yet
- Integration tests not being generated yet
...
- Q: What is JavaScript's level of support for this?
...
The following data types have one or more areas of work required:
- Not implemented in one or more reference implementations
- Implemented in different ways, or only supporting a subset of the desired specification
- Not being integration tested; binary compatibility between implementations not being validated
Interval / Timedelta
UPDATE (2019-05-16): This was resolved in ARROW-835 https://github.com/apache/arrow/commit/6f80ea4928f0d26ca175002f2e9f511962c8b012
Presently, the Interval
metadata type is as follows:
Code Block |
---|
enum IntervalUnit: short { YEAR_MONTH, DAY_TIME}
table Interval {
unit: IntervalUnit;
} |
The DAY_TIME
variety is a 64-bit compound type consisting of 4 bytes for int32 days and 4 bytes for int32
milliseconds. In principle, we want to be able to represent data that arises from the difference of timestamps, which support units from SECOND
to NANOSECOND
.
In PR https://github.com/apache/arrow/pull/920 for
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Code Block |
---|
table Interval {
/// The kind of interval, YEAR_MONTH or DAY_TIME
///
/// TODO(wesm): Should this be renamed to kind and change resolution to be
/// "unit" for consistency with the other temporal types?
unit: IntervalUnit;
/// The unit of time resolution for DAY_TIME. If null, assumed to be
/// milliseconds
resolution: TimeUnit = MILLISECOND;
} |
Additionally, the memory representation for the DAY_TIME
Interval would change to a 64-bit integer representing the number of size of the interval according to the set resolution
.
The only implementation with any interval types yet is Java so users of this will be impacted.
In March 2018 mailing list discussion, Jacques proposed
Code Block |
---|
- For interval, I'd like to propose moving to a single value
representation instead of a composite. I think that it is unlikely that
anyone needs a composite representation. If they do, they can compose their
own with the other primitives available. I believe this would look like:
- Interval Day to Seconds: 8 bytes representing number of
milliseconds.
- Interval Year to Months: 4 bytes representing number of months. |
Wes agreed while bringing up the request for a time unit, which would default to milliseconds per above.
Related Mailing List Discussions
- June 2016 Timestamps with different precision / Timedeltas
- October 2017 [DISCUSS] Expanding Arrow interval type metadata, changing Java memory representation
- March 2018 [DISCUSS] Arrow 1.0 Compatibility Issues: Union and Interval
Related PRs / JIRAs
- ARROW-270
Request for more generic interval typeJira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key ARROW-270 - ARROW-352 https://github.com/apache/arrow/pull/920
- ARROW-835
Supporting Timedelta64 from pandasJira server ASF JIRA serverId 5aa69414-a9e9-3523-82ec-879b028fb15b key ARROW-835
Map Type
Current Status
Map has been added as a logical type and defined in the Flatbuffer schema format with 1 field "keysSorted" which indicates if the child keys vector has been presorted. A Map is a nested type that is represented as List<entry: Struct<key: K, value: V>>.
- Need to agree on metadata representation.
Proposed Metadata Representation
- Same memory layout as List<entry: Struct<key: K, value: V>>. This is so implementations lacking Map can alias as repeated struct values.
- `Struct` and `K` fields are constrained to be non-nullable, other fields can be nullable
Sample JSON Metadata:
Code Block |
---|
{
"name" : "MapName",
"nullable" : true|false,
"type" : {
"name" : "map",
"keysSorted" : true|false
},
"children" : [{
"name" : "entry",
"nullable" : false,
"type" : {
"name" : "struct"
},
"children" : [{
"name" : "key",
"nullable" : false,
"type" : {
"name" : K
},
"children" : []
},{
"name" : "value",
"nullable" : true|false,
"type" : {
"name" : V
},
"children" : []
}]
}] |
Union Types
Related PRs / JIRAs
...
See https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 for the current status of integration testing and implementation across languages.