...
- Value: a value (e.g. int, string, array, etc). All values in Impala are nullable internally.
- Fixed-length data: the part of a value that is fixed in size (e.g. a 32-bit integer, the 32-bit length + 64-bit pointer representing a string)
- Variable-length data: parts of a value that vary in length, e.g. string data, maps, arrays
- Slot: an area of memory that holds the fixed-length part of a value (e.g. INT, STRING) if not null
- Null indicators: a fixed-length bitstring that indicates whether slots are NULL
- Tuple: an array of slots, plus null indicators
- RowTupleRow: a logical row comprised of a number of values. A row is comprised of multiple tuples and represented as a fixed-length array of pointers to tuples. TupleRow of a ScanNode just has one tuple. TupleRow of a JoinNode has more tuples joined from the children nodes.
- RowBatch: a batch of rows, plus information about memory resources referenced by the rows.
- Operator/ExecNode: a physical query operator, e.g. aggregation, join, scan
...
Here is an example memory layout for an (INT, STRING, BIGINT, STRING) row that is comprised of two tuples. The data is:
int_col | string_col1 | bigint_col | string_col2 |
---|---|---|---|
999 | "hello" | NULL | NULL |
NULL | "hell" | 12345 | "world" |
NULL | "hell" | 12345 | "world" |
NULL | NULL | NULL | NULL |
The memory layout purposefully uses many features to illustrate how data can be shared between rows and tuples. Most batches have simpler layouts.
Gliffy Diagram | ||||||
---|---|---|---|---|---|---|
|
Constructing a RowBatch in a Scan
...