Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Value: a value (e.g. int, string, array, etc). All values in Impala are nullable internally.
  • Fixed-length data: the part of a value that is fixed in size (e.g. a 32-bit integer, the 32-bit length + 64-bit pointer representing a string)
  • Variable-length data: parts of a value that vary in length, e.g. string data, maps, arrays
  • Slot: an area of memory that holds the fixed-length part of a value (e.g. INT, STRING) if not null
  • Null indicators: a fixed-length bitstring that indicates whether slots are NULL
  • Tuple: an array of slots, plus null indicators
  • RowTupleRow: a logical row comprised of a number of values. A row is comprised of multiple tuples and represented as a fixed-length array of pointers to tuples. TupleRow of a ScanNode just has one tuple. TupleRow of a JoinNode has more tuples joined from the children nodes.
  • RowBatch: a batch of rows, plus information about memory resources referenced by the rows.
  • Operator/ExecNode: a physical query operator, e.g. aggregation, join, scan

...

Here is an example memory layout for an (INT, STRING, BIGINT, STRING) row that is comprised of two tuples. The data is:

int_colstring_col1bigint_colstring_col2
999"hello"NULLNULL
NULL"hell"12345"world"
NULL"hell"12345"world"
NULLNULLNULLNULL

The memory layout purposefully uses many features to illustrate how data can be shared between rows and tuples. Most batches have simpler layouts.

Gliffy Diagram
displayNameImpala RowBatch
nameImpala RowBatch
pagePin12

Constructing a RowBatch in a Scan

...