Page History

Versions Compared

Old Version 5

changes.mady.by.user Shao Feng Shi

Saved on Feb 21, 2020

compared with

New Version Current

changes.mady.by.user Wang Rupeng

Saved on May 28, 2020

Key

This line was added.
This line was removed.
Formatting was changed.

...

Parquet file schema:
    1:           OPTIONAL INT64 R:0 D:1
    2:           REQUIRED INT64DOUBLE R:0 D:0
    3:           OPTIONAL INT64 R:0 D:1
    110000:      OPTIONAL INT64 R:0 D:1
    110001:      OPTIONAL INT64 R:0 D:1

"REQUIRED" and "OPTIONAL" correspond to "nullable" in database system.
Parquet data type includes BOOLEAN, INT32, INT64, INT96, FLOAT, DOUBLE and BYTE_ARRAY. The data with string type in hive will be stored as BYTE_ARRAY in parquet.(ShaofengShi: Why the column "2" is "REQUIRED" but others are "OPTIONAL", please explain. And why their data types are all "INT64"? )

How to deal with the order of dimension and measure
- In a parquet file, the order of the columns is always dimension first and measure last
- There is no order between dimensions and between measures
Parquet file split
- parquet.block.size default 128mb
- (ShaofengShi: How many row groups in a parquet file?)

...

Type	Spark	Parquet
Numeric types	ByteType	INT32
Numeric types	ShortType	INT32
Numeric types	IntegerType	INT32
Numeric types	LongType	INT64
Numeric types	FloatType	FLOAT
Numeric types	DoubleType	DOUBLE
Numeric types	DecimalType	INT32，INT64，BinaryType，FIXED_LEN_BYTE_ARRAY
String type	StringTypeBinary	BYTE_ARRAY
Binary type	BinaryTypeBinary	BYTE_ARRAY
Boolean type	BooleanType	BOOLEAN
Datetime type	TimestampType	INT96
Datetime type	DateType	INT32

How computed columns are stored
- Bitmap: Binary BYTE_ARRAY
- TopN: Binary BYTE_ARRAY

5. How to build Cube into Parquet

...

What are the optimizations of Kylin reading parquet data？
- Segment Pruning
- Shard by
- Parquet page index
- Project Pushdown
- Predicate Pushdown

7. Performance

...

Build

Use TPCH as the dataset to remember the test (ShaofengShi: What's the cluster configuration? What's the model/cube design? )
The detailed data is as follows
Image Removed