...
Code Block |
---|
// Hive version 0.11 through 0.14: hive --orcfiledump <location-of-orc-file> // Hive version 0.15 and later: hive --orcfiledump [-d] [--rowindex <col_ids>] <location-of-orc-file> // Hive version 1.2.0 and later: hive --orcfiledump [-d] [-t] [--rowindex <col_ids>] <location-of-orc-file> // Hive version 1.3.0 and later: hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex <col_ids>] [--recover] [--skip-dump] [--backup-path <new-path>] <location-of-orc-file-or-directory> |
Specifying -d
to in the command will cause it to dump the data in the ORC file data rather than the metadata (Hive 1.1.0 and later).
Specifying --rowindex
with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).
Specifying -t
to in the command will print the timezone id of the writer.
Specifying -j
to in the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata, add -p
to the command.
Specifying --recover
to in the command will recover a corrupted orc ORC file generated by hive Hive streaming.
Specifying --skip-dump
is used along along with --recover
to perform will perform recovery without dumping metadata.
Specifying --backup-path
to the command with new with a new-path will let the recovery tool to move the corrupted files to the specified backup path (default: /tmp).
<location-of-orc-file> is the URI of the ORC file.
...