...
Code Block |
---|
// Hive version 0.11 through 0.14: hive --orcfiledump <location-of-orc-file> // Hive version 0.15 and later: hive --orcfiledump [-d] [--rowindex <col_ids>] <location-of-orc-file> // Hive version 1.2.0 and later: hive --orcfiledump [-d] [-t] [--rowindex <col_ids>] <location-of-orc-file> // Hive version 1.3.0 and later: hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex <col_ids>] [--recover] [--skip-dump] [--backup-path <new-path>] <location-of-orc-file>file-or-directory> |
Specifying Adding -d
to the command will cause it to dump the data in the ORC file rather than the metadata (Hive 1.1.0 and later).
Adding Specifying --rowindex
with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).
Adding Specifying -t
to the command will print the timezone id of the writer.
Adding Specifying -j
to the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata add -p
to the command.
Specifying --recover
to the command will recover corrupted orc file generated by hive streaming
Specifying --skip-dump
is used along with --recover
to perform recovery without dumping metadata
Specifying --backup-path
to the command with new path will let recovery tool to move the corrupted files to the specified backup path (default: /tmp)
<location-of-orc-file> is the URI of the ORC file.
<location-of-orc-file-or-directory> is the URI of the ORC file or directory. From From Hive 1.3.0 onward onward, this URI can be a directory containing ORC files.
...