Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
// Hive version 0.11 through 0.14:
hive --orcfiledump <location-of-orc-file>
 
// Hive version 0.15 and later:
hive --orcfiledump [-d] [--rowindex <col_ids>] <location-of-orc-file>
 
// Hive version 1.2.0 and later:
hive --orcfiledump [-d] [-t] [--rowindex <col_ids>] <location-of-orc-file>
 
// Hive version 1.3.0 and later:
hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex <col_ids>] [--recover] [--skip-dump] [--backup-path <new-path>] <location-of-orc-file>file-or-directory>

Specifying Adding -d to the command will cause it to dump the data in the ORC file rather than the metadata (Hive 1.1.0 and later).

Adding Specifying --rowindex with a comma separated list of column ids will cause it to print row indexes for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive 1.1.0 and later).

Adding Specifying -t to the command will print the timezone id of the writer.

Adding Specifying -j to the command will print the ORC file metadata in JSON format. To pretty print the JSON metadata add -p to the command.

Specifying --recover to the command will recover corrupted orc file generated by hive streaming

Specifying --skip-dump is used along with --recover to perform recovery without dumping metadata

Specifying --backup-path to the command with new path will let recovery tool to move the corrupted files to the specified backup path (default: /tmp)

<location-of-orc-file> is the URI of the ORC file.

<location-of-orc-file-or-directory> is the URI of the ORC file or directory. From  From Hive 1.3.0 onward onward, this URI can be a directory containing ORC files.

...