Replication us= age

Examples

Version

The `EXPORT` and `IMPORT` commands were added in H= ive 0.8.0 (see HIVE-1918).

Replication extensions to the `EXPORT` and IMPORT commands were added in Hive 1.2.0 (see HIVE-7973 and Hive Replication Development).

Overview

The EXPORT command exports the data of a table or partition= , along with the metadata, into a specified output location. This output lo= cation can then be moved over to a different Hadoop or Hive instance and im= ported from there with the IMPORT command.

When exporting a partitioned table, the original data may be located in = different HDFS locations. The ability to export/import a subset of the part= ition is also supported.

Exported metadata is stored in the target directory, and data files are = stored in subdirectories.

The EXPORT and IMPORT commands work independen= tly of the source and target metastore DBMS used; for example, they can be = used between Derby and MySQL databases.

IMPORT will create target table/partition if it does not ex= ist. All the table properties/parameters will be that of table that w= as used in EXPORT to generate the archive. If target exi= sts, checks are performed that it has appropriate schema, Input/OutputForma= t, etc. If target table exists and is not partitioned, it must be emp= ty. If target table exists and is partitioned, partitions being impor= ted must not exist in the table.

Export Syntax

EXPORT TABLE =
tablename [PARTITION (part_column=3D"value"[, ...])]=20
  TO 'export_target_path' [ FOR replication('eventid') ]

Import Syntax

IMPORT [[EXTE=
RNAL] TABLE new_or_original_tablename [PARTITION (part_column=3D"value"[, .=
..])]]=20
  FROM 'source_path'=20
  [LOCATION 'import_target_path']

Replication usage
The `EXPORT` and `IMPORT` comm= ands behave slightly differently when used in the context of replication, a= nd are intended to be used by tools that perform replication between hive w= arehouses. In most cases, end users will not need to use this additional ta= g, except when doing a manual bootstrap of a replication-destination wareho= use so that an incremental replication tool can take over from that point.<= /p>
They make use of a special table property called "repl.last.id" in a tab= le or partition (depending on what object is being replicated) object to ma= ke sure that a replication export/import will only update objects if the up= date is newer than the object it affects. On the export end, it tags the re= plication export dump with an id that is monotonically increasing on the so= urce warehouse (incrementing each time there is a source warehouse metastor= e modification). In addition, an export that is tagged as being for replica= tion will not result in an error if it attempts to export an object which d= oes not currently exist. (This is because in the general flow of replicatio= n, it is quite possible that by the time an event is acted upon for replica= tion by an external tool, it is possible that the object has been removed, = and thus, should not halt the replication pipeline.)

On the import side, there is no syntax change, but import is run on an e= xport dump that was generated with the FOR REPLICATION tag, it will check t= he object it is replicating into if it exists. If that object already exist= s, it checks the repl.last.id property of that object to determine if what = is being imported is newer than the current state of the object in the dest= ination warehouse. If the update is newer, then it replaces the object with= the newer information. If the update is older than the object already in p= lace, the update is ignored, and causes no error.

For those using EXPORT for the first-time manual bootstrapping usecase, = users are recommended to use a " FOR replication('bootstrapping') " tag. (A= dvanced users note : The choice of "bootstrapping" here is arbitrary-ish, a= nd could just as well have been "foo". The real goal is to have a value suc= h that all further incremental replication ids will be greater than this or= iginal id. Thus, the integral value of this initial id should be 0, and thu= s, any string which does not contain numbers is acceptable. Having an initi= al tag be "123456", however, would be bad, as it could cause further update= s which have a repl.last.id < 123456 to not be applied.)

Examples

Simple export and import:

export table department to 'hdfs_exports_location/department';
import from 'hdfs_exports_location/department';

Rename table on import:

export table department to 'hdfs_exports_location/department';
import table imported_dept from 'hdfs_exports_location/department';

Export partition and import:

export table employee partition (emp_country=3D"in", emp_state=3D"ka")=
 to 'hdfs_exports_location/employee';
import from 'hdfs_exports_location/employee';

Export table and import partition:

export table employee to 'hdfs_exports_location/employee';
import table employee partition (emp_country=3D"us", emp_state=3D"tn") from=
 'hdfs_exports_location/employee';

Specify the import location:

export table department to 'hdfs_exports_location/department';
import table department from 'hdfs_exports_location/department'=20
       location 'import_target_location/department';

Import as an external table:

export table department to 'hdfs_exports_location/department';
import external table department from 'hdfs_exports_location/department';

LanguageManual ImportExport

Overview

Export Syntax

Import Syntax

Examples