Date: Tue, 19 Mar 2024 08:15:17 +0000 (UTC) Message-ID: <805150518.55991.1710836117094@cwiki-he-fi.apache.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_55990_1751819558.1710836117093" ------=_Part_55990_1751819558.1710836117093 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
Version
The EXPORT
and IMPORT
commands were added in H=
ive 0.8.0 (see HIVE-1918).
Replication extensions to the EXPORT
and IMPORT
commands were added in Hive 1.2.0 (see HIVE-7973 and Hive Replication Development).
The EXPORT
command exports the data of a table or partition=
, along with the metadata, into a specified output location. This output lo=
cation can then be moved over to a different Hadoop or Hive instance and im=
ported from there with the IMPORT
command.
When exporting a partitioned table, the original data may be located in = different HDFS locations. The ability to export/import a subset of the part= ition is also supported.
Exported metadata is stored in the target directory, and data files are = stored in subdirectories.
The EXPORT
and IMPORT
commands work independen=
tly of the source and target metastore DBMS used; for example, they can be =
used between Derby and MySQL databases.
IMPORT
will create target table/partition if it does not ex=
ist. All the table properties/parameters will be that of table that w=
as used in EXPORT
to generate the archive. If target exi=
sts, checks are performed that it has appropriate schema, Input/OutputForma=
t, etc. If target table exists and is not partitioned, it must be emp=
ty. If target table exists and is partitioned, partitions being impor=
ted must not exist in the table.
EXPORT TABLE = tablename [PARTITION (part_column=3D"value"[, ...])]=20 TO 'export_target_path' [ FOR replication('eventid') ]
IMPORT [[EXTE= RNAL] TABLE new_or_original_tablename [PARTITION (part_column=3D"value"[, .= ..])]]=20 FROM 'source_path'=20 [LOCATION 'import_target_path']
The EXPORT
and IMPORT
comm=
ands behave slightly differently when used in the context of replication, a=
nd are intended to be used by tools that perform replication between hive w=
arehouses. In most cases, end users will not need to use this additional ta=
g, except when doing a manual bootstrap of a replication-destination wareho=
use so that an incremental replication tool can take over from that point.<=
/p>
They make use of a special table property called "repl.last.id" in a tab= le or partition (depending on what object is being replicated) object to ma= ke sure that a replication export/import will only update objects if the up= date is newer than the object it affects. On the export end, it tags the re= plication export dump with an id that is monotonically increasing on the so= urce warehouse (incrementing each time there is a source warehouse metastor= e modification). In addition, an export that is tagged as being for replica= tion will not result in an error if it attempts to export an object which d= oes not currently exist. (This is because in the general flow of replicatio= n, it is quite possible that by the time an event is acted upon for replica= tion by an external tool, it is possible that the object has been removed, = and thus, should not halt the replication pipeline.)
On the import side, there is no syntax change, but import is run on an e= xport dump that was generated with the FOR REPLICATION tag, it will check t= he object it is replicating into if it exists. If that object already exist= s, it checks the repl.last.id property of that object to determine if what = is being imported is newer than the current state of the object in the dest= ination warehouse. If the update is newer, then it replaces the object with= the newer information. If the update is older than the object already in p= lace, the update is ignored, and causes no error.
For those using EXPORT for the first-time manual bootstrapping usecase, = users are recommended to use a " FOR replication('bootstrapping') " tag. (A= dvanced users note : The choice of "bootstrapping" here is arbitrary-ish, a= nd could just as well have been "foo". The real goal is to have a value suc= h that all further incremental replication ids will be greater than this or= iginal id. Thus, the integral value of this initial id should be 0, and thu= s, any string which does not contain numbers is acceptable. Having an initi= al tag be "123456", however, would be bad, as it could cause further update= s which have a repl.last.id < 123456 to not be applied.)
Simple export and import:
export table department to 'hdfs_exports_location/department'; import from 'hdfs_exports_location/department';
Rename table on import:
export table department to 'hdfs_exports_location/department'; import table imported_dept from 'hdfs_exports_location/department';
Export partition and import:
export table employee partition (emp_country=3D"in", emp_state=3D"ka")= to 'hdfs_exports_location/employee'; import from 'hdfs_exports_location/employee';
Export table and import partition:
export table employee to 'hdfs_exports_location/employee'; import table employee partition (emp_country=3D"us", emp_state=3D"tn") from= 'hdfs_exports_location/employee';
Specify the import location:
export table department to 'hdfs_exports_location/department'; import table department from 'hdfs_exports_location/department'=20 location 'import_target_location/department';
Import as an external table:
export table department to 'hdfs_exports_location/department'; import external table department from 'hdfs_exports_location/department';