Trafodion offline backup and restore operation are performed using HBase snapshot feature. Snapshots are a light weight way to keep the current state of a table without copying the data. After making changes to a table or in case of recovery, restoring the snapshot gives the previous state of the table. More on snapshots can be found at: http://blog.cloudera.com/blog/2013/03/introduction-to-apache-hbase-snapshots/.
Snapshots are taken for all the Trafodion tables including the Trafodion metadata tables and then exported to a HDFS location. When we need to restore the backed files from the HDFS location the snapshots are imported to the target system and then restored. The process is as follows:
Trafodion tables are created in HBase with names that have a format like ‘TRAFODION.<schema-name>.<table-name>’. To back up all Trafodion tables, the backup tool selects and backs up all table that have a name pattern like TRAFODION.<any-schema-name>.<any-table-name>. Besides user tables, metadata tables (TRAFODION._MD_.<table-name>) and repository tables (TRAFODION._REPOS_.<table-name>) and transaction related tables (TRAFODION._MD_.<table-name>) are also backed up.
The backup and restore tools can be used in both the development and clustered environment with either Cloudera or Horton Works distributions. To run the backup and restore scripts in a clustered environment, the user running the scripts needs to have sudo access to be able to run commands as HBase user (user under which HBase server runs), HDFS user and Trafodion user without requiring to enter a password. However sudo access is not needed when the scripts are run in the development environment. In a clustered environment one possibility is that root or some admin type id is used to run backup and restore. An alternative is that trafodion user is given access to run any command under the HBase or HDFS user id with a password prompt. If this acceptable from a security perspective it can be achieved by adding this line to /etc/sudoers
trafodion ALL=(hbase) NOPASSWD: ALL, (hdfs) NOPASSWD: ALL, (trafodion) NOPASSWD: ALL
Both backup and restore use the TrafExportSnapshot which requires read and write access to these locations:
The ./run_full_trafodion_backup.sh script performs full offline backup of all trafodion tables and copies the backup files to an HDFS location.
The command to use the script is as follows:
./run_full_trafodion_backup.sh -b backup_folder -u trafodion_user -h HBase_user -d hdfs_user -m mappers -l 10 -n -o Where: -b backup_folder (Optional) HDFS path where all the Trafodion object are exported and saved The HDFS path needs to have a format like hdfs://<host>:<port>/<folder>/... If the path is not provided the script generates a path with a format like hdfs://<host>:<port>/trafodion-backlups/backup_<timestamp> and unless -n is specified the user is asked whether to confirm the use of the generated path. -u trafodion user (Optional) The user under which Trafodion server runs. If not provided and if -n option is not specified the user is asked whether the default trafodion user 'trafodion' can be used or not. If the answer is yes then the default trafodion user is used otherwise the script exits. -h hbase user (Optional) The user under which HBase server runs. If not provided the script tries to compute it and if it does not succeed it considers using the default HBase user 'hbase'. Unless the -n option is specified, the user is asked to confirm the selection afterwards. -d hdfs user (Optional) The user under which HDFS server runs. If not provided the script tries to compute it and if it does not succeed it considers using the default HDFS user 'hdfs'. Unless the -n option is specified, the user is asked to confirm the selection afterwards.''
-m mappers
(Optional) Number of mappers. If unspecified or 0, each snapshot will use a number suitable for its size.
-n
(Optional) Non interactive mode. With this option the script does not prompt
the user to confirm the use of computed or default values when a parameter
like trafodion user, hbase user, hdfs user or backup path is not provided.
-o
(Optional) offline. With this option trafodion will not be restarted after
snapshots are taken.
-l
(Optional) Snapshot size limit in MB above which map reduce is used for copy. Snapshots with size below this value
will be copied using HDFS FileUtil.copy. Default value is 100 MB. FileUtil.copy is invoked through a class provided
by Trafodion. Use 0 for this option to use HBase' ExportSnaphot class instead.
Example: ./run_full_trafodion_backup.sh -b hdfs://<host>:<port>/<hdfs-path> -n
The backup script performs the following checks before starting the actual backup:
The ./run_full_trafodion_restore.sh script performs full offline restore of all trafodion tables from a HDFS location
The command to use the script is as follows:
./run_full_trafodion_restore.sh -b backup_folder -b backup_dir -u trafodion_user -h hbase_user -m mappers -l 10 -n
Where: -b backup_folder (Not Optional) HDFS path where all the Trafodion object are exported and saved The HDFS path needs to have a format like hdfs://<host>:<port>/<folder>/... -u trafodion user (Optional) The user under which Trafodion server runs. If not provided and if -n option is not specified the user is asked whether the default trafodion user 'trafodion' can be used or not. If the answer is yes then the default trafodion user is used otherwise the script exits. -h hbase user (Optional) The user under which HBase server runs. If not provided the script tries to compute it and if it does not succeed it considers using the default hbase user 'hbase'. Unless the -n option is specified, the user is asked to confirm the selection afterwards. -d hdfs user (Optional) The user under which HDFS server runs. If not provided the script tries to compute it and if it does not succeed it considers using the default HDFS user 'hdfs'. Unless the -n option is specified, the user is asked to confirm the selection afterwards. -m mappers
(Optional) Number of mappers. If unspecified or 0, each snapshot will use a number suitable for its size.
-l
(Optional) Snapshot size limit in MB above which map reduce is used for copy. Snapshots with size below this value
will be copied using HDFS FileUtil.copy. Default value is 100 MB. FileUtil.copy is invoked through a class provided
by Trafodion. Use 0 for this option to use HBase' ExportSnaphot class instead.
-n (Optional) Non interactive mode. With this option the script does not prompt the user to confirm the use of computed or default values when a parameter like trafodion user, HBase user or hdfs user is not provided.
Example: ./run_full_trafodion_restore.sh -b hdfs://<host>:<port>/<hdfs-path> -n
The restore scripts performs the following checks before starting the actual restore:
The backup and restore scripts use a set of shared scripts and functions that are described below:
To run the backup on cluster with Cloudera or Hortonworks. we can run:
./run_full_trafodion_backup.sh -b hdfs://<host>:8020/bulkload/backup -n Where hdfs://<host>:8020/bulkload/backup is the hdfs location where we want the backed up snapshot to reside in this example.
Restore was tested with Cloudera and Horton Works distributions. We can use something like:
./run_full_trafodion_restore.sh -b /bulkload/backup -n
For TrafExportSnapshot to work in the development environment, we need to copy few jar files under $MY_SQROOT/sql/local_hadoop/hbase/lib directory. Otherwise we get exceptions errors when running the ExportSnapshot MapReduce job. These workaround were figured out while testing on the workstations and there could be other ways to avoid the exceptions.
In a shell window and after sourcing in sqenv.sh, we need to run below commands once prior to running backup and/or restore.
cd $MY_SQROOT/sql/local_hadoop cp ./hadoop-2.4.0/share/hadoop/yarn/*.jar ./hbase/lib cp ./hadoop-2.4.0/share/hadoop/mapreduce/hadoop-mapreduce-client*.jar ./hbase/lib/
To perform a backup on the development environment we need to determine the hdfs port first (workstations don't always use standard port). We can get the port number from the core-site.xml configuration file located under $MY_SQROOT/sql/local_hadoop/hadoop/etc/hadoop/. In this example, it's 28400:
bash-4.1$ cat $MY_SQROOT/sql/local_hadoop/hadoop/etc/hadoop/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:28400</value> </property>
To run the backup using the port number from above, the command is:
./run_full_trafodion_backup.sh -b hdfs://localhost:<port-number>/bulkload/backup -n
To run the restore:
run_full_trafodion_restore.sh -b hdfs://localhost:<port-number>/bulkload/backup -n
The test cases that were performed so far include:
All Trafodion tables created in HBase 0.98 were backed up and exported to a hdfs location. Then all the Trafodion tables were dropped. Then the backup files were restored them from the backup location to the same HBase .98. This type of tests was done on:
The steps involved in these tests are:
This types of tests was done:
While working on this test the main issues we faced are summarized below: