Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The class HiveEndPoint describes a Hive End Point to connect to. This describes the database, table and partition names. Invoking the newConnection method on it establishes a connection to the Hive MetaStore for streaming purposes. It returns a StreamingConnection object. Multiple connections can be established on the same endpoint. StreamingConnection can then be used to initiate new transactions for performing I/O.

...

For each transaction in the TxnBatch, the application calls beginNextTransactionwrite, and then commit or abort as appropriate. See the Javadoc for details. All records in a single Transaction Batch will go to the same bucket.  A Transaction cannot include data from more than one partition.

Transactions in a TransactionBatch are eventually expired by the Metastore if not committed or aborted after hive.txn.timeout secs. TrasnactionBatch class provides a heartbeat() method to prolong the lifetime of unused transactions in the batch.  A good rule of thumb is to send call heartbeat() at (hive.txn.timeout/2) intervals after creating a TransactionBatch.  This is sufficient to keep an inactive transaction alive but not load the metastore unnecessarily.

Generally, the more events are included in each transaction the more throughput can be achieved.  

Note: Hive 1.3.0 onwards, invoking TxnBatch.close() will cause all unused transaction in the current TxnBatch to be aborted.

...