Thrift API
This page discusses the Thrift client API for Hbase. Thrift is both cross-platform and more lightweight than REST for many operations.
The latest version of the Hbase Thrift API is described by Hbase.thrift.
Using the API
Generating a Thrift client package
Once Thrift is installed, use:
thrift --gen [lang] [hbase-root]/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
lang should be one of java
, cpp
, rb
, py
, perl
or another language listed in Hbase.thrift.
This will produce a directory called gen-py, gen-rb, etc. containing the appropriate model.
Starting the Thrift server
The Thrift server can be started with:
[hbase-root]/bin/hbase thrift start
Using with Python
See Yann's tutorial (July 2008).
To acquire a Thrift client instance:
from thrift.transport.TSocket import TSocket from thrift.transport.TTransport import TBufferedTransport from thrift.protocol import TBinaryProtocol from hbase import Hbase transport = TBufferedTransport(TSocket(host, port)) transport.open() protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Hbase.Client(protocol)
Use help(client)
to view the Python API.
Data Type Spec
See Hbase.thrift for an up-to-date API and a complete definition of all methods and data types.
This section contains some definitions of Thrift data types needed for communication.
ColumnDescriptor
Used by getColumnDescriptors.
struct ColumnDescriptor { 1:Text name, 2:i32 maxVersions = 3, 3:string compression = "NONE", 4:bool inMemory = 0, 5:string bloomFilterType = "NONE", 6:i32 bloomFilterVectorSize = 0, 7:i32 bloomFilterNbHashes = 0, 8:bool blockCacheEnabled = 0, 9:i32 timeToLive = -1 }
TRegionInfo
Used by getTableRegions.
struct TRegionInfo { 1:Text startKey, 2:Text endKey, 3:i64 id, 4:Text name, 5:byte version, 6:Text serverName, 7:i32 port }
Mutation
Used when performing batch update operations. isDelete is the switch you flip when you want to delete a cell.
struct Mutation { 1:bool isDelete = 0, 2:Text column, 3:Text value, 4:bool writeToWAL = 1 }
TRowResult
Used by getRow and other gets of rows.
struct TRowResult { 1:Text row, 2:map<Text, TCell> columns }
Method Spec
This section contains some definitions of the methods exposed to clients. If you have a method to propose, add a JIRA along with a comment explaining why we should want such a method.
Meta-info methods
Get Table Names
Returns a list of table names.
list<Text> getTableNames()
Get Column Descriptors
Return a list of column families for a given table.
map<Text,ColumnDescriptor> getColumnDescriptors(Text tableName)
Get Table Regions
Return a list of the regions that make up a table.
list<TRegionInfo> getTableRegions(Text tableName)
Row methods
Get Row
Retrieve a list<TRowResult> for a given row, with all the usual options. (timestamp, selected columns). There are lots of other methods to get rows with different parameters.
list<TRowResult> getRow(Text tableName, Text row, map<Text, Text> attributes)
Mutate Row (Put)
Send a series of mutation commands (put, delete) to the table.
void mutateRow(Text tableName, Text row, list<Mutation> mutations, map<Text, Text> attributes)
Delete Row
Delete an entire row.
void deleteAllRow(Text tableName, Text row, map<Text, Text> attributes)
Scanner methods
Scanner methods use a scanner id that is maintained on the Thrift server. The ScannerId is defined as:
typedef i32 ScannerID
In Thrift, an i32 is a signed, 32-bit integer that identifies the scanner object on the Thrift server.
Open Scanner
Create a scanner for a table with some options.
ScannerID scannerOpen(Text tableName, Text startRow, list<Text> columns, map<Text, Text> attributes)
Get Scanner Results
Retrieve one or more records from the scanner at once.
list<TRowResult> scannerGetList(ScannerID id, i32 numberOfRows)
Close Scanner
Close a scanner.
void scannerClose(ScannerID id)