...
No Format |
---|
CREATE EXTERNAL TABLE countries(key string, name string, country string, country_id int) STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler' WITH SERDEPROPERTIES ("accumulo.columns.mapping" = ":rowID,info:name,info:country,info:country_id"); |
Create an indexed table
To take advantage of indexing, Hive uses another Accumulo table is used to create a lexicographically-sorted search term index for each field allowing for very efficient exact match and bounded range searches.
No Format |
---|
CREATE TABLE company_stats (
rowid string,
active_entry boolean,
num_offices tinyint,
num_personel smallint,
total_manhours int,
num_shareholders bigint,
eff_rating float,
err_rating double,
yearly_production decimal,
start_date date,
address varchar(100),
phone char(13),
last_update timestamp )
ROW FORMAT SERDE 'org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe'
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES (
"accumulo.columns.mapping" = ":rowID,a:act,a:off,a:per,a:mhs,a:shs,a:eff,a:err,a:yp,a:sd,a:addr,a:ph,a:lu”,
"accumulo.table.name"="company_stats",
"accumulo.indextable.name"="company_stats_idx"
); |
Acknowledgements
I would be remiss to not mention the efforts made by Brian Femiano that were the basis for this storage handler. His initial prototype for Accumulo-Hive integration was the base for this work.