This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • HCatalog UsingHCat

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remove broken link to old doc

Using HCatalog

Table of Contents

Info
titleVersion information

HCatalog graduated from the Apache incubator and merged with the Hive project on March 26, 2013.
Hive version 0.11.0 is the first release that includes HCatalog.

...

Joe in data acquisition uses distcp to get data onto the grid.

No Format

hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data

hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'"

...

Without HCatalog, Sally must be manually informed by Joe when data is available, or poll on HDFS.

No Format

A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray, ...);
B = filter A by bot_finder(zeta) = 0;
...
store Z into 'data/processedevents/20100819/data';

With HCatalog, HCatalog will send a JMS message that data is available. The Pig job can then be started.

No Format

A = load 'rawevents' using org.apache.hive.hcatalog.pig.HCatLoader();
B = filter A by date = '20100819' and by bot_finder(zeta) = 0;
...
store Z into 'processedevents' using org.apache.hive.hcatalog.pig.HCatStorer("date=20100819");

...

Without HCatalog, Robert must alter the table to add the required partition.

No Format

alter table processedevents add partition 20100819 hdfs://data/processedevents/20100819/data

select advertiser_id, count(clicks)
from processedevents
where date = '20100819'
group by advertiser_id;

With HCatalog, Robert does not need to modify the table structure.

No Format

select advertiser_id, count(clicks)
from processedevents
where date = ‘20100819’
group by advertiser_id;

...

WebHCat is a REST API for HCatalog. (REST stands for "representational state transfer", a style of API based on HTTP verbs).  The original name of WebHCat was Templeton. For more information, see the WebHCat manual.

 

Panel
titleColorindigo
titleBGColorsilver
titleNavigation Links

Next: HCatalog Installation

General: HCatalog ManualWebHCat ManualHive Wiki HomeHive Project Site Old version of this document (HCatalog 0.5.0): Overview