...
Sum of a column. avg, min, max can also be used. Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1)
in place of COUNT(*)
.
GROUP BY
Code Block |
---|
hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(*) WHERE a.foo > 0 GROUP BY a.bar; hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(*) FROM invites a WHERE a.foo > 0 GROUP BY a.bar; |
Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1)
in place of COUNT(*)
.
JOIN
Code Block |
---|
hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo; |
...
Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).
Now we can do some complex data analysis on the table u_data
:
...
Note that if you're using Hive 0.5.0 or earlier you will need to use COUNT(1)
in place of COUNT
(*)
.
Apache Weblog Data
The format of Apache weblog is customizable, while most webmasters uses the default.
For default Apache weblog, we can create a table with the following command.
...