Virtual Columns
Hive 0.8.0 provides support for two virtual columns:
One is INPUT__FILE__NAME
, which is the input file's name for a mapper task.
the other is BLOCK__OFFSET__INSIDE__FILE
, which is the current global file position.
For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset.
Since Hive 0.8.0 the following virtual columns have been added:
ROW__OFFSET__INSIDE__BLOCK
RAW__DATA__SIZE
ROW__ID
GROUPING__ID
It is important to note, that all of the virtual columns listed here cannot be used for any other purpose (i.e. table creation with columns having a virtual column will fail with "SemanticException Error 10328: Invalid column name..")
Simple Examples
select INPUT__FILE__NAME
, key, BLOCK__OFFSET__INSIDE__FILE
from src;
select key, count(INPUT__FILE__NAME
) from src group by key order by key;
select * from src where BLOCK__OFFSET__INSIDE__FILE
> 12000 order by key;