Hive 0.8.0 provides support for two virtual columns:
INPUT__FILE__NAME, which is the input file's name for a mapper task.
the other is
BLOCK__OFFSET__INSIDE__FILE, which is the current global file position.
For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset.
BLOCK__OFFSET__INSIDE__FILE from src;
select key, count(
INPUT__FILE__NAME) from src group by key order by key;
select * from src where
BLOCK__OFFSET__INSIDE__FILE > 12000 order by key;