You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Apache Pig is a platform for analyzing large data sets. Pig's language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing.

Pig Latin programs run in a distributed fashion on a cluster (programs are complied into Map/Reduce jobs and executed using Hadoop). For quick prototyping, Pig Latin programs can also run in "local mode" without a cluster (all processing takes place in a single local JVM).

Do you Pig? At Yahoo! 40% of all Hadoop jobs are run with Pig. Come join us!

General Information

User Documentation

  • User Documentation
  • PiggyBank - User-defined functions (UDFs) contributed by Pig users!
  • PigTools - Tools Pig users have built around and on top of Pig.
  • PigInteroperability - How to make Pig work with other platforms you may be using, such as HBase and Cassandra.
  • Penny - A distributed debugging framework for Pig.

Developer Documentation

Related Resources

  • No labels