Index

Apache Pig is a platform for analyzing large data sets. Pig's language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing.

Pig Latin programs run in a distributed fashion on a cluster (programs are complied into Map/Reduce jobs and executed using Hadoop). For quick prototyping, Pig Latin programs can also run in "local mode" without a cluster (all processing takes place in a single local JVM).

Do you Pig? At Yahoo! 40% of all Hadoop jobs are run with Pig. Come join us!

General Information

Why Pig Latin instead of SQL? Pig Latin: A Not-So-Foreign Language ...
Official Apache Pig Website
PigTalksPapers - Pig talks, papers, interviews
PoweredBy - a (partial) list of companies using Pig

User Documentation

User Documentation
PiggyBank - User-defined functions (UDFs) contributed by Pig users!
PigTools - Tools Pig users have built around and on top of Pig.
PigInteroperability - How to make Pig work with other platforms you may be using, such as HBase and Cassandra.
Penny - A distributed debugging framework for Pig.

Developer Documentation

How tos
Road map
Specification Proposals
- PigTestProposal

Child pages

General Information

User Documentation

Developer Documentation

Related Resources