Apache Pig is a platform for analyzing large data sets. Pig's language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Pig comes with many built-in functions but you can also create your own user-defined functions to do special-purpose processing.
Pig Latin programs run in a distributed fashion on a cluster (programs are complied into Map/Reduce jobs and executed using Hadoop). For quick prototyping, Pig Latin programs can also run in "local mode" without a cluster (all processing takes place in a single local JVM).
Do you Pig? At Yahoo! 40% of all Hadoop jobs are run with Pig. Come join us!
General Information
- Why Pig Latin instead of SQL? Pig Latin: A Not-So-Foreign Language ...
- Official Apache Pig Website
- PigTalksPapers - Pig talks, papers, interviews
- PoweredBy - a (partial) list of companies using Pig
- Pig book: Programming Pig
User Documentation
- User Documentation
- PiggyBank - User-defined functions (UDFs) contributed by Pig users!
- PigTools - Tools Pig users have built around and on top of Pig.
- PigInteroperability - How to make Pig work with other platforms you may be using, such as HBase and Cassandra.
- Penny - A distributed debugging framework for Pig.
- Pig Tutorial
- FAQ
Developer Documentation
- How tos
- Road map
- Specification Proposals
- Design proposals
- Guide for new contributors
Related Resources
Thanks
YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.