w/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ]...

Post on 23-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

High level data flow language for exploring very large datasets.

PIG

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoopjobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy[2] and then call directly from the language.

Apache Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad-hoc way of creating and executing MapReduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation.

Why use Pig

Pig is a powerful tool for querying data in a Hadoop cluster. It's so powerful that Yahoo! estimates that between 40% and 60% of its Hadoop workloads are generated from Pig Latin scripts. With 100,000 CPUs at Yahoo! and roughly 50% running Hadoop, that's a lot of pigs running around.

Pig Users

running Hadoop, that's a lot of pigs running around.

But Yahoo! isn't the only organization taking advantage of Pig. You'll find Pig at Twitter (processing logs, mining tweet data); t AOL and MapQuest (for analytics and batch data processing); and at LinkedIn, where Pig is used to discover people you might know.

Ebay is reportedly using Pig for search optimization, and adyard uses Pig in about half of its recommender system.

Where not to Use Pig

Apache Pig - Architecture

top related