apache pig introduction
TRANSCRIPT
Jackson Oliveira@cyber_jsoSoftware Engineer
APACHE PIG
A High Level Analysis Platform
Which can be plugged on Hadoop
How it works?
How it works?
What is the point in using PIG?!
MR is not difficult in theory...
But the reality can be different...
We want it easy to understand
Users = LOAD 'myfile.txt' ‘users’ USING PigStorage('\t') AS (name, age);
Filtered = FILTER Users BY age >= 18 AND age <= 25;
Pages = LOAD ‘pages’ AS (user, url);
Joined = JOIN Filtered BY name, Pages BY user;
Grouped = GROUP Joined BY url;
Summed = FOREACH Grouped generate GROUP, COUNT(Joined) AS clicks;
Sorted = ORDER Summed BY clicks DESC;
Also easy to extend (UDFs)...
It takes care of the execution plan for you
When use apache pig?
If you want thing being done faster
An active community
You might need rethink complicated things