Download - MongoDB & Hadoop, Sittin' in a Tree
![Page 1: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/1.jpg)
K Young - CEO, Mortar
MongoDB + Hadoopsittin’ in a tree
![Page 2: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/2.jpg)
OF THIS SESSION
Overview
Super-fast intro to Hadoop, PigWhy MongoDB + Pig?Demo: Move data MongoDB <=> PigDemo: processing data with Pig
![Page 3: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/3.jpg)
SUPER-FAST INTRO
Hadoop
From Google researchBuilt for massive parallelizationBatch (for now)Widely applicable
![Page 4: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/4.jpg)
SUPER-FAST INTRO
Hadoop
![Page 5: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/5.jpg)
Social Graph
![Page 6: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/6.jpg)
Predict
![Page 7: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/7.jpg)
Detect
![Page 8: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/8.jpg)
Genetics
![Page 9: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/9.jpg)
SUPER-FAST INTRO
Hadoop
![Page 10: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/10.jpg)
ON HADOOP
Pig
Less code Expressive codeCompiles to MRInsulates from APIPopular (LinkedIn, Twitter, Salesforce, Yahoo, Stanford University...)
![Page 11: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/11.jpg)
BRIEF, EXPRESSIVE
LIKE PROCEDURAL SQL
Pig
(thanks: twitter hadoop world presentation)
![Page 12: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/12.jpg)
FOR SERIOUS
The Same Script, In MapReduce
![Page 13: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/13.jpg)
Alternatives to Hadoop
Write MapReduce in Javascript• Javascript is not fast• Has limited data types• Hard to use complex analytic libsAdds load to data store
MONGODB NATIVE MAPREDUCE
![Page 14: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/14.jpg)
Alternatives to HadoopMONGODB AGGREGATION FRAMEWORK
Great when• Doing SQL-style aggregation• Do not require external data libs• Extra load is ok
![Page 15: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/15.jpg)
MOTIVATIONS
MongoDB + Pig
Data storage and data processing are often separate concerns
Hadoop is built for scalable processing of large datasets
![Page 16: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/16.jpg)
SIMILAR PHILOSOPHY
MongoDB, Pig
Poly-structured data• MongoDB: stores data, regardless of structure• Pig: reads data, regardless of structure (got its
name because Pigs are omnivorous)
![Page 17: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/17.jpg)
MortarFAST INTRO
Open-source code-based dev framework for data, built on Hadoop and Pig
Inspired by Rails
Self-contained, organized, executable projects
![Page 18: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/18.jpg)
> gem install mortar> git clone https://github.com/mortardata/mongo-pig-examples.git
![Page 19: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/19.jpg)
LOADMONGO => PIG
Mongo-Hadoop connector
LOAD 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>' USING com.mongodb.hadoop.pig.MongoLoader();
![Page 20: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/20.jpg)
STOREPIG => MONGO
STORE result INTO 'mongodb://<username>:<password>@<host>:<port>/<database>.<collection>'USING com.mongodb.hadoop.pig.MongoStorage( 'update [key1, key2, key3]', '{key1: 1, key2: 1, key3: 1}, {unique:false, dropDups: false}');
![Page 21: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/21.jpg)
What’s my schema?GENERATE IT
Pig is schema-optional.No schema: document#'user'#'name'With schema: user.name
![Page 22: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/22.jpg)
What’s in the collection?CHARACTERIZE IT
Hadoop-based utility describes your collection
• Field name
• Unique value count
• Example value
• Data type
• Example value count
![Page 23: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/23.jpg)
AppendixLINKS
Reference:
http://help.mortardata.com/reference/loading_and_storing_data/MongoDB
Mongo-Hadoop connector
https://github.com/mortardata/mongo-hadoop
![Page 24: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/24.jpg)
@kky@mortardata
help.mortardata.com
![Page 25: MongoDB & Hadoop, Sittin' in a Tree](https://reader034.vdocument.in/reader034/viewer/2022042614/55515a80b4c905a8768b4be8/html5/thumbnails/25.jpg)
Lunch 1:20 – 2:05 Next Sessions at 2:05 5th Floor:
West Side Ballroom 3&4: How to Keep Your Data Safe in MongoDB
West Side Ballroom 1&2: Geospatial Enhancements in MongoDB 2.4
Juilliard Complex: Business Track: Business Track: How MongoDB Helps Telefonica Digital Accelerate Time to Market
Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session
7th Floor:
Empire Complex: Real-Time Integration Between MongoDB and SQL Databases
SoHo Complex: High Performance, Scalable MongoDB in a Bare Metal Cloud