dan bassett, jonathan canfield december 13, 2011
DESCRIPTION
Dan Bassett, Jonathan Canfield December 13, 2011. What is Hadoop ?. Allows for the distributed processing of large data sets across clusters of computers Open-source project written in Java Actively supported Inspired by a project that Google started. What’s the big deal?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/1.jpg)
Dan Bassett, Jonathan CanfieldDecember 13, 2011
![Page 2: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/2.jpg)
2
What is Hadoop?• Allows for the distributed processing of large data sets across
clusters of computers• Open-source project written in Java• Actively supported• Inspired by a project that Google started
![Page 3: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/3.jpg)
3
What’s the big deal?
• Changes the economics and dynamics of large scale computing
• Scalable• Cost effective• Flexible• Fault Tolerant
![Page 4: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/4.jpg)
4
Commercially supported
• InfoSphere BigInsights• Silicon Graphics CloudRack• EMC Greenplum• Google App Engine• Oracle Big Data Appliance• Cloudera CDH, Professional Services• Microsoft Windows Server, SQL Server
![Page 5: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/5.jpg)
5
Who Uses Hadoop?
![Page 6: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/6.jpg)
6
Prominent Users
• Facebook - claims to have the largest Hadoop cluster in the world at 30PB.
• Yahoo! - claims to have the world’s largest Hadoop production application.
• eBay – 5.3PB, 532 nodes cluster• New York Times – processed 4TB of image data
into 11 million PDFs at cost of ~ $240
![Page 7: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/7.jpg)
7
HOW DOES IT WORK?
![Page 8: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/8.jpg)
8
Architecture• Hadoop Common• Hadoop Distributed File System (HDFS)• MapReduce Engine
![Page 9: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/9.jpg)
9
File System (HDFS)• One big file system from many nodes• Fault-tolerant• Runs on low-cost commodity hardware
![Page 10: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/10.jpg)
10
MapReduce Engine• Splits input data• Assigns work to nodes• Processed in parallel
![Page 11: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/11.jpg)
11
MapReduce Illustration
![Page 12: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/12.jpg)
12
MapReduce Step 1
![Page 13: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/13.jpg)
13
MapReduce Step 2
![Page 14: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/14.jpg)
14
MapReduce Step 3
![Page 15: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/15.jpg)
15
MapReduce Step 4
![Page 16: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/16.jpg)
16
MapReduce Step 4
![Page 17: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/17.jpg)
17
MapReduce Step 5
![Page 18: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/18.jpg)
18
MapReduce Step 5
![Page 19: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/19.jpg)
19
MapReduce Step 6
![Page 20: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/20.jpg)
20
MapReduce Illustration
![Page 21: Dan Bassett, Jonathan Canfield December 13, 2011](https://reader036.vdocument.in/reader036/viewer/2022070420/56815e99550346895dcd29dc/html5/thumbnails/21.jpg)
21
Resources
• Project Homehttp://hadoop.apache.org/
• Wikipediahttp://en.wikipedia.org/wiki/Apache_Hadoop
• IBMhttp://www-01.ibm.com/software/data/infosphere/hadoop/