introduction to hadoop
DESCRIPTION
Lynx Consultants training about HadoopTRANSCRIPT
![Page 1: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/1.jpg)
Marc Cluet – Lynx Consultants What’s behind Big Data
![Page 2: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/2.jpg)
What we’ll cover?
¡ Understand Hadoop components ¡ Understand different technologies involved ¡ Embrace Big Data!
Lynx Consultants © 2013
![Page 3: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/3.jpg)
What is Big Data?
Lynx Consultants © 2013
![Page 4: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/4.jpg)
What is Big Data?
¡ SQL has a limited ability to process changing data § SQL schemas are the truth, data needs to fit that
Lynx Consultants © 2013
![Page 5: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/5.jpg)
What is Big Data?
¡ Big Data is the solution! § Data can be truly dynamic
Lynx Consultants © 2013
![Page 6: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/6.jpg)
What is Big Data?
¡ Big Data is the solution! § Data can be truly dynamic § Designed to handle Terabytes of data
Lynx Consultants © 2013
![Page 7: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/7.jpg)
What is Big Data?
¡ Big Data is the solution! § Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data
Lynx Consultants © 2013
![Page 8: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/8.jpg)
What is Big Data?
¡ Big Data is the solution! § Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest
Lynx Consultants © 2013
![Page 9: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/9.jpg)
What is Big Data?
¡ Big Data is the solution! § Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest § Designed around Map/Reduce
Lynx Consultants © 2013
![Page 10: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/10.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 11: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/11.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 12: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/12.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 13: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/13.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 14: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/14.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 15: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/15.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 16: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/16.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 17: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/17.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 18: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/18.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 19: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/19.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 20: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/20.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 21: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/21.jpg)
Who runs Big Data?
¡ A few small companies
Lynx Consultants © 2013
![Page 22: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/22.jpg)
What is Hadoop?
Lynx Consultants © 2013
![Page 23: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/23.jpg)
What is Hadoop?
¡ Hadoop is one of the big players for Big Data § Developed as an Open Source implementation to implement
Google BigTable
Lynx Consultants © 2013
![Page 24: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/24.jpg)
What is Hadoop?
¡ Hadoop is one of the big players for Big Data § Developed as an Open Source implementation to implement
Google BigTable § Mainly developed at Yahoo!
Lynx Consultants © 2013
![Page 25: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/25.jpg)
What is Hadoop?
¡ Hadoop is one of the big players for Big Data § Developed as an Open Source implementation to implement
Google BigTable § Mainly developed at Yahoo! § Current companies behind it: Hortonworks and Cloudera
Lynx Consultants © 2013
![Page 26: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/26.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System § HDFS is a distributed filesystem across many nodes § Has many copies of your data (default: 3) § If one node goes down makes sure all the data is rebalanced
Lynx Consultants © 2013
![Page 27: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/27.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System
Lynx Consultants © 2013
![Page 28: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/28.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System ¡ Hbase – Hadoop NoSQL Database
§ Schemaless Key-‐Value storage § All data exportable in JSON
Lynx Consultants © 2013
![Page 29: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/29.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System ¡ Hbase – Hadoop NoSQL Database
Lynx Consultants © 2013
![Page 30: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/30.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all
§ This was invented by Google § Given a dataset we Map all that match a criteria § Then we Reduce this to a result
Lynx Consultants © 2013
![Page 31: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/31.jpg)
What are the features of Hadoop?
¡ Map/Reduce – The key to it all
Lynx Consultants © 2013
![Page 32: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/32.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL
§ Hive provides a SQL language called HiveSQL § Provides a good entrance for SQL users :)
Lynx Consultants © 2013
![Page 33: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/33.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy
§ Creates data results given a reduced language § Reinvents SQL somehow
Lynx Consultants © 2013
![Page 34: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/34.jpg)
What are the features of Hadoop?
¡ Hive
Lynx Consultants © 2013
![Page 35: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/35.jpg)
What are the features of Hadoop?
¡ Pig
Lynx Consultants © 2013
![Page 36: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/36.jpg)
What are the features of Hadoop?
¡ HDFS – Hadoop Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy ¡ Flume – Fault Tolerant transport
Lynx Consultants © 2013
![Page 37: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/37.jpg)
What are the features of Hadoop?
¡ Flume § Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! ▪ Avro, Exec, JMS, Syslog, HTTP, NetCat, Your Own (Java)
Lynx Consultants © 2013
![Page 38: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/38.jpg)
What are the features of Hadoop?
¡ Flume § Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! ▪ Memory, File, Your Own (Java)
Lynx Consultants © 2013
![Page 39: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/39.jpg)
What are the features of Hadoop?
¡ Flume § Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! § Many sinks! ▪ Avro, HDFS, Logger, IRC, File, Hbase, ElasticSearch, S3, Community sinks, Your Own (Java)
Lynx Consultants © 2013
![Page 40: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/40.jpg)
What are the features of Hadoop?
¡ Flume
Lynx Consultants © 2013
![Page 41: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/41.jpg)
How Hadoop looks like in a DC
¡ Components § Primary Namenode § Secondary Namenode § Data Node
Lynx Consultants © 2013
![Page 42: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/42.jpg)
How Hadoop looks like in a DC
¡ Components § Primary Namenode ▪ Controls all the cluster, knows where the data resides ▪ Runs the job tracker to keep track of Map/Reduce jobs ▪ Biggest point of failure, shadowing it is a potential option
§ Secondary Namenode § Data Node
Lynx Consultants © 2013
![Page 43: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/43.jpg)
How Hadoop looks like in a DC
¡ Components § Primary Namenode § Secondary Namenode ▪ Performs secondary cleanup options
§ Data Node
Lynx Consultants © 2013
![Page 44: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/44.jpg)
How Hadoop looks like in a DC
¡ Components § Primary Namenode § Secondary Namenode § Data Node ▪ Stores all the information ▪ Runs Map/Reduce
Lynx Consultants © 2013
![Page 45: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/45.jpg)
How Hadoop looks like in a DC
¡ Components
Lynx Consultants © 2013
![Page 46: Introduction to hadoop](https://reader034.vdocument.in/reader034/viewer/2022051514/54b6b3ff4a795942358b4584/html5/thumbnails/46.jpg)
Questions?
Lynx Consultants © 2013