巨量資料處理架構hadoop -part 1- - debussy.im.nuu...
TRANSCRIPT
-
Big Data Technologies and Applications
Hadoop Introduction to Hadoop
-
(())
2
Hadoop
Hadoop
-
(())
3
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
-
(())
4
4
Hadoop
!
-
(())
5
HadoopApache
HadoopJava
HadoopHDFSHadoop
Distributed File SystemMapReduce
Hadoop
Hadoop
Hadoop
-
(())
6
Hadoop
Hadoop2002Nutch
Doug CuttingMike Cafarella
2004NutchGFS
NDFS (Nutch Distributed File System)
HDFS
2004
MapReduce
2005NutchMapReduce
-
(())
7
20062NutchNDFSMapReduce
LuceneDoug Cutting
Hadoop
Hadoop Doug Cutting
Hadoop
20081HadoopApacheHadoop
-
(())
8
20084Hadoop1TB
910
209
20095Hadoop1TB62
Hadoop
-
(())
9
Hadoop
Hadoop
PB
PC
Linux
(Scale Out)
-
(())
10
10
Scale Out v.s. Scale Up
Scale Up ()
CPU
Scale Out ()
-
(())
11
Hadoop
Hadoop
HadoopMaster/Slave
HadoopHDFSMaster/Slave
masterNameNodeSecondary NameNode
slaveDataNode
HadoopMapReduceMaster/Slave
masterJobTracker
slaveTaskTracker
-
(())
12
12
HDFSNameNode
Master NodeHDFSMetadata
Metadata(Blocks)(DataNode)()
NameNodeNameNodeHDFS(SPOF Single Point of Failure)
DataNodeSlave Node(Blocks)DataNodeDataNodeBlocksNameNodeNameNodeBlockNameNodeBlockDataNodeNameNodeDataNodeBlocksDataNode
SecondaryNameNodeNameNodeNameNode
-
(())
13
13
MapReduce
JobTracker ()
Master Node
Client
(MapReduce)
TaskTracker
Single Point of Failure
TaskTracker ()
Worker Nodes
MapReduce
TaskTrackerMapperReducer:
InputMapperHadoop
Reducer
-
(())
14
Hadoop
JobTracker
JobTrackerNameNode
NameNodeMetadata
JobTrackerMapReduceTaskTracker
TaskTrackerDataNodeMapReduce
MapReduceTaskTrackerJobTracker
JobTracker
-
(())
15
Hadoop
Hadoop
2007SunnyvaleM45
40001.5PBHadoop
FacebookHadoop
FacebookHadoop
-
(())
-
(())
17
Apache Hadoop
2017/12/13
(2018) Apache Hadoop
HadoopHadoop 1.0Hadoop
Hadoop 2.0
Hadoop0.20.x0.21.x
0.22.x0.20.x1.0.x
Hadoop0.23.x2.x
Hadoop 1.0
HDFS FederationYARN0.23.x2.x
NameNode HAWire-compatibility
-
(())
18
Hadoop
1.0 Hadoop
2.0
HDFS
Name Node HDFS HA (High Availability)
Name Node
Name Space HDFS Federation
Name Space
MapReduce YARN
-
(())
19
Hadoop(Hadoop Ecosystem)
-
(())
20
HDFS
MapReduce
YARN
Giraph YARN
Hive Hadoop
HBase Hadoop
Pig HadoopSQLPig Latin
Sqoop Hadoop
Zookeeper
Storm
Flume
Ambari HadoopApache Hadoop
Kafka
Spark Hadoop
MapReduce
(in Memory)
-
(())
21
Apache Hadoop
Hortonworks
Cloudera (CDHCloudera Distribution Hadoop)
MapR
Hadoop
Hadoop
-
(())
22
() () ()
Apache
Hadoop
Apache
2
2
2
Apache
Hadoop
Apache
2 2
Cloudera
CDH
Apache
5
5
5
ImpalaNavigator
4.5 4.5
Hortonworks
HDP
Apache
4.5
5
5
Apache Hadoop
Tez 4.5 4.5
MapR
Apache Hadoop
4.5
5
5
Apache
5 3.5
-
(())
23
Apache Hadoop (CDH, HDP, MapR)
HiveMahoutSqoopFlumeSparkOozie
Apache Hadoop
Apache HadoopBugFeaturepatch
()
Apache Hadoop (
Troubleshooting)
-
(())
24
Hadoop
Hadoop
Hadoop
-02Hadoop
-02Hadoop
24
http://debussy.im.nuu.edu.tw/sjchen/BigData_final.htmlhttp://web.nuu.edu.tw/~sjchen
-
(())
25
Hadoop
Hadoop
(Standalone Mode)Hadoop
HadoopOS
HDFSHadoop
NameNodeDataNodeTaskTrackerJava
(Pseudo-Distributed Mode)Hadoop
HDFSHadoop
NameNodeDataNodeTaskTrackerJobTrackerHadoop
(Fully-Distributed Mode)Hadoop
HDFSHadoop
NameNodeDataNodeTaskTrackerJobTrackerHadoop
-
(())
26
Hadoop
(/)
Hadoop
UbuntuHadoop ()
SSH
Java
/
-
(())
27
Hadoop
ctrl+alt+t
:
$ sudo useradd m hadoop s /bin/bash
hadoop/bin/bashshell
hadoop
$ sudo passwd hadoop
hadoop
$ sudo adduser hadoop sudo
Ubuntu hadoophadoop
-
(())
28
SSH
SSH
HadoopNameNodeDataNode SSHHadoop
NameNodeDataNode ()
SSH
SSH
Secure Shell
SSHSSH Ubuntu SSH
Server
-
(())
29
Java
JavaOracleJDKOpenJDK
UbuntuOpenJDK 8
-
(())
30
Hadoop/usr/local/
$ sudo tar -zxf /media/sf_VirtualboxShare/hadoop-2.7.4.tar.gz -C /usr/local #/usr/local$ cd /usr/local/$ sudo mv ./hadoop-2.7.4/ ./hadoop #hadoop$ sudo chown -R hadoop:hadoop ./hadoop #
HadoopHadoop Hadoop
$ cd /usr/local/hadoop$ ./bin/hadoop version
Hadoop
-
(())
31
Hadoop
HadoopJava
NameNodeDataNodeHDFS
Hadoop/usr/local/hadoop/etc/hadoop/
2 core-site.xml hdfs-
site.xml
-
(())
32
(2)
core-site.xmlhdfs-site.xml
hadoop namenode -format
Hadoopstart-dfs.sh (YARN)
webHadoop
-
(())
33
core-site.xml
hadoop.tmp.dirfile:/usr/local/hadoop/tmpAbaseforothertemporarydirectories.
fs.defaultFShdfs://localhost:9000
fs.defaultFShdfsport
hadoop.tmp.dir
-
(())
34
hdfs-site.xml
dfs.replication1
dfs.namenode.name.dirfile:/usr/local/hadoop/tmp/dfs/name
dfs.datanode.data.dirfile:/usr/local/hadoop/tmp/dfs/data
dfs.replication1
dfs.namenode.name.dirNameNode Metadata
dfs.datanode.data.dirDataNodeHDFS block
-
(())
35
Hadoop Shell/HDFS Shell
hadoop fshadoop dfshdfs dfs
hadoop fsHDFS
hadoop dfsHDFS ()
hdfs dfshadoop dfsHDFS
-
(())
36
Hadoop
HadoopHadoop
Hadoop
FacebookHadoop
Hadoop
ZookeeperHDFSMapReduceHBaseHive
PigHDFSMapReduceHadoop
LinuxHadoop
-
(())
3737
--
HadoopIntroduction to Hadoop Hadoop 5Hadoop 7 8Hadoop 10Hadoop 12 13 14Hadoop 16Apache Hadoop 18 Hadoop(Hadoop Ecosystem) 20Hadoop 22 23 HadoopHadoopHadoop (/)HadoopSSHJava 32 33 34 35 37