巨量資料處理架構hadoop -part 1- - debussy.im.nuu...

37
巨量資料技術與應用 Big Data Technologies and Applications 國立聯合大學 資訊管理學系 陳士杰老師 巨量資料處理架構Hadoop Introduction to Hadoop

Upload: others

Post on 30-Aug-2019

8 views

Category:

Documents


0 download

TRANSCRIPT

  • Big Data Technologies and Applications

    Hadoop Introduction to Hadoop

  • (())

    2

    Hadoop

    Hadoop

  • (())

    3

    Hadoop

    Hadoop

    Hadoop

    Hadoop

    Hadoop

  • (())

    4

    4

    Hadoop

    !

  • (())

    5

    HadoopApache

    HadoopJava

    HadoopHDFSHadoop

    Distributed File SystemMapReduce

    Hadoop

    Hadoop

    Hadoop

  • (())

    6

    Hadoop

    Hadoop2002Nutch

    Doug CuttingMike Cafarella

    2004NutchGFS

    NDFS (Nutch Distributed File System)

    HDFS

    2004

    MapReduce

    2005NutchMapReduce

  • (())

    7

    20062NutchNDFSMapReduce

    LuceneDoug Cutting

    Hadoop

    Hadoop Doug Cutting

    Hadoop

    20081HadoopApacheHadoop

  • (())

    8

    20084Hadoop1TB

    910

    209

    20095Hadoop1TB62

    Hadoop

  • (())

    9

    Hadoop

    Hadoop

    PB

    PC

    Linux

    (Scale Out)

  • (())

    10

    10

    Scale Out v.s. Scale Up

    Scale Up ()

    CPU

    Scale Out ()

  • (())

    11

    Hadoop

    Hadoop

    HadoopMaster/Slave

    HadoopHDFSMaster/Slave

    masterNameNodeSecondary NameNode

    slaveDataNode

    HadoopMapReduceMaster/Slave

    masterJobTracker

    slaveTaskTracker

  • (())

    12

    12

    HDFSNameNode

    Master NodeHDFSMetadata

    Metadata(Blocks)(DataNode)()

    NameNodeNameNodeHDFS(SPOF Single Point of Failure)

    DataNodeSlave Node(Blocks)DataNodeDataNodeBlocksNameNodeNameNodeBlockNameNodeBlockDataNodeNameNodeDataNodeBlocksDataNode

    SecondaryNameNodeNameNodeNameNode

  • (())

    13

    13

    MapReduce

    JobTracker ()

    Master Node

    Client

    (MapReduce)

    TaskTracker

    Single Point of Failure

    TaskTracker ()

    Worker Nodes

    MapReduce

    TaskTrackerMapperReducer:

    InputMapperHadoop

    Reducer

  • (())

    14

    Hadoop

    JobTracker

    JobTrackerNameNode

    NameNodeMetadata

    JobTrackerMapReduceTaskTracker

    TaskTrackerDataNodeMapReduce

    MapReduceTaskTrackerJobTracker

    JobTracker

  • (())

    15

    Hadoop

    Hadoop

    2007SunnyvaleM45

    40001.5PBHadoop

    FacebookHadoop

    FacebookHadoop

  • (())

  • (())

    17

    Apache Hadoop

    2017/12/13

    (2018) Apache Hadoop

    HadoopHadoop 1.0Hadoop

    Hadoop 2.0

    Hadoop0.20.x0.21.x

    0.22.x0.20.x1.0.x

    Hadoop0.23.x2.x

    Hadoop 1.0

    HDFS FederationYARN0.23.x2.x

    NameNode HAWire-compatibility

  • (())

    18

    Hadoop

    1.0 Hadoop

    2.0

    HDFS

    Name Node HDFS HA (High Availability)

    Name Node

    Name Space HDFS Federation

    Name Space

    MapReduce YARN

  • (())

    19

    Hadoop(Hadoop Ecosystem)

  • (())

    20

    HDFS

    MapReduce

    YARN

    Giraph YARN

    Hive Hadoop

    HBase Hadoop

    Pig HadoopSQLPig Latin

    Sqoop Hadoop

    Zookeeper

    Storm

    Flume

    Ambari HadoopApache Hadoop

    Kafka

    Spark Hadoop

    MapReduce

    (in Memory)

  • (())

    21

    Apache Hadoop

    Hortonworks

    Cloudera (CDHCloudera Distribution Hadoop)

    MapR

    Hadoop

    Hadoop

  • (())

    22

    () () ()

    Apache

    Hadoop

    Apache

    2

    2

    2

    Apache

    Hadoop

    Apache

    2 2

    Cloudera

    CDH

    Apache

    5

    5

    5

    ImpalaNavigator

    4.5 4.5

    Hortonworks

    HDP

    Apache

    4.5

    5

    5

    Apache Hadoop

    Tez 4.5 4.5

    MapR

    Apache Hadoop

    4.5

    5

    5

    Apache

    5 3.5

  • (())

    23

    Apache Hadoop (CDH, HDP, MapR)

    HiveMahoutSqoopFlumeSparkOozie

    Apache Hadoop

    Apache HadoopBugFeaturepatch

    ()

    Apache Hadoop (

    Troubleshooting)

  • (())

    24

    Hadoop

    Hadoop

    Hadoop

    -02Hadoop

    -02Hadoop

    24

    http://debussy.im.nuu.edu.tw/sjchen/BigData_final.htmlhttp://web.nuu.edu.tw/~sjchen

  • (())

    25

    Hadoop

    Hadoop

    (Standalone Mode)Hadoop

    HadoopOS

    HDFSHadoop

    NameNodeDataNodeTaskTrackerJava

    (Pseudo-Distributed Mode)Hadoop

    HDFSHadoop

    NameNodeDataNodeTaskTrackerJobTrackerHadoop

    (Fully-Distributed Mode)Hadoop

    HDFSHadoop

    NameNodeDataNodeTaskTrackerJobTrackerHadoop

  • (())

    26

    Hadoop

    (/)

    Hadoop

    UbuntuHadoop ()

    SSH

    Java

    /

  • (())

    27

    Hadoop

    ctrl+alt+t

    :

    $ sudo useradd m hadoop s /bin/bash

    hadoop/bin/bashshell

    hadoop

    $ sudo passwd hadoop

    hadoop

    $ sudo adduser hadoop sudo

    Ubuntu hadoophadoop

  • (())

    28

    SSH

    SSH

    HadoopNameNodeDataNode SSHHadoop

    NameNodeDataNode ()

    SSH

    SSH

    Secure Shell

    SSHSSH Ubuntu SSH

    Server

  • (())

    29

    Java

    JavaOracleJDKOpenJDK

    UbuntuOpenJDK 8

  • (())

    30

    Hadoop/usr/local/

    $ sudo tar -zxf /media/sf_VirtualboxShare/hadoop-2.7.4.tar.gz -C /usr/local #/usr/local$ cd /usr/local/$ sudo mv ./hadoop-2.7.4/ ./hadoop #hadoop$ sudo chown -R hadoop:hadoop ./hadoop #

    HadoopHadoop Hadoop

    $ cd /usr/local/hadoop$ ./bin/hadoop version

    Hadoop

  • (())

    31

    Hadoop

    HadoopJava

    NameNodeDataNodeHDFS

    Hadoop/usr/local/hadoop/etc/hadoop/

    2 core-site.xml hdfs-

    site.xml

  • (())

    32

    (2)

    core-site.xmlhdfs-site.xml

    hadoop namenode -format

    Hadoopstart-dfs.sh (YARN)

    webHadoop

  • (())

    33

    core-site.xml

    hadoop.tmp.dirfile:/usr/local/hadoop/tmpAbaseforothertemporarydirectories.

    fs.defaultFShdfs://localhost:9000

    fs.defaultFShdfsport

    hadoop.tmp.dir

  • (())

    34

    hdfs-site.xml

    dfs.replication1

    dfs.namenode.name.dirfile:/usr/local/hadoop/tmp/dfs/name

    dfs.datanode.data.dirfile:/usr/local/hadoop/tmp/dfs/data

    dfs.replication1

    dfs.namenode.name.dirNameNode Metadata

    dfs.datanode.data.dirDataNodeHDFS block

  • (())

    35

    Hadoop Shell/HDFS Shell

    hadoop fshadoop dfshdfs dfs

    hadoop fsHDFS

    hadoop dfsHDFS ()

    hdfs dfshadoop dfsHDFS

  • (())

    36

    Hadoop

    HadoopHadoop

    Hadoop

    FacebookHadoop

    Hadoop

    ZookeeperHDFSMapReduceHBaseHive

    PigHDFSMapReduceHadoop

    LinuxHadoop

  • (())

    3737

    --

    HadoopIntroduction to Hadoop Hadoop 5Hadoop 7 8Hadoop 10Hadoop 12 13 14Hadoop 16Apache Hadoop 18 Hadoop(Hadoop Ecosystem) 20Hadoop 22 23 HadoopHadoopHadoop (/)HadoopSSHJava 32 33 34 35 37