meet hadoop family: part 1

13
HDFS Meet Hadoop Family: part 1

Upload: caizerx

Post on 11-Jan-2017

87 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Meet Hadoop Family: part 1

HDFS

Meet Hadoop Family: part 1

Page 2: Meet Hadoop Family: part 1
Page 3: Meet Hadoop Family: part 1

• What is it? Distributed file system, designed to store very large files with streaming data access patterns

• Why it is needed? Very large fileStreaming data accessCommodity hardware

• Traditional design limitsRAC, MPP, brings data to computation, network become bottleneck

• Trade-offsHigh latency data accessNot good for lot of small filesWrite once, not support multiple write

Page 4: Meet Hadoop Family: part 1
Page 5: Meet Hadoop Family: part 1

A Client Reading Data From HDFS

Page 6: Meet Hadoop Family: part 1

A Client Write Data to HDFS

Page 7: Meet Hadoop Family: part 1

Network Distances in Hadoop

• distance(/d1/r1/n1, /d1/r1/n1) = 0 (processes on the same node) • distance(/d1/r1/n1, /d1/r1/n2) = 2 (different nodes on the same rack) • distance(/d1/r1/n1,/d1/r2/n3) = 4 (nodesondifferentracksinthesamedatacenter) • distance(/d1/r1/n1, /d2/r3/n4) = 6 (nodes in different data centers)

Page 8: Meet Hadoop Family: part 1

• HDFS blocks, default size 128 mb (for a reason), default replication 3x

• Name Node, stores metadata of all blocks in the clusters, location configuration dfs.namenode.name.dir, default /dfs/xx

• Data nodes, store data blocks, also has metadata related to local blocks

• POSIX like (almost) permissions, rw(x), users, groups, mode

Page 9: Meet Hadoop Family: part 1

• HDFS logs and web Interface, port 50070, port 50075

• WebHDFS/ HTTPFS REST interface http://sabtu:50070/webhdfs/v1/tmp?user.name=hdfs&op=GETFILESTATUS {"FileStatus":{"accessTime":0,"blockSize":0,"childrenNum":4,"fileId":16386,"group":"supergroup","length":0,"modificationTime":1467099643710,"owner":"hdfs","pathSuffix":"","permission":"1777","replication":0,"type":"DIRECTORY"}}

Page 10: Meet Hadoop Family: part 1

• High Availability mode

• HDFS federation, similar concept with namespace / database sharding

• HDFS balancer

• Safe mode

• Distributed copy (distcp)

Some Features

Page 11: Meet Hadoop Family: part 1

HDFS Federation

Page 12: Meet Hadoop Family: part 1

• start cluster $HADOOP_PREFIX_HOME/bin/start-dfs.sh

• stop cluster$HADOOP_PREFIX_HOME/bin/stop-dfs.sh

• file operations hdfs dfs -cp x yhdfs dfs -ls x hdfs dfs -cat x hdfs dfs -put x y hdfs dfs -get x y

Common Commands

Page 13: Meet Hadoop Family: part 1

Questions?https://www.meetup.com/Jakarta-Hadoop-Big-Data/