distributed file systems

30
File Systems: GFS vs. HDFS Ciprian Lucaci Daniel Straub Seminar: Internet-scale Distributed Systems

Upload: cipri7329

Post on 28-Jan-2016

222 views

Category:

Documents


0 download

DESCRIPTION

HDFS vs GFS a comparison

TRANSCRIPT

Page 1: Distributed File Systems

File Systems: GFS vs. HDFSCiprian Lucaci

Daniel Straub

Seminar: Internet-scale Distributed Systems

Page 2: Distributed File Systems

Distributed File Systems - Motivation

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

2

The Requirements

• Store very large data sets reliably

• High bandwidth streaming

• Distribute storage

• Distribute computation

• Analysis and transformation of data

The problem!!! >>= TB/PB of data

Page 3: Distributed File Systems

Distributed File Systems - Motivation

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

3

Google File System & Hadoop Distributed File System

• Moving computation where data resides

• Low costs – commodity machines

• Vertical and Horizontal scalabilityThe solution

Page 4: Distributed File Systems

HDFS – Hadoop Ecosystem

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

4

Hadoop

MapReduce

HDFS

PIGData Flow

HiveData Summarization …

Page 5: Distributed File Systems

HDFS - Architecture

• Blocks

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

5

B1

B2

B3

Bn

• 64 MB / block (default), 128 MB, 256 MB• Split large files into blocks• Large file > x100 MB/GB• Data in any format

• Data• Metadata: checksum + generation

timestamp

Page 6: Distributed File Systems

HDFS - Architecture

• Blocks

• Data Nodes

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

6

DN1

B1

B2

B3

Bn

DN1

B1

B2

B3

Bn

DataNode1: B1, B2, B3, …, B10DataNode2: B1, B10, B11, …, B2DataNode3: B3, B10, B11, …, B1…Replication factor – 3 (default)

Rack nRack 2

Rack 3Rack 1

X1000s Data nodes

• Heartbeat• Report blocks

Page 7: Distributed File Systems

Rack 2Rack 3Rack 1

HDFS - Architecture

• Blocks

• Data Nodes

• Name Node

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

7

Namenode

Datanodes

Rack n

Page 8: Distributed File Systems

HDFS - Architecture

• Blocks

• Data Nodes

• Name Node

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

8

Master-worker pattern

Rack n

Rack 3Rack 2

Rack 1

Namenode

Datanodes

Page 9: Distributed File Systems

HDFS - Architecture

• Blocks

• Data Nodes

• Name Node

9

Metadata• Edit log• System image• Block checksum• Block location• Namespace tree

Rack n

Rack 3Rack 2

Rack 1

Namenode

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

Datanodes

NamespaceFilename: block-ids (node#block#)/user/dir1/file1: n1b1, n1b2, n3b1, b4b3/user/dir2/file2: n3b7, n4b8, n1b6, b2b5…

Inode• Permissions• Modification,

access times

Where is the namespace located?

Page 10: Distributed File Systems

HDFS – Workflow: Startup

1. Handshake

• Namespace id

• Software version

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

10

Namenode

1. Handshake

3. Send block reportB1

B2

B3

Bn

Datanode

2. Register

2. Register

• storage id

3. Block Report• Block id• Generation stamp• Block length• 1 / hour (default)

4. Heartbeat

4. Heartbeat

• Every 3 seconds

• 10 min timeout

• Storage capacity

• Storage usage

• # current transfers

Page 11: Distributed File Systems

HDFS - Architecture

• Blocks

• Data Nodes

• Name Node

• Client

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

11Rack n

Rack 3Rack 2

Rack 1

Namenode

Datanodes

Client

• read, write, create, delete• files, directories

• Code library (Java)• Exposes HDFS interface

Page 12: Distributed File Systems

HDFS - Workflow

• Write

• Read

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

12Rack nRack 3

Rack 2Rack 1

NamenodeClient

1. Request block in namespace for file

2. Send block to a Datanode

3. Create First Replica

4. Create Second Replica

5. Get list of datanodes hosting file blocks

6. Get blocks in order

Single-writer, multiple reader model

B1: Dn1, Dn2, Dn3B2: Dn2, Dn10, Dn30B3: Dn3, Dn40, Dn2…

Page 13: Distributed File Systems

HDFS - Workflow

• Write

• Read

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

13Rack nRack 3

Rack 2Rack 1

NamenodeClient

1. Request block in namespace for file

2. Send block to a Datanode

3. Create First Replica

4. Create Second Replica

5. Get list of datanodes hosting file blocks

6. Get blocks in order

Single-writer, multiple reader model

WriteConfiguration configuration = new Configuration();FileSystem hdfs = FileSystem.get( new URI( "hdfs://localhost:54310" ), configuration );Path newFilePath = new Path("hdfs://localhost:54310/s2013/batch/table.html");

OutputStream os = hdfs.createNewFile(newFilePath);BufferedWriter br=new BufferedWriter(os);br.write(“Hello Hadoop!”);br.close();

ReadConfiguration configuration = new Configuration();FileSystem hdfs = FileSystem.get( new URI( "hdfs://localhost:54310" ), configuration );Path newFilePath = new Path("hdfs://localhost:54310/s2013/batch/table.html");

InputStreamReader isr = new InputStreamReader(hdfs .open(newFilePath ))BufferedReader br=new BufferedReader(isr); String line=br.readLine();br.close();

Page 14: Distributed File Systems

HDFS - Architecture

• Blocks

• Data Nodes

• Name Node

• Client Node

• Secondary Node

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

14Rack n

Rack 3Rack 2

Rack 1

Namenode Secondary node

• Edit log• System image

Datanodes

Client

Is there any weakness in the architecture?

• Checkpoint Node• Backup Node

Page 15: Distributed File Systems

HDFS - Features• Block placement policy

• Replica management (@NameNode)• Optimize disk usage• Replication factor• Rack awareness

• Balancer (@NameNode)• Disk space usage• Moves replicas

• Block scanner (@DataNode)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

15

How would you place the replicas?

DN1

B1

B2

B3

Bn

Rack nRack 2

Rack 3Rack 1

1. No Datanode contains more than one replica of any block.2. No rack contains more than two replicas of the same block

(if sufficient racks on the cluster)

Page 16: Distributed File Systems

HDFS - Purpose

• Optimized• Large files• Commodity hardware• Enable streaming• Batch processing• Multiple reads

• Not Optimized• Big amount of small files• Concurrent modification• Arbitrary modification - appends only• General purpose applications

• Cross platform: Java, Thrift, Rest• Web access, console …• opensource

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

16

Page 17: Distributed File Systems

GFS - Purpose

• Large files (100MB and more)• Commodity hardware (failures are the norm)• (Concurrently) appending files• Sequentially reading files• High data throughput

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

17

Page 18: Distributed File Systems

GFS - Architecture

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

18

MasterFile1: chunk a, chunk c, chunk d, …File2: chunk b, chunk x…

Chunkserver 1Chunk aChunk dChunk k…

Chunk: 64MB of data

ClientControl flow

Data flow

Chunkserver 2Chunk bChunk dChunk e…

Page 19: Distributed File Systems

GFS – Workflow (Read)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

19

Read1. File: File1, chunk index: 32. Chunk d, location: CS 1, CS 23. Chunk d, byte range: 1-10004. Data: byte 1-1000

1 2

34

MasterFile1: chunk a, chunk c, chunk d, …File2: chunk b, chunk x…

CS 1Chunk aChunk dChunk k…

CS 2Chunk bChunk dChunk e…

Page 20: Distributed File Systems

GFS – Leases

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

20

• CS may have leases for chunks

• Only CS with the lease for a chunk can modify it

• The master grants a chunk lease to one of the chunk‘s replicas

Does this affect CAP?And if, how?

Page 21: Distributed File Systems

GFS – Workflow (Write)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

21

1

MasterFile1: chunk a, chunk c, …

Chunk d: CS1, CS2, CS3

CS 1Chunk c

CS 2Chunk c

CS 3Chunk c

2

1. Client:Where is chunk c, who has the lease

2. Master:CS2, CS1, CS3

Page 22: Distributed File Systems

GFS – Workflow (Write)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

22

MasterFile1: chunk a, chunk c, …

Chunk d: CS1, CS2, CS3

CS 1Chunk c

CS 2Chunk c

CS 3Chunk c

3

3. Client:Push data to CSs, stored in buffer.CSs ackknowledge reveiving data.

4. Client:Write request to primary.Primary applies write to itself.

33

4

Page 23: Distributed File Systems

GFS – Workflow (Write)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

23

MasterFile1: chunk a, chunk c, …

Chunk d: CS1, CS2, CS3

CS 1Chunk c

CS 2Chunk c

CS 3Chunk c

5. Primary:Forward write request to secondaries.Secondaries apply changes in the same order.

6. Secondaries:Indicate completion of write

55

6 6

Page 24: Distributed File Systems

GFS – Workflow (Write)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

24

MasterFile1: chunk a, chunk c, …

Chunk d: CS1, CS2, CS3

CS 1Chunk c

CS 2Chunk c

CS 3Chunk c

7. Primary:Returns result of operation to client.

If there is an error, the client will retrywriting from step 3.

If error persists, client starts with step 1.

7

Page 25: Distributed File Systems

GFS – Workflow (Atomic Record Append)

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

25

• Differences to Write:3. Client pushes data to last chunk of file

4. Client requests: Record append

5. Primary checks if data fits in chunk

0100 0100 0110 1101

0100 0100 0110 1101 0000 0000

0010 0001 1000

Trying to append 0010 0001 1000

Data does not fit, padding chunk

Client tries again and succeeds

Page 26: Distributed File Systems

GFS – (In)consistency

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

26

• Inconsistency can be result of any write request.

• “GFS does not guarantee that all replicas are bytewise identical“

• Inconsistency is handled by checksums

Write Record Append

Serial Defined Defined and partlyinconsistentConcurrent Consistent

Failure Inconsistent

Page 27: Distributed File Systems

GFS – Garbage Collection

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

27

• Deleting files/chunks does not directly free memory:

Delete GC

• GC also removes non reachable chunks

x Time_x Time_x

On deletion fileis hidden and

renamed

When the GC sees such a file

it is finallydeleted

What are theadvantages of GC ?

Page 28: Distributed File Systems

GFS – Replicas

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

28

Creation

• Even disk space usage on CS

• Limited # of recent creations per CS

Re-replication

• If # of replicas < threshold, master creates new replicas

Rebalancing

• Improve disk space and load balancing

Page 29: Distributed File Systems

HDFS vs. GFS

Similarities

• Large files

• Write once, read multiple times

• Appending of files

• Streaming of files

• Focus on AP (of CAP)

Differences

• Block id vs. Chunk index

• Concurrently appending to files

• Open source vs. Closed source

• Namenode vs. Master fail

• Permissions

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

29

Page 30: Distributed File Systems

References

• HDFS• K. Shvachko, H. Kuang, S. Radia, R. Chansler. The Hadoop Distributed File System.

2010• Thomas Kiencke. Hadoop Distributed File System (HDFS)• http://www.sas.com/en_us/insights/big-data/hadoop.html

• GFS• S. Ghemawat, H. Gobioff, S. Leung. The Google File System. 2003

• Comparison• R.Vijayakumari, R.Kirankumar, K.Gangadhara Rao. Comparative analysis of Google

File System and Hadoop Distributed File System. 2014

File Systems: GFS vs. HadoopCiprian Lucaci and Daniel Straub

30