sangmi lee pallickara - colorado state universitycs435/slides/week13-a-2.pdf · 2019-11-25 ·...
Post on 27-Jun-2020
0 Views
Preview:
TRANSCRIPT
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
1
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.0
CS435 Introduction to Big Data
PART 2. LARGE SCALE DATA STORAGE SYSTEMSDISTRIBUTED FILE SYSTEMS
Sangmi Lee PallickaraComputer Science, Colorado State Universityhttp://www.cs.colostate.edu/~cs435
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.1
FAQs• Term project presentations• 12/9 (team 1- 6), 12/11 (team 6-12), 12/13 (team 13-16)• Please attend at least 2 presentation sessions and ask questions or provide comments
• Participation score (attendance + question)
• 12 minutes (including transition time)/team• Submit your slides (No PDF!) on canvas
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
2
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.2
Today’s topics
• Google File System
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.3
Inconsistent Regions
Data 3 Data 3
Data 1 Data 1 Data 1
Data 2 Data 2 Data 2
User will re-try to store Data 3
Data 3 Data 3
Data 1
Data 2
Data 3
Data 1
Data 2
Data 3
Data 1
Data 2
Data 3
Data 3
Failed
Empty
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
3
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.4
What if record append fails at one of the replicas
• Client must retry the operation
• Replicas of same chunk may contain• Different data• Duplicates of the same record
• In whole or in part
• Replicas of chunks are not bit-wise identical!• In most systems, replicas are identical
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.5
GFS only guarantees that the data will be written at least once as an atomic unit
• For an operation to return success• Data must be written at the same offset on all the replicas
• After the write, all replicas are as long as the end of the record
• Any future record will be assigned a higher offset or a different chunk
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
4
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.6
GFS client code implements the file system API
• Communications with master and chunk servers done transparently• On behalf of apps that read or write data
• Interact with master for metadata
• Data-bearing communications directly to chunk servers
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.7
Handling failed “write” in Hadoop HDFS [1/2]
• Different from GFS
1. Pipeline is closed• Any packets in the ack queue are added to the front of the data queue
• Datanodes that are downstream from the failed node will not miss any packets
2. The current block on the good datanodes is given a new identity• Reports to the namenode
• To detect and delete partial block on the failed datanode later on
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
5
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.8
Handling failed “write” in Hadoop HDFS [2/2]
3. Remainder of the block’s data is written to the other good datanodes in the pipeline
4. Namenode notices the block is under-replicated • It arranges for a further replica to be created on another node• Write quorum
• dfs.replication.min (default to 1)
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.9
Part 2. Large scale data storage system
Distributed File SystemGoogle File System (GFS): Creating Snapshot
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
6
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.10
Snapshots
• Copying file or directory tree almost instantaneously
• Minimizing any interruptions of ongoing mutations
• Providing checkpoint mechanism• Users can commit later
• Rollback
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.11
Snapshots allow you to make a copy of a file very fast
1. Master revokes outstanding leases for any chunks of the file (source) to be snapshot
2. Log the operation to disk
3. Update in-memory state1. Duplicate metadata of the source file
4. Newly created file points to the “same chunks” as the source
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
7
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.12
When a client wants to write to a chunk C after the snapshot operation• Master sees “the reference count to C” > 1• Pick new chunk-handle C’
• Ask chunk-server with current replica of C• Create new chunk C’• Data is copied locally, not over the network
• From this point chunk handling of C’ is no different
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.13
GFS does not have a per-directory structure that lists files in the directory
• Name spaces represented as a lookup table• Maps full pathnames to metadata
• Each node has an associated read/write lock
• File creation does not require a lock on the directory structure• No inode needs to be protected from modification
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
8
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.14
Each master operation acquires a set of locks before it runs
• Read lock prevents a directory/file from being deleted, renamed, or snapshotted• Others can still read it
• Others CANNOT mutate (write/append/delete) this file
• Write lock on directory/file names serialize attempts to create a file with the same twice• Others CANNOT mutate (write/append/delete) this file
• Others CANNOT read this file
• If operation involves /d1/d2/…/dn/leaf• Acquire read locks on directory names
• /d1, /d1/d2, …, /d1/d2/…/dn• Read or write lock on full pathname
• /d1/d2/…/dn/leaf
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.15
Each master operation acquires a set of locks before it runs: Example
• Person A is reading a file mybooks/les_miserable
• Person B is trying to read a file mybooks/les_miserable
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
9
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.16
Each master operation acquires a set of locks before it runs: Example• Person A is reading a file
mybooks/les_miserablemybooks/ :read lockmybooks/les_miserable: read lock
• Person B is trying to read a file mybooks/les_miserablemybooks/ :read lockmybooks/les_miserable: read lock
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.17
Each master operation acquires a set of locks before it runs: Example
• Person A is reading a file mybooks/les_miserable
• Person B is trying to create a file mybooks/pride_and_prejudice
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
10
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.18
Each master operation acquires a set of locks before it runs: Example• Person A is reading a file
mybooks/les_miserablemybooks/ :read lockmybooks/les_miserable: read lock
• Person B is trying to create a file mybooks/pride_and_prejudicemybooks/ :read lockmybooks/pride_and_prejudice :write lock
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.19
Each master operation acquires a set of locks before it runs: Example
• Person A is trying to create a file mybooks/les_miserable
• Person B is trying to create a file mybooks/les_miserable
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
11
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.20
Each master operation acquires a set of locks before it runs: Example• Person A is trying to create a file
mybooks/les_miserablemybooks/ :read lockmybooks/les_miserable :write lock
• Person B is trying to create a file mybooks/les_miserablemybooks/ :read lockmybooks/les_miserable :write lock
Write locks will serialize the actions
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.21
Each master operation acquires a set of locks before it runs: Example
• Person A is trying to create a file mybooks/les_miserable
• Person B is trying to delete a directory mybooks/
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
12
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.22
Each master operation acquires a set of locks before it runs: Example• Person A is trying to create a file
mybooks/les_miserablemybooks/ :read lockmybooks/les_miserable :write lock
• Person B is trying to delete a file mybooks/mybooks/ :write lock
Write locks will serialize the actions
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.23
Locks are used to prevent operations during snapshots
• /home/user is being snapshotted to /save/user • Read locks on /home and /save• Write lock on /home/user and /save/user
• To create file, /home/user/foo
• Read lock on /home and /home/user• Write lock on /home/user/foo
• The two operations will be serialized
• because they try to obtain /home/user• File creation does not require write lock on parent directory … there is no “directory”
• Read locks on /home and /home/user• Write lock on /home/user/foo
Q: How do we prevent creating a file /home/user/foowhile a directory /home/user/ is being snapshotted to /save/user/ ?
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
13
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.24
Part 2. Large scale data storage system
Distributed File SystemGoogle File System (GFS): Deletion of files and garbage
collection
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.25
Garbage collection in GFS
• After a file is deleted, GFS does not reclaim space immediately
• Done lazily during garbage collection at• File and chunk levels
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
14
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.26
Master logs a file’s deletion immediately
• File is renamed to a hidden name• Includes deletion timestamp
• Master scans the file system namespace• Delete if hidden file existed for more than 3 days
• When file removed from namespace• In memory metadata is also removed• Severs links to all its chunks!
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.27
Garbage collection:When Master scans its chunk namespace
• Identifies orphaned chunks• Not reachable from any file
• Erase metadata for these chunks
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
15
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.28
The role of heart-beats in garbage collection
• Chunk server reports subset of chunks it currently has
• Master replies with identity of chunks no longer present• Chunk server free to delete its replica of such chunks
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.29
Stale chunks and issues • If a chunk server fails • AND misses mutations to the chunk• The chunk replica becomes stale
• Working with a stale replica causes problems with: • Correctness• Consistency
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
16
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.30
Aiding the detection of stale chunks
• Master maintains a chunk version number for each chunk• Distinguish between stale and up-to-date chunks
• When master grants a new lease on chunk• Increase version number• Inform replicas• Record new version persistently
Occurs BEFORE any client can write tochunk
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.31
If a replica is unavailable its version number will not be advanced
• When a chunk server restarts, it reports to the Master with the following:• Set of Chunks• Corresponding version numbers
• Used to detect stale replicas
• Remove stale replicas in regular garbage collection
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
17
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.32
Additional safeguards against stale replicas
• Include chunk version number• When client requests chunk information
• Client/Chunk server verify version to make sure things are up-to-date
• During cloning operations• Clone the most up-to-date chunk
• Clients and chunk servers expected to verify versioning information
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.33
Part 2. Large scale data storage system
Distributed File SystemGoogle File System (GFS): Data Integrity
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
18
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.34
Data Integrity• Impractical to detect chunk corruptions across replicas• Not bytewise identical in any case!
• Detection of corruption should be self-contained
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.35
Data Integrity
• Break chunks into 64 KB data blocks
• Compute 32-bit checksum for block• Keep in chunk server memory• Store persistently, separate from the data
• Verify checksums of data blocks that overlap read range
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
19
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.36
Part 2. Large scale data storage system
Distributed File SystemGoogle File System (GFS): Inefficiencies
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.37
The master server is a single point of failure
• Master server restart takes several seconds
• Shadow servers exist• Can handle reads of files
• In place of the master • But not writes
• Requires a massive main memory
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
20
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.38
The system is optimized for large files
• But not for a very large number of very small files
• Primary operation on files• Long, sequential reads/writes• Large number of random overwrites will clog things up quite a bit
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.39
Consistency Issues: GFS expects clients to resolve inconsistencies
• File chunks may have gaps or duplicates of some records• The client has to be able to deal with this
• Imagine doing this for a scientific application• Portions of a massive array are corrupted
• Clients would have to detect this
• HDFS does NOT have this problem
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
21
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.40
Security model
• None• Operation is expected to be in a trusted environment
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.41
Part 2. Large scale data storage system
Distributed File SystemGoogle File System II: Colossus
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
22
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.42
Storage Software: Colossus (GFS2)
• Next-generation cluster-level file system
• Automatically sharded metadata layer• Distributed Masters (64MB block size à 1MB)• Data typically written using Reed-Solomon (1.5x) • Client-driven replication, encoding and replication • Metadata space has enabled availability
• Why Reed-Solomon?• Cost
• Especially with cross cluster replication
• More flexible cost vs. availability choices
• Google File System II: Dawn of the Multiplying Master Nodes, http://www.theregister.co.uk/2009/08/12/google_file_system_part_deux/?page=1
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.43
Reed-Solomon Codes• Block-based error correcting codes• Digital communication and storage
• Storage devices (including tape, CD, DVD, barcodes, etc)
• Wireless or mobile communications
• Satellite communications
• Digital TV
• High-speed modems
SOURCE: https://en.wikiversity.org/wiki/Reed–Solomon_codes_for_coders
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
23
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.44
What does the R-S code do?
• Takes a block of digital data
• Adds extra “redundant” bits
• If an error happens, the R-S decoder processes each block and recovers original data
Reed-Solomon Encoder
Reed-Solomon Decoder
Communication channel or storage
devices
Noise, Errors
Data source
Data Sink
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.45
A Quick Example of the R-S encoding
• 4+2 coding
• Original files are broken into 4 pieces
• 2 parity pieces are added
• First piece of data “ABCD”, second piece of data “EFGH”…
A B C DE F G HI J K LM N O P
Original Data
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
24
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.46
A Quick Example of the R-S encoding
• Applying coding matrix
A B C DE F G HI J K LM N O P
01 00 00 0000 01 00 0000 00 01 0000 00 00 011b 1c 12 141c 1b 14 12
A B C DE F G HI J K LM N O P51 52 53 4955 56 57 25
x =
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.47
A Quick Example of the R-S encoding
• Data loss• 2 of 6 rows are lost
A B C DE F G HI J K LM N O P
01 00 00 0000 01 00 0000 00 01 0000 00 00 011b 1c 12 141c 1b 14 12
A B C DE F G HI J K LM N O P51 52 53 4955 56 57 25
x =
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
25
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.48
A Quick Example of the R-S encoding
• Without 2 rows
A B C DE F G HI J K LM N O P
01 00 00 0000 01 00 001b 1c 12 141c 1b 14 12
A B C DE F G H51 52 53 4955 56 57 25
x =
A B C DE F G HI J K LM N O P
01 00 00 0000 01 00 0000 00 01 0000 00 00 011b 1c 12 141c 1b 14 12
A B C DE F G HI J K LM N O P51 52 53 4955 56 57 25
x =
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.49
A Quick Example of the R-S encoding
• Multiplying each side with the inverted matrix
A B C DE F G HI J K LM N O P
01 00 00 0000 01 00 001b 1c 12 141c 1b 14 12
A B C DE F G H51 52 53 4955 56 57 25
x
=
01 00 00 0000 01 00 008d f6 7b 01f6 8d 01 7b
x
01 00 00 0000 01 00 008d f6 7b 01f6 8d 01 7b
x
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
26
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.50
A Quick Example of the R-S encoding
• The Inverse Matrix and the Coding Matrix Cancel Out
A B C DE F G HI J K LM N O P
01 00 00 0000 01 00 001b 1c 12 141c 1b 14 12
A B C DE F G H51 52 53 4955 56 57 25
x
=
01 00 00 0000 01 00 008d f6 7b 01f6 8d 01 7b
x
01 00 00 0000 01 00 008d f6 7b 01f6 8d 01 7b
x
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.51
A Quick Example of the R-S encoding
• Reconstructing the Original Data
A B C DE F G HI J K LM N O P
A B C DE F G H51 52 53 4955 56 57 25
=01 00 00 0000 01 00 008d f6 7b 01f6 8d 01 7b
x
CS435 Introduction to Big DataFall 2019 Colorado State University
11/18/2019 Week 13-ASangmi Lee Pallickara
27
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.52
Properties of Reed-Solomon codes
• RS(n,k) with s-bit symbols• Encoder takes k data symbols (blocks) of s bits each • Adds parity symbols to make n symbol code word
• There are n-k parity symbols of s bits each
• A Reed-Solomon decoder can correct up to t symbols that contain errors in a code word, where 2t = n-k.• t= (n-k)/2 for n-k even• t = (n-k-1)/2 for n-k odd
data Parity
2tkn
11/18/2019 CS435 Introduction to Big Data – Fall 2019 W13.A.53
Example
• RS(255,223) with 8 bit symbols• Each code word contains 255 code word bytes• 223 bytes are data and 32 bytes are parity• n=255, k=223, s=8, 2t = 32, t=16
• The decoder can correct any 16 symbol errors in the code word• Errors in up to 16 bytes anywhere in the codeword can be automatically corrected.
data Parity
2tkn
top related