large-scale data and computation - fudan...
TRANSCRIPT
![Page 1: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/1.jpg)
Large-Scale Data and Computation
Yifu Huang
School of Computer Science, Fudan [email protected]
COMP620003 Advanced Computer Networks Report, 2013
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 1 / 27
![Page 2: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/2.jpg)
Outline
1 Motivation
2 GFS
3 MapReduce
4 Bigtable
5 Discussion
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 2 / 27
![Page 3: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/3.jpg)
Motivation
Motivation
Why we choose these papers?
Big dataGoogle
Why we need GFS, MapReduce, Bigtable?
A scalable distributed file system for large distributed data-intensiveapplicationsA programming model for processing and generating large datasets thatis amenable to a broad variety of real-world tasksA distributed storage system for managing structured data that isdesigned to scale to a very large size
What can we learn from these papers?
The design and implementation ideas behind GFS, MapReduce,BigtableGet ready to enjoy open source equivalents HDFS, YARN, HBase
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 3 / 27
![Page 4: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/4.jpg)
Motivation
Motivation
Why we choose these papers?
Big dataGoogle
Why we need GFS, MapReduce, Bigtable?
A scalable distributed file system for large distributed data-intensiveapplicationsA programming model for processing and generating large datasets thatis amenable to a broad variety of real-world tasksA distributed storage system for managing structured data that isdesigned to scale to a very large size
What can we learn from these papers?
The design and implementation ideas behind GFS, MapReduce,BigtableGet ready to enjoy open source equivalents HDFS, YARN, HBase
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 3 / 27
![Page 5: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/5.jpg)
Motivation
Motivation
Why we choose these papers?
Big dataGoogle
Why we need GFS, MapReduce, Bigtable?
A scalable distributed file system for large distributed data-intensiveapplicationsA programming model for processing and generating large datasets thatis amenable to a broad variety of real-world tasksA distributed storage system for managing structured data that isdesigned to scale to a very large size
What can we learn from these papers?
The design and implementation ideas behind GFS, MapReduce,BigtableGet ready to enjoy open source equivalents HDFS, YARN, HBase
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 3 / 27
![Page 6: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/6.jpg)
Motivation
Ecosystem
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 4 / 27
![Page 7: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/7.jpg)
GFS
The Google File System
The Google File System
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Google, Inc.
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 5 / 27
![Page 8: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/8.jpg)
GFS
Design Overview
The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27
![Page 9: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/9.jpg)
GFS
Design Overview
The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27
![Page 10: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/10.jpg)
GFS
Design Overview
The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27
![Page 11: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/11.jpg)
GFS
Design Overview
The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27
![Page 12: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/12.jpg)
GFS
Design Overview
The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27
![Page 13: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/13.jpg)
GFS
Design Overview
The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27
![Page 14: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/14.jpg)
GFS
Design Overview (cont.)
Architecture
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 7 / 27
![Page 15: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/15.jpg)
GFS
Design Overview (cont.)
Chunk Size: 64 MBMetadata
The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas
Chunk LocationsOperation LogConsistency Model
Guarantees by GFSImplications for Applications
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27
![Page 16: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/16.jpg)
GFS
Design Overview (cont.)
Chunk Size: 64 MBMetadata
The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas
Chunk LocationsOperation LogConsistency Model
Guarantees by GFSImplications for Applications
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27
![Page 17: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/17.jpg)
GFS
Design Overview (cont.)
Chunk Size: 64 MBMetadata
The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas
Chunk LocationsOperation LogConsistency Model
Guarantees by GFSImplications for Applications
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27
![Page 18: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/18.jpg)
GFS
Design Overview (cont.)
Chunk Size: 64 MBMetadata
The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas
Chunk LocationsOperation LogConsistency Model
Guarantees by GFSImplications for Applications
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27
![Page 19: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/19.jpg)
GFS
Design Overview (cont.)
Chunk Size: 64 MBMetadata
The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas
Chunk LocationsOperation LogConsistency Model
Guarantees by GFSImplications for Applications
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27
![Page 20: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/20.jpg)
GFS
System Interactions
Leases and Mutation Order
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 9 / 27
![Page 21: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/21.jpg)
GFS
System Interactions (cont.)
Data Flow
We decouple the flow of data from the flow of control to use thenetwork efficientlyOur goals are to fully utilize each machine’s network bandwidth, avoidnetwork bottlenecks and high-latency links, and minimize the latency topush through all the data
Atomic Record Appends
GFS provides an atomic append operation called record append
Snapshot
The snapshot operation makes a copy of a file or a directory treealmost instantaneously, while minimizing any interruptions of ongoingmutations
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 10 / 27
![Page 22: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/22.jpg)
GFS
System Interactions (cont.)
Data Flow
We decouple the flow of data from the flow of control to use thenetwork efficientlyOur goals are to fully utilize each machine’s network bandwidth, avoidnetwork bottlenecks and high-latency links, and minimize the latency topush through all the data
Atomic Record Appends
GFS provides an atomic append operation called record append
Snapshot
The snapshot operation makes a copy of a file or a directory treealmost instantaneously, while minimizing any interruptions of ongoingmutations
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 10 / 27
![Page 23: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/23.jpg)
GFS
System Interactions (cont.)
Data Flow
We decouple the flow of data from the flow of control to use thenetwork efficientlyOur goals are to fully utilize each machine’s network bandwidth, avoidnetwork bottlenecks and high-latency links, and minimize the latency topush through all the data
Atomic Record Appends
GFS provides an atomic append operation called record append
Snapshot
The snapshot operation makes a copy of a file or a directory treealmost instantaneously, while minimizing any interruptions of ongoingmutations
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 10 / 27
![Page 24: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/24.jpg)
GFS
Fault Tolerance And Diagnosis
High Availability
Fast RecoveryChunk ReplicationMaster Replication
Data IntegrityDiagnostic Tools
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 11 / 27
![Page 25: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/25.jpg)
GFS
Fault Tolerance And Diagnosis
High Availability
Fast RecoveryChunk ReplicationMaster Replication
Data IntegrityDiagnostic Tools
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 11 / 27
![Page 26: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/26.jpg)
GFS
Fault Tolerance And Diagnosis
High Availability
Fast RecoveryChunk ReplicationMaster Replication
Data IntegrityDiagnostic Tools
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 11 / 27
![Page 27: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/27.jpg)
MapReduce
MapReduce: Simplied Data Processing on Large Clusters
MapReduce: Simplied Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
Google, Inc.
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 12 / 27
![Page 28: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/28.jpg)
MapReduce
Model
Map
Takes an input pair and produces a set of intermediate key/value pairsThe MapReduce library groups together all intermediate valuesassociated with the same intermediate key and passes them to thereduce function
Reduce
Accepts an intermediate key and a set of values for that key andmerges these values together to form a possibly smaller set of values
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 13 / 27
![Page 29: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/29.jpg)
MapReduce
Model
Map
Takes an input pair and produces a set of intermediate key/value pairsThe MapReduce library groups together all intermediate valuesassociated with the same intermediate key and passes them to thereduce function
Reduce
Accepts an intermediate key and a set of values for that key andmerges these values together to form a possibly smaller set of values
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 13 / 27
![Page 30: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/30.jpg)
MapReduce
Implementation
User ProgramMasterWorker (Map/Reduce)Fault toleranceLocalityTask granularityBackup tools
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 14 / 27
![Page 31: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/31.jpg)
MapReduce
Performance
Grep
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 15 / 27
![Page 32: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/32.jpg)
MapReduce
Performance (cont.)
Sort
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 16 / 27
![Page 33: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/33.jpg)
Bigtable
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
Google, Inc.
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 17 / 27
![Page 34: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/34.jpg)
Bigtable
Model
Rows
AtomicTablets
Column Families
Family : qualifierAccess control
Timestamps
Decreasing orderGarbage collect
Cell
(row : string, column : string, time : int64) -> string
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27
![Page 35: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/35.jpg)
Bigtable
Model
Rows
AtomicTablets
Column Families
Family : qualifierAccess control
Timestamps
Decreasing orderGarbage collect
Cell
(row : string, column : string, time : int64) -> string
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27
![Page 36: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/36.jpg)
Bigtable
Model
Rows
AtomicTablets
Column Families
Family : qualifierAccess control
Timestamps
Decreasing orderGarbage collect
Cell
(row : string, column : string, time : int64) -> string
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27
![Page 37: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/37.jpg)
Bigtable
Model
Rows
AtomicTablets
Column Families
Family : qualifierAccess control
Timestamps
Decreasing orderGarbage collect
Cell
(row : string, column : string, time : int64) -> string
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27
![Page 38: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/38.jpg)
Bigtable
Model (cont.)
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 19 / 27
![Page 39: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/39.jpg)
Bigtable
Storage
Region–Store—memStore—StoreFileFile–BlocksLog–Write ahead log
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 20 / 27
![Page 40: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/40.jpg)
Bigtable
Implementation
Tablet Location
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 21 / 27
![Page 41: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/41.jpg)
Bigtable
Implementation (cont.)
Tablet Serving
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 22 / 27
![Page 42: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/42.jpg)
Bigtable
Evaluation
Benchmarks
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 23 / 27
![Page 43: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/43.jpg)
Bigtable
Evaluation (cont.)
Scaling
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 24 / 27
![Page 44: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/44.jpg)
Bigtable
Evaluation (cont.)
Applications
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 25 / 27
![Page 45: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/45.jpg)
Discussion
Discussion
Contributions of GFS, MapReduce, Bigtable
Supporting large-scale data processing workloads on commodityhardwareEasy to use, hides details of parallelization, fault tolerance, localityoptimization, and load balancingScale well both in terms of data size (from URLs to web pages tosatellite imagery) and latency requirements (from backend bulkprocessing to real-time data serving)
Drawbacks of GFS, MapReduce, Bigtable
Single master may be still a potential bottleneckDisk I/O, time consumingDo not support many database operations (multi-row transaction,secondary index . . . )
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 26 / 27
![Page 46: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/46.jpg)
Discussion
Discussion
Contributions of GFS, MapReduce, Bigtable
Supporting large-scale data processing workloads on commodityhardwareEasy to use, hides details of parallelization, fault tolerance, localityoptimization, and load balancingScale well both in terms of data size (from URLs to web pages tosatellite imagery) and latency requirements (from backend bulkprocessing to real-time data serving)
Drawbacks of GFS, MapReduce, Bigtable
Single master may be still a potential bottleneckDisk I/O, time consumingDo not support many database operations (multi-row transaction,secondary index . . . )
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 26 / 27
![Page 47: Large-Scale Data and Computation - Fudan Universityadmis.fudan.edu.cn/~yfhuang/files/LSDC_slide.pdfLarge-Scale Data and Computation YifuHuang School of Computer Science, Fudan University](https://reader035.vdocument.in/reader035/viewer/2022081406/5f0f4a117e708231d4436c4a/html5/thumbnails/47.jpg)
Appendix
References I
[1] The Google File System. SOSP. 2003.
[2] MapReduce: Simplied Data Processing on Large Clusters. OSDI.2004.
[3] Bigtable: A Distributed Storage System for Structured Data.OSDI. 2006.
[4] The Hadoop Distributed File System. MSST. 2010.
[5] Hadoop: the definitive guide. 2012.
[6] HBase: the definitive guide. 2011.
[7] Data-intensive text processing with MapReduce. 2010.
[8] The Chubby lock service for loosely-coupled distributed systems.OSDI. 2006.
Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 27 / 27