bigtable: a distributed storage system for structured...

Post on 05-Jan-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BIGTABLE: A DISTRIBUTED

STORAGE SYSTEM FOR

STRUCTURED DATA

Presenter: Qiping Wei1

Introduction

2

Bigtable is a distributed storage system for managing structured data.– designed to scale to petabytes of data

– many projects at Google store data in it• web indexing, Google earth and Google finance,...

• different data sizes: URL, web pages, satellite imagery,...

• different latency requirements: backend bulk processing to real time data serving

– Provide a flexible high performance solution

– Rely on Chubby lock service

– Use Google file system to store data items

3

Row key

timestamp

column key

Property: reads of short row ranges are efficient

4

Row key value

AA … Value

AZ … Value

BA … Value

BU … Value

CB … Value

CM … Value…

……

……

Row range

tablet

A row rangeUnit of distribution

g

Good locality: obtained by placing access-related data together.

Answer: select row keys in the way that keys can be grouped together.

Eg. two row keys: apple, banana

revised row keys: fruit.apple, fruit.banana

5

g

• get a lot of useful data by doing only one read request.

• Reduce the number of read requests.

• Experience high speed of read operations.

6

Architecture• Bigtable has three major components:

– One master server• Assigns tablets to tablet servers

• Detects addition and expiration of tablet servers

• Balances the load of tablet servers

• Collect garbage in GFS

• Handles schema changes

– Many tablet servers• Manage a set of tablets

• Handle read/write requests from clients

• Store data in GFS

– A library linked to every client• Communicate tablet servers directly to reads and writes

7

8

Tabletserver

Tabletserver

Tabletserver

One Master

Bigtable CientLibrary ……

read/write request

•Balance load•Handle schema changes•Collect garbage

• Assign tablets • Detect addition & expiration of tablet servers

Manage a set of tabletsHandle read/write requestStore data in GFS

Data items: either in log files or in SSTables.

GFS: provide data reliability

By having multiple replicas

Of data.

9

SSTablesLog files

GFS

memtable

Mem

immutable: not allowed to modify the data.

This feature has many benefits:

• simplify various parts of Bigtable

eg. cache maintaining is easy;concurrency control implementation is efficient

• split tablet quickly

• Restore data 10

Assume that the KV item exists, start searching from memtable, then SSTables from low level to high level until find it.

Here are the steps:

• Check in-memory index first

• Find the appropriate block

• Check Bloom Filter to see if the KV item is there.

• If yes, read the block from disk and get the value.

• If no, continue the above steps until find the block and get the value.

11

Benefits resulting from sorting the key:

• support range search

• Reduce index size

12

• these updates: updates committed to the commit log.

• How to do an update?

• based on write operation

• depends on the manner of searching key

from top to bottom

• chooses the latest key 13

• Why exist?

– Limited memtable size

– Immutable SSTables

– Multiple versions allowed

• Why exist in multiple SSTables?

– SSTables from different levels can have overlap ranges

14

• Minor compaction: converting the memtable to a new SSTable.

• Major compaction: – turning multiple SSTables to a new large SSTable.

– No deletion information or deleted data

15

GFS

Memtable

SSTable

SSTable…

Mem

SSTable

Major compaction

Minor compaction

Contributions from major compaction:

• bound the number of SSTables

• reclaim resources used by deleted data

• Remove overlapped ranges to support range search

16

GFS

Read OpMemtable

SSTable

SSTable…

Mem

SSTable

17

4. Major CMPTGFS

Delete Op

Memtable

SSTable

SSTable

Mem

SSTable

Commit log

3.Minor CMPT

1. write a special deletion record

2. Insert record

Conclusion• Bigtable is a distributed multi-dimensional sorted

map indexed by a row key, a column key and a timestamp.

• The sorted feature has many benefits:– support range search– reduce index size– support read/write operations– allow to manipulate row keys to get good locality

• Bigtable provides high data reliability through GFS.

• The immutability and compaction of SSTAblessimplify and improve the performance of Bigtable.

18

Reference

• F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber. Bigtable: A Distributed Storage System for Structured Data. OSDI, 2006

• Lecture: the Google Bigtable. https://www.slideshare.net/romain_jacotin/undestand-google-bigtable-is-as-easy-as-playing-lego-bricks-lecture-by-romain-jacotinhe. October,2014

19

top related