bigtable: a distributed storage system for structured data...

242
. . . Databases: Generalities . . . . . . . . . . . . . . . . . . . . . . . Bigtable . . . . . . . . . . . Other NoSQL Thoughts . . . Conclusion Bigtable: A Distributed Storage System for Structured Data Course: CS655 Rabiet Louis Colorado State University Thursday 17 October 2013 1 / 51

Upload: others

Post on 05-Jan-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Bigtable: A Distributed Storage System forStructured Data

Course: CS655

Rabiet Louis

Colorado State University

Thursday 17 October 2013

1 / 51

Page 2: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

1 Databases: GeneralitiesRelational databaseOther SQL database modelsWhy is this not enough ?

2 BigtableDescriptionGoogle FS: underlying FSChubby: Failure resiliencePaxosSome optimizations

3 Other NoSQLAn old problemExtensible Record StoresDocument storesKey-value storesRAM databases

4 Thoughts

5 Conclusion

2 / 51

Page 3: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Plan

1 Databases: GeneralitiesRelational databaseOther SQL database modelsWhy is this not enough ?

2 BigtableDescriptionGoogle FS: underlying FSChubby: Failure resiliencePaxosSome optimizations

3 Other NoSQLAn old problemExtensible Record StoresDocument storesKey-value storesRAM databases

4 Thoughts

5 Conclusion

3 / 51

Page 4: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 5: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 6: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 7: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 8: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)

Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 9: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.

Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 10: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 11: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula

∀students,Nationality = French && Position = GRA

4 / 51

Page 12: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Relational database

Reminders on relational database model

SQL family

First order predicate

Based on two values: true and false.

Use tuples and relation

Tuples: (Louis, CS Student, French) - (Louis, GRA)Relation = Set of attributes and sets of tuples.Attr. = (Id, Major, Nationality).

A query = a formula∀students,Nationality = French && Position = GRA

4 / 51

Page 13: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Other SQL database models

Hierarchical Databases

7 Tree structure.

3 Efficient implementations.

Object Databases

3 Close to programming langages.

7 Overkill if the data has simple relations.

5 / 51

Page 14: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Other SQL database models

Hierarchical Databases

7 Tree structure.

3 Efficient implementations.

Object Databases

3 Close to programming langages.

7 Overkill if the data has simple relations.

5 / 51

Page 15: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Other SQL database models

Hierarchical Databases

7 Tree structure.

3 Efficient implementations.

Object Databases

3 Close to programming langages.

7 Overkill if the data has simple relations.

5 / 51

Page 16: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Other SQL database models

Hierarchical Databases

7 Tree structure.

3 Efficient implementations.

Object Databases

3 Close to programming langages.

7 Overkill if the data has simple relations.

5 / 51

Page 17: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Other SQL database models

Hierarchical Databases

7 Tree structure.

3 Efficient implementations.

Object Databases

3 Close to programming langages.

7 Overkill if the data has simple relations.

5 / 51

Page 18: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Other SQL database models

Hierarchical Databases

7 Tree structure.

3 Efficient implementations.

Object Databases

3 Close to programming langages.

7 Overkill if the data has simple relations.

5 / 51

Page 19: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Why is this not enough ?

.How to express query on a text ?

You have to reconstruct some kind of relation more or lessmanually even if some solutions exists [JDG08].

.How do you do if a single server is not enough ?

Partioning → write expensivesmall scheme showing why it requires two-phase commit

Replications → write expensive

6 / 51

Page 20: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Why is this not enough ?

.How to express query on a text ?

You have to reconstruct some kind of relation more or lessmanually even if some solutions exists [JDG08].

.How do you do if a single server is not enough ?

Partioning → write expensivesmall scheme showing why it requires two-phase commit

Replications → write expensive

6 / 51

Page 21: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Why is this not enough ?

.How to express query on a text ?

You have to reconstruct some kind of relation more or lessmanually even if some solutions exists [JDG08].

.How do you do if a single server is not enough ?

Partioning → write expensivesmall scheme showing why it requires two-phase commit

Replications → write expensive

6 / 51

Page 22: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Why is this not enough ?

.How to express query on a text ?

You have to reconstruct some kind of relation more or lessmanually even if some solutions exists [JDG08].

.How do you do if a single server is not enough ?

Partioning → write expensive

small scheme showing why it requires two-phase commit

Replications → write expensive

6 / 51

Page 23: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Why is this not enough ?

.How to express query on a text ?

You have to reconstruct some kind of relation more or lessmanually even if some solutions exists [JDG08].

.How do you do if a single server is not enough ?

Partioning → write expensivesmall scheme showing why it requires two-phase commit

Replications → write expensive

6 / 51

Page 24: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Why is this not enough ?

.How to express query on a text ?

You have to reconstruct some kind of relation more or lessmanually even if some solutions exists [JDG08].

.How do you do if a single server is not enough ?

Partioning → write expensivesmall scheme showing why it requires two-phase commit

Replications → write expensive

6 / 51

Page 25: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Plan

1 Databases: GeneralitiesRelational databaseOther SQL database modelsWhy is this not enough ?

2 BigtableDescriptionGoogle FS: underlying FSChubby: Failure resiliencePaxosSome optimizations

3 Other NoSQLAn old problemExtensible Record StoresDocument storesKey-value storesRAM databases

4 Thoughts

5 Conclusion

7 / 51

Page 26: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 27: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 28: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 29: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 30: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 31: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 32: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 33: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

BigData

Usage of the architecture

URLs

Locations

Data Personalized: settings, search

Some ideas

Goal is to let users handle data storage structure

Locality is important

A data = an uninterpreted string

Goes nicely with Map Reduce.

8 / 51

Page 34: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Challenges

Unstructured Data.

Scale.

Continuous Update (crawling).

Read should be allowed at any time.

Very high rates of accesses → load-balancing.

Failure resilient (adding and removing servers at any time).

9 / 51

Page 35: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Challenges

Unstructured Data.

Scale.

Continuous Update (crawling).

Read should be allowed at any time.

Very high rates of accesses → load-balancing.

Failure resilient (adding and removing servers at any time).

9 / 51

Page 36: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Challenges

Unstructured Data.

Scale.

Continuous Update (crawling).

Read should be allowed at any time.

Very high rates of accesses → load-balancing.

Failure resilient (adding and removing servers at any time).

9 / 51

Page 37: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Challenges

Unstructured Data.

Scale.

Continuous Update (crawling).

Read should be allowed at any time.

Very high rates of accesses → load-balancing.

Failure resilient (adding and removing servers at any time).

9 / 51

Page 38: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Challenges

Unstructured Data.

Scale.

Continuous Update (crawling).

Read should be allowed at any time.

Very high rates of accesses → load-balancing.

Failure resilient (adding and removing servers at any time).

9 / 51

Page 39: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Challenges

Unstructured Data.

Scale.

Continuous Update (crawling).

Read should be allowed at any time.

Very high rates of accesses → load-balancing.

Failure resilient (adding and removing servers at any time).

9 / 51

Page 40: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Data Model

10 / 51

Page 41: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 42: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 43: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 44: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 45: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 46: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 47: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 48: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.

fresh enough (age limit).

11 / 51

Page 49: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Column-oriented database

Properties

Row writes are atomic.

Row ranges (tablet) are grouped together dynamically

Sorted using lexicographic order

close number → physically close.

Column family is other group.

Several versions: usage of timestamp.

last n versions.fresh enough (age limit).

11 / 51

Page 50: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 51: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 52: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 53: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 54: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 55: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 56: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 57: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 58: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 59: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Architecture

.

.

..

Chubby

.

Master

.

M1

.

M2

.

..

GFS Master.

..

Scheduler

. ..TS1

.

..

TS2

. Client

12 / 51

Page 60: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.3 Can help for serialization, for snapshot.7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.

13 / 51

Page 61: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.3 Can help for serialization, for snapshot.7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.

13 / 51

Page 62: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.3 Can help for serialization, for snapshot.7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.

13 / 51

Page 63: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.

3 Concurrency is easier: No synchronisation, No need to controlthe concurrent accesses.

3 Can help for serialization, for snapshot.7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.

13 / 51

Page 64: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.

3 Can help for serialization, for snapshot.7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.

13 / 51

Page 65: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.3 Can help for serialization, for snapshot.

7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.

13 / 51

Page 66: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.3 Can help for serialization, for snapshot.7 Increase the number of SSTables.

7 Persistence can be achieved by differential instead of using afull copy.

13 / 51

Page 67: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Internal Storage

..

Key

.

Value

.

Key

.

Value

.

Key

.

Value

.

Index

Use a key:value storage format (SStable: Sorted String Table).

SStable are immutables.

Every time a write is done: a new table is created.3 Concurrency is easier: No synchronisation, No need to control

the concurrent accesses.3 Can help for serialization, for snapshot.7 Increase the number of SSTables.7 Persistence can be achieved by differential instead of using a

full copy.13 / 51

Page 68: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Tablets Location

B-Tree+ to store tablet location.

14 / 51

Page 69: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Tablets Assignment and ServingNo duplication of tablets.

Master keeps track of assignements to reallocate tablets ifneeded.A TS acquires a lock in Chubby per tablet.Authorization is given by the writers list in Chubby.

15 / 51

Page 70: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Tablets Assignment and ServingNo duplication of tablets.Master keeps track of assignements to reallocate tablets ifneeded.

A TS acquires a lock in Chubby per tablet.Authorization is given by the writers list in Chubby.

15 / 51

Page 71: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Tablets Assignment and ServingNo duplication of tablets.Master keeps track of assignements to reallocate tablets ifneeded.A TS acquires a lock in Chubby per tablet.

Authorization is given by the writers list in Chubby.

15 / 51

Page 72: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Tablets Assignment and ServingNo duplication of tablets.Master keeps track of assignements to reallocate tablets ifneeded.A TS acquires a lock in Chubby per tablet.Authorization is given by the writers list in Chubby.

15 / 51

Page 73: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Description

Tablets Assignment and ServingNo duplication of tablets.Master keeps track of assignements to reallocate tablets ifneeded.A TS acquires a lock in Chubby per tablet.Authorization is given by the writers list in Chubby.

15 / 51

Page 74: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Google FS: underlying FS

Reminder on GFS

Moderate number of Huge Files

A lot of failures

Appending ̸= random write

Concurrent write must be supported

16 / 51

Page 75: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Google FS: underlying FS

Reminder on GFS

Moderate number of Huge Files

A lot of failures

Appending ̸= random write

Concurrent write must be supported

16 / 51

Page 76: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Google FS: underlying FS

Reminder on GFS

Moderate number of Huge Files

A lot of failures

Appending ̸= random write

Concurrent write must be supported

16 / 51

Page 77: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Google FS: underlying FS

Reminder on GFS

Moderate number of Huge Files

A lot of failures

Appending ̸= random write

Concurrent write must be supported

16 / 51

Page 78: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Google FS: underlying FS

Reminder on GFS

Moderate number of Huge Files

A lot of failures

Appending ̸= random write

Concurrent write must be supported

16 / 51

Page 79: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Chubby in BigTable

Used for:

Electing a unique master.

Discover tablets.

Relocate tablets if needed.

ACL

Column family lists.

17 / 51

Page 80: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Chubby in BigTable

Used for:

Electing a unique master.

Discover tablets.

Relocate tablets if needed.

ACL

Column family lists.

17 / 51

Page 81: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Chubby in BigTable

Used for:

Electing a unique master.

Discover tablets.

Relocate tablets if needed.

ACL

Column family lists.

17 / 51

Page 82: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Chubby in BigTable

Used for:

Electing a unique master.

Discover tablets.

Relocate tablets if needed.

ACL

Column family lists.

17 / 51

Page 83: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Chubby in BigTable

Used for:

Electing a unique master.

Discover tablets.

Relocate tablets if needed.

ACL

Column family lists.

17 / 51

Page 84: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Chubby in BigTable

Used for:

Electing a unique master.

Discover tablets.

Relocate tablets if needed.

ACL

Column family lists.

17 / 51

Page 85: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

5 replicas; one will be the Chubby master.

Give Locks to clients.

Based on the Paxos algorithm.

18 / 51

Page 86: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

5 replicas; one will be the Chubby master.

Give Locks to clients.

Based on the Paxos algorithm.

18 / 51

Page 87: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

5 replicas; one will be the Chubby master.

Give Locks to clients.

Based on the Paxos algorithm.

18 / 51

Page 88: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

5 replicas; one will be the Chubby master.

Give Locks to clients.

Based on the Paxos algorithm.

18 / 51

Page 89: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

5 replicas; one will be the Chubby master.

Give Locks to clients.

Based on the Paxos algorithm.

18 / 51

Page 90: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

SessionA client will need to maintain a session with the Chubby service.

KeepAlive packets are sent by the client to the master.RPC are blocked by the master until the lease is closed to expire.

7 slow failure detection by the client.

3 few redondant packets

7 why clients should fail ?

19 / 51

Page 91: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

SessionA client will need to maintain a session with the Chubby service.KeepAlive packets are sent by the client to the master.

RPC are blocked by the master until the lease is closed to expire.

7 slow failure detection by the client.

3 few redondant packets

7 why clients should fail ?

19 / 51

Page 92: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

SessionA client will need to maintain a session with the Chubby service.KeepAlive packets are sent by the client to the master.RPC are blocked by the master until the lease is closed to expire.

7 slow failure detection by the client.

3 few redondant packets

7 why clients should fail ?

19 / 51

Page 93: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

SessionA client will need to maintain a session with the Chubby service.KeepAlive packets are sent by the client to the master.RPC are blocked by the master until the lease is closed to expire.

7 slow failure detection by the client.

3 few redondant packets

7 why clients should fail ?

19 / 51

Page 94: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

SessionA client will need to maintain a session with the Chubby service.KeepAlive packets are sent by the client to the master.RPC are blocked by the master until the lease is closed to expire.

7 slow failure detection by the client.

3 few redondant packets

7 why clients should fail ?

19 / 51

Page 95: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

SessionA client will need to maintain a session with the Chubby service.KeepAlive packets are sent by the client to the master.RPC are blocked by the master until the lease is closed to expire.

7 slow failure detection by the client.

3 few redondant packets

7 why clients should fail ?

19 / 51

Page 96: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

FailoversIf a master fails a new master need to take over the system.

7 Complicated.

20 / 51

Page 97: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

FailoversIf a master fails a new master need to take over the system.

7 Complicated.

20 / 51

Page 98: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Chubby: Failure resilience

Some details

FailoversIf a master fails a new master need to take over the system.

7 Complicated.

20 / 51

Page 99: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Paxos algorithm

Originaly for parliament decree.

Properties wanted

At most one decree decided (safety)

At least one decree (liveness)

21 / 51

Page 100: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Paxos algorithm

Originaly for parliament decree.

Properties wanted

At most one decree decided (safety)

At least one decree (liveness)

21 / 51

Page 101: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Paxos algorithm

Originaly for parliament decree.

Properties wanted

At most one decree decided (safety)

At least one decree (liveness)

21 / 51

Page 102: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Paxos algorithm

Originaly for parliament decree.

Properties wanted

At most one decree decided (safety)

At least one decree (liveness)

21 / 51

Page 103: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties on ballots

Make liveness possible and ensure consistency (safety property)

Each ballot has a unique number.

Every quorum has at least one priest in common.

For every ballot B, if a priest in the quorum has voted in aearlier ballot then the decree in B is equal to the lastest ballotwhere the priest voted.

22 / 51

Page 104: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties on ballots

Make liveness possible and ensure consistency (safety property)

Each ballot has a unique number.

Every quorum has at least one priest in common.

For every ballot B, if a priest in the quorum has voted in aearlier ballot then the decree in B is equal to the lastest ballotwhere the priest voted.

22 / 51

Page 105: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties on ballots

Make liveness possible and ensure consistency (safety property)

Each ballot has a unique number.

Every quorum has at least one priest in common.

For every ballot B, if a priest in the quorum has voted in aearlier ballot then the decree in B is equal to the lastest ballotwhere the priest voted.

22 / 51

Page 106: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties on ballots

Make liveness possible and ensure consistency (safety property)

Each ballot has a unique number.

Every quorum has at least one priest in common.

For every ballot B, if a priest in the quorum has voted in aearlier ballot then the decree in B is equal to the lastest ballotwhere the priest voted.

22 / 51

Page 107: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 108: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.

It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 109: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 110: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)

Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 111: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 112: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 113: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Back to Paxos in Chubby

Consensus

Several coordinators will propose a value.It works because:

Each coordinator will generate an unique number with theirpropositions. (ex: s mod n ≡ id)Promise (ack from replicas that will ignore older coordinator)will include the latest value proposed.

Since we want to have a consensus on several values, Paxoswill be repeated: Need a catch-up mechanism for slowmachines.

23 / 51

Page 114: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 115: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption

by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 116: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption

by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 117: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 118: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become large

Snapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 119: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become large

Snapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 120: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can help

Still problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 121: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 122: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance

→ expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 123: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance

→ expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 124: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance

→ expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 125: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensive

master lease can help.But problem of unstability if network is unstable.

24 / 51

Page 126: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.

But problem of unstability if network is unstable.

24 / 51

Page 127: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 128: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Paxos

Properties of Paxos/Chubby

3 Handle disk corruption by being a non voting member for awhole cycle.

7 Logs can become largeSnapshots can helpStill problematic

7 Implementation difficult

7 Read = a Paxos instance → expensivemaster lease can help.But problem of unstability if network is unstable.

24 / 51

Page 129: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.

The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 130: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 131: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 132: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.

3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 133: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.

3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 134: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.

7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 135: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 136: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 137: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compaction or Garbage Collection

Concept

The Memtable (logs recent changes) will become big.The goal of compaction is to reduce the size of this table.

1 minor compaction: after reaching a threshold: freeze andcreate a new memtable. the old table is transform into aSSTable.

3 reduce memory usage.3 increase recovery speed.3 read and write can be done concurrently.7 number of SSTables is still increasing.

2 merging compaction: reduce the number of SSTables.

3 major compaction: merging compaction using all theSSTables.

25 / 51

Page 138: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Locality

Grouping column families into a locality group.

Each locality group will have an SSTable.

3 Will provide performance improvement if columns that are notaccessed together are on seperate SSTables.

7 Actually difficult to know how to do it dynamically.

26 / 51

Page 139: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Locality

Grouping column families into a locality group.Each locality group will have an SSTable.

3 Will provide performance improvement if columns that are notaccessed together are on seperate SSTables.

7 Actually difficult to know how to do it dynamically.

26 / 51

Page 140: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Locality

Grouping column families into a locality group.Each locality group will have an SSTable.

3 Will provide performance improvement if columns that are notaccessed together are on seperate SSTables.

7 Actually difficult to know how to do it dynamically.

26 / 51

Page 141: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Locality

Grouping column families into a locality group.Each locality group will have an SSTable.

3 Will provide performance improvement if columns that are notaccessed together are on seperate SSTables.

7 Actually difficult to know how to do it dynamically.

26 / 51

Page 142: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compression and caching

Clients can decide what compresion scheme to use forSSTables (portion of it, different per SSTable).

Typical compression:

First alogrithm will look for similarities over a large window.Second algorithm will look for common string in a smallwindow 16KB.

Tablet server will use cache to improve latency.

27 / 51

Page 143: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compression and caching

Clients can decide what compresion scheme to use forSSTables (portion of it, different per SSTable).Typical compression:

First alogrithm will look for similarities over a large window.Second algorithm will look for common string in a smallwindow 16KB.

Tablet server will use cache to improve latency.

27 / 51

Page 144: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compression and caching

Clients can decide what compresion scheme to use forSSTables (portion of it, different per SSTable).Typical compression:

First alogrithm will look for similarities over a large window.

Second algorithm will look for common string in a smallwindow 16KB.

Tablet server will use cache to improve latency.

27 / 51

Page 145: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compression and caching

Clients can decide what compresion scheme to use forSSTables (portion of it, different per SSTable).Typical compression:

First alogrithm will look for similarities over a large window.Second algorithm will look for common string in a smallwindow 16KB.

Tablet server will use cache to improve latency.

27 / 51

Page 146: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Compression and caching

Clients can decide what compresion scheme to use forSSTables (portion of it, different per SSTable).Typical compression:

First alogrithm will look for similarities over a large window.Second algorithm will look for common string in a smallwindow 16KB.

Tablet server will use cache to improve latency.

27 / 51

Page 147: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ? GFS is used for accessing amoderate number of files...

One huge log ? What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 148: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ?

GFS is used for accessing amoderate number of files...

One huge log ? What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 149: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ?

GFS is used for accessing amoderate number of files...

One huge log ? What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 150: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ? GFS is used for accessing amoderate number of files...

One huge log ?

What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 151: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ? GFS is used for accessing amoderate number of files...

One huge log ?

What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 152: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ? GFS is used for accessing amoderate number of files...

One huge log ? What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 153: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Logging

.How would you store the logs about the tablets ?

A seperate log for each tablet ? GFS is used for accessing amoderate number of files...

One huge log ? What about failure recovery time ?

Two logs are used (only one is active) and they are sortedusing table id, row name and sequence number

28 / 51

Page 154: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Bloom filters

ProblemRead requires acessing all SSTables inside a tablet.

Stressful for the disk.

A blooming filter can be used to help locate the data whenknowing the column and the row.

A blooming filter is an improved hashing method.

29 / 51

Page 155: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Bloom filters

ProblemRead requires acessing all SSTables inside a tablet.Stressful for the disk.

A blooming filter can be used to help locate the data whenknowing the column and the row.

A blooming filter is an improved hashing method.

29 / 51

Page 156: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Bloom filters

ProblemRead requires acessing all SSTables inside a tablet.Stressful for the disk.

A blooming filter can be used to help locate the data whenknowing the column and the row.

A blooming filter is an improved hashing method.

29 / 51

Page 157: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Bloom filters

ProblemRead requires acessing all SSTables inside a tablet.Stressful for the disk.

A blooming filter can be used to help locate the data whenknowing the column and the row.

A blooming filter is an improved hashing method.

29 / 51

Page 158: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Some optimizations

Bloom filters

30 / 51

Page 159: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Plan

1 Databases: GeneralitiesRelational databaseOther SQL database modelsWhy is this not enough ?

2 BigtableDescriptionGoogle FS: underlying FSChubby: Failure resiliencePaxosSome optimizations

3 Other NoSQLAn old problemExtensible Record StoresDocument storesKey-value storesRAM databases

4 Thoughts

5 Conclusion

31 / 51

Page 160: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore[MAG+97] and [Abi97]

Semistructured database is an old problem.

Data model: Object Exchange Model

32 / 51

Page 161: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore[MAG+97] and [Abi97]

Semistructured database is an old problem.Data model: Object Exchange Model

32 / 51

Page 162: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore[MAG+97] and [Abi97]

Semistructured database is an old problem.Data model: Object Exchange Model

32 / 51

Page 163: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore

Properties about Lore

Very general: graph with label.

(˜tree)

Hide irregularities in the structure when doing queries.

Pattern match possible.

Merging new data.

33 / 51

Page 164: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore

Properties about Lore

Very general: graph with label.

(˜tree)

Hide irregularities in the structure when doing queries.

Pattern match possible.

Merging new data.

33 / 51

Page 165: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore

Properties about Lore

Very general: graph with label. (˜tree)

Hide irregularities in the structure when doing queries.

Pattern match possible.

Merging new data.

33 / 51

Page 166: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore

Properties about Lore

Very general: graph with label. (˜tree)

Hide irregularities in the structure when doing queries.

Pattern match possible.

Merging new data.

33 / 51

Page 167: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Lore

Properties about Lore

Very general: graph with label. (˜tree)

Hide irregularities in the structure when doing queries.

Pattern match possible.

Merging new data.

33 / 51

Page 168: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 169: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement

At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 170: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement

At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 171: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement

At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 172: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement

At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 173: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement

At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 174: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 175: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 176: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

An old problem

Some Comparison

3 Query will be optimized. (˜compilation)

3 More general than Column-key model.

3 Join support.

3 Virtualization of data placement At the time of the paper

3 Dataguides: visualization of database.

3 Code length (60 000 lines of C++ vs 550 000 for MongoDB)

7 Performance Issue. (˜traversal of graphs)

34 / 51

Page 177: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 178: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 179: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 180: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 181: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 182: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 183: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Cassandra [LM10]

Very similar to Bigtable (column-oriented).

Some features

Dynamic partionning of data.

Consistent hashing for distributing the data.

Replication is done by data replication on N nodes.

Global Knowledge of the network (hashing)

Failure detection.

Efficient anti-entropy gossiping protocol

35 / 51

Page 184: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Consistency

Strong Consistency

#Writers + #Readers > NbReplication

Consistency level

Read and write can have differents level of consistency (1 node torespond, majority, all).

36 / 51

Page 185: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Consistency

Strong Consistency

#Writers + #Readers > NbReplication

Consistency level

Read and write can have differents level of consistency (1 node torespond, majority, all).

36 / 51

Page 186: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 187: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 188: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 189: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 190: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 191: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 192: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Extensible Record Stores

Some Comparison

3 P2P structure

3 Super Family.

3 Load balancing (move lightly loaded nodes in the “ring”)

3 Some locality knowledge in replication (rack aware, datacenteraware)

3 Consistent hashing (reduce cost if changed)

7 (Eventual) Consistency

37 / 51

Page 193: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Document stores

MongoDB (No precise article)

Scability by sharding

Document oriented

38 / 51

Page 194: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Document stores

MongoDB (No precise article)

Scability by sharding

Document oriented

38 / 51

Page 195: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 196: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 197: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 198: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 199: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 200: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 201: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Amazon Dynamo[DHJ+07]

Weak Consistency.

Consistent hashing.

Object versionning

Decentralized

Replication using quorum

Failure detection

Merkle tree for Eventual Consensus (Used in Cassandra)

39 / 51

Page 202: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Some Comparison

7 Caching ?

7 Snapshot ?

7 (original article) key/value schema will affect speed if valueare huge (to write into the data you need to read it).

40 / 51

Page 203: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Some Comparison

7 Caching ?

7 Snapshot ?

7 (original article) key/value schema will affect speed if valueare huge (to write into the data you need to read it).

40 / 51

Page 204: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Some Comparison

7 Caching ?

7 Snapshot ?

7 (original article) key/value schema will affect speed if valueare huge (to write into the data you need to read it).

40 / 51

Page 205: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Key-value stores

Some Comparison

7 Caching ?

7 Snapshot ?

7 (original article) key/value schema will affect speed if valueare huge (to write into the data you need to read it).

40 / 51

Page 206: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

RAM databases

VoltDB http://voltdb.com

Based on H-Store[SMA+07]

Idea is to use main memory as storage.

41 / 51

Page 207: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

RAM databases

VoltDB http://voltdb.com

Based on H-Store[SMA+07]

Idea is to use main memory as storage.

41 / 51

Page 208: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

RAM databases

7 Cost ?

7 Size ?

7 RAM is not persistent.

42 / 51

Page 209: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

RAM databases

7 Cost ?

7 Size ?

7 RAM is not persistent.

42 / 51

Page 210: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

RAM databases

7 Cost ?

7 Size ?

7 RAM is not persistent.

42 / 51

Page 211: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

RAM databases

7 Cost ?

7 Size ?

7 RAM is not persistent.

42 / 51

Page 212: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Plan

1 Databases: GeneralitiesRelational databaseOther SQL database modelsWhy is this not enough ?

2 BigtableDescriptionGoogle FS: underlying FSChubby: Failure resiliencePaxosSome optimizations

3 Other NoSQLAn old problemExtensible Record StoresDocument storesKey-value storesRAM databases

4 Thoughts

5 Conclusion

43 / 51

Page 213: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Mapreduce

Using HBase.

One map-reduce job per tablets.

44 / 51

Page 214: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Mapreduce

Using HBase.One map-reduce job per tablets.

44 / 51

Page 215: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Mapreduce

Using HBase.One map-reduce job per tablets.

44 / 51

Page 216: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables

→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 217: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables

→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 218: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables

→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 219: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.

Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 220: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 221: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 222: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?

45 / 51

Page 223: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

Compaction

Size-TieredCompaction is done if the number of sstables hits a threshold.

7 Need a lot of space to do the copy (and the size of sstablesincreases).

7 Columns of a row could be spread across the different sstables→ Irregular performance.

Leveled compaction

SSTable are smaller and grouped by levels.Inside a level, sstables will not overlap.

7 More I/O

How to choose from this two policies ?45 / 51

Page 224: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

SSTable assignment

The master is capable of maintining the list:Tl [TSid ] = all tablets handle by TSid .

Creations, Merges and Deletions are handle by the master.

Split are initiated by the tablet server but it notifies themaster.

46 / 51

Page 225: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

SSTable assignment

The master is capable of maintining the list:Tl [TSid ] = all tablets handle by TSid .

Creations, Merges and Deletions are handle by the master.

Split are initiated by the tablet server but it notifies themaster.

46 / 51

Page 226: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Answers

SSTable assignment

The master is capable of maintining the list:Tl [TSid ] = all tablets handle by TSid .

Creations, Merges and Deletions are handle by the master.

Split are initiated by the tablet server but it notifies themaster.

46 / 51

Page 227: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):

no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 228: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):

no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 229: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):

no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 230: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):

no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 231: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):no schema possible ? (relationnal data. use null value)

dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 232: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 233: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 234: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 235: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 236: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Limitations

7 No Join

7 ACID (Atomicity, Consistency, Isolation, Durability): eventualconsistency

Problem that arise with semi structured data (texts):no schema possible ? (relationnal data. use null value)dynamic schema

3 Nice for Map reduce.

Load balancing: an way of scaling SQL server.

7 Paxos

7 Why use a tree for locating tables ?

7 No real solution for the number of tablets.

47 / 51

Page 237: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Plan

1 Databases: GeneralitiesRelational databaseOther SQL database modelsWhy is this not enough ?

2 BigtableDescriptionGoogle FS: underlying FSChubby: Failure resiliencePaxosSome optimizations

3 Other NoSQLAn old problemExtensible Record StoresDocument storesKey-value storesRAM databases

4 Thoughts

5 Conclusion

48 / 51

Page 238: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Conclusion

NoSQL ̸= SQL.

Old problem but new data pattern.

49 / 51

Page 239: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Conclusion

NoSQL ̸= SQL.

Old problem but new data pattern.

49 / 51

Page 240: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Conclusion

NoSQL ̸= SQL.

Old problem but new data pattern.

49 / 51

Page 241: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

.

.

.

Databases: Generalities. . . . . . . ... . . .. . . .. . . . . .

Bigtable. . .. . ... .. .

Other NoSQL Thoughts. . .Conclusion

Conclusion

Thank You for your attention !

50 / 51

Page 242: Bigtable: A Distributed Storage System for Structured Data ...cs655/lectures/CS655-Louis_Rabiet_BigTable.pdf · Bigtable. . .. . ... .. . Other NoSQL Thoughts. . . Conclusion Bigtable:

Bibliography

Serge Abiteboul.

Querying semi-structured data.In Foto N. Afrati and Phokion G. Kolaitis, editors, Database Theory - ICDT ’97, 6th InternationalConference, Delphi, Greece, January 8-10, 1997, Proceedings, volume 1186 of Lecture Notes in ComputerScience, pages 1–18. Springer, 1997.

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex

Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels.Dynamo: amazon’s highly available key-value store.SIGOPS Oper. Syst. Rev., 41(6):205–220, October 2007.

Alpa Jain, AnHai Doan, and Luis Gravano.

Optimizing sql queries over text databases.In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE ’08, pages636–645, Washington, DC, USA, 2008. IEEE Computer Society.

Avinash Lakshman and Prashant Malik.

Cassandra: a decentralized structured storage system.SIGOPS Oper. Syst. Rev., 44(2):35–40, April 2010.

Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, and Jennifer Widom.

Lore: A database management system for semistructured data.SIGMOD Record, 26:54–66, 1997.

Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat

Helland.The end of an architectural era: (it’s time for a complete rewrite).In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pages 1150–1160.VLDB Endowment, 2007.