the hive think tank: rocking the database world with rocksdb

36
MySQL + RocksDB Better Storage Efficiency Than InnoDB Feb 3, 2016 Siying Dong, Software Engineer Database Engineering Team @ Facebook

Upload: the-hive

Post on 08-Jan-2017

516 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: The Hive Think Tank: Rocking the Database World with RocksDB

MySQL + RocksDBBetter Storage Efficiency Than InnoDB

Feb 3, 2016Siying Dong, Software EngineerDatabase Engineering Team @ Facebook

Page 2: The Hive Think Tank: Rocking the Database World with RocksDB

Why Another MySQL Storage Engine ?

Page 3: The Hive Think Tank: Rocking the Database World with RocksDB

Facebook Website Architecture

MySQLDatabases

Web serversData center

Caches

Page 4: The Hive Think Tank: Rocking the Database World with RocksDB

Facebook Website Architecture

MySQLDatabases

Web serversData center

Caches

SSDs

Page 5: The Hive Think Tank: Rocking the Database World with RocksDB

Facebook Website Architecture

▪ What do the system metrics look like?MySQL

Databases

Web serversData center

Caches

SSDs

Page 6: The Hive Think Tank: Rocking the Database World with RocksDB

Measurements of MySQL Hosts’ Actual Resource Usages ▪ Read IOPS: < 5% ▪ Write IOPS: < 5%▪ Peak Write Bandwidth: < 15%▪ CPU: < 20%▪ Write Endurance: last more than 3 years.▪ Space is the bottleneck

Page 7: The Hive Think Tank: Rocking the Database World with RocksDB

RocksDB Storage Engine in MySQL(MyRocks)

▪ https://github.com/MySQLOnRocksDB/mysql-5.6

Page 8: The Hive Think Tank: Rocking the Database World with RocksDB

RocksDB vs. InnoDB

Page 9: The Hive Think Tank: Rocking the Database World with RocksDB

DB Size Comparison

InnoDB RocksDB00.20.40.60.8

11.2

DB Size (Relative)

Page 10: The Hive Think Tank: Rocking the Database World with RocksDB

Write Amplification Comparison

InnoDB RocksDB00.20.40.60.8

11.2 Bytes Written (Rel-

ative)

Page 11: The Hive Think Tank: Rocking the Database World with RocksDB

Why is RocksDB better?

Page 12: The Hive Think Tank: Rocking the Database World with RocksDB

Lower Space Amplification

RowRow

RowWasted

RowRow

RowWasted

Using 8KB space on storage

Waste

4KB 4KB

InnoDB RocksDBCompressed to 5KB

Uncompressed 16KB pageRowRow

RowWasted

Level 1

Level 2

Level 3

Level 4

Target 1GB

Target 10 GB

Target 100 GB

Target 1000 GB

stale

Level 0

stalestale

stale

stale

stale

Page 13: The Hive Think Tank: Rocking the Database World with RocksDB

Lower Write Amplification (Worst Case)

Row

Read

Modify

Write

Row

Row

Row

Row

Row

Row

Row

Row

Row

Write Amp = Page size / row size

InnoDB RocksDB

Level 1

Level 2

Level 3

Level 4

Target 1GB

Target 10 GB

Target 100 GB

Target 1000 GB

Level 0

Merge

Merge

Merge

Merge

flush

Write amp 1

Write Amp 10

Write Amp 10

Write Amp 10

Write Amp 10

Page 14: The Hive Think Tank: Rocking the Database World with RocksDB

How about query performance?

Page 15: The Hive Think Tank: Rocking the Database World with RocksDB

Queries Per Second (LinkBench)

Page 16: The Hive Think Tank: Rocking the Database World with RocksDB

Reads Per Query (LinkBench)

Page 17: The Hive Think Tank: Rocking the Database World with RocksDB

New RocksDB Features to Support MySQL

Page 18: The Hive Think Tank: Rocking the Database World with RocksDB

Transactions▪ Optimistic and Pessimistic▪ MyRocks Uses Pessimistic

Page 19: The Hive Think Tank: Rocking the Database World with RocksDB

“Single Delete”▪ Secondary index is insert/delete only, never upate▪ Updating a column in secondary key generates one delete + one insert

▪ We can drop “single delete” tombstone as soon as it meets a value

Page 20: The Hive Think Tank: Rocking the Database World with RocksDB

Delete Files in Range▪ Drop Table/Index Needs to claim space fast

Page 21: The Hive Think Tank: Rocking the Database World with RocksDB

Delete Files in Range▪ Drop Table/Index Needs to claim space fast

L0

L1

L2

L3

Table A Table B Table C

Page 22: The Hive Think Tank: Rocking the Database World with RocksDB

Delete Files in Range

L0

L1

L2

L3

Table A Table B Table C

▪ Drop Table/Index Needs to claim space fast

Siying Dong
Page 23: The Hive Think Tank: Rocking the Database World with RocksDB

Delete Files in Range

L0

L1

L2

L3

Table A Table B Table C

▪ Drop Table/Index Needs to claim space fast

Siying Dong
Page 24: The Hive Think Tank: Rocking the Database World with RocksDB

Delete Files in Range

L0

L1

L2

L3

Table A Table C

▪ Drop Table/Index Needs to claim space fast

Siying Dong
Page 25: The Hive Think Tank: Rocking the Database World with RocksDB

What we worked around?

Page 26: The Hive Think Tank: Rocking the Database World with RocksDB

Reverse Column order▪ RocksDB’s Prev() is much slower than Next()▪ Reverse column order to serve common query better

Page 27: The Hive Think Tank: Rocking the Database World with RocksDB

Optimizer Stats▪ Query plans need index statistics▪ MyRocks stores index statistics in data dictionary (similar to InnoDB)

▪ When creating data files (SST files), statistics are also added to the SST files, and data dictionary is also updated

Page 28: The Hive Think Tank: Rocking the Database World with RocksDB

Limitation?

Page 29: The Hive Think Tank: Rocking the Database World with RocksDB

MyRocks LimitationLimitation Plan to Address

Page 30: The Hive Think Tank: Rocking the Database World with RocksDB

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

Page 31: The Hive Think Tank: Rocking the Database World with RocksDB

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

No next key locking support

Page 32: The Hive Think Tank: Rocking the Database World with RocksDB

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

No next key locking supportOnly support replication using row-based binary logging. (durability with XA not supported)

RocksDB to support Two-Phase-Commit. MyRocks to use it to support replication using statement-based binary logging.

Page 33: The Hive Think Tank: Rocking the Database World with RocksDB

MyRocks LimitationLimitation Plan to AddressNot yet support Online DDL, Foreign Key, Spatial Index, and Fulltext Index

Support them

No next key locking supportOnly support replication using row-based binary logging.(durability with XA not supported)

RocksDB to support Two-Phase-Commit. MyRocks to use it to support replication using statement-based binary logging.

Either ORDER BY DESC or ASC is slower

Improve RocksDB Prev() performance, to narrow performance gap between ORDER BY DESC and ASC.

Page 34: The Hive Think Tank: Rocking the Database World with RocksDB

Conclusion

Page 35: The Hive Think Tank: Rocking the Database World with RocksDB

Conclusion▪ MyRocks vs. InnoDB

▪ Better Space and Write Amplification▪ Comparable performance

▪ Features in RocksDB to Support MyRocks:▪Transactions▪“Single Delete”▪Delete Files in Range

▪ MyRocks has some limitations and they are being addressed.

Page 36: The Hive Think Tank: Rocking the Database World with RocksDB

(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0