oscon2007: landscape of trx engines

48
Landscape of Open Source Transactional Storage Engines Peter Zaitsev Vadim Tkachenko http://MySQLPerformanceBlog.com

Upload: oleksiy-kovyrin

Post on 31-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 1/48

Landscape of Open SourceTransactionalStorage EnginesPeter Zaitsev

Vadim Tkachenko

http://MySQLPerformanceBlog.com

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 2/48

About us- Founders Percona Ltd

- M ySQL Perform ance Focused Consulting

-http://www.M ySQLPerform anceBlog.com - authors

- W orked for M ySQL AB for years

- Peter – lead of “High Perform ance Group”, Vadim his

right hand

- Long tim e M ySQL users for bunch of personally

involved projects

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 3/48

M ySQL pluginable architecture

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 4/48

M ySQL Transactional Engines- BDB - Legacy Storage Engine, rem oved in 5.1 not

tested

- InnoDB - “M ost popular” (The only com m only used)

storage engine by Innobase O y.

- SolidDB - Storage Engine from Solid Inform ation Technology

- PBXT - Storage Engine by SNAP Innovation (Paul McCullagh)

-Falcon - New Storage Engine by MySQL AB, Project lead byJim Starkey

- NDB - MySQL Cluster is a whole other beast and not covered

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 5/48

InnoDB- http://www.innodb.com /

- M ature Storage Engine, developm ent started by Heikki

Tuuri over 10 years ago.

- Heikki was looking for a ways to im prove traditional

databases perform ance

- Acquired by O racle in the end of 2005

- The only Transactional storage engine available in

M ySQL 5.0 official release

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 6/48

solidDB- http://www.solidtech.com /solidDBforM ySQL/

- OpenSourced in 2006

-Existing Storage Engine technology “integrated” withM ySQL

- Focused on reliability and M ultiprocessor Scalability

- Currently shipped as production ready.

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 7/48

Prim eBase XT (PBXT)- http://www.prim ebase.com /xt/

- W ritten m ainly by Paul McCullagh since 2005

-Not a port of existing storage engine to MySQL but new writeup

- Uses number of unusual design decisions

- Only 50% transactional

-Focused on efficient BLOB storage

- http://www.blobstreaming.org/

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 8/48

Falcon- http://dev.m ysql.com /doc/falcon/en/index.htm l

- Based on “Netfrastructure” engine by Jim Starkey

-Purchased by M ySQL AB in early 2006

- “Lightweight Design”

- Focused on Transactional needs of W eb Application,

efficient use of large am ount of m em ory

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 9/48

Design and Behavior 

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 10/48

InnoDB design- M VCC and very efficient row level locks

- Clustering by prim ary key, write to sam e pages

-non-com pressed secondary indexes w. transaction info

- Single tablespace or tablespace per table

- Pessim istic locking

-Instant Deadlock detection

- Fuzzy Checkpointing

- “DoubleW rite” for partial page write protection

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 11/48

InnoDB- DEADLOCK detection

- Session 1: BEGIN;Session 2: BEGIN;Session 1: UPDATE test SET name=‘random1-1’ WHERE id=1;

Session 2: UPDATE test SET name=‘random2-1’ WHERE id=2;Session 1: UPDATE test SET name=‘random1-2’ WHERE id=2;Session 2: UPDATE test SET name=‘random2-2’ WHERE id=1;

- InnoDB detect deadlock (Error 1213)Instantly insecond session

-

Pessim istic locking:- UPDATE the sam e row in two concurrent transaction –

second transaction waits on COM M IT/ROLLBACK infirst

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 12/48

InnoDB Strengths- Powerful M VCC

- Good perform ance on wide range of workloads

-Great Stability

- Great Data Protection

- Prim ary Key Clustering allows a lot of optim izations

-Transaction info in secondary indexes allow fast indexonly scans

- Adaptive Hash indexes and other advanced techniques

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 13/48

InnoDB W eaknesses- Slow Developm ent pace in recent years

- Still having scalability issues with m ultiple CPUs

-Unscalable Auto-Increm ent, Broken G roup Com m it takevery long to fix

- Large footprint, especially for secondary indexes

- It turns out not so large as we com pare

- Still m essy integration with M ySQL

- How do you see how m uch space is free in Innodb

tablespace ?

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 14/48

SolidDB Design- M VCC and Row level locking

- Clustering by Prim ary Key

-New data stored in new pages

- “Bonsai Tree” used for M ulti Versioning

- OPTIM ISTIC and PESSIM ISTIC locking specified on

table level

- Online Backup (Not usable for Slave creation)

- High Available sync replication prom ised soon.

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 15/48

solidDB - PESSIM ISTIC- DEADLO CK - DEADLO CK detected in first Session after

20 sec of waiting

- Tim eout based deadlocks

- UPDATE two rows – second session wait on first

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 16/48

solidDB - OPTIM ISTIC- DEADLOCK - DEADLOCK detected in second Session

im m ediately but with error 1205 – Lock wait tim eout exceeded

- UPDATE two concurrent rows:

-SESSION 1: BEGIN;SESSION 2: BEGIN;SESSION 1: UPDATE test SET nam e = ‘rnd’ W HERE id=2;SESSION 2: UPDATE test SET nam e = ‘rnd’ W HERE id=2;

- In Session 2 we got:

ERROR 1205 (HY000): Lock wait tim eout exceeded; tryrestarting transaction

- This is O K for OPTIM ISTIC engines, but m ay cause trouble inW eb applications.

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 17/48

S olidDB S trengths and Weakness- Lim ited production usage to really tell

- Out of storage engines reviewed m ost sim ilar in design

to Innodb

- Choice of Optim istic vs Pessim istic is nice for som e

applications

- No instant deadlock detection

-So far available as special download only (not even a

plugin)

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 18/48

PBXT Design- M VCC W ith row level locking

- “Per Database” Transactions

-No real durability yet, weak crash recovery

- OPTIM ISTIC locking

- W rite once, write sequentially to log

-Never update in place

- Data cache + Key cache

- Efficient BLOB Handling

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 19/48

PBXT- DEADLO CK detected in second session, 1213 error

- UPDATE two concurrent rows – optim istic,

second session:

ERROR 1020 (HY000): Record has changed since last

read in table 'test2'

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 20/48

PBXT Strengths and W eaknesses- Not yet com m only used in production (we tried but got

too m any bugs)

- Very good perform ance for som e workloads

- Efficient Storage, close to M yISAM

- Focused on BLOB efficient handling, extra features like

Blob Stream ing

-Still m ainly one m an project

- Large ToDo, a lot needs to be done, including Recovery

- Potentially large Purging overhead

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 21/48

Falcon Design- M VCC, row level locking (in practice, not in theory)

- PESSIM ISTIC locking

-Not clustered by prim ary key

- Row cache (cache only rows you need)

- “Optim al” index traversion

-“Data Com pression” - Nulls, Em pty Strings

- Always needs to read row data (because of index

structure)

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 22/48

Falcon- DEADLOCK:

In Session2:

ERROR 1020 (HY000): Record has changed since lastread in table 'test2'

- Ann Harrison tells Falcon checks cycles in lock graph

periodically rather than instantly on row lock wait

-UPDATE:Second session waits

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 23/48

Falcon Strengths in W eaknesses- Still Alpha with m any bugs – Early to judge

- Very active support from M ySQL AB

-Fast developm ent pace – bugs being fixed quickly, m ajorperform ance im provem ents during last 3 m onths

- Good integration with M ySQL, ie tables for perform ance

data

-No Prim ary key clustering or covering index support

- Different design decisions can com plicate m igration from

Innodb (though logical behavior becam e closer)

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 24/48

There are lies, big liesand there are

Benchmarks

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 25/48

Benchm arks – things to note- Benchm arks m ay not be relevant for perform ance of

your application

- Early versions we tried for Falcon, PBXT m ay change

their perform ance properties before production

- There is not too m uch experience out where tuning

Falcon, PBXT and Solid with M ySQL as they are barely

used in production

- W e did less benchm arks than wanted – spent a lot of

tim e fighting/reporting bugs and checking fixes

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 26/48

Benchm arks- Read-O nly on typical table for web-application

- DBT2 – TPC-C em ulation

-Dell DVD Store – em ulation of e-com m erce site

- Sysbench – O LTP transactions

- Sqlbench - sm all data set, single user, typical query

patterns

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 27/48

Box- Dell PowerEdge 2950

- CentOS release 4.5

-4 CPUm odel nam e : Intel(R) Xeon(R) CPU 5148 @2.33GHzstepping : 6cpu M Hz : 2327.529cache size : 4096 KB

- 16 GB of RAM

- RAID 10 (6 10K RPM 3.5” SAS hard drives)

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 28/48

M ySQL Versions- Yes, this m eans version affects perform ance not only

storage engine but we could not get all storage engine

working with sam e M ySQL version.

-InnoDB and PBXT5.1.19

- Falcon

6.0.1-alpha, bk tree from 10-Jul

- SolidDB

5.0.41-0073

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 29/48

Engines param eters- 12 GB of RAM for buffers

- InnoDB --innodb_buffer_pool_s ize=12G--innodb_flush_method=O_DIRECT

--innodb-log-file-s ize=100M- SolidDB --soliddb-cache-size=12G

- Falcon--falcon_min_record_memory=2G

--falcon_max_record_memory=4G--falcon_page_cache_size=8G

- PBXTpbxt_index_cache_size=8Gpbxt_record_cache_size=4G

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 30/48

DBT2 Configuration Details- DBT2

- http://osdldbt.sourceforge.net/

-10 Concurrent users (about 2 for each CPU core anddisk)

- “Zero Delay” to fully load M ySQL Server

- In 400W configuration reduced available m em ory to 4G

by locking 12G B of m em ory to have it IO bound.

- Buffer sizes were reduced to 2GB

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 31/48

DBT2 – 10 warehouses- 10 warehouses, 10

clients (datasize ~

700M )

-Result in New OrderTransaction Per M inute,

m ore is better

- PBXT crashed

- Old version of Falcon

had ~1100 NOTPM

- Great im provem ent !NOTPM0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

14000

15000

16000

17000

18000 17744

6097

8209

InnoDB

SolidDB

Falcon

PBXT

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 32/48

DBT2 – 400 warehouses- Data size ~ 29G B

- SolidDBcrashed after 336 m ins

- Did Not disable logs onSolidDB to have thingscom parable.

Time, min0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

63

40

136

Load time

InnoDB

PBXT

Falcom

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 33/48

DBT2, 400W , Data size

MB0

5000

10000

15000

20000

25000

30000

35000

40000

45000

38266

4219141770

30726

Size of loaded data

InnoDB

SolidDB

PBXT

Falcon

- Surprizingly large size

from PBXT

- SolidDB – tables were

loaded into M yISAM andthen converted to

SolidDB

- It was crashing

otherwise

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 34/48

DBT2, 400W , Results

NOTPM0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1105

495

178

InnoDB

SolidDB

Falcon

- PBXT crashed

- Result in New Order

Transaction Per M inute,

m ore is better

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 35/48

Dell DVD Store- Datasize

M edium 1 GB

2,000,000 Custom ers

100,000 Products- Falcon – crashed

- PBXT – a lot of errors

-

Result in New O rdersper m inute, m ore is

better

orders per minute0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

14000

15000

16000

17000

18000 17589

7594

InnoDB

SolidDB

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 36/48

sysbench- Older Falcon used in this test. New one crashes :(

- Couple of READ-ONLY queries against typical table for

W eb-applications – info of user account:CREATE TABLE IF NOT EXISTS sbtest (id int(10) unsigned NOT NULL auto_increment,name varchar(64) NOT NULL default '',email varchar(64) NOT NULL default '',password varchar(64) NOT NULL default '',dob date default NULL,address varchar(128) NOT NULL default '',

city varchar(64) NOT NULL default '',state_id tinyint(3) unsigned NOT NULL default '0',zip varchar(8) NOT NULL default '',country_id smallint(5) unsigned NOT NULL default '0',PRIMARY KEY (id),KEY `country_id` (country_id,state_id,city))

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 37/48

sysbench, read by prim ary key

1 4 16 64 128 256

0.00

5000.00

10000.00

15000.00

20000.00

25000.00

30000.00

35000.00

40000.00

45000.00

50000.00

55000.00

60000.00

65000.00

Innodb

Falcon

SolidDBPBXT

clients

   q   u   r    i   e   s

    /   s   e   c

•SELECT nameFROM sbtestWHERE id=?

•Innodb andSolid havesweat spotbeing

clustered byPK

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 38/48

sysbench, read by index

1 4 16 64 128 256

0.00

25.00

50.00

75.00

100.00

125.00

150.00

175.00

200.00

Innodb

Falcon

SolidDB

PBXT

clients

   q   u   r    i   e   s

    /   s   e   c

●SELECT nameFROM sbtestWHERE

country_id=?●PBXT Excels

●Falcon comesnext

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 39/48

sysbench, read by covered index

1 4 16 64 128 256

0.00

25.00

50.00

75.00

100.00

125.00

150.00

175.00

200.00

225.00

250.00

Innodb

Falcon

SolidDB

PBXT

clients

   q   u   r    i   e   s

    /   s   e   c

●SELECTstate_idFROM sbtest

WHEREcountry_id=?

●PBXT stillbest

●Falcon can'tuse coveredindex

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 40/48

sysbench, read by index, LIM IT 20

1 4 16 64 128 256

0.005000.00

10000.00

15000.00

20000.00

25000.00

30000.00

35000.00

40000.00

45000.00

50000.00

Innodb

Falcon

SolidDB

PBXT

clients

   q   u   r    i   e   s

    /   s   e   c

●SELECT nameFROM sbtestWHERE

country_id=?LIMIT 20

●Falcon Doesnot optimize

Limit●InnodbScalespoorly

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 41/48

Sysbench OLTP- Datasize

100,000,000 rows

~25GB

-Uniform distribution

- I/O-bound load

- read / write transactions

-Reduced available m em ory by locking 12GB our of16GB

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 42/48

Sysbench OLTP, tim e to load data- Using m ulti-value

INSERTs rather than

LOAD DATA INFILE

-Solid and Falcon areeven slower than Innodb

which is known to be

slow com pared to

M yISAM for data load.

sec0

250

500

750

1000

1250

1500

1750

2000

2250

2500

2750

3000

3250

3500

1930

3364

1237

2880

InnoDB

SolidDB

PBXT

Falcon

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 43/48

Sysbench OLTP, Datasize

InnoDB SolidDB PBXT Falcon0

2.5

5

7.5

10

12.5

15

17.5

20

22.5

25

27.5

22.51

26.44

23.03

8.719.6

14.8

23

8.71

Datasize, varchar vs char 

char, GB

varchar, GB

- Com parison of storages

of char and varchar

colum ns in the table

-Falcon uses dynam iclength rows anyway

- PBXT surprisingly has

sam e huge size in both

cases

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 44/48

Sysbench OLTP, results

1 4 640

5

10

15

20

25

30

35

40

45

50

12.77

30.14

46.24

10.62

22.33

26.11

3.87

10.3

19.06

4.86 5.8 5.71

I/O bound

InnoDB

SolidDB

PBXT

Falcon

clients

   t  r  a  n  s  a  c   t   i  o  n  s

   /  s  e  c

- M em ory lim ited to 4GB,

2GB for buffers

- Innodb and SolidDB have

benefit due to clusteringby prim ary key

- All but Falcon scale well

for IO bound workload

with this am ount of harddrives.

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 45/48

Selected sqlbench results- single operation repeated N times, total time in secs. less

is better

- Operation | 1| 2| 3||innodb_|pbxt_fa|soliddb|

alter_table_add (100) | 8.00| 3.00| 32.00|

count (100) | 12.00| 8.00| 28.00|count_distinct (1000) | 6.00| 8.00| 74.00|count_distinct_2 (1000) | 11.00| 11.00| 16.00|count_group_on_key_parts (1000) | 7.00| 10.00| 83.00|count_on_key (50100) | 70.00| 94.00| 210.00|delete_all_many_keys (1) | 17.00| 2.00| 28.00|insert (350768) | 6.00| 5.00| 21.00|outer_join (10) | 14.00| 7.00| 61.00|

select_key2_return_prim (200000) | 30.00| 29.00| 25.00|select_many_fields (2000) | 8.00| 6.00| 5.00|update_big (10) | 18.00| 56.00| 727.00|update_of_key_big (501) | 19.00| 6.00| 165.00|update_of_primary_key_many_keys (256| 44.00| 17.00| 55.00|update_with_key_prefix (100000) | 19.00| 8.00| 10.00|

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 46/48

Conclusion- All reviewed storage engines but InnoDB are currently

too unstable for production use. SolidDB com es closest.

- InnoDB is still winner in m ajority of tests

- Falcon has serve issues with LIM IT optim ization and IO

bound scalability

- PBXT and Falcon win in certain tests

-SolidDB is currently an outsider in term s of Perform ance

- Need to revisit when production versions of all storage

engines are ready.

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 47/48

The End- Thanks for com ing !

- Slides will be published at

http://www.m ysqlperform anceblog.com /

- Feel free to approach us with your question

- M ySQL Perform ance Optim ization Consulting Available

- http://www.m ysqlperform anceblog.com /m ysql-consulting/

8/14/2019 OSCON2007: Landscape of trx engines

http://slidepdf.com/reader/full/oscon2007-landscape-of-trx-engines 48/48

Sysbench OLTP, results, char- Datasize com parable

with m em ory size

1 4 640

2.5

5

7.5

10

12.5

15

17.5

20

22.5

25

27.5

30

32.5

35

37.5

18.75

36.71

29.36

13.81

25.11

34.77

8.87

17.51

29.1

15.15

20.4

17.27

CPU bound

InnoDB

SolidDB

PBXT

Falcon

   t  r  a  n  s  a  c   t   i  o  n  s

   /  s  e  c