®
NoVA MySQL October Meetup
TokuDB® and Fractal Tree® Indexes
Tim CallaghanVP/Engineering, Tokutek
2012.10.23
1
Tuesday, October 23, 12
®
About me, :)
“Mark Callaghan’s lesser-known but nonetheless smart brother.”
[C. Monash, May 2010]
http://www.dbms2.com/2010/05/25/voltdb-finally-launches
2
Tuesday, October 23, 12
®
About me, seriously.
• Internal development (Financial Services, Oracle), 89-99• SaaS development (Hospitality Software, Oracle), 99-09• Field engineering (VoltDB), 09-11• VP Engineering (Tokutek), 11-now• I’ve always been most interested in databases but consider
myself an “IT Toolbox”• development, administration, management,
infrastructure, testing, benchmarking, product management, support, whatever
3
Tuesday, October 23, 12
®
What is TokuDB?
• MySQL storage engine• ACID/MVCC (like InnoDB)• MySQL 5.1/5.5/[5.6], MariaDB 5.2/5.5• Big data (> RAM) performance advantages• Uniquely provides schema agility and compression
for MySQL
4
Tuesday, October 23, 12
®
Fractal Tree Indexes
5
Tuesday, October 23, 12
®
B-trees
Internal Nodes - Path to data
Leaf Nodes - Actual Data
Pointers
Pivots
Tuesday, October 23, 12
®
B-tree example
22
10 99
2, 3, 4 22,25 99
* Pivot Rule is >=
Tuesday, October 23, 12
®
InnoDB and B-trees
– InnoDB used B-trees for the primary key index and secondary indexes
– The primary key index is sorted on the primary, the “value” in the index is the rest of the row (up to a certain amount of bytes)
– Secondary indexes are sorted on the key, the “value” in the index is the primary key value for the particular row
Tuesday, October 23, 12
®
Fractal Tree Index Cheat Sheet
similar to InnoDBstore data in leaf nodesuse PK for ordering
message buffer
message buffer
message buffer
All internal nodes have message buffers
9
different than InnoDBmessage buffer in all internal nodesmuch larger nodes (4MB vs. 16KB)
Tuesday, October 23, 12
®
TokuDB Features
PerformanceCompression
AgilityManagement
10
Tuesday, October 23, 12
®
TokuDB Performance
Please select two of the following:(1) Insertion Performance
(2) Query Performance
11
Tuesday, October 23, 12
®
Performance: Replication
• TokuDB reduces slave lag–Single-threaded performance– IO utilization (compressed reads and aggregated writes)
12
Tuesday, October 23, 12
®
Performance: Indexed Insertion
13
Tuesday, October 23, 12
®
Performance: Sysbench Benchmark
14
InnoDB&
TokuDB&
0"
50"
100"
150"
200"
250"
1" 2" 4" 8" 16" 32" 64" 128" 256" 512" 1024"
Tran
sac.on
s&Per&Secon
d&
Client&Connec.ons&
Sysbench&:&16&tables,&50&million&rows/table&
Tuesday, October 23, 12
®
Performance: Fractal Tree Indexes
• TokuDB uses Tokutek’s Fractal Tree Indexes– Internal nodes are similar to B-trees (keys and
pointers) but also contain message buffers- IO is amortized across many operations
– Large block size (4MB) enables high compression and excellent range query performance
– Basement nodes support point queries– Optimal IO utilization
- Reads = highly compressed data- Writes = aggregation of many operations plus
high compression- SSD friendly = far fewer and much larger writes
15
Tuesday, October 23, 12
®
Performance: Multiple clustered keys
• MySQL’s hidden join when using secondary keys• InnoDB– Row data is stored by primary key (clustered)– Secondary keys store primary key– Lookups by secondary key actually require a
“join” to retrieve the rest of the row• TokuDB allows secondary keys to be clustered– All columns of the table are immediately
available via secondary keys– Compression saves space, indexes are small
16
Tuesday, October 23, 12
®
Performance: Bulk loader
• Traditional MySQL loader is single threaded and loads data via MySQL.
• TokuDB loader creates Fractal Tree Indexes directly and uses all available cores
• Index creation uses same technology
17
Tuesday, October 23, 12
®
Performance: Data loader
– Load time for 5000 warehouse TPC-C data
18
Tuesday, October 23, 12
®
TokuDB Compression
Performant storage is expensive, IOs aren’t free, and replication multiplies
these costs.
19
Tuesday, October 23, 12
®
Compression
• TokuDB has a larger block size than InnoDB which often leads to higher compression ratios
• 5x to 25x can be achieved• TokuDB compression is always enabled– Definable at the table level– Choose quicklz (fast) or lzma (small)– Can be modified without downtime
• InnoDB requires a fixed on-disk block size which reduces the effectiveness of compression and causes split/recompress operations that affect performance
20
Tuesday, October 23, 12
®
Compression Performance
• iiBench performance benchmark
21
Tuesday, October 23, 12
®
Compression Disk Savings
• Log data disk space comparison
22
Tuesday, October 23, 12
®
TokuDB Agility
Maintenance windows are a privilege, not a right.
23
Tuesday, October 23, 12
®
Agility: The Solution?
24
Important Notice : Upcoming System Maintenance
Please be aware that our service will be unavailable on Saturday June 16 from 1:00am to 4:00am for maintenance.
We apologize for the inconvenience.
Tuesday, October 23, 12
®
Agility: Schema Flexibility
Add a column– alter table t1 add column c4 bigint;
25
InnoDB–Lock existing table–Create new table via select statement–Rebuild all indexes–Can take hours or days on large tables
TokuDB–Create addcolumn() message and return immediately–Over time, the column is physically added to the actual rows
* TokuDB supports adding, dropping, and expanding columns (varchar, char, varbinary, integer)
Tuesday, October 23, 12
®
Agility: Adding Indexes
InnoDB–Lock the table–Creates the index –Can take hours or days on large tables
26
Add an index– create index key_c4 on t1(c4);
TokuDB–Creates index in background–Parallelized loader–Index available to MySQL when finished–Accurate progress via “show processlist;”
Tuesday, October 23, 12
®
TokuDB Management
Your DBA* has better things to do.
*if you even have one
27
Tuesday, October 23, 12
®
Management: OLTP and OLAP
• “Hybrid” data implementations are the unfortunate compromise everyone’s been forced to use– 1 database for OLTP, another for OLAP
• Usually because performance degrades as the database gets large
• Running a single TokuDB database means more timely analytical information and less moving parts
28
Tuesday, October 23, 12
®
Management: Fragmentation
• Over time, B-trees with small block sizes suffer from fragmentation
• Best practices are dump/reload or “optimize table”• Requires maintenance windows during which table
is unavailable• Fractal Tree® Indexes do not fragment– 4MB blocks vs. 16KB blocks in InnoDB– Far less random IO for range queries
29
Tuesday, October 23, 12
®
Management: Checkpoints and recovery
• TokuDB performs a checkpoint every 60 seconds– Cost of checkpoint work is reduced– Also enables fast recovery time (less than 1 minute)– 60 second time period is user definable
• Frequent checkpointing means less work–Almost no impact on workload
• InnoDB’s fuzzy checkpointing can lead to long recovery times or periods of zero transactional throughput (forced flushing).
30
Tuesday, October 23, 12
®
Management: Less “Knobs”
• TokuDB has far fewer configuration options than other storage engines– TokuDB = 21, InnoDB = 59
• All variables have sensible defaults• We run our benchmarks with two or less overrides
31
Tuesday, October 23, 12
®
Management: Progress tracking
• “show processlist”• long running processes show accurate % complete–Data loader– Index creation
> show processlist;
+------+------+-----------+--------+---------+------+----------------------------------+-----------------------------------------+| Id | User | Host | db | Command | Time | State | Info |+------+------+-----------+--------+---------+------+----------------------------------+-----------------------------------------+| 1086 | root | localhost | sbtest | Query | 169 | Loading of data about 16.0% done | create clustering index k on sbtest1(k) | +------+------+-----------+--------+---------+------+----------------------------------+-----------------------------------------+2 rows in set (0.00 sec)
32
Tuesday, October 23, 12
®
Deployment and Other Details
• XA Support• 64-bit Linux only–Developed and tested on CentOS 5 and Ubuntu–Other Linux distributions in production
• Physical hardware or virtualized–Well suited for Amazon EC2/EBS
• Mac development via native binary• Windows development via Virtual Machines• MySQL 5.1, MySQL 5.5, MariaDB 5.2, MariaDB 5.5• Commercial usage free for data < 50GB• Also free for academic, research, evaluation
33
Tuesday, October 23, 12
®
TokuDB Use-Cases
34
Tuesday, October 23, 12
®
Use-Case : Profile Technology
• Profile Technology Ltd. creates social media applications (Facebook)
• Issues– InnoDB insert speed limited crawlers (6 months/crawl)–Crash recovery took service offline for over 30 hours
• TokuDB solution– 80x improvement on crawler’s insertion rate (graph)–Crash recovery in minutes–Compression greatly reduced storage requirements with
more/richer indexes
35
Tuesday, October 23, 12
®
Use-Case : Profile Technology
36
Tuesday, October 23, 12
®
Use-Case : Evidenzia
• Evidenzia tracks copyright infringements in peer-to-peer networks
• Issues– IO utilization near 100%–System maintenance windows for schema changes– Poor query performance
• TokuDB solution–Significant reduction in IO (graph)–Hot schema changes eliminated downtime–Clustered secondary indexes improved queries
37
Tuesday, October 23, 12
®
Use-Case : Evidenzia
38
IO UtilizationTokuDB installed at week 46
Tuesday, October 23, 12