making mysql great for business intelligence

31
1 2010 Calpont Corporation – Confidential & Proprietary Making MySQL Great for Business Intelligence Robin Schumacher VP Products Calpont

Upload: calpont

Post on 26-Jan-2015

106 views

Category:

Technology


0 download

DESCRIPTION

This presentation describes how to make MySQL a great database for business intelligence, and presents a special focus on column databases and InfiniDB from Calpont

TRANSCRIPT

Page 1: Making MySQL Great For Business Intelligence

1

2010 Calpont Corporation – Confidential & Proprietary

Making MySQL Great for Business

Intelligence

Robin SchumacherVP Products

Calpont

Page 2: Making MySQL Great For Business Intelligence

2

2010 Calpont Corporation – Confidential & Proprietary

Agenda

• Quick overview of BI• Looking at the right technology foundation• General physical MySQL design decisions that

impact success• A look at row vs. column MySQL databases• Conclusions

Page 3: Making MySQL Great For Business Intelligence

3

2010 Calpont Corporation – Confidential & Proprietary

A Quick Overview of Business Intelligence

Page 4: Making MySQL Great For Business Intelligence

4

2010 Calpont Corporation – Confidential & Proprietary

What is Business Intelligence?

Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making.

BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive

analytics.

Page 5: Making MySQL Great For Business Intelligence

5

2010 Calpont Corporation – Confidential & Proprietary

Why Business Intelligence?

• All companies now recognize the need for BI• Information is a weapon that both large and small

companies use to better understand their customer, competitors, and marketplace

• Making poorly informed decisions can be disastrous

Page 6: Making MySQL Great For Business Intelligence

6

2010 Calpont Corporation – Confidential & Proprietary

Overview of Most BI Frameworks

OLTP

Files/XML

Log Files

Operational

Source Data

Stag

ing

or O

DS

ETL

Fina

l ET

L

Rep

ortin

g, B

I, N

otifi

catio

n La

yer Ad-Hoc

Dashboards

Reports

Notifications

Users

Staging

Area

Data

Warehouse

Warehouse

Archive

Purge/Archive

Data Warehouse and Metadata Management

Page 7: Making MySQL Great For Business Intelligence

7

2010 Calpont Corporation – Confidential & Proprietary

Simple Reporting Databases

OLTP Database Read Shard OneReporting Database

Application Servers

End Users

ETL

Data Archiving Link

Replication

Page 8: Making MySQL Great For Business Intelligence

8

2010 Calpont Corporation – Confidential & Proprietary

Building the Right Technical Foundation

Page 9: Making MySQL Great For Business Intelligence

9

2010 Calpont Corporation – Confidential & Proprietary

What is the Key Component for Success?

In other words, what you do with your MySQL Server – in terms of physical design, schema design, and

performance design – will be the biggest factor on whether a BI system hits the mark…

* Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.

*

Page 10: Making MySQL Great For Business Intelligence

10

2010 Calpont Corporation – Confidential & Proprietary

What Technology Decisions are Being Made?

* Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.

*

Page 11: Making MySQL Great For Business Intelligence

11

2010 Calpont Corporation – Confidential & Proprietary

What General MySQL Design Decisions Help Success?

Page 12: Making MySQL Great For Business Intelligence

12

2010 Calpont Corporation – Confidential & Proprietary

First – Get/Use a Modeling Tool

Page 13: Making MySQL Great For Business Intelligence

13

2010 Calpont Corporation – Confidential & Proprietary

Horizontal Partitioning Model

Page 14: Making MySQL Great For Business Intelligence

14

2010 Calpont Corporation – Confidential & Proprietary

Read Sharding / Horizontal Partitioning

Page 15: Making MySQL Great For Business Intelligence

15

2010 Calpont Corporation – Confidential & Proprietary

Vertical Partitioning Model

Page 16: Making MySQL Great For Business Intelligence

16

2010 Calpont Corporation – Confidential & Proprietary

General List of Top BI Design Decisions

• Storage Engine Selection

• Physical Table/Index Partitioning

• Indexing Creation and Placement

• Set proper amounts for memory caches, etc.

• Row vs. Column Engine / Database

Page 17: Making MySQL Great For Business Intelligence

17

2010 Calpont Corporation – Confidential & Proprietary

• No practical storage limits (1 tablespace=110TB)• Automatic storage management• ANSI-SQL support for all datatypes (including BLOB and XML)• Data/Index partitioning (range, hash, key, list, composite)• Built-in Replication• Main memory tables (for dimension tables)• Variety of indexes (b-tree, fulltext, clustered, hash, GIS)• Multiple-configurable data/index caches• Pre-loading of index data into index caches• Unique query cache (caches result set + query; not just data)• Parallel data load (5.1 and higher – multiple files)• Multi-insert DML• Data compression (depends on engine) • Read-only tables• Fast connection pooling• Cost-based optimizer • Wide platform support

Core BI Features for MySQL

Page 18: Making MySQL Great For Business Intelligence

18

2010 Calpont Corporation – Confidential & Proprietary

MyISAM

Archive

Memory

CSV

• High-speed query/insert engine• Non-transactional, table locking• Good for data marts, small

warehouses

• Compresses data by up to 80%• Fastest for data loads• Only allows inserts/selects• Good for seldom accessed data

• Main memory tables• Good for small dimension tables• B-tree and hash indexes

• Comma separated values• Allows both flat file access and

editing as well as SQL query/DML• Allows instantaneous data loads

Also:Merge for pre-5.1 partitioning

Storage Engines Internal to MySQL

Page 19: Making MySQL Great For Business Intelligence

2010 Calpont Corporation – Confidential & Proprietary

Partitioning and Performance (5.1+)mysql> CREATE TABLE part_tab

-> ( c1 int ,c2 varchar(30) ,c3 date )

-> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),

-> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,

-> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,

-> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,

-> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,

-> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),

-> PARTITION p11 VALUES LESS THAN MAXVALUE );

mysql> create table no_part_tab (c1 int,c2 varchar(30),c3 date);

*** Load 8 million rows of data into each table ***

mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';

+----------+

| count(*) |

+----------+

| 795181 |

+----------+

1 row in set (38.30 sec)

mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';

+----------+

| count(*) |

+----------+

| 795181 |

+----------+

1 row in set (3.88 sec)

90% Response Time Reduction

Page 20: Making MySQL Great For Business Intelligence

20

2010 Calpont Corporation – Confidential & Proprietary

Index Creation and Placement

• If query patterns are known and predictable, and data is relatively static, then indexing isn’t that difficult

• If the situation is a very ad-hoc environment, indexing becomes more difficult. Must analyze SQL traffic and index the best you can

• Over-indexing a table that is frequently loaded / refreshed / updated can severely impact load and DML performance. Test dropping and re-creating indexes vs. doing in-place loads and DML. Realize, though, any queries will be impacted from dropped indexes

• Index maintenance (rebuilds, etc.) can cause issues in MySQL (locking, etc.)

• Remember some storage engines don’t support normal indexes (Archive, CSV)

Page 21: Making MySQL Great For Business Intelligence

21

2010 Calpont Corporation – Confidential & Proprietary

Row vs. Column Engines / Databases

Page 22: Making MySQL Great For Business Intelligence

22

2010 Calpont Corporation – Confidential & Proprietary

Column vs. Row Orientation

A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…

Page 23: Making MySQL Great For Business Intelligence

23

2010 Calpont Corporation – Confidential & Proprietary

• Column databases only read the columns needed to satisfy a query vs. full rows

• If you are only selecting a subset of columns from a table and / or are using very wide tables, column DB’s are a great choice for BI

• Column databases (most of them…) remove the need for indexing because the column is the index

• Column databases automatically eliminate unnecessary I/O both logically and physically, so they do away with partitioning needs too as well as materialized views, etc.

• As a rule of thumb, column databases provide 5-10x (or more) the query performance of legacy RDBMS’s

Why a Column Database?

Page 24: Making MySQL Great For Business Intelligence

24

2010 Calpont Corporation – Confidential & Proprietary

Why a Column Database?

"If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but

analytic applications are typically looking at all rows and only a few columns. When you put that type of application on a column-

store DBMS, it outperforms anything that doesn't take a column-store approach."

- Donald Feinberg, Gartner Group

Page 25: Making MySQL Great For Business Intelligence

25

2010 Calpont Corporation – Confidential & Proprietary

• If you routinely have SELECT * queries or queries that request the majority of columns in a table

• If you constantly are doing lots of singleton inserts and deletes. As these are row-based operations they will normally run somewhat slower on a column DB than a row-oriented DB (more block touches are needed). Updates tend to run OK as they are a column operation

• If you want to do pure OLTP work. Some column DB’s are transactional (so data integrity is ensured), but they are not suited for straight OLTP work

• If you have a small database: such a DB eclipses the benefit column databases offer over row DB’s

Why Not a Column Database?

Page 26: Making MySQL Great For Business Intelligence

26

2010 Calpont Corporation – Confidential & Proprietary

What is Calpont’s InfiniDB?

InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive

parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one

box or using commodity machines in a scale out configuration.

Scale up Scale Out

Page 27: Making MySQL Great For Business Intelligence

27

2010 Calpont Corporation – Confidential & Proprietary

InfiniDB vs. a Leading Row RDBMS

2 TB’s of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache

Page 28: Making MySQL Great For Business Intelligence

28

2010 Calpont Corporation – Confidential & Proprietary

Percona’s Test of Column Databases

610 GB of raw data; 8 Core Machinehttp://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/

Page 29: Making MySQL Great For Business Intelligence

29

2010 Calpont Corporation – Confidential & Proprietary

Calpont Solutions

Calpont Analytic Database Server EditionsCalpont Analytic Database Solutions

InfiniDB Community Server

Column-OrientedMulti-threaded

Terabyte CapableSingle Server

InfiniDBEnterprise Server

Scale out /Parallel Processing Automatic

Failover

InfiniDBEnterprise Solution

Monitoring

24x7Support

Auto PatchManagement

Alerts & SNMPNotifications

Hot FixBuilds

ConsultativeHelp

Page 30: Making MySQL Great For Business Intelligence

30

2010 Calpont Corporation – Confidential & Proprietary

InfiniDB Community & Enterprise Server Comparison

Core Database Server Features InfiniDB

Community

InfiniDB

Enterprise

MySQL front end Yes Yes

Column-oriented Yes Yes

Logical data compression Yes Yes

High-Speed bulk loader w/ no blocking queries while loading Yes Yes

Crash-recovery Yes Yes

Transaction support (ACID compliant) Yes Yes

INSERT/UPDATE/DELETE (DML) support Yes Yes

Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Yes

No indexing necessary Yes Yes

Automatic vertical (column) and logical horizontal partitioning of data Yes Yes

MVCC support – snapshot read (readers don’t block writers) Yes Yes

Alter Table with online add column capability Yes Yes

High concurrency supported Yes Yes

Terabyte database capable Yes Yes

Multi-Node, MPP scale out capable w/ failover No Yes

Support Forums Only Formal Production

Support

Page 31: Making MySQL Great For Business Intelligence

31

2010 Calpont Corporation – Confidential & Proprietary

For More Information

• Download InfiniDB Community Edition• Download InfiniDB documentation• Read InfiniDB technical white papers• Read InfiniDB intro articles on MySQL dev zone• Visit InfiniDB online forums• Trial the InfiniDB Enterprise Edition: http://www.calpont.com

www.infinidb.orgwww.calpont.com