explain that explain

Explain that “Explain”

The Road to Understanding

“You Should Not Fight with the Database, the Database is your Friend”

put together by Fabrizio Parrella

PARTS ARE QUOTED OR COPIED OR REF FROM:http://devzone.zend.com/1436/the-zendcon-sessions-episode-17-sql-query-tuning-the-legend-of-drunken-query-master/http://www.slideshare.net/phpcodemonkey/mysql-explain-explained

http://devzone.zend.com/1436/the-zendcon-sessions-episode-17-sql-query-tuning-the-legend-of-drunken-query-master/

get to know your friend

➲ Recognize the strengths and also the weaknesses of your database

➲ No database is perfect -- deal with it, you're not perfect either

➲ Think of both big things and small thingsBIG: Architecture, surrounding servers, cachingSMALL: SQL coding, join rewrites, server config

becoming friends

➲ Understand storage engine abilities and weaknesses

➲ Understand how the query cache and importantbuffers works

➲ Understand optimizer's limitations

➲ Understand what should and should not be done at the application level

➲ If you understand the above, you'll start to see the database as a friend and not an enemy

the schema

➲ Basic foundation of performance

➲ Everything else depends on it

➲ Choose your data types wisely

➲ “Divide et Impera” the schema through partitioning

A divide and conquer (D&C) algorithm works by recursively break down a problem into two or more sub-problems of the same (or related) type, until there become simple enough to be solved directly. The solution to the sub-problems is then combined to give a solution to the original problem.

http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm

size does matter !!smaller, smaller, SMALLER

The more records you can fit into a single page ofmemory/disk, the faster your seeks and scans will be

➲ Do you really need that BIGINT?

➲ Use INT UNSIGNED for IPv4 addresses

➲ Use VARCHAR carefullyConverted to CHAR when used in a temporary table

➲ Use TEXT sparinglyConsider separate tables

➲ Use BLOBs very sparinglyUse the filesystem for what it was intended

real life example... handling IPv4 addresses

CREATE TABLE Sessions ( session_id INT UNSIGNED NOT NULL AUTO_INCREMENT, ip_address INT UNSIGNED NOT NULL, // Compare to CHAR(15)... session_data TEXT NOT NULL, PRIMARY KEY (session_id), INDEX (ip_address) ) ENGINE=InnoDB;

// Insert a new dummy record INSERT INTO Sessions VALUES (NULL, INET_ATON('192.168.0.2'), 'some session data');

SELECT session_id, ip_address as ip_raw, INET_NTOA(ip_address) as ip, session_data FROM Sessions WHERE ip_address BETWEEN INET_ATON('192.168.0.1') AND INET_ATON('192.168.0.255'); +------------+------------+-------------+-------------------+ | session_id | ip_raw | ip | session_data | +------------+------------+-------------+-------------------+ | 1 | 3232235522 | 192.168.0.2 | some session data | +------------+------------+-------------+-------------------+

SETs and ENUMs

➲ Often sign of poor schema design

➲ Changing the definition will most likely require a full rebuild of the table

➲ Search functions like FIND_IN_SET() are inefficient compared to index operation on a join

normalization, taking it too far

DateDate ?

http://thedailywtf.com/forums/thread/75982.aspx

vertical partitioning

➲ Never mix frequently and infrequentlyaccessed fields in a single table

➲ Splitting tables allows main records to consume the buffer pages without the extra data taking up space in memory

➲ Do you need FULLTEXT on your text columns (PRE 5.6.4)?

CREATE TABLE Users ( user_id INT NOT NULL AUTO_INCREMENT, email VARCHAR(80) NOT NULL, display_name VARCHAR(50) NOT NULL, password CHAR(41) NOT NULL, first_name VARCHAR(25) NOT NULL, last_name VARCHAR(25) NOT NULL, address VARCHAR(80) NOT NULL, city VARCHAR(30) NOT NULL, province CHAR(2) NOT NULL, postcode CHAR(7) NOT NULL, interests TEXT NULL, bio TEXT NULL, signature TEXT NULL, skills TEXT NULL, PRIMARY KEY (user_id), UNIQUE INDEX (email) ) ENGINE=InnoDB;

CREATE TABLE Users ( user_id INT NOT NULL AUTO_INCREMENT, email VARCHAR(80) NOT NULL, display_name VARCHAR(50) NOT NULL, password CHAR(41) NOT NULL, PRIMARY KEY (user_id), UNIQUE INDEX (email) ) ENGINE=InnoDB;

CREATE TABLE UserExtra ( user_id INT NOT NULL first_name VARCHAR(25) NOT NULL last_name VARCHAR(25) NOT NULL address VARCHAR(80) NOT NULL city VARCHAR(30) NOT NULL province CHAR(2) NOT NULL postcode CHAR(7) NOT NULL interests TEXT NULL bio TEXT NULL signature TEXT NULL skills TEXT NULL PRIMARY KEY (user_id) FULLTEXT KEY (interests, skills) ) ENGINE=MyISAM;

understand MySQL query cache

➲ You must understand your application's read/write patterns

➲ Internal query cache design is a compromise between CPU usage and read performance

➲ Stores the MYSQL_RESULT of a SELECT along with a hash of the SELECT SQL statement

➲ Any modification to any table involved in the SELECT invalidates the stored result

➲ Write applications to be aware of the query cacheUse SELECT SQL_NO_CACHE

coding like a master

➲ Be consistent (for crying out loud)

➲ Use ANSI SQL coding style (vs. Theta)

➲ Stop thinking in terms of iterators, for loops, while loops, etc

➲ Instead, think in terms of sets

➲ Break complex SQL statements (or business requests) into smaller, manageable chunks

Consistency, consistency, CONSISTENCY !!

➲ Tabs and Spacing

➲ Upper and Lower Case

➲ Keywords, function names

Nothing pisses offthe query master likeinconsistent SQL code!

SELECT a.first_name, a.last_name, COUNT(*) as num_rentals FROM actor a INNER JOIN film f ON a.actor_id = f.actor_id GROUP BY a.actor_id ORDER BY num_rentals DESC, a.last_name, a.first_name LIMIT 10;

vs.

select first_name, a.last_name, count(*) AS num_rentals FROM actor a join film on a.actor_id = film.actor_id group by a.actor_id order by num_rentals DESC, a.last_name, a.first_name LIMIT 10;

➲ Aliases

➲ Consider yourteammates

➲ Like your code, SQL is meant to be read, not written

guidelines

➲ Beware of join hints“force index” can get “out of date”

➲ Just because it can be done in a single SQL statement doesn't meat it should

➲ ALWAYS test and benchmark your solution

ANSI vs. THETA

SELECT a.first_name, a.last_name, COUNT(*) as num_rentals FROM actor a INNER JOIN film_actor fa ON a.actor_id = fa.actor_id INNER JOIN film f ON fa.film_id = f.film_id INNER JOIN inventory I ON f.film_id = i.film_id INNER JOIN rental r ON r.inventory_id = i.inventory_id GROUP BY a.actor_id ORDER BY num_rentals DESC, a.last_name, a.first_name LIMIT 10; SELECT

a.first_name, a.last_name, COUNT(*) as num_rentals FROM actor a, film f, film_actor fa, inventory i, rental r WHERE a.actor_id = fa.actor_id AND fa.film_id = f.film_id AND f.film_id = i.film_id AND r.inventory_id = i.inventory_id GROUP BY a.actor_id ORDER BY num_rentals DESC, a.last_name, a.first_name LIMIT 10;

ANSI STYLE

Explicitly declare JOIN conditionsusing the ON clause

THETA STYLE

Implicitly declare JOIN conditionsin the WHERE clause

why ANSI style kicks THETA style's A55

➲ MySQL THETA style only supports INNER and CROSS joinBut MySQL ANSI style supports INNER, CROSS, LEFT,

RIGHT, and NATURAL joinsMixing and matching both styles can lead to hard-to-

read SQL code

➲ It is extremely easy to miss a join condition with THETA styleEspecially when joining many tablesForgetting a Join will produce a cartesian product

(NOT GOOD !!!)

WITHOUT THE STRENGHT OF

EXPLAIN

YOU WILL GET LOST IN THE FIELDS

OF MISUNDERSTANDING

how to test our SQL

EXPLAIN the basics

➲ Provides the execution plan chosen by the MySQLoptimizer

➲ Simply prepend the word EXPLAIN in front of your SELECT statement

➲ Each row represent a set of information for each table used in the SELECT

EXPLAIN the columns

➲ select_type - type of “set” the data in this row contains (SIMPLE, DERIVATE, SUBQUERY, etc..)

➲ table - alias (or full table name if no alias) of the table or derived table from which the data in this set comes

➲ type - “access strategy” used to grab the data in this set (ALL, CONST, REF, etc...)

➲ possible_keys - keys available to optimizer for query➲ keys - keys chosen by the optimizer➲ key_len – number of bytes used from the keys➲ ref - shows the column used in join relations➲ rows - estimate of the number of rows in this set➲ Extra - information the optimizer chooses to give you

EXPLAIN the output EXPLAIN SELECT a.first_name, a.last_name, COUNT(*) as num_rentals FROM film f INNER JOIN film_category fc ON f.film_id = fc.film_id INNER JOIN category c ON fc.category_id = c.category_id WHERE f.title LIKE 'T%'\G *************************** 1. row *************************** select_type: SIMPLE table: c type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 16 Extra: *************************** 2. row *************************** select_type: SIMPLE table: fc type: ref possible_keys: PRIMARY, fk_film_category_category key: fk_film_category_category key_len: 1 ref: c.category_id rows: 1 Extra: using index *************************** 2. row *************************** select_type: SIMPLE table: f type: eq_ref possible_keys: PRIMARY, idx_title key: PRIMARY key_len: 2 ref: fc.film_id rows: 1 Extra: using where

estimate row count

available indexes and the chosen one

a covering indexwas used

EXPLAIN a real world example

CREATE TABLE àttendees` ( àttendee_id` int(11) NOT NULL, `lastname` varchar(50) NOT NULL, `conference_id` int(11) NOT NULL, `registration_status` tinyint(4) NOT NULL, PRIMARY KEY (àttendee_id`) ) ENGINE=InnoDB;

EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0

//Let's only show the important parts for now

*************************** 1. row *************************** table: attendees possible_keys: NULL key: NULL rows: 14052

CREATE TABLE `conferences` ( `conference_id` int(11) NOT NULL, `location_id` int(11) NOT NULL, `topic_id` int(11) NOT NULL, `date` date NOT NULL, PRIMARY KEY (`conference_id`) ) ENGINE=InnoDB;

➲ The three most important columns returned by EXPLAINpossible_keys

All possible indexes which MYSQL could have usedBased on a series of very quick lookups and

calculationskey: chosen keyrows: estimate of the scanned rows


➲ Interpreting the result:No suitable indexes for this query

MySQL has to do a full scan of the tableFull table scans are almost always the slowestFull table scans are usually an indication that an index is

needed







➲ MySQL has two indexes to choose from➲ “reg” is not “sufficently unique”

the spread of the values can also be a factor (e.g. when 99% of rows contain the same value)

➲ Index “uniqueness” is called cardinality➲ There is space for performance increase




*************************** 1. row *************************** table: attendees possible_keys: conf, reg key: conf rows: 331


ALTER TABLE attendees ADD INDEX conf (conference_id), ADD INDEX reg (registration_status);


➲ “reg_conf_index” is a much better choice

➲ Other keys are still available, just not as effective




*************************** 1. row *************************** table: attendees possible_keys: reg, conf, reg_conf_index key: reg_conf_index rows: 204


ALTER TABLE attendees ADD INDEX reg_conf_index (registration_status, conference_id);


➲ Seems like that also without the “reg” index everything isworking just as expected


EXPLAIN SELECT * FROM attendees WHERE registration_status = 2


*************************** 1. row *************************** table: attendees possible_keys: reg_conf_index key: reg_conf_index rows: 372


ALTER TABLE attendees DELETE INDEX reg, DELETE INDEX conf;


➲ Without the “conf” index we are at square one➲ The orders in which the fields are defined in a composite

index affects whether is available in a query➲ Potential workaround

SELECT * FROM attendees WHERE conference_id = 123 AND registration_id > 0;


EXPLAIN SELECT * FROM attendees WHERE conference_id = 123




ALTER TABLE attendees DELETE INDEX reg, DELETE INDEX conf;


➲ Great, MySQL it is using the index on “lastname”, which is good


EXPLAIN SELECT * FROM attendees WHERE lastname LIKE “parr%”


*************************** 1. row *************************** table: attendees possible_keys: lastname key: lastname rows: 234


ALTER TABLE attendees ADD INDEX lastname (lastname);


➲ MySQL doesn't even try to use an index !


EXPLAIN SELECT * FROM attendees WHERE lastname LIKE “%arr%”




ALTER TABLE attendees ADD INDEX lastname (lastname);

EXPLAIN a real world example (pre MySQL 5.1)

➲ MySQL doesn't use an index because of the OR➲ MySQL perform a full table scan➲ Workaround, use “UNION”➲ Workaround, add a composite INDEX

ALTER TABLE conferenceADD INDEX location_topic (location_id, topic_id);


EXPLAIN SELECT * FROM conferences WHERE location_id = 2 OR topic_id IN (4,6,1)


*************************** 1. row *************************** table: conferences possible_keys: location_id, topic_id key: NULL rows: 5043


ALTER TABLE conferences ADD INDEX location_id (location_id) ADD INDEX topic_id (topic_id);


➲ Looks like we need an index on “conference_id” on attendees

➲ How many total ROWS are estimate ?


EXPLAIN SELECT * FROM conferences c INNER JOIN attendees a USING (conference_id) WHERE c.location_id = 2 AND c.topic_id IN (4,6,1) AND a.registration_status > 1


*************************** 1. row *************************** table: c possible_keys: conference_topic key: conference_topic rows: 15 *************************** 1. row *************************** table: a possible_keys: NULL key: NULL rows: 14502


15 x 14502

EXPLAIN the “type”

EXPLAIN the type

➲ CONST: SELECT * FROM table WHERE field = “value”; The field needs to be indexed with a unique non-nullable key If non-unique or nullable the type will be “ref” It refers to when a table with a single row is referenced in the

SELECT Can be propagate across multiple joined columns:

EXPLAIN SELECT r.* FROM rental r INNER JOIN customer c ON r.customer_id = c.customer_id WHERE r.rental_id = 13\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: r type: const possible_keys: PRIMARY,idx_fk_customer_id key: PRIMARY key_len: 4 ref: const rows: 1 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: c type: const possible_keys: PRIMARY key: PRIMARY key_len: 2 ref: const /* Here is where the propagation occurs...*/ rows: 1 Extra: 2 rows in set (0.00 sec)

EXPLAIN the type

➲ RANGE: SELECT * FROM table WHERE field BETWEEN “value” AND “value”;

The field needs to be indexed It too many records are estimated, it won't be used

EPLAIN SELECT * FROM rental WHERE rental_date BETWEEN '2005-06-14' AND '2005-06-16'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: rental type: range possible_keys: rental_date key: rental_date key_len: 8 ref: NULL rows: 364 Extra: Using where 1 row in set (0.00 sec)

EXPLAIN the type

➲ ALL: SELECT * FROM table WHERE field BETWEEN “value” AND “far away from starting value”;

No WHERE condition (duh) No index on the field in the WHERE condition Poor selectivity on the indexed field Too many records meet the WHERE condition

SEEK: jumps into random places to fetch the data and repeat for each piece of data needed

SCAN: jump to the start and sequentially read the data For large amount of data, SCAN operations tends to be more efficient

than multiple SEEK operations Using SELECT * FROM

EPLAIN SELECT * FROM rental WHERE rental_date BETWEEN '2001-01-14' AND '2012-12-31'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: rental type: ALL possible_keys: rental_date /* large range force full scan */ key: NULL key_len: NULL ref: NULL rows: 16298 Extra: Using where 1 row in set (0.00 sec)

EXPLAIN the type

➲ INDEX_MERGE: SELECT * FROM table WHERE field = “value” AND field1 = “value”;

Introduced with the optimizer on MySQL 5.0 Allows the optimizer to use more than one index to satisfy a join condition Prior to MySQL 5.0, only one index In case of OR conditions, MySQL < 5.0 would use a full table scan

EXPLAIN SELECT * FROM rental WHERE rental_id IN (10,11,12) OR rental_date = '2006-02-01' \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: rental type: index_merge possible_keys: PRIMARY,rental_date key: rental_date,PRIMARY key_len: 8,4 ref: NULL rows: 4 Extra: Using sort_union(rental_date,PRIMARY); Using where 1 row in set (0.02 sec)

EXPLAIN the “Extra”

EXPLAIN the Extra➲ “Extra” shows additional operations invoked to get your result set

➲ Some common values are (more are discussed in the MySQL manual):

Using where Using temporary table Using filesort Using index

EXPLAIN SELECT * FROM rental WHERE rental_id IN (10,11,12) OR rental_date = '2006-02-01' \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: rental type: index_merge possible_keys: PRIMARY,rental_date key: rental_date,PRIMARY key_len: 8,4 ref: NULL rows: 4 Extra: Using sort_union(rental_date,PRIMARY); Using where 1 row in set (0.02 sec)

EXPLAIN the Extra

➲ Using filesort: AVOID➲ Avoid because

Doesn't Use Index Involves a full scan Uses a generic algorithm (one fits all) Uses filesystem (BAD !!) Gets slower with more data

➲ It's not all that bad Sometime unavoidable - ORDER BY RAND() Acceptable provided you get to your result as quickly as

possible, and keep it predictably small

EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 ORDER BY lastname *************************** 1. row *************************** table: attendees possible_keys: conference_id key: conference_id rows: 331 Extra: Using filesort

EXPLAIN the Extra

➲ Using index: GOOD➲ Celebrate because

MySQL got your results just by consulting the index MySQL didn't need to look at the table to get the results (open

table is expensive) Fastest way to get your data

➲ Particularly useful... When you are interested in a single data or id When you are interested in COUNT(), SUM(), AVG(), etc. of a field

EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id = 123 *************************** 1. row *************************** table: attendees possible_keys: conference_id key: conference_id rows: 331 Extra:

ALTER TABLE attendees ADD INDEX conf_age (conference_id, age); EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id = 123 *************************** 1. row *************************** table: attendees possible_keys: conference_id, conf_surname key: conf_surname rows: 331 Extra: Using index

Nothing is actually wrong with this query, it just could be quicker

Outside from caching, this is the fastest way to get your data

INDEXES... your schema's phone book

➲ Speed up SELECTs, but slow down modifications

➲ Make sure you have indexes on columns used in WHERE, ON, and GROUP BY clauses

➲ Always ensure that JOIN conditions are indexed AND have identical data types

➲ Good keys:Selectivity:

% of distinct values= distinct values / number rowsunique or primary always 1

Low selectivity:Maybe you can put it in a multi-column indexPrefix ? Suffix ? It depends on your application

indexed columns and functions don't mix

A full table scan is used because a function (LEFT) is operating on the lastname column.

Let's Fix this...

EXPLAIN SELECT * FROM attendees WHERE LEFT(lastname.2) = “Pa”

*************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 951 Extra: Using where

EXPLAIN SELECT * FROM attendees WHERE lastname LIKE “Pa%”

*************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: range possible_keys: idx_title key: idx_title key_len: 767 ref: NULL rows: 15 Extra: Using where

let's fix multiple issues with a SELECT query

First, we are operating on an index column (order_created) with a function – let's fix that:

SELECT * FROM orders WHERE TO_DAYS(CURRENT_DATE()) - TO_DAYS(order_created) <= 7;

Even if we removed the function in the WHERE expression, we still have a non-deterministic function in the statement which eliminates this query from being places in the query cache – let's fix that:

SELECT * FROM orders WHERE order_created >= CURRENT_DATE() - INTERVAL 7 DAYS;

We replaced the function with a constant, however we are specifying a SELECT * instead than the actual fields that we need.

What is there is a TEXT field in the table that we don't seen to see ? Having it included in the result means a larger result set which may not fit in the query cache and may force a disk-based temporary table – let's fix that:

SELECT * FROM orders WHERE order_created >= '2013-01-13' - INTERVAL 7 DAYS;

SELECT order_id, customer_id, order_total, date_created FROM orders WHERE order_created >= '2013-01-13' - INTERVAL 7 DAYS;

good indexes vs. bad indexes

Don't forget that MySQL string indexes allow only 1000 characters (333 using UTF-8).

Let's say you have 11,000,000 records in a table called “USERS” with the following fields:

➲ user, firstname, lastname, gender, email, age, country_idOur application perform searched on the following fields:➲ user➲ firstname, lastname, gender➲ emailIt is obvious to create indexes on user and email, especially if

they are unique, but what about the other fields?➲ “gender” can be M or F, selectivity is very low 2/11,000,000

= 0. Best would be to remove the index on gender if you have it

➲ “firstname”/”lastname” depend on the uniqueness of the values stores.SELECT DISTINCT to calculate the selectivity

if it is above 15% keep itbelow 15% you might want to create a composite INDEX

removing crappy or redundant indexes SELECT t.TABLE_SCHEMA AS `db`, t.TABLE_NAME AS `table`, s.INDEX_NAME AS `index name`, s.COLUMN_NAME AS `field name`, s.SEQ_IN_INDEX `seq in index`, s2.max_columns AS `# cols, s.CARDINALITY AS `card`, t.TABLE_ROWS AS `est rows`, ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `sel %` FROM INFORMATION_SCHEMA.STATISTICS s INNER JOIN INFORMATION_SCHEMA.TABLES t ON s.TABLE_SCHEMA = t.TABLE_SCHEMA AND s.TABLE_NAME = t.TABLE_NAME INNER JOIN ( SELECT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, MAX(SEQ_IN_INDEX) AS max_columns FROM INFORMATION_SCHEMA.STATISTICS WHERE TABLE_SCHEMA != 'mysql' GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME ) AS s2 ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA AND s.TABLE_NAME = s2.TABLE_NAME AND s.INDEX_NAME = s2.INDEX_NAME WHERE t.TABLE_SCHEMA != 'mysql' /* Filter out the mysql system DB */ AND t.TABLE_ROWS > 10 /* Only tables with some rows */ AND s.CARDINALITY IS NOT NULL /* Need at least one non-NULL value in the field */ AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 /* unique indexes are perfect anyway */ ORDER BY `sel %`, /* DESC for best non-unique indexes */ s.TABLE_SCHEMA, s.TABLE_NAME LIMIT 100

explain that explain

Technology