indexing
TRANSCRIPT
INDEXING
Davood Pour Yousefian Barfeh
INDEXINGINDEXING What is index?What is index?
Why is it needed?Why is it needed?
When should it be used?When should it be used?
Types of indexesTypes of indexes
What is index?What is index?
a data structurea data structure
a way of sorting a way of sorting
holds the field value, and pointer to the holds the field value, and pointer to the record it relates to record it relates to
Why is Index needed?Why is Index needed?(((((Advantage)))))(((((Advantage)))))
speed up speed up retrievalretrieval of data of data
without index: Linear Search without index: Linear Search N= number of records
- key (unique value) – N/2- key (unique value) – N/2
- non-key – N- non-key – N
using index: Binary Searchusing index: Binary Search
loglog22NN
IndexingIndexing(((((Disadvantage)))))(((((Disadvantage)))))
Additional space on the diskAdditional space on the disk
Slow downSlow down
Field name Field name Data type Data type Size on disk Size on diskid (Primary key)id (Primary key) Unsigned INT Unsigned INT 4 bytes 4 bytesfirstName firstName Char(50) Char(50) 50 bytes 50 byteslastName lastName Char(50) Char(50) 50 bytes 50 bytesemailAddress emailAddress Char(100) Char(100) 100 bytes 100 bytes
*char was used in place of varchar to allow for an accurate size on disk value *database contains five million rows, and is unindexed r = 5,000,000 records & record length R = 204 bytes & block size B = 1,024 bytes
bfr = (B/R) = 1024/204 = 5 records per disk block total number of blocks required N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks
linear search for a key field: N / 2 = 500,000 blocks -- can be log2N = 19.93 20 blocks
Linear search for a non-key field: N = 1,000,000 blocks
Ex. Without IndexingEx. Without Indexing
Field name Field name Data type Data type Size on disk Size on diskfirstName firstName Char(50) Char(50) 50 bytes 50 bytes(record pointer) (record pointer) Special Special 4 bytes 4 bytes
*Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table
r r = 5,000,000 records & index record length R = 54 bytes & block size B = 1,024 bytes
bfr = (B/R) = 1024 / 54 = 18 records per disk block
The total number of blocks required to hold the index is: N = (r/bfr) = 5000000 / 18 → 277,778 blocks
Binary Search:Binary Search:
loglog22N =N = loglog
22277,778 = 18.08 → 19 blocks 277,778 = 18.08 → 19 blocks
Ex.Using IndexingEx.Using Indexing
When When shouldshould indexing be used? indexing be used? cancan
General Rule: Anything that limits the number of results you are trying to find.
speed up finding data
cardinality
table that references other table
When should indexing be used?When should indexing be used?
speed up finding data but slow down but slow down inserting inserting , , deleting deleting or or updatingupdating datadata
- not only table must be updated but - not only table must be updated but the index as well the index as well
bankbank account numberaccount number is better than one onis better than one on balancebalance
Cardinality: The number of distinct values for a column
Binary SearchBinary Search
Linear SearchLinear Search
When should indexing be used?When should indexing be used?
When should indexing be used?When should indexing be used?
Cardinality
Ex. good Selectivity: A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000 / 100’000 = 88%
Ex. bad Selectivity: A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2%
Number of records in each group= 100’000 / 200 = 5’000 full table scan is more efficient as using such an index where much more I/O is
needed to scan repeatedly the index and the table
Index SelectivityIndex Selectivity= = Number of distinct valuesNumber of distinct values Number of recordsNumber of records
When should indexing be used?When should indexing be used?table that references other table - join
Ex.Ex.CREATE TABLE newsitem ( newsid INT PRIMARY KEY, newstitle VARCHAR(255), newscontent TEXT, authorid INT, newsdate TIMESTAMP);
CREATE TABLE authors ( authorid INT PRIMARY KEY, username VARCHAR(255), firstname VARCHAR(255), lastname VARCHAR(255));
SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid;
CREATE INDEX newsitem_authorid ON newsitem(authorid);
General Rule: Any fields involved in a table join must be indexed
When should indexing be used?When should indexing be used?
CREATE TABLE newsitem ( newsid INT PRIMARY KEY, newstitle VARCHAR(255), newscontent TEXT, authorid INT, newsdate TIMESTAMP);
CREATE TABLE newsitem_categories ( newsid INT, categoryid INT);
CREATE TABLE categories ( categoryid INT PRIMARY KEY, categoryname VARCHAR(255));
SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid;
These fields must be indexed:newsitem newsidnewsitem_categories newsidnewsitem_categories categoryidcategories categoryid
CREATE INDEX newscat_news ON newsitem_categories(newsid);
CREATE INDEX newscat_cats ON newsitem_categories(categoryid);
Ex.
Combination on IndexingCombination on Indexing
CREATE INDEX newscat_news ON newsitem_categories(newsid);CREATE INDEX newscat_cats ON newsitem_categories(categoryid);
CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid);C
an w
e do?
YES but LIMITATIONs
Conjunctions in Cobnations on IndexingConjunctions in Cobnations on Indexing
CREATE TABLE example (CREATE TABLE example ( a int, a int, b int, b int, c int c int););
CREATE INDEX example_index ON example(a,b,c);
• It will be used when you check against ‘a’.
• It will be used when you check against ‘a’ and ‘b’.
• It will be used when you check against ‘a’, ‘b’ and ‘c’.
• It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’
• It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will not be used to check the ‘c’ column as well.
A query against ‘a’ OR ‘b’ like this:SELECT a,b,c FROM example where a=1 OR b=2;
• Will only be able to use the index to check the ‘a’ column as well – it will not be able to use it to check the ‘b’ column.
Types of indexes (1)Types of indexes (1)
Clustered Clustered andand Non-clusteredNon-clustered
IndexesIndexes
indexes whose order of the rows in the data page correspond to the order of the rows in the index
• Only one per table – primary key
• Faster to read than non clustered as data is physically stored in index order
• Can be used many times per table
• Quicker for insert, delete, and update operations than a clustered index
Order of rows is not important
Types of indexes (2)Types of indexes (2)
UniqueUnique andand Non-uniqueNon-unique
IndexesIndexeshelp maintain data integrity by ensuring that no two rows of data in a table have identical key values
uniqueness is enforced
improve query performance by maintaining a sorted order of data values that are used frequently
Types of indexes (3)Types of indexes (3)
Bitmap index - stores the bulk of its data as bit array
values of a variable repeat very frequently
Dense index - An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record
Sparse index - Index records are created only for some of the records
primary key
Reverse index - reverses the key value before entering it in the index sequence numbers, where new key values monotonically
increase
Types of indexes (4)Types of indexes (4)
Fulltext - search engine examines all of the words in every stored document as
it tries to match search words supplied by the user
many other types of search: Two words near each other Any word derived from a particular root (for example run, ran, or running) Multiple words with distinct weightings A word or phrase close to the search word or phrase
Spatial - allow users to treat data within a data-store as existing within a two dimensional context
extended index that allows you to index a spatial column. A spatial column is a table column that contains
data of a spatial data type, such as geometry or geography
Syntax of Index (1)Syntax of Index (1)
Creation:
CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name [index_type] ON tbl_name (index_col_name,...) [index_type]index_col_name: col_name [(length)] [ASC | DESC]index_type: USING {BTREE | HASH}
Access MethodAccess Method
BTree:Keys have some locality of reference
They can be sorted well
Neighborhood-expect that a query for a given key
will likely be followed by a query for one of its neighbors
Hash:Dataset is extremely large
Syntax of Index(2)Syntax of Index(2)
Displaying Index Information:Displaying Index Information:
SHOW INDEX FROM table_name
Deletion:
DROP INDEX index_name ON table_name
Summary Summary
What is index? - What is index? - data structure – sorting a number of records
Why is it needed? - Why is it needed? - advantages & disadvantages
When should it be used? - When should it be used? - finding
Types of indexes - Types of indexes - clustered & non-clustered – unique & non-unique
Syntax - Syntax - creation, display, deletion