Version 1.0
SQL Server Indexes Management
Overview
A h m a d A d e l G a d
a h m a d . a d e l @ j e m m a i l . c o m
2 1 / 0 3 / 2 0 1 5
Ahmad Adel Gad
Outlines the SQL Server Indexes types and management.
Illustrates the evolution of the databases in forms of the
RDBMS concepts using the Binary Tree and the B-Tree
algorithms.
Providing the T-SQL statements to shrink and defrag
databases.
SQL Server Indexes Management Overview
1
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
SQL Server Indexes Management Overview
Author ........................................................................................................................................................... 3
Introduction .................................................................................................................................................. 4
Tree Search Algorithms ................................................................................................................................. 5
Binary Tree .................................................................................................................................... 5
B-Tree ............................................................................................................................................ 6
B+Tree ........................................................................................................................................... 6
B+Trees in SQL Server ................................................................................................................... 7
UB-Tree ......................................................................................................................................... 8
SQL Language ................................................................................................................................................ 9
History ........................................................................................................................................... 9
Language Elements ..................................................................................................................... 10
T-SQL Languages ......................................................................................................................... 10
Microsoft SQL Server Overview .................................................................................................................. 11
Microsoft & Sybase ..................................................................................................................... 11
SQL Server Release History ......................................................................................................... 12
DBMS ........................................................................................................................................................... 13
RDBMS ........................................................................................................................................................ 13
Data Access Protocols ................................................................................................................................. 14
ODBC ........................................................................................................................................... 14
OLEDB .......................................................................................................................................... 14
OLEDB vs ODBC ........................................................................................................................... 15
JDBC ............................................................................................................................................ 15
ADO.Net Data Provider ............................................................................................................... 15
Indexes ........................................................................................................................................................ 17
Index Architecture & Types ......................................................................................................... 17
Clustered Index ........................................................................................................................... 18
Non-Clustered Index ................................................................................................................... 18
Fill Factor ..................................................................................................................................... 20
Pad Index ..................................................................................................................................... 21
SQL Server Indexes Management Overview
2
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
XML Index.................................................................................................................................... 21
Spatial Index ................................................................................................................................ 22
Database Shrinking ..................................................................................................................................... 23
Shrinking Log Files ....................................................................................................................... 23
Shrinking Data Files ..................................................................................................................... 23
Check Fragmentations ................................................................................................................ 23
Defragmentation ......................................................................................................................... 23
References .................................................................................................................................................. 26
SQL Server Indexes Management Overview
3
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Author
Ahmad Adel Gad was born in Egypt-Cairo on June, 1980.
He was graduated from faculty of engineering (Civil Engineering Department) on 2003.
He found himself in Software Developing rather than the civil engineering since the first year in the
college as they were teaching them how to write a code since the first year in order to implement
applications to support their work.
After he was graduated from college by two years he decided to shift to the Software Engineering.
On the early 2007, he attended a scholarship for Software Engineering in C#.
On the early of 2008, he had certified as Microsoft Solutions Developer (MCSD) for C# 2003.
On the middle of 2008 he had certified on (Microsoft Certified Trainer) MCT where he got an official
license from Microsoft to deliver Microsoft official courses.
On 2004 he started to develop Desktop Applications for Windows with his partner the CEO Wael
Ibrahim Arafa in his company Arafasoft and his role was the executive manager for the company.
Currently, Ahmad is working as fulltime in the IT field and as freelancer in the following fields as training,
consultation and developing with the two the languages VB.Net and C#.Net:
1. SharePoint Developing/Administration
2. ASP.Net Applications
3. Windows Applications
4. SQL Server Developing/Administration
5. AJAX Implementation (JavaScript/jQuery + XML Web Service)
6. Cryptography/Digital Signature Consultation
Currently, Ahmad is officially certified as:
ITIL Foundation
MCT
MCAD
MCSD
MCPD Enterprise
MCTS - SQL Server 2005
MCPD – SharePoint 2010
MCITP – SharePoint 2010
MCTS - Web Application 3.5
MCTS - Windows Application 3.5
MCTS - ADO.Net 3.5
MCPD - Web Developer 4
MCTS - .NET Framework 4, Windows Applications
MCTS - Programming in HTML5 with JavaScript and CSS3
MCITP - Database Administrator 2008
MCTS - Microsoft SQL Server 2008, Database Development
MCTS - Core Solutions of Microsoft SharePoint Server 2013
MCTS - Advanced Solutions of Microsoft SharePoint Server 2013
SQL Server Indexes Management Overview
4
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Introduction
This article will handle the SQL database shrinking concept in consideration of the fragmentations after
the shrinking and how rebuild the index to reduce those fragmentations.
But before going in the technical details, this article will take the readers in backward to illustrate the
evolution of the databases and the RDBMS (Relational Database Management System) using the Binary
Tree and B-Tree algorithms which were evaluated to create the modern RDBMS databases with the
required indexes.
Before inventing this magnificent algorithm the computers’ users or the developers used to store their
data in plain text files (flat files) with different formats, and the most popular one is the Comma
Separated Values (CSV).
This method was very poor including the following shortfalls:
1. Not secured and can be accessible by anyone and many simple tools.
2. Unicode characters not allow.
3. Performance as the only way to get some certain data is by reading the whole file sequentially.
4. Relational Model not supported, and to overcome this scenario, they were creating file for each
table and maintaining the consistency was very complicated and kind of impossible.
Therefore, the Binary Tree was a huge revolution of the database concepts while it solved almost most
of the plain text’s short falls. And by developing this algorithm as B-Tree and later B+Tree, the RDBMS
became the common concept with any new database storage system.
SQL Server Indexes Management Overview
5
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Tree Search Algorithms
The tree methodologies were invented to store and search the data to ease and speeding of the data
query and manipulation (inserting, updating and deleting). It called a tree or a reversed tree, because
the root contains the keys while the tree leafs contain the data.
Binary Tree
The Binary Tree is a tree data structure into two nodes, which each node has at most two
children, which are referred to as the left child and the right child. The following figures are
giving an overview how the data is sorted using the Binary Tree.
The Binary Tree was a smart invention to sort and query the data but it contains a remarkable
disadvantage which is represented by its name (Binary). Each node can be branched only into
two sub nodes and each sub node can be branched only into two nodes until the end of the tree.
SQL Server Indexes Management Overview
6
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
In most cases some nodes can grow more than others and we can get a tree with very short
branches and very long branches. In this case the search could take longer to query some data
rather than others.
B-Tree
In 1972, Rudolf Bayer and Ed McCreight, both working at Boeing at the time, were looking for a
way to overcome some of the shortfalls of the binary tree. Their solution was the B-Tree.
The B-Tree is the evolution of the Binary Tree algorithm which overcomes its disadvantage.
In the B-Tree, each node can be branched into many nodes as required to keep the tree nodes
balanced and symmetric as possible as could be.
Unlike the binary search trees, the B-Tree is optimized for systems that read and write large
blocks of data. It is commonly used in databases and file systems.
The “B” character confused many about the name, some of them where referring to Binary Tree
wrongly, but the most common naming is “Balanced”.
For a long time it was unclear what the "B" in the name represented. Candidates discussed in
public where "Boeing", "Bayer", "Balanced", "Bushy" and others. In 2013, the B-Tree had just
turned 40, Ed McCreight revealed in an interview, that they intentionally never published an
answer to this question. They were thinking about many of these options themselves at the time
and decided to just leave it an open question. The CPU processing of the search for Binary Tree
is the same as B-Tree, but for the memory consuming (RAM) B-Tree consumes less because the
node would be much shorter than the binary one.
B+Tree
B-Trees are a lot more efficient than binary trees when it comes to updates, but some
operations can still turn out expensive, depending on where the node that will hold the new or
updated data lives in the tree. Therefore, another optimization was made to B-Trees to help
with this problem. Instead of treating all nodes equal, the new structure has two types of nodes.
The lowest level nodes, also called leaf nodes, hold the actual data. All other nodes including the
root node only hold the key values and pointers to the next nodes. This type of tree is called a
B+Tree as seen the following figure:
SQL Server Indexes Management Overview
7
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
There are no limitations on the number of key-pointer-pairs or data rows within a node. The
only limitation which is a significant advantage in the same time is that, all leaf nodes have the
same distance from the root node. That means that the work to seek for a particular data point
is always the same, no matter what the key value is. It also keeps updates very localized in the
tree. I might have to move a few rows to a new node during an update but then I probably only
need to change a single parent node to integrate that new node. It is however possible for a
single change to affect every level of the tree, but those changes are rare.
B+Trees in SQL Server
SQL Server stores its indexes in B+Tree format. There are a few exceptions - for example
temporary hash indexes, created during a hash join operation, or column store indexes, which
are not really indexes at all. However, all key-based clustered and non-clustered persisted SQL
Server indexes are organized and stored as B+Trees.
Each node in such a tree is a page in SQL Server terms and you will find two page types in each
index. The first type are the data pages (pages of type 1). Each leaf level node in a SQL Server
B+Tree index is a single data page. The second type are intermediate index pages (pages of type
2). Each node in an index B+Tree that is not a leaf level node is a single page of type 2. Those
pages contain rows just like the data pages. But in addition they contain a pointer for each row
that identifies the next child page. That child page can be either of type 1 or of type 2,
depending on the location in the B+Tree.
SQL Server Indexes Management Overview
8
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
UB-Tree
The UB-Tree as proposed by Rudolf Bayer and Volker Markl is a balanced tree for storing and
efficiently retrieving multidimensional data. It is basically a B+ tree with records stored according
to Z-order, also called Morton order. Z-order is simply calculated by bitwise interlacing the keys.
Insertion, deletion, and point query are done as with ordinary B+ trees. To perform range
searches in multidimensional point data, however, an algorithm must be provided for
calculating, from a point encountered in the data base, the next Z-value which is in the
multidimensional search range.
SQL Server Indexes Management Overview
9
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
SQL Language
History
The first version of SQL (Structured Query Language) was developed at IBM by Donald D.
Chamberlin and Raymond F. Boyce in the early 1970s. This version, initially called SEQUEL
(Structured English Query Language).
SQL was designed to manipulate and retrieve data stored in IBM's original relational database
product, System R. The SQL language was later formally standardized by the American National
Standards Institute (ANSI) in 1986. Subsequent versions of the SQL standard have been released
as International Organization for Standardization (ISO) standards.
The SQL language was standardized and developed with many vendors as illustrated in the
following table:
Source Common Name Full name
ANSI/ISO Standard SQL/PSM SQL/Persistent Stored Modules
Interbase / Firebird PSQL Procedural SQL
IBM DB2 SQL PL SQL Procedural Language (implements SQL/PSM)
IBM Informix SPL Stored Procedural Language
IBM Netezza NZPLSQL (based on Postgres PL/pgSQL)
Microsoft / Sybase T-SQL Transact-SQL
Mimer SQL SQL/PSM SQL/Persistent Stored Module (implements SQL/PSM)
MySQL SQL/PSM SQL/Persistent Stored Module (implements SQL/PSM)
MonetDB SQL/PSM SQL/Persistent Stored Module (implements SQL/PSM)
NuoDB SSP Starkey Stored Procedures
Oracle PL/SQL Procedural Language/SQL (based on Ada)
PostgreSQL PL/pgSQL Procedural Language/PostgreSQL (based on Oracle PL/SQL)
PostgreSQL PL/PSM
Procedural Language/Persistent Stored Modules (implements SQL/PSM)
Sybase Watcom-SQL SQL Anywhere Watcom-SQL Dialect
Teradata SPL Stored Procedural Language
SAP SAP HANA SQL Script
SQL Server Indexes Management Overview
10
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Language Elements
The SQL language is subdivided into several language elements, including:
Queries, which retrieve the data based on specific criteria. This is an important element
of SQL.
Clauses, which are constituent components of statements and queries. (In some cases,
these are optional.) (Where clauses)
Expressions, which can produce either scalar values, or tables consisting of columns and
rows of data. (Select expression From ….;)
Predicates, which specify conditions that can be evaluated to SQL three-valued logic
(3VL) (true/false/unknown) or Boolean truth values and are used to limit the effects of
statements and queries, or to change program flow.
Statements, which may have a persistent effect on schemata and data, or may control
transactions, program flow, connections, sessions, or diagnostics. (DDL/DCL)
T-SQL Languages
The following languages are popular in any RDBMS SQL language, but we are going to expand
the commands which language in prospective of the Microsoft T-SQL.
DQL (Data Query Language) – Select
DML (Data Manipulating Language) – Insert, Update and Delete
DDL (Data Definition Language) – Create, Alter and Drop
DCL (Data Control Language) – Grant, Deny and Revoke
TCL (Transaction Control Language) – Begin Tran, Commit Tran and Rollback Tran
SQL Server Indexes Management Overview
11
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Microsoft SQL Server Overview
Microsoft & Sybase
Prior to version 7.0 the code base for MS SQL Server was sold by Sybase SQL Server to Microsoft,
and was Microsoft's entry to the enterprise-level database market, competing against Oracle,
IBM, and, later, Sybase. Microsoft, Sybase and Ashton-Tate originally worked together to create
and market the first version named SQL Server 1.0 for OS/2 (about 1989) which was essentially
the same as Sybase SQL Server 3.0 on Unix, VMS, etc. Microsoft SQL Server 4.2 was shipped
around 1992 (available bundled with IBM OS/2 version 1.3). Later Microsoft SQL Server 4.21 for
Windows NT was released at the same time as Windows NT 3.1. Microsoft SQL Server v6.0 was
the first version designed for NT, and did not include any direction from Sybase.
About the time Windows NT was released in July 1993, Sybase and Microsoft parted ways and
each pursued its own design and marketing schemes. Microsoft negotiated exclusive rights to all
versions of SQL Server written for Microsoft operating systems. (In 1996 Sybase changed the
name of its product to Adaptive Server Enterprise to avoid confusion with Microsoft SQL Server.)
Until 1994, Microsoft's SQL Server carried three Sybase copyright notices as an indication of its
origin.
SQL Server 7.0 and SQL Server 2000 included modifications and extensions to the Sybase code
base, adding support for the IA-64 architecture. By SQL Server 2005 the legacy Sybase code had
been completely rewritten.
Since the release of SQL Server 2000, advances have been made in performance, the client IDE
tools, and several complementary systems that are packaged with SQL Server 2005. These
include:
An extract-transform-load (ETL) tool (SQL Server Integration Services or SSIS)
A Reporting Server
An OLAP and data mining server (Analysis Services)
Several messaging technologies, specifically Service Broker and Notification Services
SQL Server Indexes Management Overview
12
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
SQL Server Release History
Version Year Release Name Code Name Internal Version
1.0 (OS/2) 1989 SQL Server 1.0 (16 bit) Ashton-Tate / Microsoft SQL Server
-
1.1 (OS/2) 1991 SQL Server 1.1 (16 bit)
- -
4.21 (WinNT) 1993 SQL Server 4.21 SQLNT -
6 1995 SQL Server 6.0 SQL95 -
6.5 1996 SQL Server 6.5 Hydra -
7 1998 SQL Server 7.0 Sphinx 515
- 1999 SQL Server 7.0
(OLAP Tools) Palato mania -
8 2000 SQL Server 2000 Shiloh 539
8 2003 SQL Server 2000 Liberty 539
64-bit Edition
9 2005 SQL Server 2005 Yukon 611/612
10 2008 SQL Server 2008 Katmai 661
10.25 2010 Azure SQL DB Cloud Database or CloudDB
-
10.5 2010 SQL Server 2008 R2 Kilimanjaro (aka KJ) 665
11 2012 SQL Server 2012 Denali 706
12 2014 SQL Server 2014 SQL14 782
SQL Server Indexes Management Overview
13
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
DBMS Database Management Systems (DBMS) are computer software applications that interact with the user,
other applications, and the database itself to capture and analyze data. A general-purpose DBMS is
designed to allow the definition, creation, querying, update, and administration of databases. Well-
known DBMSs include MySQL, PostgreSQL, Microsoft SQL Server, Oracle, Sybase and IBM DB2. A
database is not generally portable across different DBMSs, but different DBMS can interoperate by using
standards such as SQL and ODBC or JDBC to allow a single application to work with more than one
DBMS. Database management systems are often classified according to the database model that they
support; the most popular database systems since the 1980s have all supported the relational model as
represented by the SQL language.
RDBMS A Relational Database Management System (RDBMS) is a database management system (DBMS) that is
based on the relational model as invented by E. F. Codd, of IBM's San Jose Research Laboratory.
Many popular databases currently in use are based on the relational database model.
RDBMSs are a common choice for the storage of information in new databases used for financial
records, manufacturing and logistical information, personnel data, and other applications since the
1980s. Relational databases have often replaced legacy hierarchical databases and network databases
because they are easier to understand and use. However, relational databases have been challenged by
object databases, which were introduced in an attempt to address the object-relational impedance
mismatch in relational databases, and XML databases.
In 1974, IBM began developing System R, a research project to develop a prototype RDBMS. Its first
commercial product was SQL/DS, released in 1981.However, the first commercially available RDBMS was
Oracle, released.
In 1979 by Relational Software, now Oracle Corporation.[6] Other examples of an RDBMS include DB2,
SAP Sybase ASE, and Informix.
SQL Server Indexes Management Overview
14
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Data Access Protocols
Any DBMS requires a connectivity driver/provider to be installed on the client’s machine.
This is very close to the PC hardware drivers when it is required to install them into the PC in order to be
able to connect the driver. Ex, the printer was coming in the past with Parallel Port connection and later
with USB port connection, and in order to prepare a PC to connect it, the proper driver get to be
installed compatible with the connected port.
The same with the DBMS, the prober driver/provider get to be installed on the client device to connect
to the target DBMS.
We will list the common standard known providers to the developers:
ODBC
In computing, ODBC (Open Database Connectivity) is a standard programming language
middleware API for accessing database management systems (DBMS). The designers of ODBC
aimed to make it independent of database systems and operating systems. An application
written using ODBC can be ported to other platforms, both on the client and server side, with
few changes to the data access code.
ODBC accomplishes DBMS independence by using an ODBC driver as a translation layer between
the application and the DBMS. The application uses ODBC functions through an ODBC driver
manager with which it is linked, and the driver passes the query to the DBMS. An ODBC driver
can be thought of as analogous to a printer or other driver, providing a standard set of functions
for the application to use, and implementing DBMS-specific functionality. An application that can
use ODBC is referred to as "ODBC-compliant". Any ODBC-compliant application can access any
DBMS for which a driver is installed. Drivers exist for all major DBMSs, many other data sources
like address book systems and Microsoft Excel, and even for text or CSV files.
ODBC was originally developed by Microsoft during the early 1990s, and became the basis for
the Call Level Interface (CLI) standardized by SQL Access Group in the Unix and mainframe world.
OLEDB
OLE DB (Object Linking and Embedding Database), sometimes written as OLEDB or OLE-DB.
It is an API designed by Microsoft, allows accessing data from a variety of sources in a uniform
manner. The API provides a set of interfaces implemented using the Component Object Model
(COM); it is otherwise unrelated to OLE. Microsoft originally intended OLE DB as a higher-level
replacement for, and successor to, ODBC, extending its feature set to support a wider variety of
non-relational databases, such as object databases and spreadsheets that do not necessarily
implement SQL.
SQL Server Indexes Management Overview
15
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
OLEDB vs ODBC
The differences between these two data access protocols are technical, but in the most general
terms, OLE DB is newer, more advanced, and compatible with a wider selection of database
technologies. In addition, OLE DB is more general, in that it includes the ODBC functionality.
Technically speaking, ODBC (Open Database Connectivity) is designed to provide access primarily
to SQL data in a multi-platform environment. OLE DB (Object Linking and Embedding Database)
is designed to provide access to all types of data in an OLE Component Object Model (COM)
environment. OLE DB includes the SQL functionality defined in ODBC but also defines interfaces
suitable for gaining access to data other than SQL data.
A query generated via ODBC is functionally similar to a query generated by OLE-DB. However, be
aware that if you generate a query (via STATISTICA Query) using an ODBC connection,
STATISTICA uses OLE DB to connect to the ODBC connection. Because there is an intermediate
connection, ODBC queries may be slower than OLE DB queries.
JDBC
JDBC is a Java Database Connectivity technology (Java Standard Edition platform). This
technology is an API for the Java programming language that defines how a client may access a
database. It provides methods for querying and updating data in a database. JDBC is oriented
towards relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-
accessible data source in the JVM host environment.
Sun Microsystems released JDBC as part of JDK 1.1 on February 19, 1997. Since then it has been
part of the Java Standard Edition.
The JDBC classes are contained in the Java package java.sql and javax.sql.
ADO.Net Data Provider
An ADO.NET data provider is a software component that interacts with a data source comes with
the Microsoft .Net Framework. ADO.NET data providers are analogous to ODBC drivers, JDBC
drivers, and OLE DB providers.
ADO.NET providers can be created to access such simple data stores as a text file and
spreadsheet, through to such complex databases as Oracle Database, Microsoft SQL Server,
MySQL, PostgreSQL, SQLite, IBM DB2, Sybase ASE, and many others. They can also provide
access to hierarchical data stores such as email systems.
However, because different data store technologies can have different capabilities, every
ADO.NET provider cannot implement every possible interface available in the ADO.NET
standard.
SQL Server Indexes Management Overview
16
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Starting from Microsoft .Net Framework v1.1, the developers don’t need to install any extra
API’s on the clients’ machines to connect to both SQL Server and Oracle Database Servers.
Just one important remark, the Microsoft ADO.Net Provider can connect to Microsoft SQL
Server v7.0 and higher. If the developers want to connect to Microsoft SQL Server 6.5 or lower,
they get to use the ODBC or OLEDB connection.
SQL Server Indexes Management Overview
17
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Indexes
A database index is a data structure that improves the speed of data retrieval operations on a database
table at the cost of additional writes and storage space to maintain the index data structure.
Indexes are used to quickly locate data without having to search every row in a database table every
time a database table is accessed.
Indexes can be created using one or more columns of a database table, providing the basis for both
rapid random lookups and efficient access of ordered records.
An index is a copy of select columns of data from a table that can be searched very efficiently that also
includes a low-level disk block address or direct link to the complete row of data it was copied from.
Index Architecture & Types
The data generally stored into records and each record cannot exceed the size of 8 KB (except
the very large data such as Binary, VarChar(Max), nVarChar(Max), Image, etc which are stored in
different pages with different structure). The records in turn are stored in pages and each page
cannot exceed the size 8 KB. The pages are grouped in extents while each extent can contain up
to 8 pages. That means each extent cannot exceed 64 KB.
The data in a table without any index created, are located in a logical partition calls Heap, while
with the index structure, that data is stored in the Leaf Level (LL), and the index definition keys
are stored in the Intermediate Level (IL) which is calls also as Non-Leaf Level.
If the table doesn’t contain any index (Clustered or Non-Clustered), the insert of the data would
be very fast because all the new records are added sequentially into the heap without taking
time to reorder them based on a sorting criteria such as in the case of the indexes. But in the
other hand, the search, update or delete would be very slow in comparing with the index
because it needs to scan the whole pages in the heap to retrieve the required data, update them
or delete them.
In SQL Server databases, each table each table can only contain one Clustered Index and up to
249 or 999 of Non-Clustered Indexes based on the SQL version.
Additional two types of indexes (XML and Spatial) were added initiated by SQL Server 2005 and
SQL Server 2008 respectively. The following points are representing the four types of indexes
(Clustered, Non-Clustered, XML and Spatial).
SQL Server Indexes Management Overview
18
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Clustered Index
The cluster index alters the data block into a certain distinct order to match the index, resulting
in the row data being stored physically in order for a specific column or combined with multiple
column which calls composite index. Therefore, only one clustered index can be created on a
given database table. Clustered indices can greatly increase overall speed of retrieval. Usually it
comes with the primary key index but not a necessary.
If a database table contains a clustered index, there won’t be an existing for heap because all the
data would be restructured on the Leaf Level with the extension of the intermediate level (Non-
Leaf Level).
The following figure illustrates the process of finding rows in a Clustered Index:
Syntax:
Create Clustered Index Index_Name On Table_Name(Col1, Col2, …);
Create Unique Clustered Index Index_Name On Table_Name(Col1, Col2, …);
Non-Clustered Index
The data is present in arbitrary order, but the logical ordering is specified by the index. The data
rows may be spread throughout the table regardless of the value of the indexed column or
expression. The non-clustered index tree contains the index keys in sorted order, with the leaf
level of the index containing the pointer to the record (page and the row number in the data
page in page-organized engines; row offset in file-organized engines).
SQL Server Indexes Management Overview
19
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
The database table can contain one or more Non-Cluster index plus Clustered Index or without.
In case of a Clustered Index plus Non-Clustered Index, the data exists in the leaf level of the
Clustered Index while the Non-Clustered Index is creating a tree which is pointing into the data
in the Leaf Level of the clustered index as represented the next figure:
In case of a Non-Clustered Index without a Clustered Index, the data will be still lying in the
heap while the Non-Clustered Index’s is pointing to the data into the heap.
The next figure illustrates the process of finding rows in a Heap with a Non-Clustered Index:
SQL Server Indexes Management Overview
20
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Syntax:
Create NonClustered Index Index_Name On Table_Name(Col1, Col2, …);
Create Unique NonClustered Index Index_Name On Table_Name(Col1, Col2, …);
Note: If the index type (Clustered/NonClustered) not mentioned whether it is unique or not in
the syntax, the index is created as “NonClustered”.
Fill Factor
The Fill Factor is the defined ratio of filling the data into each page in the leaf level with each
index created.
If no Fill Factor is defined within an index, the page is filled with 100% of its capacity (8 KB).
Note: If the Fill Factor is defined as 0%, it would be the same respect of 100% which means all
the pages is filled with data.
It is very significant in performance to specify a fill factor to keep reserved empty blank space in
each page for updating data or deleting data. If the page is already 100% full and new item
should be inserted into this page, the page would be split in order to hold the new record based
on its order, while if it was having a predefined fill factor, the record would have a place to be
stored in. Defining the fill factor ratio should be calculated based on the amount of data
manipulation (insert, update or delete).
The disadvantage of the Fill factor, that is preserve extra disk space and it increases with the
increase of the Fill factor ratio, therefore, it must be revised wisely.
In SharePoint Content Databases, the fill factor is 80%.
SQL Server Indexes Management Overview
21
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
It is healthy to rebuild the index from time to other to organize the data into the pages based on
the defined fill factor which could increase the data base size on the disk but in return, it
maintains the performance.
Syntax:
Create Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With
FillFactor = Fill_Ratio_As_Integer;
Create Unique Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With FillFactor = Fill_Ratio_As_Integer;
Pad Index
The Pad Index is optional keyword defined with the index creation statement to preserve the
same defined Fill Factor value to the Intermediate Level pages (Non-Leaf Level) as well;
otherwise, it will be 100% by default. (The intermediate Level fill factor cannot be different than
the leaf level fill factor).
Syntax:
Create Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With
Pad_Index, FillFactor = Fill_Ratio_As_Integer;
Create Unique Clustered /NonClustered Index Index_Name On Table_Name(Col1, Col2, …) With Pad_Index, FillFactor = Fill_Ratio_As_Integer;
XML Index
New index type was initiated by SQL Server 2005. XML indexes can be created on xml data type
columns. They index all tags, values and paths over the XML instances in the column and benefit
query performance. Your application may benefit from an XML index in the following situations:
Queries on XML columns are common in your workload. XML index maintenance cost
during data modification must be considered.
Your XML values are relatively large and the retrieved parts are relatively small. Building
the index avoids parsing the whole data at run time and benefits index lookups for
efficient query processing.
XML indexes fall into the following categories:
Primary XML index
Secondary XML index
The first index on the xml type column must be the primary XML index. Using the primary XML
index, the following types of secondary indexes are supported: PATH, VALUE, and PROPERTY.
Depending on the type of queries, these secondary indexes might help improve query
performance.
SQL Server Indexes Management Overview
22
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Spatial Index
Starting from SQL Server 2008 and higher are supporting spatial data and spatial indexes. A
spatial index is a type of extended index that allows you to index a spatial column. A spatial
column is a table column that contains data of a spatial data type, such as geometry or
geography.
The following table is representing the maximum indexes count per table:
Version Columns/
Table Columns/Index Key
Clustered Indexes
Non-Clustered Indexes
XML Indexes
SQL Server 2005 1024 16 1 249 249
SQL Server 2008 1024* 16 1 999 249
SQL Server 2008 R2 1024* 16 1 999 249
SQL Server 2012 1024* 16 1 999 249
SQL Server 2014 1024* 16 1 999 249
* In wide tables which is initiated by SQL Server 2008, it could take up to 30,000 columns per
table and also up to 1,000 indexes but with maximum tow size of 8KB such as the non-wide
table.
SQL Server Indexes Management Overview
23
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Database Shrinking
When the database is consuming much space on the hard-disk with much unallocated data, we can think
seriously on database shrinking which is trimming the unallocated data on the disk.
The unallocated data is caused by the much manipulation operations such as the data updating and
deleting; because if the data is moved from a page to another due to an update or even delete, the
reserved space on the disk is not freed up, furthermore, it may increase the index fragmentation.
Shrinking Log Files
The log can truncated with the Transaction Log backups. Otherwise, to shrink a log file, the
recovery model of database mush be set to “Simple” first and then run the “DBCC SHRINKFILE”
command.
Syntax:
USE DatabaseName;
GO
DBCC SHRINKFILE (LogFileName);
or
DBCC SHRINKFILE (LogFileName, NewFileSize_in_MB);
Shrinking Data Files
The data files can be shrunk with any recovery model using the “DBCC SHRINKFILE”:
Syntax:
USE DatabaseName;
GO
DBCC SHRINKFILE (DataFileName);
or
DBCC SHRINKFILE (DataFileName, NewFileSize_in_MB);
To avoid the data file size expansion after the defragmentation (index rebuilding) which would
be performed after the shrinking, always specify the new file size which is equal or greater than
the current allocated size with extra few gigabytes.
Check Fragmentations
You can always check the database fragmentation using the following T-SQL query:
Defragmentation
Use Database_Name;
Go
SELECT (Sch.Name + '.[' + T.Name + ']') As TableName, I.Name AS IndexName,
INS.index_type_desc As IndexType, Fill_Factor, Page_Count,
INS.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) As INS
INNER JOIN sys.indexes I ON I.object_id = INS.object_id AND I.index_id = INS.index_id
Inner Join sys.Tables As T On T.object_id = INS.object_id
Inner Join sys.Schemas As Sch On T.Schema_ID = Sch.Schema_ID
WHERE I.Name IS Not Null And INS.avg_fragmentation_in_percent > 0
ORDER BY INS.avg_fragmentation_in_percent DESC;
SQL Server Indexes Management Overview
24
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
The defragmentation is done basically by rebuilding the index or reorganizing it using the Alter
Index command which would replace the usage of the two commands “DBCC INDEXDEFRAG”
and “DBCC DBREINDEX” in a future release of Microsoft SQL.
Defragmentation can be performed by rebuilding (if fragmentation is above 30%) and
reorganizing (if fragmentation is equal to or less than 30%) all the indexes in the database using
one of the following attached T-SQL scripts:
1. “Defrag_Database.sql” can be executed this script direct using the parameters listed on
the underneath table.
2. “JSP_Defrag_Database - Stored Procedure.sql” can be executed once per a time to create
the stored procedure in the memory until the current query session is closed. This
Stored Procedure can be used to defrag multiple databases in the same opened query
session using the following syntax:
JSP_Defrag_Database - Stored Procedure.sql
Defrag_Database.sql
Parameter Name Required/Optional
Data Type Default Value
Values Range
Description
@DataBase Optional nVarChar(1000) Null N/A The target database to be defragged. If not set, it would defrag the current connected database.
@FillFactor Optional SmallInt -1 0 - 100 The fill factor ratio; between 0 to 100
while 0 is the same respects as 100.
@Online Optional VarChar(3) On On | Off
If set to “On”, the database would be kept in the “Online” mode, if set to “Off” the database would be set to the “Offline” mode.
@XMLOnlineMode Optional VarChar(10) Reorg Off | Reorg
Both XML and Spatial indexes are not supported in rebuilding with “Online”
mode (@Online = ‘On’), therefore, it must be to set wither to “Off” to be rebuilt with “Offline” mode, or “Reorg” to be reorganized instead of the rebuilding.
@SpatialOnlineMode Optional VarChar(10) Reorg Off | Reorg
Execute ##JSP_Defrag_Database @DataBase = 'AdventureWorks2008R2';
SQL Server Indexes Management Overview
25
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
Important Remarks:
1. The “Online = On” option will not work with the following cases:
Non Enterprise Edition of SQL.
The index is an XML index.
The index is a Spatial index.
The index is on a local temp table.
The index is clustered and the table contains a LOB database column(s).
The index is not clustered and the index itself contains a LOB database
column(s).
2. Do not execute the above script on Microsoft SQL Server 2012 or Microsoft SQL Server 2014
with “Online” option.
If using SQL Server 2012 or SQL Server 2014, mind there is a bug where the online index rebuild
can cause index corruption or data loss when it is used together with concurrent queries that
modify many rows.
Apply a CU that includes hotfix http://support.microsoft.com/kb/2969896.
For SQL Server 2012 SP1, apply at least CU11.
For SQL Server 2012 SP2, apply at least CU1.
For SQL Server 2014, apply at least CU2.
The following T-SQL script is useful to get all the info about the current installed SQL:
SELECT
SERVERPROPERTY('ServerName') AS ServerName,
SERVERPROPERTY('MachineName') AS MachineName,
SERVERPROPERTY('ComputerNamePhysicalNetBIOS') AS CurrentActiveClusterMachine,
SERVERPROPERTY('InstanceName') AS InstanceName,
SERVERPROPERTY('ProductVersion') AS ProductVersion,
SERVERPROPERTY('ProductLevel') AS ProductLevel,
SERVERPROPERTY('Edition') AS Edition,
SERVERPROPERTY('EngineEdition') AS EngineEdition,
SERVERPROPERTY('LicenseType') AS LicenseType,
SERVERPROPERTY('Collation') AS Collation,
SERVERPROPERTY('BuildClrVersion') AS BuildClrVersion;
SQL Server Indexes Management Overview
26
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
References
http://sqlity.net/en/2445/b-plus-tree/
https://technet.microsoft.com/en-us/magazine/2009.07.sqlbackup.aspx
https://msdn.microsoft.com/en-us/library/ms188388.aspx
https://msdn.microsoft.com/en-us/library/ms190981.aspx
https://msdn.microsoft.com/en-us/library/ms181671.aspx
https://msdn.microsoft.com/en-us/library/ms177571.aspx
https://technet.microsoft.com/en-us/magazine/2009.07.sqlbackup.aspx
http://en.wikipedia.org/wiki/Database_index
http://en.wikipedia.org/wiki/Transact-SQL
http://en.wikipedia.org/wiki/SQL
http://sqlity.net/en/2462/intermediate-index-pages/
http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/
https://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/
http://use-the-index-luke.com/sql/anatomy/the-tree
http://en.wikipedia.org/wiki/Tree_%28data_structure%29
http://en.wikipedia.org/wiki/Binary_tree
http://en.wikipedia.org/wiki/B-tree
http://en.wikipedia.org/wiki/Sybase
http://en.wikipedia.org/wiki/Microsoft_SQL_Server
http://en.wikipedia.org/wiki/Sybase
http://en.wikipedia.org/wiki/OLE_DB
http://en.wikipedia.org/wiki/Open_Database_Connectivity
http://en.wikipedia.org/wiki/Java_Database_Connectivity
http://en.wikipedia.org/wiki/Database
http://en.wikipedia.org/wiki/Relational_database_management_system
http://documentation.statsoft.com/STATISTICAHelp.aspx?path=Query/StatQuery/FAQ/WhatIstheDifferencebe
tweenODBCandOLEDB
http://sqlmag.com/sql-server/ole-db-or-odbc
http://en.wikipedia.org/wiki/ADO.NET_data_provider
https://msdn.microsoft.com/en-us/library/system.data.oracleclient%28v=vs.71%29.aspx
https://msdn.microsoft.com/en-us/library/system.data.sqlclient%28v=vs.110%29.aspx
http://blogs.msdn.com/b/blogdoezequiel/archive/2011/07/03/adaptive-index-defrag.aspx#.VRLtM3vMhWw
http://support.microsoft.com/kb/2969896
https://msdn.microsoft.com/en-us/library/ms143432%28v=sql.90%29.aspx
https://msdn.microsoft.com/en-us/library/ms143432%28v=sql.100%29.aspx
https://msdn.microsoft.com/en-us/library/ms143432%28v=sql.105%29.aspx
https://msdn.microsoft.com/en-us/library/ms143432%28v=sql.105%29.aspx
https://msdn.microsoft.com/en-us/library/ms143432.aspx
https://msdn.microsoft.com/en-us/library/ms191497%28v=sql.105%29.aspx
https://technet.microsoft.com/en-us/library/bb964712%28v=sql.105%29.aspx
SQL Server Indexes Management Overview
27
Ahmad Adel Gad ([email protected])
Mobile: +20-10-02501021 | +353-83-8489055 | Skype ID: eng.ahmad.adel | JemPress Blog WordPress Blog LinkedIn Twitter Google+
https://msdn.microsoft.com/en-us/library/bb934196.aspx
https://msdn.microsoft.com/en-us/library/bb895265.aspx
https://technet.microsoft.com/en-us/library/ms186986%28v=sql.100%29.aspx
https://msdn.microsoft.com/en-us/library/ms173760.aspx
http://en.wikipedia.org/wiki/UB-tree
http://www.scholarpedia.org/article/B-tree_and_UB-tree
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CCwQFjAC&url=http%3A%2F%
2Fwww.vldb.org%2Fconf%2F2000%2FP263.pdf&ei=DusVVePmDMTiapCtgmg&usg=AFQjCNETDk6oAlHSAY36Xj
W9v_ylryPorg&sig2=OX_VuUsxUW4RKI32a7Ic9w&bvm=bv.89381419,d.d24&cad=rja
https://www.simple-talk.com/sql/t-sql-programming/sql-server-spatial-indexes/