Learn about SQL Server and its Principles – SQL Tutorial Part 5

B-tree

In SQL Server 2005, all indexes are structured in the form of B-Trees. A B-Tree structure can be visualized as an inverted tree with roots right at the top splitting into branches and then into leaves right at the bottom.

In a B-Tree structure, there is a single root node at the top. This node then branches out into the next level, known as the first intermediate level. The nodes at the first intermediate level can branch out further. This branching can continue into multiple intermediate levels and then finally the leaf level. The nodes at the leaf level are known as the leaf nodes.

Index B-tree Structure

In the B-Tree structure of an index, the root node consists of an index page. This index page contains pointers that point to the index pages present in the first intermediate level. These index pages in turn point to the index pages present in the next intermediate level. There can be multiple intermediate levels in an index B-Tree. The leaf nodes of the index B-Tree have either data pages containing data rows or index pages containing index rows that point to data rows.

Root Node

Contains an index page with pointers pointing to index pages at the first intermediate level.

Intermediate Nodes

Contain index pages with pointers pointing either to index pages at the next intermediate level or to index or data pages at the leaf level.

Leaf Nodes

Contain either data pages or index pages that point to data pages.

A data page containing index entries is called an index page.

Index Architecture

In SQL Server 2005, data in the database can be stored either in a sorted manner or at random. If data is stored in a sorted manner, the data is said to be present in a clustered structure. If it is stored at random, it is said to be present in a heap structure.

Heap Structures

In a heap structure, the data pages and records are not arranged in sorted order. The only connection between the data pages is the information recorded in the Index Allocation Map (IAM) pages.

In SQL Server 2005, IAM pages are used to scan through a heap structure. IAM pages map extents that are used by an allocation unit in a part of a database file.

You can read a heap by scanning the IAM pages to look for the extents that contain the pages for that heap.

If an allocation unit contains extents from more than one file, there will be multiple IAM pages linked together in an IAM chain to map these extents.

Partitioning of Heap Structures

A table can be logically divided into smaller groups of rows. This division is referred to as partitioning. Tables are partitioned in order to carry out maintenance operations more efficiently. By default, a table has a single partition.

When partitions are created in a table with a heap structure, each partition will contain data in an individual heap structure. For example, if a heap has three partitions, then there are three heap structures present, one in each partition.

Clustered Indexes

A clustered index causes records to be physically stored in a sorted or sequential order. Thus, a clustered index determines the actual order in which data is stored in the database. Hence, you can create only one clustered index in a table.

Uniqueness of a value in a clustered index is maintained explicitly using the UNIQUE keyword or implicitly using an internal unique identifier.

Before you create a clustered index, you need to make sure the free space in your system is at least 1.2 times the amount of data in the table.

Clustered Index as a B-tree

In SQL Server 2005, a clustered index is organized in the form of a B-tree. The root node of the B-tree contains an index page with pointers that point to the index pages at the first intermediate level. These index pages in turn have pointers that point to the index pages at the next intermediate level. The index pages at the last intermediate level have pointers that point to actual data rows in the data pages present at the leaf level of the index.

The size of a clustered index is about five percent of the table size. However, it varies depending on the size of the indexed column.

Accessing Data with a Clustered Index

A clustered index can be created on a table using a column without duplicate values. This index reorganizes the records in the sequential order of the values in the index column.

Clustered indexes are used to locate a single row or a range of rows. Starting from the first page of the index, the search value is checked against each key value on the page. When the matching key value is found, the database engine moves to the page indicated by that value. The desired row or range of rows is then accessed.

Clustered indexes are useful for columns that are searched frequently for key values or are accessed in sorted order.

Guidelines

A clustered index is automatically created on a table when a primary key is defined on the table. In a table without a primary key column, a clustered index should ideally be defined on:

Key columns that are searched on extensively.
Columns used in queries that return large result sets.
Columns having unique data.
Columns used in table joins.

Two or more tables can be logically joined through columns that are common to the tables. Data can then be retrieved from these tables as if they were a single table.

Nonclustered Indexes

A nonclustered index is defined on a table that has data either in a clustered structure or a heap. Nonclustered index will be the default type if an index is not defined on a table. Each index row in the nonclustered index contains a nonclustered key value and a row locator. This row locator points to the data row corresponding to the key value in the table.

Nonclustered indexes have a similar B-tree structure as clustered indexes but with the following differences:

The data rows of the table are not physically stored in the order defined by their nonclustered keys.
In a nonclustered index structure, the leaf level contains index rows.

Guidelines

Nonclustered indexes are useful when you require multiple ways to search data. There are some facts and guidelines to be considered before creating a nonclustered index.

When a clustered index is re-created or the DROPJEXISTING option is used, SQL Server rebuilds the existing nonclustered indexes.
A table can have up to 249 nonclustered indexes.
Create clustered index before creating a nonclustered index.

Differences

Clustered and nonclustered indexes are different in terms of their architecture and their usefulness in query executions. The table highlights the differences between clustered and nonclustered indexes:

XML Indexes

SQL Server 2005 has introduced a new data type namely, xml. This data type is used to store XML documents and fragments. An XML fragment is an XML instance that has a single top-level element missing.

Due to the large size of XML columns, queries that search within these columns can be slow. You can speed up these queries by creating an XML index on each column. An XML index can be a clustered or a nonclustered index.

The size of all the data stored in an xml type column cannot exceed 2 Gigabytes (GB).

Types of XML Indexes

XML indexes can be created on a table only if there is a clustered index based on the primary key of the table. This primary key cannot exceed 15 columns.

There are two types of XML indexes:

Primary XML Indexes
Secondary XML Indexes

Primary XML Indexes

The process of carrying out queries within an XML column can sometimes be slow. A primary XML index is created on each XML column to speed up these queries. It is a special index that shreds the XML data to store information.

Secondary XML Indexes

Secondary XML indexes are specialized XML indexes that help with specific XML queries. These include:

Searching for values anywhere in the XML document.
Retrieving particular object properties from within an XML document.

Secondary XML indexes can only be created on columns that already have a primary XML index.

Test Your Knowledge

True or False?

(A) A clustered index reorganizes data in the table.

(B) A nonclustered index stores data rows at the leaf level.

(D) A clustered index provides faster retrieval of queries that return large result sets.

(E) A clustered index stores index pages at the leaf level of the index.

Answer

(A) True

(B) False

(D) True

(E) False

Read Learn about SQL Server and its Principles Part 4

Learn about SQL Server and its Principles – SQL Tutorial Part 5

B-tree

Index B-tree Structure

Root Node

Intermediate Nodes

Leaf Nodes

Index Architecture

Heap Structures

Partitioning of Heap Structures

Clustered Indexes

Clustered Index as a B-tree

Accessing Data with a Clustered Index

Guidelines

Nonclustered Indexes

Guidelines

Differences

XML Indexes

Types of XML Indexes

Primary XML Indexes

Secondary XML Indexes

Test Your Knowledge

Answer

Leave a Reply Cancel reply