SQL Tutorial – Learn about SQL Server and its Principles – Part 4
SQL Tutorial Part 4
Welcome to the module Introduction to Indexes. Indexes are created for faster retrieval of data. Sorted data is more easily accessible than unsorted data. Hence, queries on indexed tables are executed faster and with lesser strain on system resources.
In this module, you will learn about:
- Introduction to Indexes
- Index Architecture
- Retrieving Stored Data
- Finding Rows
Towards the end of the module, there are demonstrations and/or simulations for reinforcing the theoretical concepts.
A book contains pages, which contain paragraphs made up of sentences. Similarly, SQL Server 2005 stores data in storage units known as data pages. These pages contain data in the form of rows.
In SQL Server 2005, the size of each data page is S kilobytes (KB). Thus, SQL Server databases have 12S data pages per megabyte (MB) of storage space.
A page begins with a 96-byte header, which stores system information about the page. This information includes:
- Page number
- Page type
- Amount of free space on the page
- Allocation unit ID of the object to which the page is allocated
A data page is the smallest unit of data storage. An allocation unit is a collection of data pages grouped together based on the page type, This grouping is done for efficient management of data.
All input and output operations in the database are performed at the page level. This means that the database engine reads or writes data pages. A set of eight contiguous data pages is referred to as an extent.
SQL Server 2005 stores data pages in files known as data files. The space allotted to a data file is divided into sequentially numbered data pages. The numbering starts from zero.
There are three types of data files in SQL Server 2005. These are:
- Primary Data Files
- Secondary Data Files
- Log Files
Primary Data Files
A primary data file is automatically created at the time of creation of the database. This file has references to all other files in the database. The recommended file extension for primary data files is .mdf.
Secondary Data Files
Secondary data files are optional in a database and can be created to segregate database objects such as tables, views, procedures etc. The recommended file extension for secondary data files is .ndf.
Log files contain information about modifications carried out in the database. This information is useful in recovery of data in contingencies such as sudden power failure or the need to shift the database to a different server. There is at least one log file for each database. The recommended file extension for log files is .ldf.
Need for Indexes
To search a topic in a book, you normally locate the topic in the index and straightaway turn to the given page number. Thus, the required information is quickly found without having to read the entire book.
Similarly, to facilitate quick retrieval of data from a database, SQL Server 2005 provides the indexing feature. Just like in a book, an index in SQL Server 2005 database contains information that allows you to find specific data without scanning through the entire table.
In a table, records are stored in the order in which they are entered. Their storage in the database is unsorted. When data is to be retrieved from such tables, the entire table needs to be scanned. This slows down the query retrieval process. To speed up query retrieval, indexes are created.
When an index is created on a table, the index creates an order for the data rows or records in the table. This assists in faster location and retrieval of data during searches.
Guidelines about Indexes
Indexes point to the location of a row on a data page instead of searching through the table. Consider the following facts and guidelines about indexes:
- Indexes increase the speed of queries that join tables or perform sorting operations.
- Indexes implement the uniqueness of rows if defined when you create an index.
- Indexes are created and maintained in ascending or descending order.
In a telephone directory, where a large amount of data is stored and is frequently accessed, the storage of data is done in an alphabetical order, If such data was unsorted, it would be nearly impossible to search for a specific telephone number.
Similarly, in a database table having a large number of records that are frequently accessed, records need to be sorted for fast retrieval. When an index is created on the table, the index either physically or logically sorts the records. Thus, searching for a specific record becomes faster and there is less strain on system resources.
Accessing Data Group-wise
Indexes are useful when data needs to be accessed group-wise. For example, you want to make modifications to the conveyance allowance for all employees based on the department they work in. Here you wish to make the changes for all employees in one department before moving on to employees in another department. In this case, an index can be created on the department column before accessing the records.
This index will create logical chunks of data rows based on the department. This again will limit the amount of data actually scanned during query retrieval. Hence, retrieval will be faster and there will be less strain on system resources.
When not to create an index
Indexes are useful to retrieve data at a faster rate, but they consume disk space and require maintenance. There are certain conditions where indexes are not essential.
- Indexes should not be created if they are not used frequently since, maintaining them requires time and resources.
Indexes should not be created on columns having duplicate data.