How to Use Covering Index to reduce RID lookup

How to Use Covering Index to reduce RID lookup in SQL Server

In this article, I am going to discuss how to use Covering Index to reduce RID lookup in SQL Server which will improve the SQL Query performance in detail. Please read our previous article where we discussed when to choose Table Scan and when to choose Seek-Scan in SQL Server to get better performance.

What is the Row Identifier (RID) lookup?

The Row Identifier (RID) lookup in SQL Server is basically a physical operator and this physical operator only comes into picture when there is a non-clustered index created on the table. So, if there is no non-clustered index in the table, then this Row Identifier (RID) lookup physical operator will not come into the picture.

When RID Lookup comes into the picture, then there is a performance degrade in the SQL Query. So, as part of this article, we are going to discuss the following pointers.

  1. What are non-clustered indexes in SQL Server?
  2. How RID lookup degrade the performance of the SQL Query?
  3. How Covering Index helps us to reduce the RID Lookup performance issues?
Non-Clustered Index in SQL Server:

Both clustered and non-clustered index follows the same B-Tree structure. That is both having the same root node, non-leaf node, and the leaf node. The only difference is how the leaf node works. The clustered index leaf node actually pointing to the actual data on the hard disk. On the other hand, the non-clustered index leaf node has a Row ID (RID) and depending upon the situations, this Row ID (RID) pointing to different things. Please have a look at the following to understand this better.

Non-Clustered Index in SQL Server

Situation1:

If your table does not have a clustered index, the Row ID (RID) of the non-clustered index pointing to the heap table. The heap table means a table without indexes. In the heap table, it will search row by row until it finds the data.

Situation2:

If your table has a clustered index, then the Row ID (RID) of the non-clustered index will point towards the clustered index key and that indexed key is used to search the data.

Understand Row Identifier (RID) in the non-clustered index:

Let us understand the Row Identifier (RID) in SQL Server with an example. We are going to use the following Customer table to understand this concept.

How to Use Covering Index to reduce RID lookup in SQL Server

Please use the below SQL Script to create and populate the Customer table with data.

CREATE TABLE Customer(
 [CustomerID] INT NOT NULL,
 [CustomerCode] VARCHAR(20) NOT NULL,
 [CustomerName] VARCHAR(50) NOT NULL
)
GO

INSERT INTO Customer VALUES (1, 'CUS_101', 'David')
INSERT INTO Customer VALUES (4, 'CUS_104', 'Taylor')
INSERT INTO Customer VALUES (6, 'CUS_106', 'Smith')
INSERT INTO Customer VALUES (3, 'CUS_103', 'Sara')
INSERT INTO Customer VALUES (5, 'CUS_105', 'Pam')
INSERT INTO Customer VALUES (2, 'CUS_102', 'John')

At this moment, the Customer table does not have any keys or indexes in any of the columns. The Customer Code field is going to hold unique values for each customer. So, let us add a unique non-clustered index on the Customer Code column by executing the following query.

CREATE UNIQUE NONCLUSTERED INDEX IX_Customer_Code ON Customer(CustomerCode ASC);

Once you created the non-clustered index, let us fire the following select query and see the estimated query execution plan.

SELECT * FROM Customer WHERE CustomerCode = ‘CUS_101’;

Once you execute the above select query, now let us see the Display Estimated Execution Plan either by clicking on the Display Estimated Execution Plan button or simply by pressing (Ctrl + L) keys which should display the execution query plan window as shown below.

Non-clustered Index Leaf Node points to RID lookup of Heap Table

In the above image, you can see something called RID Lookup (Heap). So, whenever your table does not have any clustered index, then the leaf node of the non-clustered index will point to this RID Lookup in the Heap table. That’s what we already discussed.

Now, let us see what will happen when we create a clustered index on the CustomerID column and execute the same SQL query.

Creating a Clustered Index:

Create a clustered index on the CustomerID column by executing the following SQL Script. As the CustomerID is the ID column of the Customer Table and as it is going to contain unique values for each customer, so here, we are creating a unique clustered index.

CREATE UNIQUE CLUSTERED INDEX IX_Customer_ID ON Customer(CustomerID ASC);

Once you created the above unique clustered index, let us fire the same SQL Query and see the estimated query execution plan.

SELECT * FROM Customer WHERE CustomerCode = ‘CUS_101’;

Once you execute the above query, now let us see the Display Estimated Execution Plan either by clicking on the Display Estimated Execution Plan button or simply by pressing (Ctrl + L) keys which should display the execution query plan window as shown below.

Non-Clustered Index Leaf Node Pointing towards Clustered Index Key

Now you can see, it is pointing towards the Clustered Index Key and this is because of the clustered index in the table. So, we have seen how the query plan changes when we have a non-clustered with and without a clustered index in SQL Server.

Now, the question that should come to your mind is where is the problem?

The problem is in the lookup jump. The non-clustered index leaf node contains the Row ID (RID) and this RID needs to jump to the heap table or to the clustered index key. This jump requires additional effort and hence it decreases the performance of your query.

How to overcome the RID Lookup or Key Lookup in SQL Server?

The Lookups can be eliminated or reduced by using a Covering Index. In SQL Server, we can create the Covering Index in two ways. They are as follows:

  1. Composite Keys
  2. Include Keyword

But Include is the best way to implement the Covering Index and this is because

  1. You cannot create composite indexes on data types like XML, VARCHAR(MAX), etc.
  2. The size of composite indexes increases the key size which further impacts the performance.
Creating Covering Query using Include:

Let us create a covering query using the Include option. So, what we will do here is, first, we will delete the non-clustered index and then we will create the non-clustered index using the Include option.

Delete non-clustered Index: DROP INDEX Customer.IX_Customer_Code;

Create a non-clustered index:

The columns you want in the select clause must be specified in the Include operator except for the column on which you are creating the index.

CREATE UNIQUE NONCLUSTERED INDEX IX_Customer_Code ON Customer(CustomerCode ASC) INCLUDE (CustomerID, CustomerName);

Once you create the above index using the Include option, now let us fire the same SQL as shown below query and see the estimated query execution plan.

SELECT * FROM Customer WHERE CustomerCode = ‘CUS_101’;

Once you execute the above select statement, now let us see the Display Estimated Execution Plan either by clicking on the Display Estimated Execution Plan button or simply by pressing (Ctrl + L) keys which should display the execution query plan window as shown below.

how to use a Covering Index to reduce RID lookup in SQL Server

Now you can see in the above image, the extra lookup from the non-clustered index leaf node to the heap or to the clustered index key has been eliminated which ultimately improves the performance of your SQL query.

That’s it for today. In the next article, I am going to discuss how to create proper Indexes in a table to Improve Performance in SQL Server. Here, in this article, I try to explain how to use Covering Index to reduce RID lookup in SQL Server to improve the SQL Query performance in detail.