Why Lookups are bad for performance?

Tags

Why Lookups are bad for performance?

Well today we will talk about why lookups are bad for query performance. So honestly lookups are not bad every time. It depends where optimizer have used it. So we will go step by step to understand this concept in detail. First we will understand what lookups, their types are and when they are bad for performances and what are the remedies we have.

Agenda
1. What are lookups?
2. What are the types of lookups?
3. Examples and lookup operators in execution plan
4. Scenarios where they are bad for performances?
5. Remedies for lookups? How to avoid lookups?

What are lookups?

Please note that lookups happens only in case of Non Clustered Indexes only. This is because in clustered indexes we will have data for all columns at leaf level.

Now lookup case your query is able to use Non Clustered to satisfy the search argument from where clause (Predicate) but you have at least a column in the select list which is not available (Not included) with this index. So effectively from somewhere we have to pull out the remaining columns as optimizer has to satisfy the user’s query. So what it does, it uses RID or clustered key (Clustered Index Key) to fetch the remaining columns from either heap or clustered Index.

So effectively we have two types of lookups

1. RID Lookup (Heap) – If the table referenced does not have a clustered index then the query optimizer has to go back to the heap to get the additional information. This is called RID lookup (Heap).
2. Key Lookup (Clustered) – So if the table referenced has a clustered index then the query optimizer has to go back to the clustered index using Clustered key it got from the Non Clustered index. This is called Clustered Key Lookup.

Example of RID Lookup (Heap)

--

/* Create a simple customer table */

CREATE TABLE Customers
(
	 ID INT
	,Name VARCHAR(10)
	,Addr VARCHAR(10)
)
GO

/* Insert some data */

INSERT INTO Customers
SELECT DISTINCT number --, LEFT(CAST(NEWID() AS VARCHAR(255)),8),LEFT(CAST(NEWID() AS VARCHAR(255)),8)
FROM MASTER..SPT_VALUES WHERE number > 0 and number < 1000
GO

INSERT INTO Customers
SELECT DISTINCT number, '02BBD08D', 'Sometext'
FROM MASTER..SPT_VALUES WHERE number > 1158 and number < 1258

/* Create Non Clustered index on Name column */

CREATE NONCLUSTERED INDEX Ix_Name ON Customers(Name)

/*Execute the below query and examine the execution plan*/
SELECT Name, Addr FROM Customers
WITH (INDEX(Ix_Name))
WHERE Name = '02BBD08D'

--

Now let’s examine the execution plan for the above query.

Here we have RID Lookup (Heap) since we don’t have Addr column in non-clustered index. Query optimizer uses RID lookup to fetch the additional information. Now let’s check whether it is costly for us not.

Now check out the number of executions for this iterator. It is 101 in our case. Here the number is on the higher side. If this number of executions are 1, 2, 3 or some smaller then it is fine but if you have got lots of rows the cost of this operator becomes astronomical.

In this case we are reading 104 pages from disk. Check out the output of StatisticsIO ON

Lookups are very easy to fix. Now let’s modify our index to include the Addr column in the include section. Now why I am adding addr column in the include section is because we are not using this column in the where that’s why. If we would have used in the where then we could have used it in the key column section.

--

/* Drop the existing index and create Non Clustered index on Name column with Addr Column in the include section */

DROP INDEX Ix_Name ON Customers
GO

CREATE NONCLUSTERED INDEX Ix_Name ON Customers(Name) INCLUDE(Addr)
GO

/*Execute the below query and examine the execution plan*/
SELECT Name, Addr FROM Customers
WHERE Name = '02BBD08D'

--

Now let’s examine the execution plan and how many pages we are reading.

--

(101 row(s) affected)
Table 'Customers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

--

In this case we are reading only 2 pages. So we are doing great!

In this case we are doing great since we removed RID Lookup(Heap) and Nested Loop(Inner Join) iterator. Enjoy 

Example of Key Lookup (Clustered)

--

USE AdventureWorks2014
GO

SELECT p.Name , p.Color
FROM [Production].[Product] p
WHERE p.Name LIKE 'k%'

--

Now let’s examine the execution plan.

Now let’s include color column also in the index.

--

/* Drop the existing index and create Non Clustered index on Name column with color column in the include section */

DROP INDEX AK_Product_Name ON [Production].[Product]
GO

CREATE NONCLUSTERED INDEX AK_Product_Name ON [Production].[Product](Name) INCLUDE(Color)
GO

--

Now let’s check the new execution plan

Please understand that Lookups are very easy to fix but you also need to consider other things like how many total indexes you have on your table as indexes comes with a cost. Also whenever you do some DML operation (Insert, Update and Delete) you indexes will be restructured.

Now let’s talk about scenarios where lookups are bad and there remedies.

1. If the number of executions are on the higher side then only should concern about this. If the number of executions in the lookup section is small then we can ignore this.
2. If your query is doing a RID lookup, which means you don’t have Clustered Index on your table. In this scenario you should create a clustered index on your table to get better statistics and eventually an optimal execution plan rather than.
3. Now check whether you really need that extra column(s) or not, most of the time people tend to write * and don’t understand whether they really need that or not. Select only columns that you actually required.
4. If you query id doing a Key lookup(Clustered) then you need to consider things like how many existing indexes you have on your table and whether other queries are badly affected or not due to inclusion of this column. My rule of thumb is that each table should have less than five indexes. We can go ahead and add that column in the include section or the key section depending of the requirement of the query.

I hope you have enjoyed the article. Thanks for reading !

-Pawan Khowal

MSBISkills.com

1 thought on “Why Lookups are bad for performance?”

Sanjay Kumar said:

April 30, 2018 at 9:44 AM

Hello,

11/10!! Your blog is such a complete read. I like your approach with Why Lookups are bad for performance?. Clearly, you wrote it to make learning a cake walk for me.

how to find first non-space character from end of the string

ex: ‘abc def g ‘

I should get 4 in this case. (the first nonspace character from backwards in the above example is g and the position is 4 from backwards.

Could you help me in getting this?

I am so grateful for your blog. Really looking forward to read more.

Best Regards,
Irene Hynes

LikeLike

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Improving my SQL BI Skills

Why Lookups are bad for performance?

1 thought on “Why Lookups are bad for performance?”

Leave a comment Cancel reply

Share this

Related

1 thought on “Why Lookups are bad for performance?”

Leave a comment Cancel reply