, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

How GUIDs can cause fragmentation in clustered indexes

Reference – http://www.sqlskills.com/blogs/paul/can-guid-cluster-keys-cause-non-clustered-index-fragmentation/

A GUID Key will create fragmentation(Fragmentation means the data is stored non-contiguously on disk) because it is random in nature and its size is also on a very higher side. A GUID is of 16 bytes, four times the space of an 4-byte integer. So every time when you insert a row in the index SQL has to search the insertion point in the B-tree and since the value we are expecting is random hence the insertion point is also random.

This means that if an index page is full, a random insert that happens to have to go onto that page will cause a page split to make room for the new record. A page-split is where a new page is allocated and (as near as possible to) half the rows from the splitting page are moved to the new page. The new row is then inserted into one of the two pages, determined by the key value. Usually the newly allocated page is not physically contiguous to the splitting page, and so fragmentation has been caused.

In this case *two* kinds of fragmentation have been caused-

1. Logical fragmentation – Here the next logical page as determined by the index order is not the next physical page in the data file.

2. Physical (or internal) fragmentation – Here the space is being wasted on index pages.

These can both affect query performance, as well as the expense of having to do the page split in the first place.

Check out the example below – I’ll create a clustered index with GUID. Let’s see what happens when we insert 10000 rows:


CREATE TABLE TestClusteredKeyFragmentation

CREATE CLUSTERED INDEX Ix_ID ON TestClusteredKeyFragmentation (ID)

INSERT INTO TestClusteredKeyFragmentation DEFAULT VALUES
GO 10000

--Execute the below query and check the details
   OBJECT_NAME (ips.[object_id]) AS 'Object Name',
   si.name AS 'Index Name',
   ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
   ips.page_count AS 'Pages',
   ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats (DB_ID('InMemory'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
   si.object_id = ips.object_id
   AND si.index_id = ips.index_id
   AND ips.index_level = 0
   AND OBJECT_NAME (ips.[object_id]) = 'TestClusteredKeyFragmentation'

The TestClusteredKeyFragmentation clustered index 98.41% fragmented, with around 35% space being wasted on each page. Hence we should never use GUID as Keys. They uses 16 bytes which is four times then the normal integer fields used to take and this space is wasted everywhere. They also expensive in joins and takes more time to perform lookup.

We should always use Integer fields for Primary Key if possible. If you still want to learn more on this topic then click Pauls blog (Paul is the best in SQL SERVER ) – http://www.sqlskills.com/blogs/paul/can-guid-cluster-keys-cause-non-clustered-index-fragmentation/

Keep learning. We all need to learn.