Tags

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,


SQL SERVER Data Compression & Its Types

Recently I have been asked a question in an interview- The question is like we want to keep 1 year of data in our system which is around 2 TB and the physical drive is of size 500 GB only So what will you do in this kind of scenario?

The answer for this question is use Data Compression. Recently I implemented this feature for one of my client where around 1.5 TB of table is compressed to 300 GB. The data compression feature in SQL Server helps compress the data inside a database, and it can help reduce the size of the database.

Data compression provides following benefits

a. It saves space because compressed data is stored in fewer pages.
b. It improves performance of I/O as queries need to read fewer pages from the disk.

Disadvantage-

Extra CPU resources are required on the database server to compress and decompress the data, while data is exchanged with the application

In SQL Server 2016 we have 2 types of Compression

a. Row Compression
b. Page Compression

Following database objects can be compressed

a. A whole table that is stored as a heap.
b. A whole table that is stored as a clustered index.
c. A whole nonclustered index.
d. A whole indexed view.
e. For partitioned tables and indexes, you can configure the compression option for each partition, and the various partitions of an object do not have to have the same compression setting.

Note – SQL Server 2016 supports row and page compression for rowstore tables and indexes, and supports columnstore and columnstore archival compression for columnstore tables and indexes.

Row Compression

a. It Reorganizes data at a record level
b. It Stores data more efficiently in a row by storing fixed-length data types in variable-length storage format. Examples

  • “Pawan Kumar” – Char(100) to Char(11) , here we are saving 89 bytes
  • “15” Int to 1 byte , here we are saving 3 bytes
  • 2015/01/01 DateTime to 4 bytes , here we are saving 4 bytes

c. A compressed row uses 4 bits per compressed column to store the length of the data in the column.
d. NULL and 0 values across all data types take no additional space other than these 4 bits.
e. You can compress these data types – Int, Bigint, Numeric, Decimal, SmallInt, bit, SmallMoney, money, float, real, DateTime, datetime2, datetimeoffset, char, binary, timestamp, nchar, nvarchar, Spatial

Note – All Non leaf level indexes always row compressed.

Page Compression

It works just like Winzip or 7zip works in compression. Note that pages only receive Full Page compression if it will save at least 20% free space on the page. If the system is not getting at least 20% of saving it will just do row compression. In this case SQL Server will alters data at page level and do data type agnostic at binary level.

The Steps in this case are given below-

  • Row compression
  • Column prefix compression –
    In the case the SQL Server creates 2 more sections just below the page header for each row. These sections are called Anchor Section and Dictionary Section. 

In this case we find common pattern from each column and put longest value in the Dictionary Section and put Null in the place of longest value. Then it will replace the first common patters in the remaining rows. This happens for all the columns.

e.g. Let’s say you have following values in your table before this operation.
Row Values – Pawan1, Paw201517, Pawa15, Paw12

So after we perform the operation we will have following values in the table and in dictionary section.

Dictionary Section – Paw201517
Row Values – [3]an1, null, [3]a15, [3]12

Now we may have a case where we don’t have a matching pattern. In this case we will just put null instead of the value we picked as common pattern. In this case we will a little overhead on the system.

e.g Before compression
Row Values – 40.90, 35.76, 75.84, 56

Dictionary Section – 35.76
Row Values – [0]40.90, Null, [0]75.84, [0]56

  • Page Dictionary compression

In this case all the values from the dictionary section moves to the Anchor section. A dictionary value can be referenced by any column of any row on the page. For example, if you have a byte pattern ‘0x1144344 in col1 of row-1 and col2 of row-2, they both can refer to the same dictionary value. You may recall that in column-prefix we look for common prefix in the same column across all rows on the same page.

A dictionary entry is created only if the value is repeats in its entirety in two or more columns. What happens if we have the repeating value in the same column across multiple rows on the page? Don’t we create a column-prefix for that? The answer is most likely ‘yes’ unless there is another column-prefix value that provides more space savings. The rule is that we first apply column-prefix compression on the page and then apply page-dictionary compression in that order.

Page compression is a logged operation.

Pages only fully page compressed during rebuild operation in heap. If you insert data into the compressed data(heap) it will only do row compression. Page compression will happen when you rebuild your data.

Page compression calculation fired off during page splits – Clustered Key
Till the page split happen only row compression happens.

More Notes on Compression

  • In case you have a compressed index and you rebuild it with fill factor 70% then SQL Server maintains fill factor 70%.
  • Should I enable PAGE compression?  The answer for this is use stored procedure sp_estimate_data_compression_savings to  estimate the space  savings. and if there is no or insignificant space savings then we should not enable it.
  • If you are using Heavy Inserts or updates use no compression or row compression. Heavy selects use page compression.Index rebuilds will take more time.

Examples

--


CREATE TABLE CompressData 
(
	  ID INT IDENTITY(1,1)
	, Name CHAR(50)
)
GO

INSERT INTO CompressData VALUES ('Pawan Kumar Khowal')
GO 10000


EXEC sp_spaceused CompressData
GO


ALTER TABLE CompressData
REBUILD WITH (DATA_COMPRESSION = ROW);
GO


EXEC sp_spaceused CompressData
GO


ALTER TABLE CompressData
REBUILD WITH (DATA_COMPRESSION = PAGE);
GO


EXEC sp_spaceused CompressData
GO


--

I hope you have enjoyed the article. Cheers, Thanks for reading !

-Pawan Khowal

MSBISkills.com

References-
https://technet.microsoft.com/en-us/dn912438
https://msdn.microsoft.com/en-us/library/cc280449.aspx
https://technet.microsoft.com/en-us/library/dd894051(v=sql.100).aspx
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/01/18/details-on-page-compression-page-dictionary.aspx
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/05/12/data-compression-and-fill-factor.aspx
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2008/01/18/what-is-page-compression.aspx

Advertisements