Cardinality

Last week one of my friend called me and asked me what is Cardinality. He told me that in one of the interviews the interviewer asked this question. Before this I have never heard that this question was asked in a MSBI Interview.

https://en.wikipedia.org/wiki/Cardinality_(SQL_statements)

As per Wikipedia article on Cardinality (SQL statements), defines it as follows:

Cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column

There are 3 types of cardinality:

High-cardinality, Normal-cardinality, and Low-cardinality

High-cardinality refers to columns with values that are very uncommon or unique. High-cardinality column values are typically identification numbers, email addresses, or user names. An example of a data table column with high-cardinality would be a Emps table with a column named EMPID. Since the values held in the EMPID column are unique, this column’s cardinality type would be referred to as high-cardinality.

Normal-cardinality refers to columns with values that are somewhat uncommon. Normal-cardinality column values are typically names, street addresses, or vehicle types. An example of a data table column with normal-cardinality would be a CUSTOMER table with a column named LAST_NAME, containing the last names of customers. While some people have common last names, such as Smith, others have uncommon last names. Therefore, an examination of all of the values held in the LAST_NAME column would show “clumps” of names in some places (e.g.: a lot of Smith’s ) surrounded on both sides by a long series of unique values. Since there is a variety of possible values held in this column, its cardinality type would be referred to as normal-cardinality.

Low-cardinality refers to columns with few unique values. Low-cardinality column values are typically status flags, Boolean values, or major classifications such as gender. An example of a data table column with low-cardinality would be a CUSTOMER table with a column named NEW_CUSTOMER. This column would contain only 2 distinct values: Y or N, denoting whether the customer was new or not. Since there are only 2 possible values held in this column, its cardinality type would be referred to as low-cardinality.

Keep learning. We all need to learn.

MSBISkills.com