Getting Started with Hashing in SQL Server

Introduction

In my most recent articles, Ive talked about encryption in detail and demonstrated its usage at the entire database level with Transparent Data Encryption and at the column level with granularcell level encryption. In this article, I am going to discuss hashing in SQL Server and how it is different from encryption.

Encryption brings data into a state which cannot be interpreted by anyone who does not have access to the decryption key, password, or certificates. Though encryption does not restrict the access to the data, it ensures if data loss happens, then in that case data is useless for the person who does not have access to the decryption keypasswordcertificates. On the other hand, Hashing brings a string of characters of arbitrary size into a usually shorter fixed-length value or key that represents the original string and acts as a shortened reference to the original data. A slight change in the input string of characters produces a completely different hashed output.

To meet the demands of regulatory compliance and corporate data security standards, SQL Server allows you to enable encryption at the columncell level or on the entire database level whereas hashing can be used for several purposes for example:

Encryption is bidirectional, which means data encrypted can be decrypted back to the original string if you have access to the correct decryption key, whereas hashing is unidirectional, which means hashed data cannot be reversed back to the original string.

SQL Server has the HASHBYTES inbuilt function to hash the string of characters using different hashing algorithms. The supported algorithms are MD2, MD4, MD5, SHA, SHA1, or SHA2. The hashed data conforms to the algorithm standard in terms of storage size i.e. 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512. SHA2_256 and SHA2_512 algorithms are available in SQL Server 2012 and later versions.

The stronger hash function you use, the more storage space it takes, and performance is slower but it provides a stronger hash value with minimal chance of hash collision (generating the same hashed output for two different input string of characters). Hence, its recommended to use hashing algorithms depending on your workload and data to hash by making it an appropriate trade-off.

The example below, demonstrates the use of the HASHBYTES function to do hashing using MD5 algorithm. As mentioned before, a slight change in the input string of characters produces a completely different hashed output and this is what you could see in the second column. The only difference between input for the first column and input for the second column is an extra space at the end of the input string in the second input string:

No matter how many times you do the hashing, the hashed output will remain same for the same set of input strings and same hashing algorithm:

View post:
Getting Started with Hashing in SQL Server

Related Posts
This entry was posted in $1$s. Bookmark the permalink.