Dynamic Data Masking Basics In SQL Server
In this post, we’ll look at an example of data masking in SQL Server using three data masking functions with email. I use emails across the board in this post because we can compare the results side-by-side. As a quick note, while I use three different data masking functions, we have more functions that we can use if we need.
First, let’s look at some terminology when it comes to SQL Server and securing data in general so that we can know where data masking fits into the picture.
Questions
- Question 1: you want to limit data exposure on the column level by hiding a portion or all of the values to some users within your organization. What technology in SQL Server would you use that is available in 2016 and beyond?
- Question 2: You are using a partial data mask in T-SQL with
partial(1,XX,1)
for names in your database. When the user who will not be able to see the unmasked names queries the database, how will the name Sarah Johnson appear? - Question 3: What is one drawback to dynamic data masking that allows for an attacker to get around it?
Terms:
- Data masking: data security technique that hides sensitive data through a variety of techniques, such as randomizing the actual data values, obfuscating a fraction of the data values, or other techniques.
- Row level data security (also called row level security): complex data security technique that acts like a row filter on data (ie:
WHERE Country = 'Serbia'
). - Transparent data encryption: data security technique that encrypts data at rest through securing the data and log files on disk.
- Always encrypted: a data security technique that occurs on a level higher than the data (application layer), as the database will not have the encryption keys. This means that if the database is compromised, it’s meaningless because even a high level database administrator wouldn’t have access to the data.
In this post, we’re looking at the first one — data masking. For our example in this post, we’ll be looking at using three data masking functions — default, partial and email.
Data Masking Basics
In the below code, I create a table with four columns, all of which will store email addresses and add four values. I check if table exists first only because in the entire script, I want to be able to run the whole SQL script without opening 10 windows in SQL Server. You do not have to write or test your code this way, but it does make testing simpler.
IF OBJECT_ID('DataMaskExample') IS NOT NULL
BEGIN
DROP TABLE DataMaskExample
END
CREATE TABLE DataMaskExample (
StandardValue VARCHAR(50),
DataMaskDefault VARCHAR(50) MASKED WITH (FUNCTION = 'default()'),
DataMaskPartial VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"xxxxx",2)'),
DataMaskEmail VARCHAR(50) MASKED WITH (FUNCTION = 'email()')
)
INSERT INTO DataMaskExample
VALUES ('jane.doe@ourdomain.com','jane.doe@ourdomain.com','jane.doe@ourdomain.com','jane.doe@ourdomain.com')
, ('john.doe@ourdomain.com','john.doe@ourdomain.com','john.doe@ourdomain.com','john.doe@ourdomain.com')
, ('james.johnson@ourdomain.com','james.johnson@ourdomain.com','james.johnson@ourdomain.com','james.johnson@ourdomain.com')
, ('sarah.johnson@ourdomain.com','sarah.johnson@ourdomain.com','sarah.johnson@ourdomain.com','sarah.johnson@ourdomain.com')
SELECT TOP 1 * FROM DataMaskExample
Let’s break down the three columns that we’re masking in this example (for the current user, these columns are not masked — we’ll show a masked example later with a user who doesn’t have permissions to see the data):
- In the case of default masking, the entire result will be hidden. We will not be able to even see part of the column, so the result will be
xxxx
. - In the case of partial masking, we’ll see the first 2 characters of the column and the last 2 characters of the column with the rest of the column having
xxxxx
. So in the case ofjane.doe@ourdomain.com
, we would get the result ofjaxxxxxom
. - In the case of email masking, this functionality comes from SQL Server and will hide most of the email values. In our example of
jane.doe@ourdomain.com
, we’ll get the result ofjXXX@XXXX.com
. As a note, the top level domain is not hidden as we’ll see, so be aware that how we’ve used this functionality would expose the top level domain.
Testing Data Masking
Next, we want to review the results of our masked values. Unfortunately, the user that we’re using to set the data masking will have full permissions to the data and see everything. As you see when you run the above code, you see all the values!
In the below code, we check if a TestUser
exist and if it does, we drop the test user and then create it. Be careful when running this! If you already have a user with this name, you may not want to run this code because we’ll drop the user. As I wrote earlier in the post, I do this because I want to be able to run all the code in 1 query window, not 10. However, I do not use users with this name, so this makes sense. From here, select everything from our data masked table as the user TestUser
.
IF EXISTS (SELECT 1 FROM sys.database_principals WHERE name = 'TestUser')
BEGIN
DROP USER TestUser
END
CREATE USER TestUser WITHOUT LOGIN
ALTER ROLE db_datareader ADD MEMBER TestUser
---- Review results:
BEGIN TRAN
SELECT SUSER_NAME() SuserN, USER_NAME() UserN --- 1: gets current user
EXECUTE AS USER = 'TestUser'
SELECT SUSER_NAME() SuserN, USER_NAME() UserN --- 2: we are now TestUser
SELECT * FROM DataMaskExample
REVERT
SELECT SUSER_NAME() SuserN, USER_NAME() UserN --- returns to the user in shown in 1
ROLLBACK TRAN
Let’s dig further into the ----Review results:
section:
- Any transaction (
SELECT
or otherwise) that you do not want to commit, always practice aBEGIN TRAN
/ROLLBACK TRAN
. This is just a good habit, even when testing. - We check for the current user that we’re running, then run
EXECUTE AS
our test user and check the current user again to confirm that we’ve switched. This is not necessary, but is like the mathematical concept of showing your work. - We select everything from our example table and see the output that we expected to see.
- We’ll notice the
REVERT
command here and see what happens with our user — it switches back. If you remove the revert command, you will still be running queries as that test user (which we don’t want to run). This is why we run revert. Per Microsoft, this applies in situations where we are running commands as users.
Full Data Masking Query Run
If we want to execute everything in one window (as we’ve been building) the below code run is as follows along with an image shown below this:
IF OBJECT_ID('DataMaskExample') IS NOT NULL
BEGIN
DROP TABLE DataMaskExample
END
CREATE TABLE DataMaskExample (
StandardValue VARCHAR(50),
DataMaskDefault VARCHAR(50) MASKED WITH (FUNCTION = 'default()'),
DataMaskPartial VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"xxxxx",2)'),
DataMaskEmail VARCHAR(50) MASKED WITH (FUNCTION = 'email()')
)
INSERT INTO DataMaskExample
VALUES ('jane.doe@ourdomain.com','jane.doe@ourdomain.com','jane.doe@ourdomain.com','jane.doe@ourdomain.com')
, ('john.doe@ourdomain.com','john.doe@ourdomain.com','john.doe@ourdomain.com','john.doe@ourdomain.com')
, ('james.johnson@ourdomain.com','james.johnson@ourdomain.com','james.johnson@ourdomain.com','james.johnson@ourdomain.com')
, ('sarah.johnson@ourdomain.com','sarah.johnson@ourdomain.com','sarah.johnson@ourdomain.com','sarah.johnson@ourdomain.com')
SELECT TOP 1 * FROM DataMaskExample
IF EXISTS (SELECT 1 FROM sys.database_principals WHERE name = 'TestUser')
BEGIN
DROP USER TestUser
END
CREATE USER TestUser WITHOUT LOGIN
ALTER ROLE db_datareader ADD MEMBER TestUser
---- Review results:
BEGIN TRAN
--SELECT SUSER_NAME() SuserN, USER_NAME() UserN --- 1: gets current user
EXECUTE AS USER = 'TestUser'
--SELECT SUSER_NAME() SuserN, USER_NAME() UserN --- 2: we are now TestUser
SELECT * FROM DataMaskExample
REVERT
--SELECT SUSER_NAME() SuserN, USER_NAME() UserN --- returns to the user in shown in 1
ROLLBACK TRAN
DROP TABLE DataMaskExample
DROP USER TestUser
In the below image, you’ll see the entire query result of the full query being run without any of the SELECT SUSER_NAME() SuserN, USER_NAME() UserN
queries (for security reasons, this result is not shown).
Disadvantages of Dynamic Data Masking
Dynamic data masking only applies to versions of SQL Server 2016 and later. If you are running older versions of SQL Server, then this functionality will not be available to you. You can use data masking in Azure SQL, Azure SQL Managed Instances and Azure Synapse as this is a supported feature in all of these.
Before implementing this as a solution, you will want to plan access levels for users. In addition, to avoid confusion I recommend automation of permissions to avoid situations where a user may have elevated access to data where they should not. The same applies to situations where you may want some teams to be able to view all data (ie: the marketing team being able to see email addresses while the finance team cannot). Finally, consider this with updates to data; if a user is not able to see certain values in the database, you may want to avoid this user having access to write data.
Dynamic data masking is less secure that other encryption and security methods because:
- Sensitive data are still being stored in the database
- Hackers will always try to escalate their privileges
- Unauthorized users can still query underlying data to determine values through intelligent guessing.
- Dynamic data masking cannot apply to some columns such as FILESTREAM, COLUMN_SET and computed columns.
Let’s look at points 2 and 3 in more details.
Hackers Can Get Around Dynamic Data Masking
One reminder I discuss with people is that hackers have the incentives to compromise your system plus it depends on who is hacking your system or who is after your data. Before you consider using dynamic data masking as a solution, I highly recommend the video What Everyone Show Know About Hackers — Who and Why.
Hackers will always try to escalate privilege once they have unauthorized access to a system. Because dynamic data masking still stores data in the database, this means that they can eventually get access to the sensitive data once they escalate their privileges.
In addition, you have people at the company with access so hackers know that they can attempt social engineering techniques, such as spear phishing, to possibly compromise your data through them. As I write later, this is one of the big reasons that I dislike dynamic data masking — some employees see data that they shouldn’t and thus become easy targets to spear phishing.
I once was giving a presentation about security and data and I wanted to show the audience how easy it was to compromise a company through spearphishing. I simply queried a popular social media site for a company and showed the audience that there were people working at the company posting details about the company and their customers. This is how easy it was to find details about companies — and if I was in HR, everyone of those employees would be fired, as they are compromising their company’s security. However, hackers don’t share my goal of helping companies secure their data better — a hacker will use that information against the company.
This leads us to our next point — even when we mask customer data, what happens if an employee can query the data to uncover the values?
Uncovering Data Masked Value
In the case of the third point about querying the data to uncover the underlying values, consider our example that have above this where we’ve masked email addresses. Let’s add a new email address to our example, then run the same query to check for values but this time let’s see if we can take a stab at what these values are — and here you’ll start to see big problems with dynamic data masking:
--- We add a new value that we'll be querying for
INSERT INTO DataMaskExample
VALUES ('read@read.com','read@read.com','read@read.com','read@read.com')
---- Our results in data being returned
BEGIN TRAN
EXECUTE AS USER = 'TestUser'
SELECT * FROM DataMaskExample WHERE LOWER(DataMaskDefault) LIKE '%read%'
SELECT * FROM DataMaskExample WHERE LOWER(DataMaskPartial) LIKE '%read%'
SELECT * FROM DataMaskExample WHERE LOWER(DataMaskEmail) LIKE '%read%'
REVERT
ROLLBACK TRAN
We get results from all three queries and it’s accurate! This is dangerous, especially if we’ve masked columns involved in numerical ranges. Keep in mind that we could query this data with even more detail and start to make out what these email addresses are.
Dynamic data masking does little to stop a hacker and unfortunately may give us a perspective that we’ve secured our data when in reality, we have done very little to secure our data.
Answers
- Answer 1: Dynamic data masking
- Answer 2: SXXN
- Answer 3: you could answer with a few possibilities from this post — an attacker could still compromise someone with full access to the data through spear phishing or social engineering. Another possible attack is to query the data with ranges/detailed queries until you narrow down what the values are.
Conclusion
Keep in mind that data masking as a security solution is not as robust as some of the other security techniques. For an example, one company once interviewed me about a high level database development and administrative position that would have had significant access to user data. In the interview, one person remarked that they had an idea about “mining” people’s private messages to other users on their platform. I pointed out in the interview that they were violating their users’ privacy by even looking at those data. Needless to write, the interview became very uncomfortable at that point and I had no interest in working for the company because it literally was violating their users’ privacy.
However, data masking would not help in this situation — these developers and database administrators had significant access, plus would have been the ones to set up data masking! As one who’s administered many databases over his career, I do not like any access to user data, so I prefer encryption methods where even I have no idea what’s being stored. Using the wrong data security technique (or the wrong combination, as it’s often best to combine security techniques) could come with significant costs.
Note: all images in the post are either created through actual code runs or from Pixabay. The written content of this post is copyright; all rights reserved. None of the written content in this post may be used in any artificial intelligence.