discover, identify, and classify personal data in ... · pdf file azure active directory: data...
Post on 22-May-2020
4 views
Embed Size (px)
TRANSCRIPT
Discover, identify, and classify personal data in Microsoft Azure Personal data discovery, identification, and classification are essential to a successful security,
governance, compliance, and personal data privacy strategy. Azure customers who collect data from
their users must be able to identify personal data and understand where it’s located in order to keep it
secure.
Azure provides a rich diversity of data storage possibilities and multiple tools that can help customers
identify, classify, and search for personal data in their Azure environments, hosted applications, and
external sources.
This article provides guidance on how to discover, identify, and classify personal data in several Azure
tools and services, including using Azure Data Catalog, Azure Active Directory, SQL Database, Power
Query for Hadoop clusters in Azure HDInsight, Azure Information Protection, Azure Search, and SQL
queries for Azure Cosmos DB.
Scenario A U.S.-based sports company collects a variety of personal and other data from their customers and
employees, maintains it in multiple databases, and stores it in several different locations in their Azure
environment. In addition to selling sports equipment, they also host and manage registration for elite
athletic events around the world, including in the EU.
Since the company hosts many international bicycling tours every year and has contingent staff in
locations around the globe, a couple of the data sets are quite large. The company also has developer-
built applications that are used by both customers and employees.
Problem statement The company wants to address the following issues:
Customer and employee personal data must be classified/distinguished from the other data the
company collects in order to ensure proper access and security.
The data admin needs to easily discover the location of customer personal data across various
areas of the Azure environment.
Customer and employee personal data that appears in shared documents and email
communications must be classified and labelled to help ensure that it’s kept secure.
The company’s app developers need a way to easily search for customer and employee personal
data in their web and mobile apps.
Developers also need to query their document database for personal data.
Company goals Data sources and assets that include personal data must be registered so they can be
tagged/annotated and searched in Azure Data Catalog.
All customer and employee personal data must be tagged/annotated in Azure Data Catalog so it
can be found easily. Ideally customer and employee personal data are tagged/annotated
separately.
Personal data from customer and employee user profiles and work information residing in Azure
Active Directory must be easily located.
Personal data residing in multiple SQL databases must be easily queried.
Some of the company’s large data sets are managed through Azure HDInsight and stored in
Hadoop. They must be imported into Excel so they can be queried for personal data.
Personal data shared in documents and email communications must be classified, labelled, and
kept secure with Azure Information Protection.
The company’s app developers must be able to discover customer and employee personal data
in the apps they’ve built, which they can do with Azure Search.
Developers must be able to find personal data in their document database.
Solutions The following Azure tools can help you with personal data identification, classification, and discovery.
Azure Data Catalog: data classification, annotation, and discovery
Azure Data Catalog is a metadata catalog that helps enterprise organizations manage and track data
sources/assets. The first step is to register them. The next step is to classify all personal data and tag or
annotate it so it’s easier to find. Finally, you can discover personal data through searching and filtering.
Once you’ve located your data, you can use its location to connect to it with the application or tool of
your choice, such as Excel or SQL Server Management Studio.
In order to use the catalog, you must be the owner or co-owner of an Azure subscription and you must
be signed in with an Azure Active Directory user account.
Note: You can only have one data catalog per organization/Azure Active Directory domain.
Data can be classified, annotated and discovered in Azure Data Catalog either manually or through a
REST API.
How do I manually register, tag/annotate, and discover/search personal data sources, assets,
and objects?
The following steps are an overview of how to register, annotate, and discover/search for data in Azure
Data Catalog. The links in these steps take you to an Azure Data Catalog tutorial with exercises that
provide more specific guidance. The exercises focus on a fictional company called AdventureWorks.
Instructions earlier in the tutorial show you how to load the actual AdventureWorks database and
provide detailed background information.
You can do the exercises or just use the information as a guideline for working with your own data.
https://azure.microsoft.com/services/data-catalog/ https://azure.microsoft.com/free/
1. Register data sources/assets
In order to search for and identify personal data with Azure Data Catalog, you need to register
your data source/assets first. Once you sign in, you’ll launch the registration tool, choose a data
source to register and register specific data objects. You can also add tags to help enable search.
Once registered, the data source or asset remains in its existing location, but a copy of the
metadata is added to Azure Data Catalog, which allows the user to more easily discover personal
data.
You can categorize data assets that contain personal information during registration with a tag
that distinguishes them as such. You can tag customer and employee personal data separately,
too. For example, tag “name,” “Social Security number,” “ID number,” and any others as
“customer personal data,” “employee personal information,” or “sensitive customer data.”
Then they’ll be discoverable with a Data Catalog search. Tags are not preset. You can use any tag
name you want.
To learn how to register your data assets, follow the instructions in the Register data assets
section of the tutorial.
There is also a how-to page that provides more information about registering, discovering,
annotating and searching data in Azure Data Catalog. For more information, visit Register data
sources in Azure Data Catalog, which is part of a larger documentation site for the service (the
full tutorial can be found under Get Started with Azure Data Catalog on this same site).
https://docs.microsoft.com/azure/data-catalog/data-catalog-get-started#register-data-assets https://docs.microsoft.com/azure/data-catalog/data-catalog-how-to-register https://docs.microsoft.com/azure/data-catalog/data-catalog-how-to-register https://docs.microsoft.com/azure/data-catalog/data-catalog-get-started
Once you’ve registered your data sources/assets/objects, you can further tag (annotate) them
and discover/search for them.
2. Annotate data sources/assets
When registering your data source/assets in step 1, you have a chance to add tags to help
categorize and identify data objects. The annotate data steps show you how to do this after
your data source/assets are registered.
The tutorial shows you how to tag data assets, but doesn’t specifically discuss personal data.
You can use a data tag like “customer personal data,” “employee personal information,” or
“sensitive customer data” to identify all fields that contain personal data, such as “name”,
“Social Security number,” “ID number” and others. You can also add tags for experts, users, or
glossary items, or add tags or descriptions at the column level.
In addition, you can add information that shows users how to request access to the data
source/asset and documentation for your assets.
To learn how to annotate/tag your data assets, follow the instructions in the Annotate data
assets section of the tutorial.
For more information, visit How to annotate data sources.
3. Discover/search for data sources/assets
Personal data assets can be discovered in Azure Data Catalog through searching and filtering.
Basic search will match terms and annotations (tags), and filtering allows you to choose tags,
source type, and other specific identifiers to complement the basic search.
https://docs.microsoft.com/azure/data-catalog/data-catalog-get-started#annotate-data-assets https://docs.microsoft.com/azure/data-catalog/data-catalog-get-started#annotate-data-assets https://docs.microsoft.com/azure/data-catalog/data-catalog-how-to-annotate
To learn how to discover data, follow the instructions in the Discover data assets section of the
tutorial. You can find personal data by doing a search f