discover, identify, and classify personal data in ... · pdf file azure active directory: data...

Click here to load reader

Post on 22-May-2020




0 download

Embed Size (px)


  • Discover, identify, and classify personal data in Microsoft Azure Personal data discovery, identification, and classification are essential to a successful security,

    governance, compliance, and personal data privacy strategy. Azure customers who collect data from

    their users must be able to identify personal data and understand where it’s located in order to keep it


    Azure provides a rich diversity of data storage possibilities and multiple tools that can help customers

    identify, classify, and search for personal data in their Azure environments, hosted applications, and

    external sources.

    This article provides guidance on how to discover, identify, and classify personal data in several Azure

    tools and services, including using Azure Data Catalog, Azure Active Directory, SQL Database, Power

    Query for Hadoop clusters in Azure HDInsight, Azure Information Protection, Azure Search, and SQL

    queries for Azure Cosmos DB.

    Scenario A U.S.-based sports company collects a variety of personal and other data from their customers and

    employees, maintains it in multiple databases, and stores it in several different locations in their Azure

    environment. In addition to selling sports equipment, they also host and manage registration for elite

    athletic events around the world, including in the EU.

    Since the company hosts many international bicycling tours every year and has contingent staff in

    locations around the globe, a couple of the data sets are quite large. The company also has developer-

    built applications that are used by both customers and employees.

    Problem statement The company wants to address the following issues:

     Customer and employee personal data must be classified/distinguished from the other data the

    company collects in order to ensure proper access and security.

     The data admin needs to easily discover the location of customer personal data across various

    areas of the Azure environment.

     Customer and employee personal data that appears in shared documents and email

    communications must be classified and labelled to help ensure that it’s kept secure.

     The company’s app developers need a way to easily search for customer and employee personal

    data in their web and mobile apps.

     Developers also need to query their document database for personal data.

    Company goals  Data sources and assets that include personal data must be registered so they can be

    tagged/annotated and searched in Azure Data Catalog.

  •  All customer and employee personal data must be tagged/annotated in Azure Data Catalog so it

    can be found easily. Ideally customer and employee personal data are tagged/annotated


     Personal data from customer and employee user profiles and work information residing in Azure

    Active Directory must be easily located.

     Personal data residing in multiple SQL databases must be easily queried.

     Some of the company’s large data sets are managed through Azure HDInsight and stored in

    Hadoop. They must be imported into Excel so they can be queried for personal data.

     Personal data shared in documents and email communications must be classified, labelled, and

    kept secure with Azure Information Protection.

     The company’s app developers must be able to discover customer and employee personal data

    in the apps they’ve built, which they can do with Azure Search.

     Developers must be able to find personal data in their document database.

    Solutions The following Azure tools can help you with personal data identification, classification, and discovery.

    Azure Data Catalog: data classification, annotation, and discovery

    Azure Data Catalog is a metadata catalog that helps enterprise organizations manage and track data

    sources/assets. The first step is to register them. The next step is to classify all personal data and tag or

    annotate it so it’s easier to find. Finally, you can discover personal data through searching and filtering.

    Once you’ve located your data, you can use its location to connect to it with the application or tool of

    your choice, such as Excel or SQL Server Management Studio.

    In order to use the catalog, you must be the owner or co-owner of an Azure subscription and you must

    be signed in with an Azure Active Directory user account.

    Note: You can only have one data catalog per organization/Azure Active Directory domain.

    Data can be classified, annotated and discovered in Azure Data Catalog either manually or through a


    How do I manually register, tag/annotate, and discover/search personal data sources, assets,

    and objects?

    The following steps are an overview of how to register, annotate, and discover/search for data in Azure

    Data Catalog. The links in these steps take you to an Azure Data Catalog tutorial with exercises that

    provide more specific guidance. The exercises focus on a fictional company called AdventureWorks.

    Instructions earlier in the tutorial show you how to load the actual AdventureWorks database and

    provide detailed background information.

    You can do the exercises or just use the information as a guideline for working with your own data.

  • 1. Register data sources/assets

    In order to search for and identify personal data with Azure Data Catalog, you need to register

    your data source/assets first. Once you sign in, you’ll launch the registration tool, choose a data

    source to register and register specific data objects. You can also add tags to help enable search.

    Once registered, the data source or asset remains in its existing location, but a copy of the

    metadata is added to Azure Data Catalog, which allows the user to more easily discover personal


    You can categorize data assets that contain personal information during registration with a tag

    that distinguishes them as such. You can tag customer and employee personal data separately,

    too. For example, tag “name,” “Social Security number,” “ID number,” and any others as

    “customer personal data,” “employee personal information,” or “sensitive customer data.”

    Then they’ll be discoverable with a Data Catalog search. Tags are not preset. You can use any tag

    name you want.

    To learn how to register your data assets, follow the instructions in the Register data assets

    section of the tutorial.

    There is also a how-to page that provides more information about registering, discovering,

    annotating and searching data in Azure Data Catalog. For more information, visit Register data

    sources in Azure Data Catalog, which is part of a larger documentation site for the service (the

    full tutorial can be found under Get Started with Azure Data Catalog on this same site).

  • Once you’ve registered your data sources/assets/objects, you can further tag (annotate) them

    and discover/search for them.

    2. Annotate data sources/assets

    When registering your data source/assets in step 1, you have a chance to add tags to help

    categorize and identify data objects. The annotate data steps show you how to do this after

    your data source/assets are registered.

    The tutorial shows you how to tag data assets, but doesn’t specifically discuss personal data.

    You can use a data tag like “customer personal data,” “employee personal information,” or

    “sensitive customer data” to identify all fields that contain personal data, such as “name”,

    “Social Security number,” “ID number” and others. You can also add tags for experts, users, or

    glossary items, or add tags or descriptions at the column level.

    In addition, you can add information that shows users how to request access to the data

    source/asset and documentation for your assets.

    To learn how to annotate/tag your data assets, follow the instructions in the Annotate data

    assets section of the tutorial.

    For more information, visit How to annotate data sources.

    3. Discover/search for data sources/assets

    Personal data assets can be discovered in Azure Data Catalog through searching and filtering.

    Basic search will match terms and annotations (tags), and filtering allows you to choose tags,

    source type, and other specific identifiers to complement the basic search.

  • To learn how to discover data, follow the instructions in the Discover data assets section of the

    tutorial. You can find personal data by doing a search f

View more