proximity window (patternsproximity=300) keyword match is within proximity window idmatch...

Brian ReidMVP, MCM, MCSM, VTSP, etc.C7 Solutions and NB Consulting UK

Extending Data Loss Prevention For Your Business

EDC401

DLP Intro

DLP ComponentsData classificationsBuilt in and uploaded XML templates that define the type of data to look for in messages

DLP policy templates and collectionsGroups of transport rules that can be enabled or disabled together that collectively provide the checks and conditions for DLP

Transport rulesThe rules that control mail flow and set the restrictions should a message contain matching DLP content

Policy tipsMessages in Outlook 2013 or OWA/OWA for Devices to inform the user that the message should have limited distribution

Text extraction engineA component of Exchange Server that scans every message looking for data that matches the data classifications referenced in the transport rules.

Reporting and enforcingCharting and reporting on the number of hits a DLP rule has found and incident reporting which copies the message and or properties of the message for audit purposes. Charting is only in Office 365.

Creating DLP solutions for your businessUsing out of the box templatesMainly for detecting financial and personally identifiable information (PII)

Writing your ownCreate a data classification (an XML template to describe your private data)Uploading that template to Exchange Server 2013 or Exchange OnlineCreating DLP policies using your new data classifications

Purchasing from third party vendor

Data Classifications

Process to create a data classificationCollect a suitable set of documents to classifyDocuments should contain a known set of patterns and evidences of the classificationDetermine how confident you are that the document describes the classification

Create classification and rules as requiredTest document setTest your document set against your classification and rules to see if it works. Adjust as required.Documents should pass the classification test as expected by getting a score greater than the confidence threshold

Determine confidence level for pattern or evidenceConfidence Level = True Positives / (True Positives + False Positives)Therefore if a classification rule has 5 test sets, 4 are expected to match and do match and 1 is not expected to match but does match, then 4/(4+1) = 80% confidence level should be set in the classification

Example documents

Authoring data classificationsPrepare to create a data classification XML fileCollect content that represents restricted data and data that should not be restricted.Determine the rules that identify the data to be classified and the level of confidence of the match. Check the document set against the rules created later to prove the rules work.

Determine rule typeEntity rules are based on pattern matching and a count of the pattern within the content for typically well defined content (credit cards, social security numbers etc.)Affinity rules are based on the probability that the content contains some evidence of the data classification. Evidence is an aggregation of required matches within certain proximity

Creating a data classification XML fileThe steps to make the file follow, but top tip is use a proper XML editor and not “notepad”

Authoring data classificationsCreating a data classification XML fileYou will need the following:

GUID’s for the ID’s you will createText strings for classification name and description, entity name, affinity name, and localized versions if requiredID’s in existing data classifications to associate this classification withKeyword or Regex to add as possible evidence of a sensitive information detection.

Example data classification (basic layout)<RulePackage>

<RulePack><Details

</RulePack><Rules>

Entity / Affinity elementsKeywords / Regex elementsLocalizedStrings element

</Rules></RulePackage>

Example data classification (Entity)<Rules>

</Any></Pattern><Pattern> ... </Pattern>

</Entity></Rules>

GUID used to identify this Entity, unique amongst all DLP objects.GUID used in LocalisedStrings as

</Entity></Rules>

Number of characters either side of the pattern that are scanned

for additional corroborative evidence

</Entity></Rules>

DLP rules have a confidence value.

Pattern(s) confidenceLevel must match or exceed

recommendedConfidence for this Entity to be detected.

</Entity></Rules>

idRef indicates the area in file that defines what the pattern actually

looks like (keyword or regex)

</Entity></Rules>

There must be one IdMatch per Pattern. Count of Pattern matches is used in DLP rule consideration.

</Entity></Rules>

An Entity can have one or more Match elements. They describe

corroborative evidence that should indicate a good hit for the

IdMatch

</Entity></Rules>

There can be multiple Pattern elements per Entity, each with different levels of match and

possibly minMatch in a group of Any set of matches

Entity pattern matching examples

…X X N I N O A B 2 3 4 5 6 7 A X X …

Proximity Window (patternsProximity=300)

Keyword Match is withinProximity Window

IdMatch

Address

NINO1 Name DateNINO2 NINO3 NINO4

No evidence in NINO2proximity window

Address withinNINO1 proximity window

Name not completely within NINO3 proximity window

Name and date inNINO4 proximity window

Entity confidence level𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐿𝑒𝑣𝑒𝑙 (𝐸𝑛𝑡𝑖𝑡𝑦 )=1−∏𝜄=1

𝜅 (1−𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒𝐿𝑒𝑣𝑒𝑙(𝑃𝑎𝑡𝑡𝑒𝑟𝑛𝑖))

Example data classification (Affinity)<Rules>

</Any></Evidence>

</Affinity></Rules> Affinity rules are targeted towards

content without well-defined identifiers (i.e. Sarbanes-Oxley)

</Any></Evidence>

</Affinity></Rules>

Affinity rules look for a collection of evidence, and no Count is

returned. Only return a confidence level.

</Any></Evidence>

</Affinity></Rules>

Affinity content is a collection of Evidences within an

evidencesProximity window and minimum confidence level.

</Any></Evidence>

</Affinity></Rules>

Note use of minMatches and maxMatches set to same value.

Therefore two, and only two allowed.

Affinity proximity window

Evidence ofSEC filing

(confidenceLevel=80)

Evidences of upcoming quarterly report

…S E C 5 0 D R A F T Q 1 F Y 1 3 L OS S ……

In this example, this proximity window has three matches, so each would be considered in any affinity confidence formula1-[(1-0.80) X (1-0.40) X (1-0.40)] = 92.8%

Proximity Window (evidencesProximity=600)

Affinity proximity window

Evidence ofSEC filing

Evidences of upcoming quarterly report

…S E C 5 0 D R A F T Q 1 F Y 1 3 L OS S ……

And sliding the window along does not change this window size, but reduces the confidence values

1-[(1-0.40) X (1-0.40) X (1-0.40)] = 78.4%

Proximity Window (evidencesProximity=600)

Affinity confidence level

Data classification patters (keywords)<Rules>

<Entity [or Affinity]><IdMatch idRef="KwCreditCard" />...

</Entity><Keyword id="KwCreditCard">

</Group></Keyword>...

</Rules>

word = look for words in sentencestring = case sensitive, sub-string

matches

</Rules>

Term = terms defined in data classification

Dictionary = reference to external file

Data classification patterns (Regex)<Rules>

</Any>...

</Entity><Regex

id="RxRest">[A-HKM-NPR-TW-Z]{3}\d{4}D\d{4}[EGK]</Regex> <Regex id="RxPriv">[A-HKM-NPR-TW-Z]{3}\d{4}D\d{4}[P]</Regex>

...</Rules>

Data classification patterns (localized)<Rules>

<Resource idRef="guid"><Name langcode="en-gb" default="true">entity name

en-gb</Name>

<Description langcode="en-gb" default="true">description</Description>

</Resource></LocalizedStrings>

</Rules>

en-gb</Name>

</Rules>

guid from Entity or Affinity is repeated here to tie the name and

descriptions to the correct resource.

en-gb</Name>

</Rules>

Name/Description is repeated for each language group/locale you

need. Must have one default value, rest are optional.

Example data classification (Rule Package)<RulePackage>

<RulePack id="guid1"><Version major="1" minor="0" build="0"

revision="0" /><Publisher id="guid2" /><Details defaultLangCode="en-gb">

</LocalizedDetails></Details>

</RulePack><Rules> ... </Rules>

</RulePackage>

Example XML file

See the notes of this slide or download from http://bit.ly/mecdlp

Importing data classificationsImport data classificationNew-ClassificationRuleCollection -FileData ([Byte[]]$(Get-Content -Path "C:\temp\DLP\ContosoPharma.xml" -Encoding Byte -ReadCount 0))

Confirm if import is successfulGet-DataClassification [-Identity <DataClassificationIdParameter>]

DLP Policies

So what are DLP PoliciesA collection of transport rulesCreated via DLP policies in EAC or using –DlpPolicy in EMS

Different policy rules needed for different conditionsFor example, the first rule needs to allow for overrides or groups of users to whom the rule will not fire: i.e. Block UK National Insurance Numbers (UK PII) from being emailed externally unless the count of hits in the email is one and the sender is a member of human resources

1. If sender is member of Human Resources and the recipient is located outside the organization and the message contains UK (National Insurance Number (NINO);minimum count=1;maximum count=1 then set the message header to this value: X-Ms-Exchange-Organization-Dlp-SenderOverrideJustification to the value TransportRule override and stop processing further rules

2. If the recipient is located outside the organization and the message contains UK (National Insurance Number (NINO);minimum count=any;maximum count=any then notify the sender with a policy tip to block the message, but allow the sender to override and send

Creating DLP policiesUploading polices, rules and classificationsAs well as the data classification discussed above, the DLP policy rules can be imported as well*. Import-DlpPolicyCollection -FileData ([Byte[]]$(Get-Content -Path "C:\temp\DLP\policycollection\myPolicy.xml" -Encoding Byte -ReadCount 0))Or by using EAC > compliance management > data loss prevention > + Import DLP Policy

Using Exchange Management ShellImport-DataClassificationNew-DlpPolicyNew-TransportRule –DlpPolicy <GroupNameForPolicy>

* Technet documentation on this is currently incorrect! See http://bit.ly/mecdlp or notes in deck for working example.

DLP RulesCreate via compliance > data loss preventionCreates DLP rules grouped by policy and only allows the creation of rules that are DLP type rulesRule defaults are set in the product or via DLP Policy template that you have imported

Create via mail flow > rulesCreate any rule including those that look for messages that contain sensitive information and notify the sender via a Policy Tip (the DLP rules)

Customise rulesUse any supported method of modifying the rule to add other conditions, actions and exceptions as requiredManaging DLP rules via compliance > data loss prevention makes matching changes across all the rules in the DLP policy collectionEnsure iFilters are installed for additional attachment types that you will want to scan.TransportRuleAttachmentTextScanLimit (TransportConfig) sets the text size where scanning stops. Default is first 150K of text in email is scanned.

Backing up your policies and classificationsBacking up your DLP policiesExport-DlpPolicyCollectionImport-DlpPolicyCollection to restore the data (note this will remove all existing policies)

Backing up your data classificationKeep a copy of the XML file you used to import the classification – it is imported to the Active Directory, so only needed for classification transfers from lab to production forests or in multi-forest Exchange deployments.

Extending DLP for your private data

Other sessions to listen toEDC.204 Data Loss Prevention (DLP) in Exchange, Outlook, and OWAEDC.302 Advanced Data Loss Prevention (DLP) in Exchange

1. Go to the Pre-Release Programs Booth

2. Tell us about your Office 365 environment/or on premises plans

3. Get selected to be in a program

4. Try new features first and give us feedback!

Start now at:http://prereleaseprograms-public.sharepoint.com/

Pre-Release Programs TeamBe first in line!

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

proximity window (patternsproximity=300) keyword match is within proximity window idmatch...

nino4 proximity window

nino1 proximity window

nino3 proximity window

window size

nino2 proximity window

confidence values

affinity confidence

keyword match

Documents

surviving the keyword apocalypse: keyword research post...

the new perspective to real-time category/brand ... ·...

proximity sensor/proximity switch/proximity...

keyword proximity search on graphs m.sc. systems course the...

keyword masterfinal

cikm 2005 1 finding and approximating top-k answers in...

keyword research

keyword++: a framework to improve keyword search over...

proximity, proximity, proximity long

the performance of single-keyword and multiple-keyword...

finding and approximating top- k answers in keyword...

user's guidesearching by keyword click edit > advanced...

conducting keyword research step 2 google keyword planner

find bibliographic records - oclc · 2020-06-26 · in a...

a fast algorithm for the generalized k-keyword proximity

keyword organizer

keyword++: a framework to improve keyword search … ·...

floor plans retail ground floor 464-466 bushwick …...floor...

tasks for graduate toolkit - bodleian libraries€¦ ·...

the selim and rachel benin school of engineering and...