the data rich and information poor
TRANSCRIPT
Presenter:
Date:
The Data Rich and Information Poor Retention, Technology, Metrics, & IG
John Cofrancesco
5/1/2019Private & Confidential
Copyright ©2015 Active Navigation
Information Governance vs Records Management
➢The nature of our business has changed in the last 24 months • Has anyone asked you about
cybersecurity?
• Have you been dealing more in completed records or your S-drive?
• Does your IT staff include you in data planning?
In records management we will always be an after thought with limited budgets and resources.
The progression of our information economy makes IG the unavoidable future and the people and
organizations that put themselves at the center of it will reap the rewards.
Big numbers are scary -- but do they mean anything?
1. Data volume is exploding: more data has been created in the past two years than in the entire history of the human race.
2. Our accumulated digital universe of data will grow from 4.4 zettabytes today to around 44 zettabytes, or 44 trillion gigabytes next year.
3. We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.
1. Our data volume is exploding and the cost of our file-shares are growing by Some Number a Year
2. Our accumulated digital universe will increase our potential e-discovery costs by Some Number a Year
3. We are seeing a massive growth in the number of systems we use and the cost of those new systems in addition to legacy systems is Some Number a Year
OR
People are care about the things that they can use to connect to their world. Empower your IG program by
focusing on what concerns your leadership.
Failure Number 1
• HP TRIM
• Train the world
• They will “do records”
• Software Stinks
• It is not like riding a bike
• They hate “doing records”
Failure Number 2
• Better “open source” software
• Build it into their process
• They won’t know they are “doing records”
• Software still stinks
• Not my process
• They found out they are “doing records”
Traditional Solutions Fall Short
• No expertise; focused on adjacent use cases (migration, eDiscovery, identity and access management)
• Heavyweight architecture optimized for different problem and does not scale• Agents hard to deploy and maintain
• Full text indexes take too much to deploy, maintain and query
• Entire solution too costly to implement
• Insufficient decision support for confident actions; nothing gets disposed of
• Full feature set unavailable for management in place• Data loss prevention works for data in motion
• eDiscovery requires a collection process and offline processing
• Inflexible classification engines (to support policies)• Do no readily support customer nuances
• Cannot operate at file level
Proprietary Information of Active Navigation6
Information Challenge
Lots of data
Legislation
and
regulation
Customers’
ethical
expectations
Internal
compliance
Don’t move
the data
Cost
pressure
Malicious
operators
It should be
easy
Proprietary Information of Active Navigation7
File vs Data Analytics & What is ‘Big Data?’
0 8 1 4 8 8 0 8 1 5 8 8
1 0 0 8 8 9 1 0 0 9 8 9
0 7 0 4 7 6 0 7 0 5 7 6
1 2 3 1 1 7 1 2 3 2 1 7
0 1 0 1 1 8 0 1 0 2 1 8
1 0 1 7 7 7 1 0 1 8 7 7
0 8 1 4 8 8 1 0 0 8 9 0
1 0 0 8 8 9 1 1 0 8 9 0
0 7 0 4 7 6 0 7 0 4 7 7
1 2 3 1 1 7 1 2 3 1 1 9
0 1 0 1 1 8 0 1 0 1 1 9
1 0 1 7 7 7 1 0 1 7 7 8
100 50
400 100
100 80
500 0
10 0
50 25
100 50
150 75
100 80
500 0
10 0
50 25
Data Analytics
0 8 1 4 8 8 0 8 1 5 8 8
1 0 0 8 8 9 1 0 0 9 8 9
0 7 0 4 7 6 0 7 0 5 7 6
1 2 3 1 1 7 1 2 3 2 1 7
0 1 0 1 1 8 0 1 0 2 1 8
1 0 1 7 7 7 1 0 1 8 7 7
0 8 1 4 8 8 1 0 0 8 9 0
1 0 0 8 8 9 1 1 0 8 9 0
0 7 0 4 7 6 0 7 0 4 7 7
1 2 3 1 1 7 1 2 3 1 1 9
0 1 0 1 1 8 0 1 0 1 1 9
1 0 1 7 7 7 1 0 1 7 7 8
100 50
400 100
100 80
500 0
10 0
50 25
100 50
150 75
100 80
500 0
10 0
50 25
Data Analytics
• Compare known data
• Allows you to make averages
• Gives value to structured data
• Take action outside the data
What is ‘Big Data?’
0 8 1 4 8 8 0 8 1 5 8 8
1 0 0 8 8 9 1 0 0 9 8 9
0 7 0 4 7 6 0 7 0 5 7 6
1 2 3 1 1 7 1 2 3 2 1 7
0 1 0 1 1 8 0 1 0 2 1 8
1 0 1 7 7 7 1 0 1 8 7 7
0 8 1 4 8 8 1 0 0 8 9 0
1 0 0 8 8 9 1 1 0 8 9 0
0 7 0 4 7 6 0 7 0 4 7 7
1 2 3 1 1 7 1 2 3 1 1 9
0 1 0 1 1 8 0 1 0 1 1 9
1 0 1 7 7 7 1 0 1 7 7 8
100 50
400 100
100 80
500 0
10 0
50 25
100 50
150 75
100 80
500 0
10 0
50 25
Shirts Brand A-Cost
12 The Gap $15
2 JCrew $50
Big Data Analytics
• Compares unrelated data
• Allows you to guess at reasoning
• Gives value to huge data sets
• Take action outside the data
File Analytics
0 8 1 4 8 8 0 8 1 5 8 8
1 0 0 8 8 9 1 0 0 9 8 9
0 7 0 4 7 6 0 7 0 5 7 6
1 2 3 1 1 7 1 2 3 2 1 7
0 1 0 1 1 8 0 1 0 2 1 8
1 0 1 7 7 7 1 0 1 8 7 7
0 8 1 4 8 8 1 0 0 8 8 9
1 0 0 8 8 9 1 1 0 8 9 0
0 7 0 4 7 6 0 7 0 4 7 7
1 2 3 1 1 7 1 2 3 1 1 9
0 1 0 1 1 8 0 1 0 1 1 9
1 0 1 7 7 7 1 0 1 7 7 8
100 50
400 100
100 80
500 0
10 0
50 25
100 50
150 75
100 80
500 0
10 0
50 25
File Analytics
• Found all the birthdates
• Cut across the documents
• Reports their location
• Lets you action them
Data breach investment gap
Invest
men
t
3 to 6 Days* 256 to 388 Days*
250* Days*Ponemon Institute —2015 Cost of Data Breach Study: Global Analysis
Time to gain entry Time to exit
Free play time!
12
Solution Comparison vs Best ApproachBest Approach Needs FA Approach eDiscovery
Identity and Access
Mgt
Low deployment footprint for supportability
and appropriate global investment
2-3% of content
footprint
>20% of content
footprint
Does not scale
globally
Manage in place; migration or replication is
not an option
Designed for all
actions in places
Take a copy for
offline processing
Does not scale
globally
Taking action is HARD; experience and
solution designed for the job
Charting and review
against policies
Review on single
matter only
Poor decision
support for action
Flexible credentials for complex permissions
environment
Fully customizable
credentials mgt
Take a copy for
offline processing
Elevated service
accounts required
Adaptable engine to meet specific and
regional peculiarities
Fully customizable
across all locale
Review on single
matter only
Applicable only for
sensitive data cases
Decision support and review environment
which connects SMEs, visually, to their data
Charting and review
against policies
Not available for
SME review
Poor decision
support for action
Ability to roll up and project progress across
entire deployment
Mgt reporting
aggregates all data
No meaningful
global reporting
Does not scale
globally
Proprietary Information of Active Navigation13
File Analysis & File Analytics
“File analysis enables data architects, legal and security professionals,
storage managers, and business analysts to understand and manage
unstructured data stores, reduce risk and costs, and make better
information management decisions for unstructured data” – Gartner 2017
“Technology Can Do The Heavy Lifting When Unmanaged
Documents Need To Be Cleaned Up, Migrated, And
Mined For Insights” - Forrester 2017
The future belongs to those who own the data
➢ Because of the move to cloud computing, it is unlikely that ECM’s will be able to function as standalone systems
➢ Google, Amazon, and Microsoft will be the only main players left in the mainstream
➢ One or perhaps two ECM’s will persist for local deployment and specialty
1. Did moving the box in which your data lived really improve your organization?
2. Do you feel like you are doing something to someone rather than for someone?
3. Why not build a system around the habits of your users rather than trying to change their habits?
The box matters less than the process: building a system around process will save on costs,
improve the program, and provide opportunities to add value.
Start with a Pilot Project to Establish a Repeatable Process
Task
Week
1 2 3 4 5 6 7 8
0.0 Hold Project Kickoff Session
1.0 Modify Index Configuration and Index Content
1.1 Hold review session on metadata and rules; revise metadata in DC
1.2 Set up permissions and index pilot site content
1.3 Preview indexed content via remote session; prepare for workshops
2.0 Conduct ROT and Sensitive Data Workshops with Business Unit
2.1 Schedule workshops, communicate objectives
2.2 Conduct workshops, provide reports for reviewers to markup
2.3 Review and markup reports, return for upload
2.4 Take actions based on markups
2.5 Compile reports, prepare and deliver management presentation
2.6 Refine project process as necessary
3.0 Prepare for Migration (Optional)
4.0 Migrate Cleansed and Restructured Files (Optional)
Proprietary Information of Active Navigation18
Roll Out Repeatable Process Across BUs, Enterprise
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Business Unit 1 Pilot
Business Unit 2
Business Unit 3
Business Unit 4
Business Unit 5
Business Units 6-
Weeks
Proprietary Information of Active Navigation19
Who Do You Need to Kill Your File Share?
Information Governance Policy
Body
Enterprise Core Team
Business Unit Teams
Organizations Involved Information Governance Committee
Enterprise Core Team Business Unit Teams
Membership • Legal
• Compliance/Audit
• Risk Management
• IT / IT Security
• Chief Data Office
• Chief Admin Office
• Operations
• IT / IT Security
• Risk Management
• Chief Data Office
• Compliance
• Records Management
• Legal
• IT
• Risk Management
• Records Manager
• Records Coordinators
• Operations
Role • Set charter and goals
• Set policies
• Provide resources
• Monitor progress
• Insist on results
• Determine requirements
• Recommend policies
• Establish model process
• Determine architecture
• Provide infrastructure
• Manage enterprise
operations
• Provide training and
consultation
• Compile enterprise reports
• Implement model process
• Index content
• Prepare reports
• Conduct workshops
• Apply policies to cleanse
files
• Tailor metadata as
necessary
• Organize, tag, migrate
content
• Monitor policy compliance
Approach recommended by Gartner and Forrester
Proprietary Information of Active Navigation20
Best Practice Work Flow
Data Discovery
Inventory content
in target
repositories:
• File shares
• SharePoint
• ECM
• Cloud
• Exchange
• Google Drive
• Etc.
Data Cleansing
Identify ROT and
cleanse or
quarantine
Identify duplicates
and cleanse
Identify sensitive
data and cleanse
or secure
Data Modeling
Develop rules for
categorizing files
into records
schedules,
knowledge sharing
taxonomies, etc.
Apply rules to
auto-categorize
content
Review results and
refine taxonomies
and rules
Metadata Tagging
Develop metadata
fields based on
taxonomies and
rules
Auto-tag content
for RM and KM
Migration
Configure
structure and
metadata in
destination
repository (e.g.,
SharePoint, ECM)
Map structure and
metadata between
Discovery Center
and destination
Migrate files and
metadata
Monitor policy
complianceApproach recommended by Gartner, Forrester and supported by Discovery Center
Proprietary Information of Active Navigation21
Example Data
Proprietary Information of Active Navigation22
Total: 1,074,258 files, 965.71 GB
941 GB Shared Drives, 24.71 GB SharePoint
Redundant, Obsolete, Trivial
64% of files were remediation candidates, examples:
• Temporary files
• Email archives
• User identified and aging backups
• Created >10 yrs ago
Sensitive Files: 7563
Employment Data\1099s 19
Employment Data\Background Check 5
Employment Data\W-2 or W-4 8
Financial PII\Credit Data 7
Intellectual Property\Architectural Diagrams & Documents 186
Intellectual Property\M&A Documents 11
Intellectual Property\Models and Analytics 1488
Intellectual Property\Network Diagrams & Configurations 76
Intellectual Property\Pre-Product Launch Plans 10
Intellectual Property\Software and Application Code 4746
IT Related\Password Files 49
Medical Data\Health Records 13
NDA 719
PII\Canadian SIN 178
PII\Passports 48
>26% were surplus
file duplicates
Last accessed>5 years: 54%
Created >5 years: 73%
Dealing with risk gives you a seat at the table –but finding value makes you a leader ➢ Odds are your organization already collects the data it needs to be more successful, but being “data rich” means nothing if
you cannot rationalize that data into information.
The example to follow:
• Rio Tinto Group, a mining company, had over 100 years of collected geological data from potential mines all around the world
• Rio began mining (puns are funny!) their data for areas where old technology could not achieve valuable outcomes but new technology would allow for mines to be profitable
• Rio finds one new mine from mining its old data every year
I’ll bet you know who has a well-supported IG program!
Organization Assessment Questionnaire
1. Is my organization’s data organized to meet our business needs?
2. Do we have and follow our process for managing our data?
3. Do we monitor our data to ensure it meets our changing requirements?
4. Does management understand both the risk and the value of our data?
5. Is our data used a source for creating value?