bigdata and privacy webinar at brighttalk
DESCRIPTION
BigData and Privacy webinar at BrighttalkTRANSCRIPT
Bridging the Gap Between Privacy and Big Data
Ulf Mattsson, CTO
Protegrity
ulf.mattsson AT protegrity.com
2
20 years with IBM
• Research & Development & Global Services
Inventor
• Encryption, Tokenization & Intrusion Prevention
Member of
• PCI Security Standards Council (PCI SSC)
• American National Standards Institute (ANSI) X9
• Encryption & Tokenization
• International Federation for Information Processing
• IFIP WG 11.3 Data and Application Security
Ulf Mattsson, CTO Protegrity
Agenda
Big Data adoption rate
Security holes and threats to data
• Privacy regulations
New data protection techniques
Best practices for protecting big data
3
4
Bridging the Gap Between Privacy and Big Data
BIG DATA ADOPTION AND HOLES
5
6
Source: Gartner
Has Your Organization Already Invested in Big Data?
In 2012, Gartner predicted Big Data to be majority adopted in 2 to 5 years.
In 2013, Gartner updated this prediction to 5 to 10 years…
Why has adoption been slower than expected?
Big Data Adoption Statistics
7
Are they investing?
Source: Gartner
Holes in Big Data…
8
Source: Gartner
9
DataMonetization
CustomerSupport
CustomerProfiles
Sales &Marketing
SocialMedia
BusinessImprovement
Big Data
Regulations& Breaches Increased
profits
Increased profits
Increased profits
Increased profits
Increased profits
Balancing Security and Data Insight
Many ways to hack Big Data
Many regulations on sensitive data
Privileged User Threats
Encryption• Data Size
• Data type
• Data format
• Performance (SLA)
Analytics
Inter-node data movement
What’s the Problem with Securing Big Data?
10
Many Ways to Hack Big Data
Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase
11
HDFS(Hadoop Distributed File System)
MapReduce (Job Scheduling/Execution System)
Hbase (Column DB)
Pig (Data Flow) Hive (SQL) Sqoop
ETL Tools BI Reporting RDBMS
Avr
o (S
eria
lizat
ion)
Zoo
keep
er (
Coo
rdin
atio
n)
Hackers
PrivilegedUsers
UnvettedApplications
OrAd Hoc
Processes
Big Data and The Insider Threat
12
BALANCING SECURITY AND DATA INSIGHT
13
Balancing Security and Data Insight
Tug of war between security and data insight
Big Data is designed for access
Privacy regulations require de-identification
Granular data-level protection
Traditional security don’t allow for seamless data use
14
Reduction of Pain with New Protection Techniques
15
1970 2000 2005 2010
High
Low
Pain& TCO
Strong EncryptionAES, 3DES
Format Preserving EncryptionDTP, FPE
Vault-based Tokenization
Vaultless Tokenization
Input Value: 3872 3789 1620 3675
!@#$%a^.,mhu7///&*B()_+!@
8278 2789 2990 2789
8278 2789 2990 2789
Format Preserving
Greatly reduced Key Management
No Vault
8278 2789 2990 2789
10 000 000 -
1 000 000 -
100 000 -
10 000 -
1 000 -
100 -
Transactions per second*
I
Format
Preserving
Encryption
Speed of Different Protection Methods
I
Vaultless
Data
Tokenization
I
AES CBC
Encryption
Standard
I
Vault-based
Data
Tokenization
*: Speed will depend on the configuration
16
HOW IS TOKENIZATION BRIDGING THE GAP?
17
De-Identified Sensitive Data Field Real Data Tokenized / Pseudonymized
Name Joe Smith csu wusoj
Address 100 Main Street, Pleasantville, CA 476 srta coetse, cysieondusbak, CA
Date of Birth 12/25/1966 01/02/1966
Telephone 760-278-3389 760-389-2289
E-Mail Address [email protected] [email protected]
SSN 076-39-2778 076-28-3390
CC Number 3678 2289 3907 3378 3846 2290 3371 3378
Business URL www.surferdude.com www.sheyinctao.com
Fingerprint Encrypted
Photo Encrypted
X-Ray Encrypted
Healthcare Data – Primary Care Data
Dr. visits, prescriptions, hospital stays and discharges, clinical, billing, etc.
Protection methods can be equally applied to the actual healthcare data, but not needed with de-identification
18
Research Brief
“Tokenization Gets Traction”
Aberdeen has seen a steady increase in enterprise use of tokenization
Alternative to encryption
47% using tokenization for something other than cardholder data (PII and PHI)
Tokenization users had 50% fewer security-related incidents
19 Author: Derek Brink, VP and Research Fellow, IT Security and IT GRC
Protection Granularity: Encryption of Fields
20
Production SystemsEncryption •Reversible•Policy Control (authorized / Unauthorized Access)•Lacks Integration Transparency•Complex Key Management•Example !@#$%a^.,mhu7///&*B()_+!@
Protection Granularity: Encryption vs. Masking
21
Production Systems
Non-Production Systems
Encryption of fields•Reversible•Policy Control (authorized / Unauthorized Access)•Lacks Integration Transparency•Complex Key Management•Example
Masking of fields•Not reversible•No Policy, Everyone can access the data•Integrates Transparently•No Complex Key Management•Example 0389 3778 3652 0038
!@#$%a^.,mhu7///&*B()_+!@
Protection Granularity: Tokenization of Fields
22
Production Systems
Non-Production Systems
Vaultless Tokenization / Pseudonymization
•No Complex Key Management•Business Intelligence
•Reversible •Policy Control (Authorized / Unauthorized Access)
•Not Reversible•Integrates Transparently
Credit Card: 0389 3778 3652 0038
CLASSIFY DATA
23
Financial Services
Healthcare and Pharmaceuticals
Infrastructure and Energy
Federal Government
Select US Regulations for Security and Privacy
24
Privacy Laws
54 International Privacy Laws
30 United States Privacy Laws
25
1. Classify: Examples of Sensitive Data
26
Sensitive Information Compliance Regulation / Laws
Credit Card Numbers PCI DSS
Names HIPAA, State Privacy Laws
Address HIPAA, State Privacy Laws
Dates HIPAA, State Privacy Laws
Phone Numbers HIPAA, State Privacy Laws
Personal ID Numbers HIPAA, State Privacy Laws
Personally owned property numbers HIPAA, State Privacy Laws
Personal Characteristics HIPAA, State Privacy Laws
Asset Information HIPAA, State Privacy Laws
PCI DSS Build and maintain a secure network.
1. Install and maintain a firewall configuration to protect data
2. Do not use vendor-supplied defaults for system passwords and other security parameters
Protect cardholder data. 3. Protect stored data4. Encrypt transmission of cardholder data and
sensitive information across public networks
Maintain a vulnerability management program.
5. Use and regularly update anti-virus software6. Develop and maintain secure systems and
applications
Implement strong access control measures.
7. Restrict access to data by business need-to-know
8. Assign a unique ID to each person with computer access
9. Restrict physical access to cardholder data
Regularly monitor and test networks.
10. Track and monitor all access to network resources and cardholder data
11. Regularly test security systems and processes
Maintain an information security policy.
12. Maintain a policy that addresses information security
27
Protection of cardholder data in memory
Clarification of key management dual control and split knowledge
Recommendations on making PCI DSS business-as-usual and best practices
Security policy and operational procedures added
Increased password strength
New requirements for point-of-sale terminal security
More robust requirements for penetration testing
PCI DSS 3.0
28
HIPAA PHI: List of 18 Identifiers
1. Names
2. All geographical subdivisions smaller than a State
3. All elements of dates (except year) related to individual
4. Phone numbers
5. Fax numbers
6. Electronic mail addresses
7. Social Security numbers
8. Medical record numbers
9. Health plan beneficiary numbers
10. Account numbers
29
11. Certificate/license numbers
12. Vehicle identifiers and serial numbers
13. Device identifiers and serial numbers
14. Web Universal Resource Locators (URLs)
15. Internet Protocol (IP) address numbers
16. Biometric identifiers, including finger prints
17. Full face photographic images
18. Any other unique identifying number
Make business associates of covered entities directly liable for compliance
Strengthen the limitations on the use and disclosure of protected health information for marketing purposes
Prohibit the sale of protected health information without individual authorization.
Expand individuals’ rights to receive electronic copies of their health information
HIPAA Omnibus Final Rule – Some Examples
30
PROTECT DATA
31
Protection Beyond Kerberos
32
Volume Encryption
Field level data protection for existing and new data.
API enabled Field level data protection
API enabled Field level data protection
HDFS
(Hadoop Distributed File System)
MapReduce (Job Scheduling/Execution System)
Hbase (Column DB)
Pig (Data Flow) Hive (SQL) Sqoop
ETL Tools BI Reporting RDBMS
Volume Encryption
33
HDFS
MapReduce
Protected with Volume Encryption
Entire file is in the clear when analyzed
File Encryption – Authorized User
34
HDFS
MapReduce
Protected with File Encryption
Entire file is in the clear when analyzed
File Encryption – Non Authorized User
35
HDFS
MapReduce
Protected with File Encryption
Entire file is in unreadable when analyzed
Volume Encryption + Gateway Field Protection
36
HDFS
MapReduce
Kerberos AccessControl
Granular Field Level Protection
Protected with Volume Encryption
Data Protection File Gateway
Volume Encryption + Internal MapReduce Field Protection
37
HDFS
MapReduce
Kerberos Access Control
Granular Field Level Protection
Protected with Volume Encryption
Hadoop Staging
MapReduce
Analytics
ENFORCE DATA PROTECTION
38
A Data Security Policy
39
What is the sensitive data that needs to be protected. Data Element.
How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc.
Who should have access to sensitive data and who should not. Security access control. Roles & Members.
When should sensitive data access be granted to those who have access. Day of week, time of day.
Where is the sensitive data stored? This will be where the policy is enforced. At the protector.
Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
What
Who
When
Where
How
Audit
Volume Encryption + Field Protection + Policy Enforcement
40
HDFSProtected with
Volume Encryption
MapReduce
Data Protection Policy
Volume Encryption + Field Protection + Policy Enforcement
41
HDFSProtected with
Volume Encryption
MapReduce
Data Protection Policy
4. Authorized User Example
42
Presentation to requestorName: Joe SmithAddress: 100 Main Street, Pleasantville, CA
Selected data displayed (least privilege)
Protection at restName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA
Data Scientist, Business Analyst
PolicyEnforcement
Does the requestor have the authority to access the protected data?
Authorized
Request
Response
4. Un-Authorized User Example
43
Request
Response
Presentation to requestorName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA
Protection at restName: csu wusojAddress: 476 srta coetse, cysieondusbak, CA
Privileged Used, DBA, System Administrators, Bad Guy
PolicyEnforcement
Does the requestor have the authority to access the protected data?
Not Authorized
MONITOR DATA PROTECTION
44
Best Practices for Protecting Big Data
Start early
Granular protection
Select the optimal protection
Enterprise coverage
Protection against insider threat
Protect highly sensitive data in a way that is mostly transparent to the analysis process
Policy based protection
Record data access events
45
Proven enterprise data security software and innovation leader
• Sole focus on the protection of data
Growth driven by risk management and compliance
• PCI (Payment Card Industry)
• PII (Personally Identifiable Information)
• PHI (Protected Health Information) – HIPAA
• State and Foreign Privacy Laws, Breach Notification Laws
Successful across many key industries
Introduction to Protegrity
46
How Protegrity Can Help
47
1
2
3
4
5
We can help you Classify the sensitive data
We can help you Discover where the sensitive data sits
We can help you Protect your sensitive data in a flexible way
We can help you Enforce policies that will enable business functions and preventing sensitive data from the wrong hands.
We can help you Monitor sensitive data to gain insights on abnormal behaviors.
Please contact us for more information
www.protegrity.com