securely analyze data with sas® and cloudera · securely analyze data with sas® and cloudera...

28
#AnalyticsX Copyright © 2016, SAS Institute Inc. All rights reserved. Securely Analyze Data With SAS® and Cloudera Scott Armstrong Director, Business Development Cloudera [email protected]

Upload: others

Post on 05-Feb-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

#AnalyticsXC o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Securely Analyze Data With SAS® and Cloudera

Scott ArmstrongDirector, Business DevelopmentCloudera

[email protected]

Scott ArmstrongDirector, Business Development @ [email protected]

Securely analyze data with SAS and Cloudera

© Cloudera, Inc. All rights reserved.

Agenda

• SAS & Cloudera

• How we work together

• Security in Hadoop

• Q&A

© Cloudera, Inc. All rights reserved.

SAS & Cloudera: Leaders Coming Together

CEO commitment from both companies

Formal Alliance forged in January 2013

Master Reciprocal Services Agreement in place to provide service flexibility

A Data Science course leveraging joint content and instructors

Cloudera is the leading commercial Hadoop distribution for SAS product testing & internal use

Cloudera onsite dedicated resource to work with SAS R&D to ensure tight technical alignment & roadmap

A joint QuickStart service bundle, featuring SAS Visual Analytics / Visual Statistics, SAS Data Loader for Hadoop and the Cloudera Enterprise Data Hub starter service package

SAS & Cloudera enable organizations to achieve competitive advantage by gaining value from all their data, through a proven combination of enterprise-ready storage, processing, analytics, and data management.

Expected Benefits from an Integrated SAS & Cloudera Platform

Improved Business Outcomes

•Better decisions by analyzing more data.

•Solve the hard problems with interactive and iterative analytics

•Unlimited variables for analysis, i.e. No column restrictions

Accelerated Time-to-Value

•In-memory data and analytics processing for faster performance.

•Joint ‘Starter Service’ bundle is available and can offer a fast start

•SAS simplifies working with Hadoop, Cloudera Manager simplifies system admin.

Reduced Costs & Risk

•SAS & Cloudera integration minimizes data movement & improves governance

•Cloudera & SAS are stable market leaders aligned across R&D (dedicated Cloudera engineer), product mgt., services, education, and tech support

More Innovation

•Hadoop’s cost-effective scalability allows for more analytic exploration of data that previously was too costly to store or troublesome to format

•Cloudera & SAS integrated technologies make ‘Big Data Analytics’ approachable and can support innovative use cases

What to Expect from SAS & Cloudera

Cloudera is the Preferred Hadoop Vendor for SAS Solutions on Demand

o Anti-Money Launderingo Tax Fraudo Drug Developmento Clinical Trial Data

Transparencyo Intelligent Advertising for

Publishers

o Claims Fraudo Customer Experience Analyticso Customer Experience Targetingo Customer Experience Personalizationo Marketing Operations Managemento Suspect Claims Detection…. and many more

Business-specific solutions such as

How Cloudera & SAS work together

© Cloudera, Inc. All rights reserved.

SAS & Cloudera

SAS & Cloudera intersect in many ways:

SAS pulling data FROM Cloudera, when it is most convenient;

SAS can work WITH Cloudera, lifting data into a purpose-built advanced analytics in-memory environment;

SAS can work directly IN Cloudera, leveraging the distributed processing capabilities of Hadoop.

8© Cloudera, Inc. All rights reserved.

Memory

SAS

Data

In-Database

SAS

Traditional SAS

SAS Analytics HADOOP DEPLOYMENT PATTERNS

• These approaches are complementary & can be combined for maximum effect

• SAS In-Memory environment can be deployed as part of Hadoop cluster or separate footprint

SAS

In-Memory

Memory

Data

In-Database

Co-located Deployment Asymmetric

ORDataData

Security @ Cloudera

© Cloudera, Inc. All rights reserved.

The Benefits of Hadoop...

One place for unlimited data

• All types

• More sources

• Faster, larger ingestion

Unified, multi-framework data access

• More users

• More tools

• Faster changes

© Cloudera, Inc. All rights reserved.

…Can Create Information Security Challenges

Business Manager

• Run high value workloads in cluster

• Quickly adopt new innovations

Information Security

• Follow established policies and procedures

• Maintain compliance

IT/Operations

• Integrate with existing IT investments

• Minimize end-user support

• Automate configuration

Secure without CompromiseSecurity and Compliance are Not “Opt-In” Activities

Enterprise EncryptionProtects everything transparently

Access Policy EnforcementFull-stack row/column-based RBAC and dynamic masking

Automated Data ManagementFull-stack audit, lineage, discovery, and lifecycle

Secure OperationsSeparation of duties, log data redaction

OPERATIONSCloudera ManagerCloudera Director

DATA MANAGEMENT

Cloudera Navigator

Encrypt and KeyTrustee

Optimizer

STRUCTUREDSqoop

UNSTRUCTUREDKafka, Flume

PROCESS, ANALYZE, SERVE

UNIFIED SERVICES

RESOURCE MANAGEMENTYARN

SECURITYSentry, RecordService

STORE

INTEGRATE

BATCHSpark, Hive, Pig

MapReduce

STREAM

Spark

SQLImpala

SEARCH

Solr

OTHERKite

NoSQLHBase

OTHERObject Store

FILESYSTEMHDFS

RELATIONALKudu

Comprehensive, Compliance-Ready Security

Authentication, Authorization, Audit, and Compliance

AccessDefining what users and applications can

do with data

Technical Concepts:Permissions

Authorization

DataProtecting data in the

cluster from unauthorized visibility

Technical Concepts:Encryption, Tokenization,

Data masking

VisibilityReporting on where data came from and how it’s being used

Technical Concepts:AuditingLineage

Cloudera ManagerApache Sentry & RecordService

Cloudera NavigatorNavigator Encrypt & Key

Trustee | Partners

PerimeterGuarding access to the

cluster itself

Technical Concepts:Authentication

Network isolation

© Cloudera, Inc. All rights reserved.

Perimeter Security – Isolation, Authentication

Preserve user choice of the right Hadoop service (e.g. Impala, Spark)Conform to centrally managed authentication policiesImplement with existing standard systems: Active Directory (LDAP) and Kerberos

Cloudera Manager

PerimeterGuarding access to the

cluster itself

Technical Concepts:Authentication

Network isolation

© Cloudera, Inc. All rights reserved.

© Cloudera, Inc. All rights reserved.

Active Directory and Kerberos

• Manages Users, Groups, and Services• Provides username / password authentication

• Group membership determines Service access

Active Directory

• Trusted and standard third-party• Authenticated users receive “Tickets”

• “Tickets” gain access to Services

Kerberos

User authenticates to AD

Authenticated user

gets Kerberos

Ticket

Ticket grants access to

Services e.g. ImpalaUser

[ssmith]Password[***** ]

Automated Authentication with Cloudera Manager

Direct to AD Kerberos Integration

Kerberos Configuration Wizard

Added Tuning and Monitoring

• Users authenticate directly against AD• Hadoop Services defined directly in AD Kerberos• User access to Hadoop services controlled via AD Groups

• Automates Kerberos configuration for existing Hadoop clusters simplifying a tedious and error prone process

• Tune interrelated configuration for dual KDC’s• Service monitoring through CM when Kerberos enabled

© Cloudera, Inc. All rights reserved.

Access Security Requirements

Provide users access to data needed to do their jobCentrally manage access policies

Leverage a role-based access control model built on AD

AccessDefining what users and applications can

do with data

InfoSec Concept:Authorization

Apache Sentry & RecordService

© Cloudera, Inc. All rights reserved.

© Cloudera, Inc. All rights reserved.

RBAC and Centralized Authorization

Manage data access by role, instead of by individual user

• Customer Support Rep has read access to US Customers

• Broker Analyst has read access to US Transactions

• Relationships between users and roles are established via groups

An RBAC policy is then uniformly enforced for all Hadoop services

• Provides unified authorization controls

• As opposed to tools for managing numerous, service specific policies

© Cloudera, Inc. All rights reserved.

Unified Authorization with Apache Sentry

Sentry provides unified authorization via:

• Fine-grained RBAC for Impala, Hive, and Search

• Impala/Hive permissions synced in HDFS for all other components (Spark, MapReduce, etc)

Goal: Unified authorization for all Hadoop services and applications

Sentry Perm.Read Access

to ALL Transaction

Data

Sentry Role

Fraud Analyst Role

Group

Fraud Analysts

Sam Smith

© Cloudera, Inc. All rights reserved.

© Cloudera, Inc. All rights reserved.

The Need for Fine-Grained Access Control Across all access paths

Columns: Sensitive column visibility varies; Example: credit card numbers

• Managers: 1234 5678 1234 5678

• Call Center: XXXX XXXX XXXX 5678

• Analysts: XXXX XXXX XXXX XXXX

• Others: Does not see credit card column

Rows: Different groups of users need access to different records

• European privacy laws

• Government security clearance

• Financial information restrictions

21© Cloudera, Inc. All rights reserved.

Permission Enforcement today with SentryHive

Server 2

Sen

try

Enfo

rce

men

t

Impala

HDFS: MR, Pig, Spark, ...

Search (Solr)

Sentry Permissions

rules

Rule: “Allow fraud analysts read access to the transaction table”

Admins specify permissions

Sen

try

Enfo

rce

men

t Se

ntr

y En

forc

em

ent

Se

ntr

y En

forc

em

ent

SAS products

Sentry

Service

Coarse grained (table)

© Cloudera, Inc. All rights reserved.

© Cloudera, Inc. All rights reserved.

RecordServiceUnified Access Control Enforcement

• New high performance security layer that centrally enforces access control policies across Hadoop• Complements Apache Sentry’s unified policy

definition

• Row- and column-based security

• Dynamic data masking

• Apache-licensed open source

• Beta now available

STRUCTUREDSqoop

UNSTRUCTUREDKafka, Flume

PROCESS, ANALYZE, SERVE

UNIFIED SERVICES

RESOURCE MANAGEMENTYARN

SECURITYSentry, RecordService

STORE

INTEGRATE

BATCHSpark, Hive, Pig

MapReduce

STREAM

Spark

SQLImpala

SEARCH

Solr

OTHERKite

NoSQLHBase

OTHERObject Store

FILESYSTEMHDFS

RELATIONALKudu

Fine-Grained HDFS Access without RecordService

Date/time

Accnt # SSN Asset Trade Country

09:33:1116-Feb-2015

0234837823

238-23-9876

AAPL Sell US

11:33:01 16-Feb-2015

3947848494

329-44-9847

TBT Buy EU

14:12:34 16-Feb-2015

4848367383

123-56-2345

IBM Sell UK

09:22:0316-Feb-2015

3485739384

585-11-2345

INTC Buy US

11:55:33 16-Feb-2015

3847598390

234-11-8765

F Buy US

10:22:5516-Feb-2015

8765432176

344-22-9876

UA Buy UK

13:45:2416-Feb-2015

3456789012

412-22-8765

AMZN Sell EU

09:03:44 16-Feb-2015

4857389329

123-44-5678

TMV Buy US

Date/time

Accnt # SSN Asset Trade Country

14:12:34 16-Feb-2015

4848367383

123-56-2345

IBM Sell UK

10:22:5516-Feb-2015

8765432176

344-22-9876

UA Buy UK

15:55:55 16-Feb-2015

4756983234

234-76-9274

MA Buy UK

Date/time

Accnt # SSN Asset Trade Country

11:33:01 16-Feb-2015

3947848494

329-44-9847

TBT Buy EU

13:45:2416-Feb-2015

3456789012

412-22-8765

AMZN Sell EU

Date/time

Accnt # SSN Asset Trade Country

09:33:1116-Feb-2015

0234837823

238-23-9876

AAPL Sell US

09:22:0316-Feb-2015

3485739384

585-11-2345

INTC Buy US

11:55:33 16-Feb-2015

3847598390

234-11-8765

F Buy US

09:03:44 16-Feb-2015

4857389329

123-44-5678

TMV Buy US

Split the original fileUse HDFS permissions to limit access

© Cloudera, Inc. All rights reserved.

Fine-Grained HDFS Access Control with RecordService

• Apply controls to the master data file

• Row, column, and sub-column (masking) controls

• Enforce these across all access paths

Date/time

Accnt # SSN Asset Trade Country

09:33:1116-Feb-2015

0234837823

238-23-9876

AAPL Sell US

11:33:01 16-Feb-2015

3947848494

329-44-9847

TBT Buy EU

14:12:34 16-Feb-2015

4848367383

123-56-2345

IBM Sell EU

09:22:0316-Feb-2015

3485739384

585-11-2345

INTC Buy US

11:55:33 16-Feb-2015

3847598390

234-11-8765

F Buy US

10:22:5516-Feb-2015

8765432176

344-22-9876

UA Buy EU

Column-Level Controls

Ro

w-L

eve

l Co

ntr

ols

Date/time

Accnt # SSN Asset Trade Country

09:33:1116-Feb-2015

0234837823

238-23-9876

AAPL Sell US

11:33:01 16-Feb-2015

3947848494

329-44-9847

TBT Buy group2

14:12:34 16-Feb-2015

4848367383

123-56-2345

IBM Sell group3

09:22:0316-Feb-2015

3485739384

585-11-2345

INTC Buy US

11:55:33 16-Feb-2015

3847598390

234-11-8765

F Buy US

10:22:5516-Feb-2015

8765432176

344-22-9876

UA Buy group3

Column-Level Controls

Ro

w-L

eve

l Co

ntr

ols

XXX-XX

XXX-XX

XXX-XX

What U.S. Brokers See

PROBLEM

SOLUTION

Customer data was spread across sources and channels, limiting loyalty marketing

• Existing targeting segments not generating enough return

• Limited ability to analyze multi-structured data

• Need accelerated processing to act on data but existing system running at capacity

Implemented new system to maximize marketing ROI, while meeting compliance

• Improved segmentation with reduced processing time (6hrs to 45min)

• Analyzing 3M records per hour, incl. mobile, sentiment, & non-gaming spend

• EDW optimization equals millions saved• Achieved PCI compliance and met

governance needs

Thank You

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#AnalyticsX