getting started with sas and hadoop · #analyticsx c o p y r ig ht © 201 6, sas in stitute in c....

50
#analyticsx Copyright © 2016, SAS Institute Inc. All rights reserved. Getting Started with SAS and Hadoop Jeff Bailey

Upload: vandiep

Post on 07-Sep-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Getting Started with SAS and Hadoop

Jeff Bailey

Page 2: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

Why Hadoop?

Page 3: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

HOW MUCH DOES THIS DRIVE COST?

3 TB

Page 4: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

HOW MUCH DOES THIS DRIVE COST?

3 TB

Silly, you couldn’t get a 3TB drive in 1980!

1980 $1,312,500,000

Page 5: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

HOW MUCH DOES THIS DRIVE COST?

3 TB

That’s $0.03 per GB! TODAY $69

2010 $270

2005 $3,720

2000 $33,000

1995 $3,360,000

1990 $33,600,000

1985 $315,000,000

1980 $1,312,500,000

Page 6: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

HOW MUCH DOES THIS DRIVE COST?That’s $0.03 per GB!

3 TB

TODAY $92

2010 $270

2005 $3,720

2000 $33,000

1995 $3,360,000

1990 $33,600,000

1985 $315,000,000

1980 $1,312,500,000

TODAY $69

2010 $270

2005 $3,720

2000 $33,000

1995 $3,360,000

1990 $33,600,000

1985 $315,000,000

1980 $1,312,500,000

Insight: Disk Space is FREE!

Page 7: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

IT’S NOT JUST ABOUT COST!

3 TB

How long does it take to read 3 TB of data?

Page 8: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

IT’S NOT JUST ABOUT COST!

3 TB4.17 Hours

How long does it take to read 3 TB of data?

Page 9: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

3 TB

How long does it take to read 3 TB?

4.17 Hours3 TB

4.17 HoursWhat happens if you add more disks?

IT’S NOT JUST ABOUT COST!

Page 10: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

HOW LONG DOES IT TAKE TO READ A 3 TB FILE?

4.17 hr

2.5 min

15 sec

1 disk

100 disks

1000 disks

Page 11: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

HOW LONG DOES IT TAKE TO READ A 3 TB FILE?

4.17 hr

2.5 min

15 sec

1 disk

100 disks

1000 disks

Insight: More Disks are FASTER!

Page 12: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

What is Hadoop?

Page 13: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• Distributed Storage Performs Great

• Data is Replicated

• Reasonable Cost

• Sits on the OS File System

Had

oo

p

HDFS

Hadoop is a Storage Platform

Page 14: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• MapReduce/YARN

• Distributed Processing

• Data Locality

• Usually Java

Had

oo

p

YARN / MapReduce

HDFS

Hadoop is a Processing Platform

Page 15: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• Scripting Language

• Higher level than programming Java MapReduce

• Pig Latin scripts are converted to MapReduce jobs

• Great for joining data

• Great for transforming data

Had

oo

p

YARN / MapReduce

HDFS

Pig

Apache Pig

Page 16: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• Distributed Processing

• Data Locality

• Map Phase

• Reduce Phase

Clo

ud

era

YARN / MapReduce

HDFS

Had

oo

p

people = LOAD '/user/training/customers' AS (cust_id, name); orders = LOAD '/user/training/orders' AS (ord_id, cust_id, cost); groups = GROUP orders BY cust_id; totals = FOREACH groups GENERATE group, SUM(orders.cost) AS t; result = JOIN totals BY group, people BY cust_id;DUMP result;

Apache Pig: Example Program

Page 17: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• SQL on Hadoop

• Similar to traditional SQL

• Reduces development time

• Enables BI on Hadoop

• Schema-on-Read

• You choose underlying file format

Had

oo

p

YARN / MapReduce

HDFS

Pig Hive2

Apache Hive

Page 18: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• SQL on Hadoop

• Similar to traditional SQL

• Reduces development time

• Enables BI on Hadoop

• Schema-on-Read

• You choose underlying file format

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2

Had

oo

p

SELECT zipcode, SUM(cost) AS total

FROM customers

JOIN orders

ON (customers.cust_id = orders.cust_id)

WHERE zipcode LIKE '63%'

GROUP BY zipcode

ORDER BY total DESC;

Apache Hive

Page 19: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

• High-performance SQL engine

• Handles concurrency well

• Does not rely on MapReduce

• Supports a dialect of SQL very similar to Hive’s

• 100% open source

• Apache License

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2 Impala

Apache Impala is a SQL Engine

Page 20: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

How can SAS Interact with Hadoop?

Page 21: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Using Base SAS 9.4 with Hadoop

FILEREF HDFSData Files

Data Files#1

Page 22: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

SAS FILENAME Statement for Hadoop

Had

oo

p

YARN / MapReduce

HDFS

Pig Hive2 Impala

SAS

Page 23: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

SAS FILENAME Statement for Hadoop

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2 Impala

SAS

options set=SAS_HADOOP_CONFIG_PATH="\\sashq\cdh45p1";options set=SAS_HADOOP_JAR_PATH="\\sashq\cdh45";

FILENAME hdp1 hadoop 'test.txt';

/* Write file to HDFS */data _null_;

file hdp1;put ' Test Test Test';

run;

/* Read file from HDFS */data test;

infile hdp1;input textline $15.;

run;

Page 24: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Using Base SAS 9.4 with Hadoop

FILEREF HDFSData Files

Data Files#1

PROC

HadoopHadoop

MapReduce +HDFS commands#2

Page 25: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Hadoop ProcedureH

ad

oo

p

YARN / MapReduce

HDFS

Pig Hive2 Impala

SAS

• Submit HDFS commands

• Submit MapReduce Jobs

• Submit Pig Latin programs

Page 26: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

How Do I Submit HDFS Commands?

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2 Impala

SASfilename cfg 'C:\Hadoop_cfg\cdh57.xml';

/* Copy war_and_peace.txt to HDFS. *//* Copy moby_dick.txt to HDFS. */proc hadoop options=cfg username="sasxjb" verbose;

HDFS mkdir='/user/sasxjb/Books';HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"

OUT='/user/sasxjb/Books/moby_dick.txt';HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"

OUT='/user/sasxjb/Books/war_and_peace.txt';run;

Page 27: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

How Do I Submit MapReduce Jobs?

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2 Impala

SASfilename cfg 'C:\Hadoop_cfg\cdh57.xml';

proc hadoop options=cfg user="sasxjb" verbose;mapreduce input='/user/sasxjb/Books/moby_dick.txt' output='/user/sasxjb/outBook' jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3-96.jar' outputkey="org.apache.hadoop.io.Text" outputvalue="org.apache.hadoop.io.IntWritable" reduce="org.apache.hadoop.examples.WordCount$IntSumReducer" combine="org.apache.hadoop.examples.WordCount$IntSumReducer" map="org.apache.hadoop.examples.WordCount$TokenizerMapper";

run;

Page 28: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

How Do I Submit Pig Latin Programs?

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2 Impala

SAS

filename cfg 'C:\Hadoop_cfg\cdh57.xml';

proc hadoop options=cfg username="sasxjb“ verbose; pig code=pigcode ;

run;

Page 29: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Using Base SAS 9.4 with Hadoop

FILEREF HDFSData Files

Data Files#1

PROC

HadoopHadoop

MapReduce +HDFS commands#2

SAS/ACCESS HiveServer2HiveQL

Result sets#3

Page 30: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

SAS/ACCESS Interface to Hadoop

• Generates HiveQL

• Connects via JDBC

• Makes Hive tables look like SAS data sets

• Bulk loads directly to HDFS

• Can read directly from HDFS

Had

oo

p

YARN / MapReduce

HDFS

Pig Hive2 Impala

SAS

Page 31: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

How Does SAS/ACCESS Talk to Hadoop?proc sql;

select count(*) from mycdh.customer_dimwhere loyalty_program='Chocolate Club';

run;

?

Page 32: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

How Does SAS/ACCESS Talk to Hadoop?proc sql;

select count(*) from mycdh.customer_dimwhere loyalty_program='Chocolate Club';

run;

select COUNT(*) from `CUSTOMER_DIM` TXT_1

WHERE TXT_1.`loyalty_program` = 'Chocolate Club'

SAS Generated This SQL

Page 33: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

How Does SAS/ACCESS Talk to Hadoop?proc sql;

select count(*) from mycdh.customer_dimwhere loyalty_program='Chocolate Club';

run;

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

select COUNT(*) from `CUSTOMER_DIM` TXT_1

WHERE TXT_1.`loyalty_program` = 'Chocolate Club'

SAS Generated This SQL

Page 34: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

SAS/ACCESS Interface to Hadoop

• Generates HiveQL• Connects via JDBC

• Makes Hive tables look like SAS data sets

• Bulk loads directly to HDFS

• Can read directly from HDFS

Had

oo

p

YARN / MapReduce

HDFS

Pig Hive2 Impala

SAS

Page 35: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

We Can Write Our Own HiveQL!proc sql;

connect to hadoop (server=quickstart

user=cloudera);

execute (create table store_cnt

row format delimited

fields terminated by '\001‘

stored as parquet

as

select customer_rk, count(*) as tot

from order_fact

group by customer_rk) by hadoop;

quit;

Explicit Pass-Through

Page 36: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Clo

ud

era

YARN / MapReduce

HDFS

Pig Hive2 Impala

What about Apache Impala?

SAS/ACCESS Interface to Impala:

• Connects via ODBC

• Makes Hive tables look like SAS data sets

• Bulk loads directly to HDFS

SAS/ACCESS

to ImpalaImpala

HiveQL

Result sets#4

Page 37: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

In-Database: Code Accelerator

Page 38: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

proc ds2 indb=yes;

thread tpgm / overwrite=yes;

method run();

set hdplib.intable;

output;

end;

endthread;

run;

data hdplib.outdata

(overwrite=yes);

dcl thread tpgm hdpdata;

method run();

set from hdpdata;

end;

enddata;

run;

quit;

What is SAS In-Database Code Accelerator?

SAS In-Database Code Accelerators let

you run SAS code inside Hadoop. With this

you get:

• DS2 processing (modern DATA Step)

• More Data Types

• Code Packages

• More Programming Structures

• Parallel Database Operations

• Thread Programs Run Inside Database

Page 39: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

In-Database Code Accelerator Runs in Hadoop

Had

oo

p

YARN / MapReduce

HDFS

Pig Hive2 SAS EP

proc ds2 indb=yes;

thread tpgm / overwrite=yes;

method run();

set hdplib.intable;

output;

end;

endthread;

run;

data hdplib.outdata

(overwrite=yes);

dcl thread tpgm hdpdata;

method run();

set from hdpdata;

end;

enddata;

run;

quit;

Page 40: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

In-Database: Scoring Accelerator

Page 41: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

What is SAS In-Database Scoring Accelerator?H

ad

oo

p

YARN / MapReduce

HDFS

Pig Hive2 SAS EP

SAS In-Database Scoring

Accelerator lets you score models

inside the cluster. With this you get:

• Uses the SAS Embedded

Process

• Faster Scoring

• Less data movement – score

data where it lives

• Uses fewer resources

Page 42: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

What does the Scoring Process Look like?

Page 43: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

Data Loader for Hadoop

Page 44: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

Data Loader for Hadoop – Self Service Big Data

• Easy to use UI

• Query Data

• Manage Data

• Transform Data

• Run Custom Code

• Move Data

Page 45: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

SAS Grid Manager for Hadoop

Page 46: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

What is SAS Grid Manager for Hadoop?

Page 47: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

Servers

Page 48: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

SAS Viya + Hadoop Architecture

Microservices In-Memory Engine

Cloud Analytic Services (CAS)

SAS Data Connector to Hadoop

SAS Data Connect Acceleratorfor Hadoop

Hadoop

HDFS as infrastructure

Page 49: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

[email protected]

http://www.linkedin.com/in/jeffreydbailey

https://github.com/Jeff-

Bailey/SAS13341_SAS_Hadoop

Feel Free to Contact Me!

Page 50: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey

C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.

#analyticsx