använd sas för att bearbeta och analysera ditt data i hadoop...to run, view, or edit. chain...

18
Copyright © 2014, SAS Institute Inc. All rights reserved. make connections • share ideas • be inspired Använd SAS för att bearbeta och analysera ditt data i Hadoop Mikael Turvall

Upload: others

Post on 02-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

make connections • share ideas • be inspired

Använd SAS för att bearbeta och analysera ditt data i HadoopMikael Turvall

Page 2: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Arkitektur

SAS® VA/VS

WEB-BASED CLIENT

SAS® Studio

MPP DATASTOREBLADE ENVIRONMENT

IN-MEMORY STORE

SAS® LASR™

ANALYTIC SERVER

SAS®

IN-MEMORY STATISTICS FOR HADOOP

HadoopTeradataPivotalOracle

SAS Embedded Process

WORKSPACE SERVER

MID-TIER

METADATA

SERVER (Optional)

OtherRDBMS Nonrelational Click Stream PC Files

HadoopCloudera

Hortonworks

SAS®

VISUAL ANALYTICS and SAS®

VISUAL STATISTICS

Page 3: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

Varför ?

Hadoop som en platform för dataHadoop som kärnan i nästa generations

analysplatform

Page 4: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Från data till beslut

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

• SAS/ACCESS

• SAS Data Management

• SAS Federation Server

• SAS Data Loader for Hadoop• SAS Visual Analytics

• SAS In-memory

Statistics for Hadoop

• SAS HPA Products

• SAS Visual Statistics

• SAS In-memory Statistics

for Hadoop

• SAS Enterprise Miner

• SAS Scoring Accelerator for

Hadoop

• SAS Code Accelerator for

Hadoop

Page 5: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Kom igång snabbt

Möjligheter

• Transparent access till Hadoop-tabeller ivanliga SAS-library

• Man kan programmera i SAS SQL och SAS datasteg som vanligt

• Man kan hantera Hadoop från SAS:

• Native HDFS kommandon

• MapReduce, Pig, och HiveQL

Fördelar

• Man behöver inte vara expert på Hadoop-specifik syntax

• Byta till Hadoop är lika enkelt som att byta ettlibname

• Befintliga SAS program, rapproter, etc. kanåteranvändas

• Många olika sätt att accessa data ger IT olikamöjligheter att utnyttja kapaciteten

MAN KAN BÖRJA IDAG

Page 6: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Var får jag tag i Hadoop ?

Page 7: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS/ACCESS to Hadoop

Flytta delar av jobbet in i Hadoop

HADOOP

Hive QLSAS

SERVER

Page 8: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

libname elefant hadoop PORT=10000 SERVER=sascldserv02

USER=hadoop PASSWORD=“hadoop" ;

Komma igång med Hadoop

Page 9: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Hadoop Filename Statement

OBS! Flytta inte över ALL data i till en SAS-tabell

FILENAME hdpfile1 hadoop "/user/hadoop/gutenberg/pg20417.txt"

cfg="C:\Users\hadoop_config.xml" user='hadoop' ;

DATA my_analysis_data;

INFILE hdpfile1 ;

INPUT …;

RUN;

Definiera en fileref

Använd den som

vanligt

Page 10: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

SAS 9.4 kan läsa “icke-HIVE”-filer som tabeller

Filformatformat

• Delimited

• CSV

• XML

• JSON (experimental)

• Binary files

Multipla filer i en katalog

Hadoop File Reader

Page 11: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

libname HDP hadoop user=hadoop pw=Hadoop

config = '/home/sasinst/hadoop_config.xml‘

hdfs_tempdir = '/user/hadoop/tmp‘

hdfs_metadir = '/user/hadoop/metadata‘

hdfs_permdir = '/user/hadoop/dataload' ;

proc hdmd name=hdp.pipedata_dept

format=delimited sep = '|‘

DATA_FILE='pipedata_dept.txt' ;

COLUMN col1 int;

COLUMN col2 char(15); run;

proc print data=hdp.pipedata_dept; run;

Hadoop File ReaderDefiniera ett

libname

Specificera

filformatet

Använd den som

vanligt

Page 12: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Creating new data in

HadoopTransform data inside

Hadoop using HiveQL

Access data in

Hadoop

DI Studio

Page 13: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

SPDE

mytab.mdf

mytab.dpf1

mytab.dpf2

libname spdlib spde ‘/path’;

proc print data=spdlib.mytab;

run;

SPDE

Open/read/close

mytab.mdf

Open/read/close

mytab.dpf1

Open/read/close

mytab.dpf2

t

k

i

o

e

Traditionellt filsystem

Page 14: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

SPDE - HadoopHDFS

libname spdlib spde ‘/path’ hdfshost=default;

proc print data=spdlib.mytab;

run;

SPDE

Open/read/close

mytab.dpf1

Open/read/close

mytab.mdf

Open/read/close

mytab.dpf2

Namenode

Datanode

Datanode

Datanode

Get data block

locations

M1 D1

D1

D2

D2

Get data

Get data

Get data

H

D

F

S

C

l

i

e

n

t

Page 15: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Nästa steg - SAS-jobb i Hadoop

HADOOP

SAS Data Step

& DS2

SAS

SERVER

• SAS® Data Loader for Hadoop

• SAS® Code Accelerator for Hadoop

• SAS® Scoring Accelerator for Hadoop

Page 16: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

SAS® Data Director User Name

What directive do you want to perform?

Copy Data for VisualizationCopy data from Hadoop and load it

into LASR for visualization. Existing

data in the target table will be

replaced.

Join Tables in HadoopCreate a table in Hadoop from

multiple tables.

Schedule a Directive to RunSchedule a directive to run at

specified dates and times

Copy Data to HadoopCopy data from a source and load it

into Hadoop. Existing data in the

target file will be replaced.

Pivot a Table in HadoopTranspose the columns of a table in

Hadoop.

Saved DirectivesOpen a previously created directive

to run, view, or edit.

Chain Directives TogetherRun a number of directives in a

specific order.

Profile DataCreate a report profiling the data in a

table.

Transform Data in HadoopTransform the data in an Hadoop

data file.

Verify Mailing AddressCheck the validity of the mailing

address data in a table.

Generate Business Rules Send Data for RemediationSelect data to send to the

remediation queue for further action.

Analyze data in a table and generate

business rules.

1 Click

All DirectivesShow:

SAS® Data Loader for Hadoop

Page 17: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

Från data till beslut

TEXT

MANAGE

DATA

EX

PL

OR

E

DA

TA

DEVELOP

MODELS

DE

PL

OY

&

MO

NIT

OR

• SAS/ACCESS

• SAS Data Management

• SAS Federation Server

• SAS Data Loader for Hadoop• SAS Visual Analytics

• SAS In-memory

Statistics for Hadoop

• SAS HPA Products

• SAS Visual Statistics

• SAS In-memory Statistics

for Hadoop

• SAS Enterprise Miner

• SAS Scoring Accelerator for

Hadoop

• SAS Code Accelerator for

Hadoop

Page 18: Använd SAS för att bearbeta och analysera ditt data i Hadoop...to run, view, or edit. Chain Directives Together Run a number of directives in a specific order. Profile Data Create

Copyright © 2014, SAS Institute Inc. All rights reserved.

make connections • share ideas • be inspired

[email protected]