använd sas för att bearbeta och analysera ditt data i hadoop...to run, view, or edit. chain...
TRANSCRIPT
Copyright © 2014, SAS Institute Inc. All rights reserved.
make connections • share ideas • be inspired
Använd SAS för att bearbeta och analysera ditt data i HadoopMikael Turvall
Copyright © 2014, SAS Institute Inc. All rights reserved.
Arkitektur
SAS® VA/VS
WEB-BASED CLIENT
SAS® Studio
MPP DATASTOREBLADE ENVIRONMENT
IN-MEMORY STORE
SAS® LASR™
ANALYTIC SERVER
SAS®
IN-MEMORY STATISTICS FOR HADOOP
HadoopTeradataPivotalOracle
SAS Embedded Process
WORKSPACE SERVER
MID-TIER
METADATA
SERVER (Optional)
OtherRDBMS Nonrelational Click Stream PC Files
HadoopCloudera
Hortonworks
SAS®
VISUAL ANALYTICS and SAS®
VISUAL STATISTICS
Copyright © 2014, SAS Institute Inc. All rights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
Varför ?
Hadoop som en platform för dataHadoop som kärnan i nästa generations
analysplatform
Copyright © 2014, SAS Institute Inc. All rights reserved.
Från data till beslut
TEXT
MANAGE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
DE
PL
OY
&
MO
NIT
OR
• SAS/ACCESS
• SAS Data Management
• SAS Federation Server
• SAS Data Loader for Hadoop• SAS Visual Analytics
• SAS In-memory
Statistics for Hadoop
• SAS HPA Products
• SAS Visual Statistics
• SAS In-memory Statistics
for Hadoop
• SAS Enterprise Miner
• SAS Scoring Accelerator for
Hadoop
• SAS Code Accelerator for
Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Kom igång snabbt
Möjligheter
• Transparent access till Hadoop-tabeller ivanliga SAS-library
• Man kan programmera i SAS SQL och SAS datasteg som vanligt
• Man kan hantera Hadoop från SAS:
• Native HDFS kommandon
• MapReduce, Pig, och HiveQL
Fördelar
• Man behöver inte vara expert på Hadoop-specifik syntax
• Byta till Hadoop är lika enkelt som att byta ettlibname
• Befintliga SAS program, rapproter, etc. kanåteranvändas
• Många olika sätt att accessa data ger IT olikamöjligheter att utnyttja kapaciteten
MAN KAN BÖRJA IDAG
Copyright © 2014, SAS Institute Inc. All rights reserved.
Var får jag tag i Hadoop ?
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS/ACCESS to Hadoop
Flytta delar av jobbet in i Hadoop
HADOOP
Hive QLSAS
SERVER
Copyright © 2014, SAS Institute Inc. All rights reserved.
libname elefant hadoop PORT=10000 SERVER=sascldserv02
USER=hadoop PASSWORD=“hadoop" ;
Komma igång med Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Hadoop Filename Statement
OBS! Flytta inte över ALL data i till en SAS-tabell
FILENAME hdpfile1 hadoop "/user/hadoop/gutenberg/pg20417.txt"
cfg="C:\Users\hadoop_config.xml" user='hadoop' ;
DATA my_analysis_data;
INFILE hdpfile1 ;
INPUT …;
RUN;
Definiera en fileref
Använd den som
vanligt
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS 9.4 kan läsa “icke-HIVE”-filer som tabeller
Filformatformat
• Delimited
• CSV
• XML
• JSON (experimental)
• Binary files
Multipla filer i en katalog
Hadoop File Reader
Copyright © 2014, SAS Institute Inc. All rights reserved.
libname HDP hadoop user=hadoop pw=Hadoop
config = '/home/sasinst/hadoop_config.xml‘
hdfs_tempdir = '/user/hadoop/tmp‘
hdfs_metadir = '/user/hadoop/metadata‘
hdfs_permdir = '/user/hadoop/dataload' ;
proc hdmd name=hdp.pipedata_dept
format=delimited sep = '|‘
DATA_FILE='pipedata_dept.txt' ;
COLUMN col1 int;
COLUMN col2 char(15); run;
proc print data=hdp.pipedata_dept; run;
Hadoop File ReaderDefiniera ett
libname
Specificera
filformatet
Använd den som
vanligt
Copyright © 2014, SAS Institute Inc. All rights reserved.
Creating new data in
HadoopTransform data inside
Hadoop using HiveQL
Access data in
Hadoop
DI Studio
Copyright © 2014, SAS Institute Inc. All rights reserved.
SPDE
mytab.mdf
mytab.dpf1
mytab.dpf2
libname spdlib spde ‘/path’;
proc print data=spdlib.mytab;
run;
SPDE
Open/read/close
mytab.mdf
Open/read/close
mytab.dpf1
Open/read/close
mytab.dpf2
t
k
i
o
e
Traditionellt filsystem
Copyright © 2014, SAS Institute Inc. All rights reserved.
SPDE - HadoopHDFS
libname spdlib spde ‘/path’ hdfshost=default;
proc print data=spdlib.mytab;
run;
SPDE
Open/read/close
mytab.dpf1
Open/read/close
mytab.mdf
Open/read/close
mytab.dpf2
Namenode
Datanode
Datanode
Datanode
Get data block
locations
M1 D1
D1
D2
D2
Get data
Get data
Get data
H
D
F
S
C
l
i
e
n
t
Copyright © 2014, SAS Institute Inc. All rights reserved.
Nästa steg - SAS-jobb i Hadoop
HADOOP
SAS Data Step
& DS2
SAS
SERVER
• SAS® Data Loader for Hadoop
• SAS® Code Accelerator for Hadoop
• SAS® Scoring Accelerator for Hadoop
SAS® Data Director User Name
What directive do you want to perform?
Copy Data for VisualizationCopy data from Hadoop and load it
into LASR for visualization. Existing
data in the target table will be
replaced.
Join Tables in HadoopCreate a table in Hadoop from
multiple tables.
Schedule a Directive to RunSchedule a directive to run at
specified dates and times
Copy Data to HadoopCopy data from a source and load it
into Hadoop. Existing data in the
target file will be replaced.
Pivot a Table in HadoopTranspose the columns of a table in
Hadoop.
Saved DirectivesOpen a previously created directive
to run, view, or edit.
Chain Directives TogetherRun a number of directives in a
specific order.
Profile DataCreate a report profiling the data in a
table.
Transform Data in HadoopTransform the data in an Hadoop
data file.
Verify Mailing AddressCheck the validity of the mailing
address data in a table.
Generate Business Rules Send Data for RemediationSelect data to send to the
remediation queue for further action.
Analyze data in a table and generate
business rules.
1 Click
All DirectivesShow:
SAS® Data Loader for Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Från data till beslut
TEXT
MANAGE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
DE
PL
OY
&
MO
NIT
OR
• SAS/ACCESS
• SAS Data Management
• SAS Federation Server
• SAS Data Loader for Hadoop• SAS Visual Analytics
• SAS In-memory
Statistics for Hadoop
• SAS HPA Products
• SAS Visual Statistics
• SAS In-memory Statistics
for Hadoop
• SAS Enterprise Miner
• SAS Scoring Accelerator for
Hadoop
• SAS Code Accelerator for
Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
make connections • share ideas • be inspired