adding statistical functionality to the data step with proc fcmp

30
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Adding Statistical Functionality to the DATA Step with PROC FCMP Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC Paper 326-2010

Upload: jariou

Post on 15-Jan-2015

1.362 views

Category:

Documents


2 download

DESCRIPTION

Extend and reuse SAS own procedures within data step code. Using PROC FCMP, we show you can create reusable code in the data step to pull together the power of possibly many procedures and getting a much cleaner programming model.

TRANSCRIPT

Page 1: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Adding Statistical Functionality to the DATA Step with PROC

FCMP

Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC

Paper 326-2010

Page 2: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Introduction/Motivation

Ever want to call a SAS procedure from the DATA step?

Ever want to encapsulate a complicated analytical algorithm in a reusable function?

This talk will demonstrate how to add statistical functionality to the DATA step through the definition of FCMP function wrappers.

Page 3: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Overview

RUN_MACRO function in FCMP

Recursive Technique

Iterative Technique/The Simulation

Meta Programming with FCMP

Page 4: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

RUN_MACRO Function in FCMP

executes a predefined SAS macro

Syntax:

rc = run_macro(‘macro_name’, var_1, var_2, …);

• rc : return code

• macro_name: name of sas macro to run

• var_N: variables to pass to/from macro

Page 5: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

See Macro Run /* Create a macro called testmacro */

%macro subtract_macro; %let difference = %sysevalf(&a - &b);%mend subtract_macro;

/* Use subtract_macro within a function */

proc fcmp outlib = sasuser.ds.functions;

function subtract(a,b); rc = run_macro(‘subtract_macro', a, b, difference); if rc eq 0 then return(difference); else return(.); endsub;  /* test the call */ a = 5.3; b = 0.7; diff = subtract(a, b); put diff=;

run;

diff=4.6

Page 6: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

See Macro Run in DATA Step options cmplib = (sasuser.ds);

data _null_; a = 5.3; b = 0.7; diff = subtract(a, b); put diff=;run;

diff=4.6

Page 7: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Segmenting Time Series Data

“Segmenting Time Series: A Survey and Novel Approach” Keogh, Eamonn, et. al.

reduce extremely large time series data sets

piecewise linear approximations

top-down recursive algorithm

Page 8: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Top Down Algorithm

SegmentTopDown ( currentSegment ) { error = run_linear_approximation( currentSegment );

leftError = run_linear_approximation ( leftSegment );

rightError = run_linear_approximation ( rightSegment );

combinedError = leftError + rightError;

if (combinedError < error) then { call SegmentTopDown ( leftSegment ) ; call SegmentTopDown ( rightSegment );

} else {

keep_segment( currentSegment ); }}

Page 9: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Top Down Subroutine

subroutine segment_topdown(data $, segdata $, var $, start, end, threshold);  error = linear_approximation(data, start,end);  mid = start + floor((end-start)/2); left_error = linear_approximation (data, start, mid); right_error = linear_approximation (data, mid+1, end);  improvement = (error – (left_error + right_error)) / error; if (improvement > threshold) then do; call segment_topdown(data, segdata, start, mid, threshold); call segment_topdown(data, segdata, mid+1, end, threshold); end; else do; call append_segment(segdata, start, end, error); end;  endsub;

Page 10: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Linear Approximation Subroutine

function linear_approximation(ds_in $, var $, first_obs, last_obs);

rc = run_macro(‘linear_approximation_macro’, ds_in, first_obs, last_obs, var, error);

return(error);

endsub;

Page 11: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Linear Approximation Macro

%macro linear_approximation_macro;  data _TEMP_; set &ds_in(firstobs=&first_obs obs=&last_obs); retain _TREND_ 0; _TREND_ = _TREND_ + 1; run;  proc reg data=_TEMP_ outest=_EST_ noprint; model &var = _TREND_ / sse; run; quit;  proc sql noprint; select _SSE_ into :ERROR from _est_; quit;  %mend linear_approximation_macro;  

Page 12: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Results

data _NULL_;

call segment_topdown("sasuser.snp", "work.segds_20", "close", 1, 15116, 0.2);

call segment_topdown("sasuser.snp", "work.segds_15", "close", 1, 15116, 0.15); run;

Page 13: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Graphic Results

42 Piecewise Linear Segments

Page 14: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Graphic Results

113 Piecewise Linear Segments

Page 15: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Iterative Technique• "Minimum Quadratic Distance Estimation for the

Proportional Hazards Regression Model with Grouped Data“, Jacques Rioux and Andrew Luong

• Survival models/proportional hazard model

• Proc PHREG (max likelihood) versus minimum distance methods

• Iteratively reweighted least squares algorithm

Page 16: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Iteratively Reweighted Least Squares Algorithm

initialize_weights( weights );

params1 = run_regression( weights );

while (maxRelativeDifference > criteria)

{

update_weights(weights);

params2 = run_regression( weights );

maxRelativeDifference = params2 - params1;

params1 = params2;

}

Page 17: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

IterativeTechnique: DATA Step code

subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indepVars $ );

array params1[3];

array params2[3];

call prepare_phdata(indata, “_prepdata_”);

call run_regression(“_prepdata_”, depVars, indepVars, weightVars, parmData, params1);

maxRelativeDifference = 1;

do while( maxRelativeDifference > 0.0001 );

call update_weights(“_prepdata_”, weightVars, parmData);

call run_regression( “_prepdata_”, depVars, indepVars, weightVars, parmData, params2 );

maxRelativeDifference = calc_max_relative_diff(params1,params2);

end;

endsub;

Page 18: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run_Regression Subroutine

subroutine run_regression( data $, dependent $, independent $, weight $, parmData $, parmArray[*]); outargs parmArray;

array tmpArray[1] _temporary_;

rc = RUN_MACRO ('run_regression_macro', data, parmData , dependent, independent, weight) ;

rc = read_array(parmData, tmpArray); do i = 1 to dim(parmArray); parmArray[i] = tmpArray[1,i]; end;

endsub;

Page 19: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run_Regression Macro

%macro run_regression_macro;

proc reg data=&data outest=&parmData NOPRINT; model &dependent = &independent/noint; weight &weight; quit;

data &parmData; set &parmData; keep &independent; run;

%mend run_regression_macro  

Page 20: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

The True Glory of Reusable Functions: The Simulation

• Now have a “fitting routine” for the Proportional Hazard Model (fit_ph_model)

• Create a function to generate PH data (called generate_ph_data)

• Create a function to append fits to results data set (called append_ph_data).

Page 21: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

The Simulation Study

proc fcmp; do i=1 to 1000; call simulate_ph_data ("work.simdata"); call fit_ph_model("work.simdata", "work.params", "log_log_Pij", "Weight", "x1 x2 x3" ); call append_data("work.simresults", "work.params"); end; run;

Page 22: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Simulation Results

Coefficient Real Value Mean StDev

X1 0.1 0.102454 0.036917

X2 0.3 0.307029 0.050375

X3 0.2 0.205464 0.017793

Page 23: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Simulation Graphs

Page 24: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Meta Programming

Create you own scoring function dynamically from a fitted model

subroutine create_score( data $, dependent $, independent $, scoreFunc $, library $ ); paramds = "work.params"; rc = RUN_MACRO('run_regression_macro', data, paramds, dependent, independent); rc = RUN_MACRO('create_score_func_macro', paramds, independent, scoreFunc, library); endsub;

Page 25: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Score Function Macro%macro create_score_func_macro;

proc transpose data =&paramds out=&paramds._t; var &independent; run;

proc sql noprint; select trim(_NAME_) || " * " || strip(put(col1,BEST12.))

into: theScore separated by " + "from &paramds._t;

select trim(_NAME_)into: theArgs separated by " , "from &paramds._t;

quit;

data _NULL_; set &paramds; call symputX ("Intercept",intercept); run;

<continued>

Page 26: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Score Function Macro - continuedproc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &theScore); endsub; quit;

%mend create_score_func_macro;

Page 27: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run Create Score Function

data _NULL_; call create_score("work.mroz", "lwage", "educ exper age kidslt6 kidsge6", "PredLWage_Full", "sasuser.score"); call create_score("work.mroz", "lwage", "educ exper age", "PredLWage_NoKids", "sasuser.score");run;

data _NULL_; educ = 15; exper = 5; age = 30; kidslt6 = 2; kidsge6 = 1;

PredWage_Full = exp(PredLWage_Full(educ, exper, age, kidslt6, kidsge6)); put PredWage_Full=;

PredWage_NoKids = exp(PredLWage_NoKids(educ, exper, age)); put PredWage_NoKids=;run;

PredWage_Full=3.4199679212 PredWage_NoKids=3.787216653

Page 28: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Conclusions

Users can encapsulate preexisting analytical procedures as building blocks for even larger more complex statistical analysis methods!

PROC FCMP provides the vehicle to write reusable, independent program units (functions and subroutines)

These units can be written and tested independently.

Page 29: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Where to find more information

http://support.sas.com/saspresents

Paper is PDF form

Zip file containing all source code

Page 30: Adding Statistical Functionality to the DATA Step with PROC FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Adding Statistical Functionality to the DATA Step with PROC FCMP

Paper 326-2010