introductionwrite functions!example: calculate...

129
Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Orien Outline 1 Introduction 2 Write Functions! 3 Example: Calculate Entropy 4 Arguments and Returns 5 R Style 6 Writing Better R Code 7 Object Oriented Programming Writing Functions I 1 / 129 University of Kansas

Upload: voque

Post on 28-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 1 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Writing Functions In RNecessity Really is a Mother

Paul E. Johnson12

1University of Kansas, Department of Political Science 2Center for ResearchMethods and Data Analysis

2013

Writing Functions I 2 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 3 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 4 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Did You Ever Write a Program?

If Yes: R’s different than that.

If No: Its not like other things you’ve done.

In either case, don’t worry about it :)

Writing Functions I 5 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

R is a little bit like an elephant

Its a tree trunk! Its a snake! Its a brush!

Writing Functions I 6 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

The R Language is like S, of course

The S Language– John Chambers, etal. at Bell Labs, mid 1970s.See Richard Becker’s “Brief History ofS” about the AT&T years

There have been 4 generations of theS language.

Many packages now were written inS3, but S4 has been recommended fornew packages for at least 5 years.

A new framework known as “referenceclasses” is now being developed (wasjokingly referred to as “R5” at onetime)

S3: The New S Language 1988

Writing Functions I 7 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Is R a Branch from S?

R can be seen as a competingimplementation of S.Ross Ihaka and Robert Gentleman.1996. “R: A language for data analysisand graphics.” Journal ofComputational and GraphicalStatistics, 5(3):299-314.

Open Source, Open Community, openrepository CRAN

S pioneers now work to advance R.

S4: John Chambers,Software forData Analysis: Programming with R,Springer, 2008

Writing Functions I 8 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 9 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Overview: 3 reasons to write functions

Preserve your sanity: isolate specific work.

Side benefit: preserve sanity of others who have to read yourcode

Facilitate re-use of your ideas

Co-operate with R functions like lapply() and boot() whichREQUIRE functions from us.

Writing Functions I 10 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Functions: Separate Calculations Meaningfully

New programmers tempted to craft a giant sequence of 1000commands

Just Don’t!

Problem No other human can comprehend that messSolution Write functions to calculate separate parts

I don’t feel comfortable with any function until I have a small“working example” to explore it (many availablehttp://pj.freefaculty.org/R)

Writing Functions I 11 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Re-use your work

If you write a function, you can put it to use in many differentcontexts

If you write a gigantic stream of 1000 commands, you can’t.

Writing Functions I 12 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Avoid for loops with lots of meat inside

Instead of this:

f o r ( i i n 1 :1000}{1000 s o f l i n e s he r e f u l l o f x [ i ] , z [ i ] , and so f o r t h}

We want:

fn1 <− f u n c t i o n ( arguments ) { . . . }fn2 <− f u n c t i o n ( arguments ) { . . . }f o r ( i i n 1 : 1000) {

y <− fn1 ( x , . . . )z <− fn2 ( y , . . . )}

This is easier to read, understand, and more re-usable!

Writing Functions I 13 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Personal confession

My first attack at any problem is often a long string ofcommands that are not separable, not readable

The revision process usually causes me to segregate code intoseparate pieces

One hint that you need a function: constant cutting andpasting of code scraps from one place to another

Writing Functions I 14 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

When finished, I Wish Your R program would look like this

myfn1 <− f u n c t i o n ( argument1 , argument2 , . . . ) {## lines here using argument1 , argument2

}myfn2 <− f u n c t i o n ( argument3 , argument4 ) {

## lines here

}## Try to perfect the above. Then use them

##

a <− 7b <− c (4 , 4 , 4 , 4 , 2)g r e a t 1 <− myfn1 ( a , b , parm3 = TRUE)g r e a t 2 <− myfn2 (b , g r e a t 1 )

Writing Functions I 15 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How to Create a Function

R allows us to create functions “on the fly”. This is theessential difference between a compiled language like C and aninterpreted language like R. While an R session is running, wecan add new capabilities to it.

The artist Escher would like this one:The word function is a function that createsfunctions!

A new function somethingGood() is defined by a stanza thatbegins like so:

somethingGood <− f u n c t i o n ( arguments ) {

somethingGood is a name we choose

Writing Functions I 16 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How to Create a Function ...

The terms arguments and parameters are interchangeable. Ioften say inputs. In R, do not say “options”, that confusespeople because R has a function called options() thatgoverns the working session.

arguments may be specified with default values, as in

somethingGood <− f u n c t i o n ( x1 = 0 , x2 = NULL) {

After the squiggly brace, any valid R code can be used. Wecan even define functions inside the function!

What happens in the function stays in the function. Thingsyou create inside there do not escape the closure unless youreally want them to.

Return results: When when the function’s work is finished, asingle object’s name is included on the last line.

Writing Functions I 17 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How to Create a Function ...

somethingGood <− f u n c t i o n ( x1 = 0 , x2 = NULL) {## suppose really interesting calculations create

res , a result

r e s}

There are some little wrinkles about returns that will bediscussed later. But, for now, I plant some seeds in your mind.

The return includes one objectThe result will ordinarily print in the R terminal when thefunction runs, but we can prevent that by using the last line as

somethingGood <− f u n c t i o n ( x1 = 0 , x2 = NULL) {## suppose really interesting calculations create \

texttt{res}, a result

i n v i s i b l e ( r e s )}

Writing Functions I 18 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How to Create a Function ...

It is possible to exit sooner, to short-circuit the calculationsbefore they have all run their course. That is what thereturn() function is for.

somethingGood <− f u n c t i o n ( x1 = 0 , x2 = NULL) {## suppose you created res

i f ( s omeLog i c a lCond i t i o n ) r e t u r n ( i n v i s i b l e ( r e s ) )## otherwise , go on and revise res further.

i n v i s i b l e ( r e s )}

Writing Functions I 19 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

R Functions pass information “by value”

The basic premise is that users should organize theirinformation “here”, in the current environment, and it isimportant that the function should not accidentally damage it.

Thus, we send info “over there” to a function

We get back a new something.

The function DOES NOT change variables we give to thefunction

The super assignment << − allows an exception to this, butR Core recommends we avoid it. Only experts should use it.

Writing Functions I 20 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

A simple example of a new function: doubleMe

doubleMe <− f u n c t i o n ( i n pu t = 0) {newva l <− 2 * i n pu t

}

The function’s name is “doubleMe”I choose a name for the incoming variable “input”.Other names would do (must start with a letter, etc.):

doubleMe <− f u n c t i o n ( x ) {out <− 2 * x

}

The last named thing is the one that comes back from the function.Note, explicit use of a return function is NOT REQUIRED. Thelast named thing comes back to us.

Writing Functions I 21 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Key Elements of doubleMe

doubleMe <− f u n c t i o n ( i n pu t = 0) {newva l <− 2 * i n pu t

}

doubleMe a name with which toaccess this function.Because of mybackground in“Objective C”, I like thisstyle of name. Don’tput periods in functionnames unless you knowabout “classes” and areusing them.

input a name usedINTERNALLY whilemaking calculations

= 0 An optional defaultvalue.

newval Last calculation isreturned.

Writing Functions I 22 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How to Call doubleMe

What is 2 * 7?

( doubleMe (7) )

[ 1 ] 14

The caller may name the arguments explicitly:

( doubleMe ( i npu t = 8) )

[ 1 ] 16

Wonder why I use parentheses around everything? Its just apresentational trick. The default action is “print”, and that’swhat happens when you put something in parentheses withouta function name.

The alternatives are:

p r i n t ( doubleMe ( i npu t = 3) )

Writing Functions I 23 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How to Call doubleMe ...

[ 1 ] 6

or

x <− doubleMe ( i npu t = 2)x

[ 1 ] 4

Writing Functions I 24 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Generally, I Prefer Clarity in the Call

The “call” is the code statement that puts a function to use.The call includes the function’s name and all arguments.

This works

doubleMe (10)

But wouldn’t you rather be clear?

doubleMe ( i npu t = 10)

When there are many arguments, naming them often helpsprevent accidental matching of input to arguments (R’spositional matching can be fooled).

Writing Functions I 25 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Function Calls

But if you name the argument wrong, it breaks

> doubleMe ( myInput = 7)E r r o r i n doubleMe ( myInput = 7) :unused argument ( s ) ( myInput = 7)

What if you feed it something unsuitable?

> doubleMe ( lm ( rnorm (100) ∼ rnorm (100) ) )E r r o r i n 2 * i n pu t : non−numeric argument to b i n a r y

op e r a t o rI n a d d i t i o n : Warning messages :1 : I n mo d e l .m a t r i x . d e f a u l t (mt , mf , c o n t r a s t s ) :

the r e s pon s e appeared on the r ight−hand s i d e and wasdropped

2 : I n mo d e l .m a t r i x . d e f a u l t (mt , mf , c o n t r a s t s ) :problem with term 1 i n mode l .ma t r i x : no columns a r e

a s s i g n e d

Writing Functions I 26 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Vectorization

This is not always true, but OFTEN:

We get “free”“vectorization”

( doubleMe ( c ( 1 , 2 , 3 , 4 , 5 ) ) )

[ 1 ] 2 4 6 8 10

But it won’t allow you to specify too many inputs:

> doubleMe (1 , 2 , 3 , 4 , 5 )E r r o r i n doubleMe (1 , 2 , 3 , 4 , 5) :

unused argument ( s ) (2 , 3 , 4 , 5)

Vectorization: vector in =⇒ vector out

Oops. I forgot the input

doubleMe ( )

Gives the default value.

Writing Functions I 27 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

print.function magic

Oops. I forgot the parentheses

doubleMe

f u n c t i o n ( i n pu t = 0) {newva l <− 2 * i n pu t

}

Similarly, type “lm” and hit return. Or “predict.glm”. Don’tadd parentheses.

When you type a function’s name, R thinks you want to printthat thing, and it invokes a “print method” for you, calledprint.function(). (That’s print as applied to an object ofclass function.)

Generally, print.function() will display the R internalfunctions in a “tidied up” format. Your functions–the ones youhave created in your session–are generally not tidied up. Thatis discussed in the Rstyle vignette distributed with rockchalk.

Writing Functions I 28 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

predict.glm, for example

p r e d i c t . g lm

f u n c t i o n ( ob j e c t , newdata = NULL , type = c ( ” l i n k ” , ”r e s pon s e” ,”terms ”) , s e . f i t = FALSE , d i s p e r s i o n = NULL , terms =

NULL ,n a . a c t i o n = na .pa s s , . . . )

{t ype <− match .a rg ( type )n a . a c t <− o b j e c t $ n a . a c t i o no b j e c t $ n a . a c t i o n <− NULLi f ( ! s e . f i t ) {

i f ( m i s s i n g ( newdata ) ) {pred <− sw i t c h ( type , l i n k = ob j e c t $

l i n e a r . p r e d i c t o r s ,r e s pon s e = ob j e c t $ f i t t e d . v a l u e s , te rms =

p r e d i c t . l m ( ob j e c t ,s e . f i t = s e . f i t , s c a l e = 1 , type = ”terms ”

,te rms = terms ) )

i f ( ! i s . n u l l ( n a . a c t ) )

Writing Functions I 29 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

predict.glm, for example ...pred <− n a p r e d i c t ( na . ac t , p red )

}e l s e {

pred <− p r e d i c t . l m ( ob j e c t , newdata , s e . f i t ,s c a l e = 1 ,type = i f e l s e ( type == ” l i n k ” , ”r e s pon s e ” ,

type ) ,te rms = terms , n a . a c t i o n = n a . a c t i o n )

sw i t c h ( type , r e s pon s e = {pred <− f am i l y ( o b j e c t ) $ l i n k i n v ( pred )

} , l i n k = , terms = )}

}e l s e {

i f ( i n h e r i t s ( ob j e c t , ”s u r v r e g ”) )d i s p e r s i o n <− 1

i f ( i s . n u l l ( d i s p e r s i o n ) | | d i s p e r s i o n == 0)d i s p e r s i o n <− summary ( ob j e c t , d i s p e r s i o n =

d i s p e r s i o n ) $ d i s p e r s i o nr e s i d u a l . s c a l e <− a s . v e c t o r ( s q r t ( d i s p e r s i o n ) )pred <− p r e d i c t . l m ( ob j e c t , newdata , s e . f i t , s c a l e =

r e s i d u a l . s c a l e ,

Writing Functions I 30 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

predict.glm, for example ...

t ype = i f e l s e ( type == ” l i n k ” , ”r e s pon s e ” , type ) ,te rms = terms , n a . a c t i o n = na . a c t i o n )

f i t <− pred $ f i ts e . f i t <− pred $ s e . f i tsw i t c h ( type , r e s pon s e = {

s e . f i t <− s e . f i t * abs ( f am i l y ( o b j e c t ) $mu.eta ( f i t) )

f i t <− f am i l y ( o b j e c t ) $ l i n k i n v ( f i t )} , l i n k = , terms = )i f ( m i s s i n g ( newdata ) && ! i s . n u l l ( n a . a c t ) ) {

f i t <− n a p r e d i c t ( na . ac t , f i t )s e . f i t <− n a p r e d i c t ( na . ac t , s e . f i t )

}pred <− l i s t ( f i t = f i t , s e . f i t = s e . f i t ,

r e s i d u a l . s c a l e = r e s i d u a l . s c a l e )}pred

}<bytecode : 0 x2bc3fb8><environment : namespace : s t a t s>

Writing Functions I 31 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Function Calls: Local versus Global

The variables we create in “our” session are generally in theGlobal Environment.

Local variables in functions.

Function arguments are local variablesVariables created inside are local variables

Local variables cease to exist once R returns from the function

After playing with doubleMe(), we note that the variableinput does not exist in the current environment.

> l s ( )[ 1 ] ”doubleMe ”> i n pu tE r r o r : o b j e c t ' i n pu t ' not found

Writing Functions I 32 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Check Point: write your own function

Write a function “myGreatFunction” that takes a vector andreturns the arithmetic average.

Generate your own input data, x1, like so

s e t . s e e d (234234)x1 <− rnorm (10000 , m = 7 , sd = 19)

In myGreatFunction(), you can use any R functions youwant, it is not necessary to re-create the mean() function(unless you really want to :))

After you’ve written myGreatFunction(), use it:

x1mean <− myGreatFunct ion ( x1 )x1mean

Now stress test your function by changing x1

x1 [ c (13 , 44 , 99 , 343 , 555) ] <− NAmyGreatFunct ion ( x1 )

Writing Functions I 33 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 34 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Entropy can summarize diversity for a categorical variable

Entropy in Physics means disorganization

Sometimes called Shannon’s Information Index

Basic idea. List the possible outcomes and their probabilities

The amount of diversity in a collection of observationsdepends on the equality of the proportions of cases observedwithin each type.

Writing Functions I 35 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

A Reasonable Person Would Agree . . .

This distribution is “less diverse”outcome name t1 t2 t3 t4 t5

prob(outcome) 0.1 0.3 0.05 0.55 0.0

than this distribution:outcome name t1 t2 t3 t4 t5

prob(outcome) 0.2 0.2 0.2 0.2 0.2

Writing Functions I 36 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

The Information Index

For each type, calculate the following information (or can Isay “diversity”?) value

−pt ∗ log2(pt) (1)

Note that if pt = 0, the diversity value is 0

If pt = 1, then diversity is also 0

Sum those values across the m categories

m∑t=1

−pt ∗ log2(pt) (2)

Diversity is at a maximum when pt are all equal

Writing Functions I 37 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Calculate Diversity for One Type

d i v r <− f u n c t i o n ( p = 0) {i f e l s e ( p > 0 & p < 1 , −p * l o g2 ( p ) , 0)

}

Writing Functions I 38 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Let’s plot that

pseq <− seq (0 .001 , 0 .999 , l e n g t h =999)p s e q . d i v r <− d i v r ( pseq )p l o t ( p s e q . d i v r ∼ pseq , x l a b = ”p ” , y l a b = ”D i v e r s i t y

Con t r i b u t i o n o f One Obse r va t i on ” , main = e x p r e s s i o n (pa s t e ( ”D i v e r s i t y : ” , −p %*% log [ 2 ] ( p ) ) ) , t ype = ” l ”)

Writing Functions I 39 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

Diversity: − p × log2(p)

p

Div

ersi

ty C

ontr

ibut

ion

of O

ne O

bser

vatio

n

Writing Functions I 40 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Diversity Function

Define an Entropy function that sums those values

en t ropy <− f u n c t i o n ( p ) {sum( d i v r ( p ) )

}

Calculate some test cases

en t ropy ( c (1 / 5 , 1/ 5 , 1/ 5 , 1/ 5 , 1/ 5) )

[ 1 ] 2 .321928

en t ropy ( c (3 / 5 , 1/ 5 , 1/ 5 , 0/ 5 , 0/ 5) )

[ 1 ] 1 .370951

Writing Functions I 41 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

There’s a Little Problem With This Approach

Diversity is sensitive to the number of categories8 equally likely outcomes (rep(x,y): repeats x y times.)

en t ropy ( rep (1 / 8 , 8) )

[ 1 ] 3

14 equally likely outcomes

en t ropy ( rep (1 / 14 , 14) )

[ 1 ] 3 .807355

Write it out for a 3 category case

−1

3log2(

1

3)− 1

3log2(

1

3)− 1

3log2(

1

3) = −log2(

1

3) (3)

The highest possible diversity with 3 types is −log2(13)

The highest possible diversity for N types is −log2( 1N )

Writing Functions I 42 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

We Might As Well Plot That

maximumEntropy <− f u n c t i o n (N) − l o g2 (1 /N)Nmax <− 15M <− 2 :Nmaxp l o t (M, maximumEntropy (M) , x l a b = ”N o f P o s s i b l e Types ” ,

y l a b = ”Maximum Po s s i b l e D i v e r s i t y ” , main = ”MaximumPo s s i b l e Entropy For N Ca t e g o r i e s ” , t ype = ”h ” , axe s =FALSE)

a x i s (1 )a x i s (2 )p o i n t s (M, maximumEntropy (M) , pch=19)

Writing Functions I 43 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Maximum Entropy as a Function of the Number of Types

Maximum Possible Entropy For N Categories

N of Possible Types

Max

imum

Pos

sibl

e D

iver

sity

2 4 6 8 10 12 14

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Writing Functions I 44 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Final Result: Normed Entropy as a Diversity Summary

normedEntropy <− f u n c t i o n ( x ) en t ropy ( x ) / maximumEntropy (l e n g t h ( x ) )

Compare some cases with 4 possible outcomes

normedEntropy ( c (1 / 4 ,1 / 4 ,1 / 4 ,1 / 4) )

[ 1 ] 1

normedEntropy ( c (1 / 2 , 1/ 2 , 0 , 0) )

[ 1 ] 0 . 5

normedEntropy ( c (1 , 0 , 0 , 0) )

[ 1 ] 0

Writing Functions I 45 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How about 7 types of outcomes:

normedEntropy ( rep (1 / 7 , 7) )

[ 1 ] 1

normedEntropy ( ( 1 : 7 ) / ( sum ( 1 : 7 ) ) )

[ 1 ] 0 .9297027

normedEntropy ( c (2 / 7 , 2/ 7 , 3/ 7 , 0 , 0 , 0 , 0) )

[ 1 ] 0 .5544923

normedEntropy ( c (5 / 7 , 2/ 7 , 0 , 0 , 0 , 0 , 0) )

[ 1 ] 0 .3074497

Writing Functions I 46 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Compare 3 test cases

1 2 3 4 5 6 7 8 9 10

0.0

0.1

0.2

0.3

0.4

0.5

Normed Entropy= 0.82

0.1 0.1 0.1 0.1

0.2 0.2 0.2

0 0 0

1 2 3 4 5

0.0

0.1

0.2

0.3

0.4

0.5

Normed Entropy= 0.66

0.2

0.4 0.4

0 0

1 2 3 4 5 6 7

0.0

0.1

0.2

0.3

0.4

0.5

Normed Entropy= 0.93

0.04

0.07

0.11

0.14

0.18

0.21

0.25

Subjectively, I wrestle with the question of whether comparisonacross variables with different M is meaningful.

Writing Functions I 47 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Entropy is reported in summarize() andsummarizeFactors() in the rockchalk package

Manufacture a variable to re-produce testcase3

round ( t e s t c a s e 3 , 2 )

[ 1 ] 0 . 04 0 .07 0 .11 0 .14 0 .18 0 .21 0 .25

l i b r a r y ( r o c k ch a l k )t e s t c a s e 3 v <− f a c t o r ( c ( 1 , 2 , 2 , 3 , 3 , 3 , 4 , 4 , 4 , 4 , 5 , 5 , 5 , 5 , 5 ,

6 , 6 , 6 , 6 , 6 , 6 , 7 , 7 , 7 , 7 , 7 , 7 , 7 ) )round ( ( t a b l e ( t e s t c a s e 3 v ) / l e n g t h ( t e s t c a s e 3 v ) ) , 2)

t e s t c a s e 3 v1 2 3 4 5 6 7

0 .04 0 .07 0 .11 0 .14 0 .18 0 .21 0 .25

dat <− da t a . f r ame ( t e s t c a s e 3 v )summar i zeFacto r s ( dat )

Writing Functions I 48 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Entropy is reported in summarize() andsummarizeFactors() in the rockchalk package ...

t e s t c a s e 3 v7 : 7 .00006 : 6 .00005 : 5 .00004 : 4 .0000( A l l Others ) : 6 .0000NA ' s : 0 .0000en t ropy : 2 .6100normedEntropy : 0 .9297N :28 .0000

Writing Functions I 49 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 50 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Function Anatomy

Arguments: The Arguments of a function

someWork <− f u n c t i o n ( what1 , what2 , what3 )

what1, what2, and what3 become “local variables” inside thefunction.

named arguments must begin with letters, may includenumbers, or “ ” or “.”, but no “” or “,” or “*” or ”-”

Writing Functions I 51 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Arguments: named and unnamed R variables

someWork <− f u n c t i o n ( what1 , what2 , what3 , . . . )

In R, everything is an “object”. what1, what2, what3 can beANYTHING!

That is a blessing and a curse

Blessing: Flexibility! Let an argument be a vector, matrix, list,whatever. The function declaration does not care.Curse: Difficulty managing the infinite array of possible waysusers might break your functions.

It is your choice, whether your function should receive

2 integers (x, y), orone vector of 2 integers (x)For examples of this, look at the arguments of plot.default,arrows, segments, lines, and matplot.

Writing Functions I 52 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Peculiar spellings like . . .

Functions may have an argument “. . .”

It seems funny, but “...” is actually a word made up of threelegal characters

If the caller includes any named arguments that are notwhat1, what2, or what3, then the R runtime will put them ina list and give them to “...”

The “...” argument requires special effort.

Writing Functions I 53 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Function Arguments: R is VERY Lenient

R is lenient. Perhaps too lenient!

Arguments are not type-defined. In

myF <− f u n c t i o n ( x1 , x2 ) {

x1 could be an integer vector, a character string, a regressionmodel, or anythingFunction writer may declare default values, but need not.These are acceptable definitions

someWork <− f u n c t i o n ( what1 , what2 , what3 , what4 ,what5 )

someWork <− f u n c t i o n ( what1 = 0 , what2 = NULL , what3= c (1 , 2 , 3 ) , what4 = 3 * what1 , what5 )

someWork <− f u n c t i o n ( what1 = 0 , what2 , what3 , what4= 3 * what1 , what5 )

Writing Functions I 54 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

R is Lenient toward Users as Well

R is lenient on format of function calls

someWork (1 )someWork ( what=1, someObject )someWork ( what5 = f r ed , what4 = jim , what3 = j o e )

Not all parameters must be providedPartial argument matching

Writing Functions I 55 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

R is Lenient About Undefined Variables

This is a convenience and a curse

Variables inside functions might not be resolved the way youexpect.

If a variable is used, but not defined inside the function, R will“look outward” for it

This is “lexical scope” in action. The runtime engine searchesthrough a sequence of “frames”, ending up at the user’sworkspace (which is the Global Environment).

Writing Functions I 56 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Example of the Undefined Variable Problem

Suppose “dat” exists in your workspace.

Here is a function that will find “dat”, even if that’s not whatyou intend.

myFn <− f u n c t i o n ( x , y , z ) {dat1 <− 2 * x + 3 * yr e s <− s q r t ( dat )

}

Note my typographical error (dat where dat1 should be). Thefunction should crash, but it will not.

R will find the wrong “dat” and the result we get will beunhelpful

In my experience, this is the single most important cause ofhard-to-find flaws in user code.

Writing Functions I 57 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Inside the function, Check Arguments

Check and Re-format

“argument checking”: diagnose what arguments the userprovidedFigure out if they are “correct” for what the function requires.

You choose how sympathetic you want to be toward users. Doyou want to accept incomplete input and re-format it? (Inrockchalk/R/genCorrelatedData.R, find may “re-formatter”functions like makeVec(), vech2mat()).

Writing Functions I 58 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Arguments: When to Use Defaults?

I don’t know, but . . .

As an R beginner, I took a very conservative strategy ofputting defaults on everything!

f u n c t i o n ( what1 = NULL , what2 = NULL , what3 = 3 , what4 =``a ' ' )

That way, if a user forgets to provide“what3”, then the systemwill not go looking for it.

If the defaults usually work, this is concise, easy to read.

Most functions in R base code don’t set defaults for all, oreven most, variables.

Instead, we manage arguments with more delicacy. Insistingon NULL defaults is “throwing the baby out with the bathwater.”

Writing Functions I 59 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Functions that Help while Checking Arguments

missing()

Inside a function, missing() is a way to ask if the usersupplied a value for an argument.

doSomething <− f u n c t i o n ( what1 , what2 , what3 , what4 ) {i f ( m i s s i n g ( what1 ) ) s top (`` you f o r g o t to s p e c i f y

what1 ' ' )}

I’ve found this conservative approach to be an error-preventer.If x is not provided, or if the user gave us a NULL, we betteradapt!

i f ( m i s s i n g ( x ) | | i s . n u l l ( x ) ) ## do something

Writing Functions I 60 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Functions that Help while Checking Arguments ...

Type Checks

There are many “is.” functions. Ask if x is a certain type of thing:

is.vector() TRUE or FALSE: are you a vector?

is.list() TRUE or FALSE: are you a list?

is.numeric() TRUE or FALSE: are you numeric?

You get the general idea, I hope

Writing Functions I 61 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Functions that Help while Checking Arguments ...

Look at all of these is. functions!

i s . a r r a y i s . l o a d e d i s . o b j e c ti s . a t om i c i s . l o g i c a l i s . o r d e r e di s . c a l l i s . m a t r i x i s . p a c k a g e v e r s i o ni s . c h a r a c t e r i s .m t s i s . p a i r l i s ti s . c omp l e x i s . n a i s . p r i m i t i v ei s . d a t a . f r am e i s . n a<− i s . q ri s . d o u b l e i s . n a . d a t a . f r am e i s . Ri s . e l em e n t i s . n a<−. d e f a u l t i s . r a s t e ri s . emp t y .mode l i s . n a<−. f a c t o r i s . r a wi s . e n v i r o nmen t i s . name i s . r e c u r s i v ei s . e x p r e s s i o n i s . n a n i s . r e l i s t a b l ei s . f a c t o r i s . n a . n um e r i c v e r s i o n i s . s i n g l ei s . f i n i t e i s . n a . POS IX l t i s . s t e p f u ni s . f u n c t i o n i s . n u l l i s . s ymb o li s . i n f i n i t e i s . n um e r i c i s . t a b l ei s . i n t e g e r i s . n ume r i c .D a t e i s . t si s . l a n g u a g e i s . n um e r i c . d i f f t i m e i s . t s k e r n e li s . l e a f i s . nume r i c .POS IX t i s . u n s o r t e di s . l i s t i s . n um e r i c v e r s i o n i s . v e c t o r

Writing Functions I 62 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Functions that Help while Checking Arguments ...

Size Checks

length()

nrow(), ncol() number of rows (columns) from a matrix

NROW(), NCOL() Works if input is matrix, but will treat a vectoras a one column matrix.

dim() Returns 2 dimensional vector, works with matricesand arrays

Stop, Warn

stop(), stopifnot() Ways to abend the function

warning Will return, but throws a message onto the R warninglog. User can run warnings() to see messages.

Writing Functions I 63 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

The Return Has To Be Singular

When you use a function, it is necessary to “catch” the outputwith a single object name, as in

newth ing <− doubleMe (32)newth ing

[ 1 ] 64

i s . n um e r i c ( newth ing )

[ 1 ] TRUE

i s . v e c t o r ( newth ing )

[ 1 ] TRUE

We expect “doubleMe(32)” should return 64, and it does.

Writing Functions I 64 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

The Return Has To Be Singular ...

The“vectorization for free”gives us a hint of what is to follow.

newth ing <− doubleMe ( c (1 , 2 , 3 , 4 , 5 ) )newth ing

[ 1 ] 2 4 6 8 10

i s . n um e r i c ( newth ing )

[ 1 ] TRUE

i s . v e c t o r ( newth ing )

[ 1 ] TRUE

R allows us to return one “thing”, but “thing” can be a rich,informative thing!

Writing Functions I 65 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Generalization. Return a list

A list may include numbers, characters, vectors ,etc

or data frames or other lists

Read code for function “lm”

Almost ALL interesting R functions return a list of elements.

Writing Functions I 66 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Check Point: revise your function

Create “myGreatFunction2” by revising “myGreatFunction”.Make it return a vector of 3 values: the maximum, theminimum, and the median.

Generate your own input data, x1, like so

x1 <− rnorm (10000 , m=7, sd=19)

After you’ve written myGreatFunction, use it:

myGreatFunct ion ( x1 )

Now stress test your function by changing x1

x1 [ c (13 , 44 , 99 , 343 , 555) ] <− NAmyGreatFunct ion ( x1 )

Writing Functions I 67 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Check Point: Run this example function

Admittedly, this is a dumb example, but . . .

This function returns a regression object

aRegMod <− f u n c t i o n ( x in1 , x in2 , yout ) { lm ( yout ∼ x i n1+ x i n2 ) }

Generate some data, run aRegMod()

dat <− r o c k c h a l k : : g enCo r r e l a t edData (N = 100 , rho = 0 .3, beta = c (1 , 1 .0 , −1.1 , 0 . 0 ) )

m1 <− with ( dat , aRegMod( x1 , x2 , y ) )

m1 is a “single object”

Run “class(m1)”, “attributes(m1)”, “summary(m1)”, ‘str(m1)”,

Writing Functions I 68 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 69 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

The Unofficial Official R Style Guide

This is discussed in rockchalk vignette Rstyle

There is not much “official style guidance” from the R CoreTeam

Don’t mistake that as permission to write however you want.

There ARE very widely accepted standards for the way thatcode should look

Writing Functions I 70 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Inductive Style Guide

To see that R really does have an implicitly stated Style,inspect the R source code.

The code follows a uniform pattern of indentation, the use ofwhite space, and so forth.

Recall that print.function() will be called if you type thename of a function without parentheses. That is a tidied upview of R’s internal structural represenation of a function.

. This time, lets look at “lm” inside R:

lm

Writing Functions I 71 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Inductive Style Guide ...

f u n c t i o n ( formula , data , subse t , we ights , n a . a c t i o n , method= ”qr ” ,model = TRUE, x = FALSE , y = FALSE , qr = TRUE,

s i n g u l a r . o k = TRUE,c o n t r a s t s = NULL , o f f s e t , . . . )

{r e t . x <− xr e t . y <− yc l <− ma t c h . c a l l ( )mf <− ma t c h . c a l l ( e xpand .do t s = FALSE)m <− match ( c ( ”fo rmu la ” , ”data ” , ”s ub s e t ” , ”we i gh t s ” , ”

n a . a c t i o n ” ,” o f f s e t ”) , names (mf ) , 0L )

mf <− mf [ c (1L , m) ]mf$ d r o p . u n u s e d . l e v e l s <− TRUEmf [ [ 1 L ] ] <− as.name ( ”mode l . f r ame ”)mf <− e v a l (mf , p a r e n t . f r ame ( ) )i f ( method == ”mode l . f r ame ”)

r e t u r n (mf )e l s e i f ( method != ”qr ”)

Writing Functions I 72 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Inductive Style Guide ...

warn ing ( g e t t e x t f ( ”method = '%s ' i s not s u ppo r t e d .Us ing ' qr ' ” ,method ) , domain = NA)

mt <− a t t r (mf , ”terms ”)y <− mode l . r e s pon s e (mf , ”numer ic ”)w <− a s . v e c t o r ( mode l .we i gh t s (mf ) )i f ( ! i s . n u l l (w) && ! i s . n um e r i c (w) )

s top ( ” ' weights ' must be a numer ic v e c t o r ”)o f f s e t <− a s . v e c t o r ( mo d e l . o f f s e t (mf ) )i f ( ! i s . n u l l ( o f f s e t ) ) {

i f ( l e n g t h ( o f f s e t ) != NROW( y ) )s top ( g e t t e x t f ( ”number o f o f f s e t s i s %d , shou ld

equa l %d ( number o f o b s e r v a t i o n s ) ” ,l e n g t h ( o f f s e t ) , NROW( y ) ) , domain = NA)

}i f ( i s . emp t y .mode l (mt) ) {

x <− NULLz <− l i s t ( c o e f f i c i e n t s = i f ( i s . m a t r i x ( y ) ) mat r i x ( ,

0 ,3) e l s e numer ic ( ) , r e s i d u a l s = y , f i t t e d . v a l u e s

= 0 *

Writing Functions I 73 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Inductive Style Guide ...y , we i gh t s = w, rank = 0L , d f . r e s i d u a l = i f ( !

i s . n u l l (w) ) sum(w !=0) e l s e i f ( i s . m a t r i x ( y ) ) nrow ( y ) e l s e l e n g t h ( y )

)i f ( ! i s . n u l l ( o f f s e t ) ) {

z$ f i t t e d . v a l u e s <− o f f s e tz$ r e s i d u a l s <− y − o f f s e t

}}e l s e {

x <− mode l .ma t r i x (mt , mf , c o n t r a s t s )z <− i f ( i s . n u l l (w) )

l m . f i t ( x , y , o f f s e t = o f f s e t , s i n g u l a r . o k =s i n g u l a r . o k ,. . . )

e l s e lm .w f i t ( x , y , w, o f f s e t = o f f s e t , s i n g u l a r . o k =s i n g u l a r . o k ,

. . . )}c l a s s ( z ) <− c ( i f ( i s . m a t r i x ( y ) ) ”mlm” , ”lm ”)z$ n a . a c t i o n <− a t t r (mf , ”n a . a c t i o n ”)z$ o f f s e t <− o f f s e t

Writing Functions I 74 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Inductive Style Guide ...

z$ c o n t r a s t s <− a t t r ( x , ”c o n t r a s t s ”)z$ x l e v e l s <− . g e t X l e v e l s (mt , mf )z$ c a l l <− c lz$ terms <− mti f ( model )

z$model <− mfi f ( r e t . x )

z$x <− xi f ( r e t . y )

z$y <− yi f ( ! qr )

z$ qr <− NULLz

}<bytecode : 0 x2a512f8><environment : namespace : s t a t s>

Writing Functions I 75 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Why Neatness Counts

You can write messy code, but you can’t make anybody readit.

Finding bugs in a giant undifferentiated mass of commands isdifficult

Chance of error rises as clarity decreases

If you want people to help you, or use your code, you shouldwrite neatly!

Writing Functions I 76 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Style Fundamentals

WHITE SPACE:

indentation. R Core Team recommends 4 spacesone space around operators like <- = *

“< −” should be used for assignments. “=” was used bymistake so often by novices that the R interpreter wasre-written to allow =. However, it may still fail in some cases.

Use helpful variable names

Separate calculations into functions. Sage advice from one ofmy programming mentors:

Don’t allow a calculation to grow longer than thespace on one screen. Break it down into smaller,well defined pieces.

Writing Functions I 77 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Be Careful about line endings

Unlike C (or other languages), R does not require an “end ofline character” like “;”.

That’s convenient, but sometimes code can “fool” R intobelieving that a command is finished.

From the help page for “if”

Note that it is a common mistake to forget to putbraces (‘{ .. }’) around your statements, e.g., after‘if(..)’ or ‘for(....)’. In particular, you should nothave a newline between ‘}’ and ‘else’ to avoid asyntax error in entering a ‘if ... else’ construct at thekeyboard or via ‘source’. For that reason, one(somewhat extreme) attitude of defensiveprogramming is to always use braces, e.g., for ‘if’clauses.

Writing Functions I 78 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Be Careful about line endings ...

The else “ Policy. I strongly recommend this format:

i f ( a− l o g i c a l− c ond i t i o n ) {## if TRUE , do this

} e l s e {## if FALSE , the other thing

}

to close the previous if and begin else on same line.

“if-else” is very troublesome. If R thinks the “if” is finished, itmay not notice the else.

i f ( x > 7) y <− 3e l s e y <− 2

Causes “Error: unexpected ’else’ in “else”

When running code line-by-line, the “naked else” always causesan error.

Writing Functions I 79 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Google Doc on R Coding is Just “somebody’s” OpinionThe Easily Googled Google R Standards

Writing Functions I 80 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How To Name Functions

Don’t use names for functions that are already in widespreaduse, like lm, seq, rep, etc.

I like Objective C style variable and function names thatsmash words together, as in myRegression myCode

R uses periods in function names to represent “objectorientation” or “subclassing”, thus I avoid periods for simplepunctuation.Ex: doSomething() is better than do.something

Underscores are now allowed in function names.Ex: do something() would be OK

Writing Functions I 81 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How To Name Variables

Use clear names always

Use short names where possible

Never use names of common functions for variables

Never name a variable T or F (doing so tramples R symbols)

“Name by suffix” strategy I’m using now:

m1 <− lm ( y ∼ x , data=dat )m1sum <− summary (m1)m1v i f <− v i f (m1)m1inf <− i n f l u e n c e .m e a s u r e s (m1)

Writing Functions I 82 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Expect Some Variations in My Code

I don’t mind adding squiggly braces, even if not requiredif (whatever == TRUE) {x <- y}

Sometimes I will use 3 lines where one would do

if (whatever == TRUE){

x <- y

}

When in doubt, I like to explicitly name arguments, but notalways.

I sometimes forget to write TRUE and FALSE when T and Fwill suffice

Here’s why that’s a big deal. Suppose a some user mistakenlyredefines T or F:

T <− 7F <− myTe r r i f i c F un c t i o n

then my functions that use “T” and “F” will die.

Writing Functions I 83 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 84 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

What is this Section?

Tips from the school of hard knocks

Criticisms of R code I’ve found or written

Writing Functions I 85 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Functions replace “cut and paste” editing

If you find yourself using “copy and paste” to repeat stanzas withslight variations, you are almost certainly doing the wrongthing.

Re-conceptualize, write a function that does the right thing

Use the function over and over

Why is this important: AVOIDING mistakes due to editing mistakes

Writing Functions I 86 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

We Learn By Criticizing

I keep a folder of R code that troubles me.This example is a more-or-less literal translation from SAS into Rthat does not use R’s special features.

# ---------------SPECIFICATIONS------------------------ #

i t e r =1000 #how many iterations per

condition

s e t . s e e d (7913025) # set random seed

# --------------END SPECIFICATIONS--------------------- #

#n per cluster sample size

f o r ( p e r c l u s t i n c (100) ) {#number of clusters - later for MLM application

f o r ( n c l u s t i n c (1 ) ) {#common correlation

f o r ( s e t c o r r i n c ( 1 : 8 ) ) {i f ( s e t c o r r ==1) {

c o r r . 1 0 <−1c o r r . 2 0 <−0c o r r . 3 0 <−0c o r r . 4 0 <−0

Writing Functions I 87 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

We Learn By Criticizing ...c o r r . 5 0 <−0c o r r . 6 0 <−0c o r r . 7 0 <−0c o r r . 8 0 <−0}

i f ( s e t c o r r ==2) {c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−0c o r r . 4 0 <−0c o r r . 5 0 <−0c o r r . 6 0 <−0c o r r . 7 0 <−0c o r r . 8 0 <−0}

i f ( s e t c o r r ==3) {c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−1c o r r . 4 0 <−0c o r r . 5 0 <−0c o r r . 6 0 <−0

Writing Functions I 88 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

We Learn By Criticizing ...c o r r . 7 0 <−0c o r r . 8 0 <−0}

i f ( s e t c o r r ==4) {c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−1c o r r . 4 0 <−1c o r r . 5 0 <−0c o r r . 6 0 <−0c o r r . 7 0 <−0c o r r . 8 0 <−0}

i f ( s e t c o r r ==5) {c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−1c o r r . 4 0 <−1c o r r . 5 0 <−1c o r r . 6 0 <−0c o r r . 7 0 <−0c o r r . 8 0 <−0

Writing Functions I 89 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

We Learn By Criticizing ...}

i f ( s e t c o r r ==6) {c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−1c o r r . 4 0 <−1c o r r . 5 0 <−1c o r r . 6 0 <−1c o r r . 7 0 <−0c o r r . 8 0 <−0}

i f ( s e t c o r r ==7) {c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−1c o r r . 4 0 <−1c o r r . 5 0 <−1c o r r . 6 0 <−1c o r r . 7 0 <−1c o r r . 8 0 <−0}

i f ( s e t c o r r ==8) {

Writing Functions I 90 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

We Learn By Criticizing ...

c o r r . 1 0 <−1c o r r . 2 0 <−1c o r r . 3 0 <−1c o r r . 4 0 <−1c o r r . 5 0 <−1c o r r . 6 0 <−1c o r r . 7 0 <−1c o r r . 8 0 <−1

}

That project defined several variables in that way, consuming 100sof lines.

Writing Functions I 91 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Quick Exercise

How would you convert that into a vector in R, without writing300 lines.Hint. You want to end up with a vector setcorr with 0 elements,all either 0 or 1.Requirement: You need some easy, flexible way to adjust thenumber of 0’s and 1’s

Writing Functions I 92 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Another “noisy code” example

Note in the following

1 the author does not declare functionsRather, treats comments ahead of blocks of code as if theywere function declarations.

2 repeated use of cat() with same file argument is an exampleof cut-and-paste coding.

# #########################################

# WRITE GENERATION CODE #

# #########################################

pathGEN <− pa s t e ( d i r r o o t , p e r c l u s t , n c l u s t , s e t c o r r , sep=”\\ ”)gen <− pa s t e ( pathGEN , ”g e n e r a t e c o r r d a t a . i n p ” , sep=”\\ ”)t e s t d a t a 1 <− pa s t e ( pathGEN , ”da t a1 . d a t ” , sep=”\\ ”)

ca t ( 'MONTECARLO: \n ' , f i l e=gen )ca t ( 'NAMES ARE x y q1−q8 ; \n ' , f i l e=gen , append=T)ca t ( 'NOBSERVATIONS = 1000 ; \n ' , f i l e=gen , append=T)

Writing Functions I 93 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Another “noisy code” example ...ca t ( ' NREPS = ' , i t e r , ' ; \n ' , f i l e=gen , append=T)ca t ( ' SEED = ' , round ( r u n i f ( 1 ) * 10000000) , ' ; \n ' , f i l e=gen ,

append=T)ca t ( ' REPSAVE=ALL ; \n ' , f i l e=gen , append=T)ca t ( ' SAVE=\n ' , pathGEN , ' \\ data * . d a t ; \n ' , f i l e=gen , append=

T, sep=””)ca t ( 'MODEL POPULATION: \n ' , f i l e=gen , append=T)ca t ( ' [ q1−q8*0 x*0 y* 0 ] ; \n ' , f i l e=gen , append=T)ca t ( 'q1−q8* 1 ; x* 1 ; y* 1 ; \n ' , f i l e=gen , append=T)ca t ( 'q1−q8 wi th q1−q8* . 50 ; \n ' , f i l e=gen , append=T)ca t ( ' x wi th y* . 50 ; \n ' , f i l e=gen , append=T)ca t ( ' x wi th q1−q8* . 50 ; \n ' , f i l e=gen , append=T)ca t ( ' y wi th q1−q8* ' , . 0+( c o r r . 1 0 * ( . 10+c o r r . 2 0 * . 10+c o r r . 3 0 *

. 10+c o r r . 4 0 *+co r r . 4 0 * . 10+c o r r . 5 0 * . 10+c o r r . 6 0 * . 10+c o r r . 7 0* . 10+c o r r . 8 0 * . 10 ) ) , ' ; \n ' , f i l e=gen , append=T, sep=””)

i f ( f i l e . e x i s t s ( t e s t d a t a 1 ) ) {} e l s e {s h e l l ( pa s t e ( ”mp lu s . e x e ” , gen , pa s t e ( pathGEN , ”s a v e 1 . o u t ” , sep

=”\\ ”) , sep=” ”) )}

Writing Functions I 94 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Another “noisy code” example ...#

---------------------------------------------------------------------

#

# #########################################

# GENERATE SAS CODE #

# #########################################

pathSAS <− pa s t e ( d i r r o o t , p e r c l u s t , n c l u s t , s e t c o r r , s e t p a t t e r n ,p e r c en tm i s s , aux , auxnumber , sep=”\\ ”)

pathSASdata <− pa s t e ( d i r r o o t , p e r c l u s t , n c l u s t , s e t c o r r ,s e t p a t t e r n , p e r c en tm i s s , sep=”\\ ”)

smcarmiss <− pa s t e ( pathSAS , ”mod i f y d a t a . s a s ” , sep=”\\ ”)t e s t d a t a 2 <− pa s t e ( pathSASdata , ”da t a1 . d a t ” , sep=”\\ ”)

# ------------------------------------------------------- #

# ------------- import SIM data into SAS ---------------- #

# ------------------------------------------------------- #

i f ( f i l e . e x i s t s ( t e s t d a t a 2 ) ) {} e l s e {

ca t ( ' proc p r i n t t o \n ' , f i l e=smcarmiss , append=T)

Writing Functions I 95 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Another “noisy code” example ...ca t ( ' l o g=”R:\\ u s e r s \\ username \\ data \\simLOG2\\LOGLOG.log ” \n

' , f i l e=smcarmiss , append=T)ca t ( ' p r i n t =”R:\\ u s e r s \\ username \\ data \\simLOG2\\ LSTLST. lst ”

\n ' , f i l e=smcarmiss , append=T)ca t ( 'new ; \n ' , f i l e=smcarmiss , append=T)ca t ( ' run ; \n ' , f i l e=smcarmiss , append=T)ca t ( ' \n ' , f i l e=smcarmiss , append=T)

ca t ( '%macro importMPLUS ; \n ' , f i l e=smcarmiss , append=T)ca t ( '%do i=1 %to ' , f i l e=smcarmiss , append=T)ca t ( pa s t e ( i t e r ) , f i l e=smcarmiss , append=T)ca t ( ' ; \n ' , f i l e=smcarmiss , append=T)ca t ( ' data work .da ta&i ; \n ' , f i l e=smcarmiss , append=T)ca t ( ' i n f i l e ' , f i l e=smcarmiss , append=T)ca t ( ' ” ' , f i l e=smcarmiss , append=T)ca t ( pa s t e ( pathGEN , ”data& i . . d a t ” , sep=”\\ ”) , f i l e=smcarmiss ,

append=T)ca t ( ' ” ; \n ' , f i l e=smcarmiss , append=T)ca t ( ' INPUT x y q1 q2 q3 q4 q5 q6 q7 q8 ; /*<−−− insert

v a r i a b l e s he r e */ \n ' , f i l e=smcarmiss , append=T)ca t ( 'RUN; \n ' , f i l e=smcarmiss , append=T)ca t ( '%end ; \n ' , f i l e=smcarmiss , append=T)

Writing Functions I 96 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Another “noisy code” example ...ca t ( '%mend ; \n ' , f i l e=smcarmiss , append=T)ca t ( '%importMPLUS \n ' , f i l e=smcarmiss , append=T)ca t ( ' \n ' , f i l e=smcarmiss , append=T)

# #########################################

# Draw random sample of size N #

# #########################################

ca t ( '%macro s amp l e s i z e ; \n ' , f i l e=smcarmiss , append=T)ca t ( '%do i=1 %to ' , f i l e=smcarmiss , append=T)ca t ( pa s t e ( i t e r ) , f i l e=smcarmiss , append=T)ca t ( ' ; \n ' , f i l e=smcarmiss , append=T)ca t ( ' Proc s u r v e y s e l e c t data=work .da ta&i out=work . samp leda ta&

i method=SRS \n ' , f i l e=smcarmiss , append=T)i f ( p e r c l u s t ==50) {ca t ( ' samps i ze=50 \n ' , f i l e=smcarmiss , append=T)}e l s e i f ( p e r c l u s t ==75) {ca t ( ' samps i ze=75 \n ' , f i l e=smcarmiss , append=T)}e l s e i f ( p e r c l u s t ==100) {ca t ( ' samps i ze=100 \n ' , f i l e=smcarmiss , append=T)}

Writing Functions I 97 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Another “noisy code” example ...

e l s e i f ( p e r c l u s t ==200) {ca t ( ' samps i ze=200 \n ' , f i l e=smcarmiss , append=T)}e l s e i f ( p e r c l u s t ==400) {ca t ( ' samps i ze=400 \n ' , f i l e=smcarmiss , append=T)}e l s e i f ( p e r c l u s t ==800) {ca t ( ' samps i ze=800 \n ' , f i l e=smcarmiss , append=T)}e l s e i f ( p e r c l u s t ==1000) {ca t ( ' samps i ze=1000 \n ' , f i l e=smcarmiss , append=T)}ca t ( 'SEED = ' , round ( r u n i f ( 1 ) * 10000000) , ' ; \n ' , f i l e=

smcarmiss , append=T)ca t ( 'RUN; \n ' , f i l e=smcarmiss , append=T)ca t ( '%end ; \n ' , f i l e=smcarmiss , append=T)ca t ( '%mend ; \n ' , f i l e=smcarmiss , append=T)ca t ( '%samp l e s i z e \n ' , f i l e=smcarmiss , append=T)ca t ( ' \n ' , f i l e=smcarmiss , append=T)

Writing Functions I 98 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

This could be much better

Weird indentation

Use ”/”, not ”backslash”, even on Windows

Use vectors

Prolific copying and pasting of ”cat” lines.

Avoid system except where truly necessary. R has OS neutralfunctions to create directorys and such.

Writing Functions I 99 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

I found the prototype for that code in a previous project

gen <− pa s t e ( path , ”g e n e r a t e . i n p ” , sep=”\\ ”)

ca t ( 'MONTECARLO: \n ' , f i l e=gen )ca t ( 'NAMES ARE y1−y6 ; \n ' , f i l e=gen , append=T)ca t ( ' NOBSERVATIONS = ' , p e r c l u s t * nc l u s t , ' ; \n ' , f i l e=gen ,

append=T)i f ( p e r c l u s t != 7 . 5 ) {ca t ( ' NCSIZES = 1 ; \n ' , f i l e=gen , append=T)ca t ( ' CSIZES = ' , n c l u s t , ' ( ' , p e r c l u s t , ' ) ; \n ' , f i l e=gen ,

append=T)}i f ( p e r c l u s t ==7 . 5 ) {ca t ( ' NCSIZES = 2 ; \n ' , f i l e=gen , append=T)ca t ( ' CSIZES = ' , n c l u s t / 2 , ' (7 ) ' , n c l u s t / 2 , ' (8 ) ; \n ' ,

f i l e=gen , append=T)}

# user-specified iterations

ca t ( ' NREPS = ' , i t e r , ' ; \n ' , f i l e=gen , append=T)ca t ( ' SEED = 791305; \n ' , f i l e=gen , append=T)

Writing Functions I 100 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

I found the prototype for that code in a previous project ...ca t ( ' REPSAVE=ALL ; \n ' , f i l e=gen , append=T)ca t ( ' SAVE=\n ' , path , ' \\ data * . d a t ; \n ' , f i l e=gen , append=T,

sep=””)ca t ( ' ANALYSIS : TYPE = TWOLEVEL; \n ' , f i l e=gen , append=T)

ca t ( 'MODEL POPULATION: \n ' , f i l e=gen , append=T)

ca t ( '%WITHIN% \n ' , f i l e=gen , append=T)# baseline loadings are all .3 , add .4 if 'within ' = 1, but

change based on mod/strongca t ( 'FW BY y1−y2* ' , . 3+(w i t h i n * ( . 4+s t r ong * . 1 ) ) , ' \n ' , f i l e=

gen , append=T, sep=””)ca t ( ' y3−y4* ' , . 3+(w i t h i n * ( .4−mod* . 3 ) ) , ' \n ' , f i l e=gen ,

append=T, sep=””)ca t ( ' y5−y6* ' , . 3+(w i t h i n * ( .4− s t rong * . 1 ) ) , ' ; \n ' , f i l e=gen ,

append=T, sep=””)ca t ( 'FW@1; \n ' , f i l e=gen , append=T)# residuals are just 1-loading∧2

ca t ( ' y1−y2* ' , 1−( . 3+(w i t h i n * ( . 4+s t r ong * . 1 ) ) )∧2 , ' ; \n ' , f i l e=gen , append=T, sep=””)

ca t ( ' y3−y4* ' , 1−( . 3+(w i t h i n * ( .4−mod* . 3 ) ) )∧2 , ' ; \n ' , f i l e=gen , append=T, sep=””)

Writing Functions I 101 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

I found the prototype for that code in a previous project ...

ca t ( ' y5−y6* ' , 1−( . 3+(w i t h i n * ( .4− s t rong * . 1 ) ) )∧2 , ' ; \n ' , f i l e=gen , append=T, sep=””)

ca t ( ' \n ' , f i l e=gen , append=T, sep=””)ca t ( '%BETWEEN% \n ' , f i l e=gen , append=T)

# the ICC bit multiplies by .053 when ICC is low , by 1 when

ICC is high

ca t ( 'FB BY y1−y2* ' , s q r t ( ( . 3+(between* ( . 4+s t r ong * . 1 ) ) )∧2*(1+( i c c * ( .053−1 ) ) ) ) , ' \n ' , f i l e=gen , append=T, sep=””)

ca t ( ' y3−y4* ' , s q r t ( ( . 3+(between* ( .4−mod* . 3 ) ) )∧2 * (1+( i c c * (.053−1 ) ) ) ) , ' \n ' , f i l e=gen , append=T, sep=””)

ca t ( ' y5−y6* ' , s q r t ( ( . 3+(between* ( .4− s t rong * . 1 ) ) )∧2 * (1+( i c c *

( .053−1 ) ) ) ) , ' ; \n ' , f i l e=gen , append=T, sep=””)ca t ( 'FB@1 ; \n ' , f i l e=gen , append=T)

#residuals are total variance (1 or .053 , depending on ICC)

loading∧2

ca t ( ' y1−y2* ' , 1+( i c c * ( .053−1 ) ) −sqrt ( ( . 3+(between* ( . 4+s t r ong *

. 1 ) ) )∧2* (1+( i c c * ( .053−1 ) ) ) )∧2 , ' ; \n ' , f i l e=gen , append=T, sep=””)

Writing Functions I 102 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

I found the prototype for that code in a previous project ...

ca t ( ' y3−y4* ' , 1+( i c c * ( .053−1 ) ) −sqrt ( ( . 3+(between* ( .4−mod* . 3 )) )∧2 * (1+( i c c * ( .053−1 ) ) ) )∧2 , ' ; \n ' , f i l e=gen , append=T,sep=””)

ca t ( ' y5−y6* ' , 1+( i c c * ( .053−1 ) ) −sqrt ( ( . 3+(between* ( .4− s t rong *

. 1 ) ) )∧2 * (1+( i c c * ( .053−1 ) ) ) )∧2 , ' ; \n ' , f i l e=gen , append=T, sep=””)

ca t ( ' \n ' , f i l e=gen , append=T, sep=””)

#run the above syntax using Mplus

#shell (paste ("cd", mplus , sep =" "))

s h e l l ( pa s t e ( ”mp lu s . e x e ” , gen , pa s t e ( path , ”s a v e . o u t ” , sep=”\\”) , sep=” ”) )

Writing Functions I 103 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Here was my Suggestion

Create separate functions to do separate parts of the work.Avoid so much “cut and paste” coding. The cat function caninclude MANY separate quoted strings or values, there is no reasonto write separate cat statements for each line.Here was my suggestion for part of the revision

##Create one MPlus Input file corresponding to following

parameters.

c r e a t e I n p F i l e <− f u n c t i o n ( path=”apath ” , gen=”a f i l e n am e . i n p ”, p e r c l u s t =2, n c l u s t =100 , i t e r =1000 , mod=1, s t r ong =1,between=1, w i t h i n =1){ca t ( ”MONTECARLO:

NAMES ARE y1−y6 ;NOBSERVATIONS = ” , p e r c l u s t * nc l u s t , ” ; \n ” ,

i f e l s e ( p e r c l u s t != 7 . 5 ,pa s t e ( ”NCSIZES = 1 ; \n CSIZES =” , n c l u s t ,

”( ” , p e r c l u s t , ”) ;\ n ”) ,pa s t e ( ” NCSIZES = 2 ; \n CSIZES = ” ,

n c l u s t / 2 , ” (7 ) ” , n c l u s t /2 , ” (8 ) ; \n ”) ) , f i l e=gen , append=T,

Writing Functions I 104 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Here was my Suggestion ...sep=””)

## user-specified iterations

ca t ( ”NREPS = ” , i t e r , ” ;SEED = 791305;REPSAVE=ALL ;SAVE=” , path , ”\\ data * . d a t ;ANALYSIS : TYPE = TWOLEVEL;MODEL POPULATION:%WITHIN% \n ” , f i l e=gen , append=T, sep=””)

## baseline loadings are all .3 , add .4 if "within" =

1, but change based on mod/strong

ca t ( ”FW BY y1−y2* ” , . 3+(w i t h i n * ( . 4+s t r ong * . 1 ) ) ,”y3−y4* ” , . 3+(w i t h i n * ( .4−mod* . 3 ) ) ,”y5−y6* ” , . 3+(w i t h i n * ( .4− s t rong * . 1 ) ) ,”FW@1; \n ” ,”y1−y2* ” , 1−( . 3+(w i t h i n * ( . 4+s t r ong * . 1 ) ) )∧2 , ” ; \n ” ,”y3−y4* ” , 1−( . 3+(w i t h i n * ( .4−mod* . 3 ) ) )∧2 , ” ; \n ” ,”y5−y6* ” , 1−( . 3+(w i t h i n * ( .4− s t rong * . 1 ) ) )∧2 , ” ; \n ” ,” %BETWEEN% ” ,

Writing Functions I 105 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Here was my Suggestion ...

”FB BY y1−y2* ” , s q r t ( ( . 3+(between* ( . 4+s t r ong * . 1 ) ) )∧

2* (1+( i c c * ( .053−1 ) ) ) ) ,”y3−y4* ” , s q r t ( ( . 3+(between* ( .4−mod* . 3 ) ) )∧2 * (1+(

i c c * ( .053−1 ) ) ) ) ,”y5−y6* ” , s q r t ( ( . 3+(between* ( .4− s t rong * . 1 ) ) )∧2 *

(1+( i c c * ( .053−1 ) ) ) ) , ” ; FB@1 ; \n ” ,”y1−y2* ” , 1+( i c c * ( .053−1 ) ) −sqrt ( ( . 3+(between* ( . 4+

s t r ong * . 1 ) ) )∧2* (1+( i c c * ( .053−1 ) ) ) )∧2 , ” ; ” ,”y3−y4* ” , 1+( i c c * ( .053−1 ) ) −sqrt ( ( . 3+(between* (

.4−mod* . 3 ) ) )∧2 * (1+( i c c * ( .053−1 ) ) ) )∧2 , ” ; ” ,”y5−y6* ” , 1+( i c c * ( .053−1 ) ) −sqrt ( ( . 3+(between* (

.4− s t rong * . 1 ) ) )∧2 * (1+( i c c * ( .053−1 ) ) ) )∧2 , ” ; ” ,”\n ” , f i l e=gen , append=T, sep=””)

}

Writing Functions I 106 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Critique that!

Benefit of re-write is isolation of code writing into a separatefunction

We need to work on cleaning up use of “cat” to write files.

Writing Functions I 107 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Outline

1 Introduction

2 Write Functions!

3 Example: Calculate Entropy

4 Arguments and Returns

5 R Style

6 Writing Better R Code

7 Object Oriented Programming

Writing Functions I 108 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Object Oriented Programming

A re-conceptualization of programming to reduce programmererror

OO includes a broad set of ideas, only a few of which aredirectly applicable to R programming

The “rise” to pre-eminance of OO indicated by the

introduction of object frameworks in existing languages (C++,Objective-C)growth of wholly new object-oriented languages (Java)

Writing Functions I 109 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Decipher R OO by Intuition

Run the command

methods ( p r i n t )

What do you see? I see 170 lines, like so:

[ 1 ] p r i n t . a c f *[ 2 ] p r i n t . a n o v a[ 3 ] p r i n t . a o v *

[ 4 ] p r i n t . a o v l i s t *. . .[ 1 6 7 ] p r i n t . w a r n i n g s[ 1 6 8 ] p r i n t . x g e t t e x t *[ 1 6 9 ] p r i n t . x n g e t t e x t *[ 1 7 0 ] p r i n t . x t a b s *

Non−v i s i b l e f u n c t i o n s a r e a s t e r i s k e d

Writing Functions I 110 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

170 print.??? Methods.

Yes: there are really 170 print “methods”

No: the R team does not expect or want you to know all ofthem. Users just run

p r i n t ( x )

Try not to worry about “how” the printing is achieved.

Yes: R team wants package writers to create specialized printmethods to control presentation of their output for the objectthey create.

Writing Functions I 111 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

The R Runtime System Handles the Details

The user runs

p r i n t ( x )

The R runtime system1 notices that x is of a certain type, say “classOfx”2 and then the runtime system uses print.classOfX to handle the

user’s request

Writing Functions I 112 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

print is a “Generic Function”

Definition of “Generic Function”: the function that users call whichcauses an object-specific function to be called for the work to getdone. Examples:“print”, “plot”, “summary”, “anova”

Generic Function is terminology unique to R(AFAIK)

In the standard case, a generic function does not do any work.It sends the work to the appropriate “implementation” in amethod.

“A standard generic function does nocomputation other than dispatching a method, but Rgeneric functions can do other coumputations as wellbefore and/or after method dispatch”(Chambers,Software for Data Analysis, p. 398)

Writing Functions I 113 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

print is a “Generic Function” ...

UseMethod() is the function that declares a function asgeneric: The R runtime system is warned to “be alert” tousage of the function.

Example: the print generic function from R source (basepackage).

p r i n t <− f u n c t i o n ( x , . . . ) UseMethod ( ”p r i n t ”)

Example: the plot generic function from R source (graphicspackage).

p l o t <− f u n c t i o n ( x , y , . . . ) UseMethod ( ”p l o t ”)

Writing Functions I 114 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Here’s Where R Gains its Analytical Power

The generic is just a place holder. User runs print(x), then Rknows it is supposed to ask x for its class and then theappropriate thing is supposed to happen. No Big Deal.

But the statisticians in the S & R projects saw enormoussimplifying potential in developing a battery is standardgeneric accessor functions

summary()anova()predict()plot()aic()

Writing Functions I 115 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Object

Object: self-contained “thing”. A container for data.

Operationally, in R: just about anything on the left hand sidein an assignment “<-”

Each “thing” in R carries with it enough information so thatgeneric functions “know what to do.”.

If there is no function specific to an object, the work is sent toa default method (see print.default).

Writing Functions I 116 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Class

Definition: As far as R with S3 is concerned, class is acharacteristic label assigned to an object (an object can have avector of classes, such as c(“lm”, “glm”)).

The class information is used by R do decide which methodshould be used to fulfill the request.

Run class(x), ask x what class it inherits from.

In R, the principal importance of the “class” of an object isthat it is used to decide which function should be used tocarry out the required work when a generic function is used.

Classes called “numeric”, “integer”, “character” are all vectorclasses

Writing Functions I 117 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Class ...

> y <− c (1 , 10 , 23)> c l a s s ( y )[ 1 ] ”numer ic ”> x <− c ( ”a ” , ”b ” , ”c ”)> x[ 1 ] ”a ” ”b ” ”c ”> c l a s s ( x )[ 1 ] ”c h a r a c t e r ”> x <− f a c t o r ( x )> c l a s s ( x )[ 1 ] ” f a c t o r ”> m1 <− lm ( y ∼ x )> c l a s s (m1)[ 1 ] ”lm ”> m2 <− glm ( y ∼ x , f am i l y=Gamma)> c l a s s (m2)[ 1 ] ”glm ” ”lm ”

Writing Functions I 118 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Method, a.k.a, “Method Function”

Definition: The “implementation”: the function that does the workfor an object of a particular type.

When the user runs print(m1), and m1 is from class “lm”, thework is sent to a method print.lm()

Methods are always named in the format “generic.class”, suchas “print.default”, “print.lm”, etc.

Note: Most methods do not “double-check” whether theobject they are given is from the proper class. They count inR’s runtime system to check and then call print.whatever forobejcts of type whatever

That’s why many methods are“hidden” (can only access via :::notation)

Accessing methods directly

Writing Functions I 119 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Method, a.k.a, “Method Function” ...

If a method is “exported”, can be called directly via“package::method.class()” format

If a package is “attached” to the search path, then“method.class()” will suffice, but is not as clear

If a method is NOT exported, then user can reach into thepackage and grab it by running “package:::method.class()”

Writing Functions I 120 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Detour: attributes() Function and Confusing Output

The class is stored as an attribute in many object types. Runattributes()

> a t t r i b u t e s ( x )$ l e v e l s[ 1 ] ”a ” ”b ” ”c ”

$ c l a s s[ 1 ] ” f a c t o r ”

> a t t r i b u t e s (m1)$names[ 1 ] ” c o e f f i c i e n t s ” ” r e s i d u a l s ” ” e f f e c t s ” ”rank ”[ 5 ] ” f i t t e d . v a l u e s ” ”a s s i g n ” ”qr ” ”

d f . r e s i d u a l ”[ 9 ] ”c o n t r a s t s ” ” x l e v e l s ” ” c a l l ” ”terms ”

[ 1 3 ] ”model ”

$ c l a s s[ 1 ] ”lm ”

> a t t r i b u t e s ( y )

Writing Functions I 121 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Detour: attributes() Function and Confusing Output ...

NULL> i s . o b j e c t ( y )[ 1 ] FALSE

puzzle: why has y no attribute? Why is it not an object?

Honestly, I’m baffled, I thought ”everything in R is an object.”(And I still do.)

If the object does not have a class attribute, it has animplicit class, “matrix”, “array” or the result of “mode(x)”(except that integer vectors have the implicit class“integer”). (from ?class in R-2.15.1)

Writing Functions I 122 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How Objects get “into” Classes

In older S3 terminology, user is allowed to simply claim that xis from one or more classes

c l a s s ( x ) <− c (`` lm ' ' , ``glm ' ' , c l a s s ( x ) )

That would say x’s class includes “lm” and “glm” as newclasses, and also would keep x’s old classes as well.

The class is an attribute, can be set thusly

a t t r ( x , `` c l a s s ' ' ) <− c (`` lm ' ' , ``whateve r ISay ' ' )

When a generic method “run” is called with x, the R runtimewill first try to use run.lm. If run.lm is not found, thenrun.whateverISay will be tried, and if that fails, it falls back torun.default.

Writing Functions I 123 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How Objects get “into” Classes: S4

S4 has more structure, makes classes & methods work morelike truly object oriented programs.

S4 classes are defined with a list of variables BEFORE objectsare created.

Variables are typed!

Example imitates Matloff, p. 223

s e t C l a s s ( ”p j f r i e n d ” , r e p r e s e n t a t i o n (name=”c h a r a c t e r ” ,gender=”f a c t o r ” ,food=”f a c t o r ” ,age=” i n t e g e r ”) )

Create an instance of class pjfriend (Note: to declare aninteger, add letter “L” to end of number).

Writing Functions I 124 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How Objects get “into” Classes: S4 ...

w i l l i am <− new( ”p j f r i e n d ” , name = ”w i l l i am ” , gender =f a c t o r ( ”male ”) , food=f a c t o r ( ”p i z z a ”) , age=33L)

w i l l i am

An ob j e c t o f c l a s s ”p j f r i e n d ”S l o t ”name ”:[ 1 ] ”w i l l i am ”

S l o t ”gender ” :[ 1 ] maleL e v e l s : male

S l o t ”food ” :[ 1 ] p i z z aL e v e l s : p i z z a

S l o t ”age ” :[ 1 ] 33

Writing Functions I 125 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How Objects get “into” Classes: S4 ...j a n e <− new( ”p j f r i e n d ” , name=”pumpkin ” , gender =

f a c t o r ( ”f ema l e ”) , food=f a c t o r ( ”hamburger ”) , age=21L)

j an e

An ob j e c t o f c l a s s ”p j f r i e n d ”S l o t ”name ”:[ 1 ] ”pumpkin ”

S l o t ”gender ” :[ 1 ] f ema l eL e v e l s : f ema l e

S l o t ”food ” :[ 1 ] hamburgerL e v e l s : hamburger

S l o t ”age ” :[ 1 ] 21

jane and william are instances of class “pjfriend”

Writing Functions I 126 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

How Objects get “into” Classes: S4 ...

The variables inside an S4 object are called “slots” in R

“slot” would be called “instance variable” in mostOO-languages)

values in slots can be retrieved with symbol @, not $

Writing Functions I 127 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Implement an S4 method

Step 1. Write a function that can receive a function of type“pjfriend” and do something with it.

Step 2. Use setMethod to tell the R system that the functionimplements the method that is called for.

setMethod “wraps” a function.

setMethod ( ”some−generic− function−name ” , ”p j f r i e n d ” ,f u n c t i o n ( x ) {

#do something with x

})

Writing Functions I 128 / 129 University of Kansas

Introduction Write Functions! Example: Calculate Entropy Arguments and Returns R Style Writing Better R Code Object Oriented Programming

Difficult to Account For Changes between S3 and S4

I think it is difficult to explain some of the notational andterminological changes between S3 and S4.

If you type an S4 object’s name on the command line

> x

the R runtime looks for a method “show.class” (where class isthe class of x).

Why change from “print” to “show”? (IDK)

Why change the “accessor” symbol from $ to @ ?

Why call things accessed with @ “slots” rather than instancevariables?

Writing Functions I 129 / 129 University of Kansas