reproducible research using stata › meeting › 4nasug › schumm_nasug...managing statistical...

34
Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Reproducible Research Using Stata L. Philip Schumm Ronald A. Thisted Department of Health Studies University of Chicago July 11, 2005

Upload: others

Post on 08-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Reproducible Research Using Stata

L. Philip Schumm Ronald A. Thisted

Department of Health StudiesUniversity of Chicago

July 11, 2005

Page 2: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Common practice

do-file paper/report

data analysis writing

cut & pastere-enter by hand

I Inefficient and time-consuming

I Can lead to non-reproducible results

Page 3: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Common practice

do-file paper/report

data analysis writing

cut & pastere-enter by hand

I Inefficient and time-consuming

I Can lead to non-reproducible results

Page 4: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Common practice

do-file paper/report

data analysis writing

cut & pastere-enter by hand

I Inefficient and time-consuming

I Can lead to non-reproducible results

Page 5: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

A big improvement: Intermediary files

do-file paper/report

data analysis writing

graphs & tables

individualresults

I Not all results automatically transferred

I Can be difficult to manage

I Data analysis and writing still asynchronous

Page 6: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

A big improvement: Intermediary files

do-file paper/report

data analysis writing

graphs & tables

individualresults

I Not all results automatically transferred

I Can be difficult to manage

I Data analysis and writing still asynchronous

Page 7: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

A big improvement: Intermediary files

do-file paper/report

data analysis writing

graphs & tables

individualresults

I Not all results automatically transferred

I Can be difficult to manage

I Data analysis and writing still asynchronous

Page 8: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

A big improvement: Intermediary files

do-file paper/report

data analysis writing

graphs & tables

individualresults

I Not all results automatically transferred

I Can be difficult to manage

I Data analysis and writing still asynchronous

Page 9: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunksI Literate programming (Knuth, 1992)

I tanglingI weaving

I R package called Sweave (Leisch, 2002)

Page 10: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunks

I Literate programming (Knuth, 1992)I tanglingI weaving

I R package called Sweave (Leisch, 2002)

Page 11: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunksI Literate programming (Knuth, 1992)

I tanglingI weaving

I R package called Sweave (Leisch, 2002)

Page 12: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Reproducible research

What is reproducible research?

I Emerging literature (e.g., Buckheit and Donoho, 1995;Gentleman and Lang, 2003)

I Dynamic document composed of code chunks and text chunksI Literate programming (Knuth, 1992)

I tanglingI weaving

I R package called Sweave (Leisch, 2002)

Page 13: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Dynamic do-files

A “dynamic” do-file

do-file

data analysis

stata2doc.py

writing

Page 14: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Dynamic do-files

Comments, commands, and docstrings

// Here is an example dynamic do-file.

* here is the docstring for the first commandsysuse auto

* weightsq equals weight squaredgen weightsq=weight^2

reg mpg weight weightsq foreign

/* As you can see, commands don’t have to havedocstrings. */

Page 15: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

-stata2doc- and -s2d-

Two stata commands: stata2doc and s2d

do-file

data analysis

stata2doc.py

writing

stata2doc.ado

log filegraphsscalars

Page 16: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

-stata2doc- and -s2d-

Syntax

stata2doc using do-file, [dirname(dirname) linesize(#) as(type)replace override options]

s2d [exp list, nodisplay table noisily warn name(name)] :

Page 17: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

-stata2doc- and -s2d-

Examples of -s2d- usage

. s2d w2coef=_b[weightsq] rsq=e(r2): reg mpg weight weightsq foreign<output omitted>

. scalar lis2d_rsq = .69129599

s2d_w2coef = 1.591e-06

. s2d two = (1 + 1), noi

s2d_two = 2

Page 18: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Final output

Putting it all together

do-file

data analysis

stata2rst.pyreST

document

writing

stata2doc.ado

log filegraphsscalars

Page 19: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,HTML, LATEX, PDF, Open Office)

(see http://docutils.sourceforge.net for more information)

Page 20: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,HTML, LATEX, PDF, Open Office)

(see http://docutils.sourceforge.net for more information)

Page 21: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,HTML, LATEX, PDF, Open Office)

(see http://docutils.sourceforge.net for more information)

Page 22: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

What is reStructuredText?

I A plaintext markup syntax and parser system

I Intuitive, readable, and easy-to-use

I Powerful and extensible

I via Docutils may be translated into a variety of formats (e.g.,HTML, LATEX, PDF, Open Office)

(see http://docutils.sourceforge.net for more information)

Page 23: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Simple command

Simple command: do-file

/*----------------A Simple Example----------------

This is a *very* simple example in which I shall demonstrate the following:

1) a simple command2) graphs3) substitution4) tables

The Venerable Auto Data-----------------------

Let’s start by reading them in:*/

sysuse auto

Page 24: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Simple command

Simple command: reStructuredText

----------------A Simple Example----------------

This is a *very* simple example in which I shall demonstrate the following:

1) a simple command2) graphs3) substitution4) tables

The Venerable Auto Data-----------------------

Let’s start by reading them in:

::

. sysuse auto(1978 Automobile Data)

Page 25: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Simple command

Simple command: PDF via LATEX

A Simple Example

This is a very simple example in which I shall demonstrate the following:

1) a simple command

2) graphs

3) substitution

4) tables

The Venerable Auto Data

Let’s start by reading them in:

. sysuse auto(1978 Automobile Data)

1

Page 26: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Graphs

Graph: do-file

* Now lets look at a boxplot comparing mpg between* domestic and foreign.

* Boxplot comparing domestic and foreign.gr box mpg, over(foreign) name(fig1)

Page 27: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Graphs

Graph: reStructuredText

Now lets look at a boxplot comparing mpg betweendomestic and foreign.

.. gr box mpg, over(foreign) name(fig1)

.. figure:: fig1.pdf:scale: 33

Boxplot comparing domestic and foreign.

Page 28: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Graphs

Graph: PDF via LATEX

Now lets look at a boxplot comparing mpg between domestic and foreign.

10

20

30

40

Mile

age (

mpg)

Domestic Foreign

Figure 1: Boxplot comparing domestic and foreign.

1

Page 29: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Substitutions

Substitution: do-file

* Using a t-test to compare mpg between foreign and domestic* cars yields a p-value of |s2d_ttp|.

s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign)

Page 30: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Substitutions

Substitution: reStructuredText

Using a t-test to compare mpg between foreign and domesticcars yields a p-value of |s2d_ttp|.

.. s2d ttp=(string(r(p),"%05.4f")): ttest mpg, by(foreign)

.. |s2d_ttp| replace:: 0.0005

Page 31: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Substitutions

Substitution: PDF via LATEX

Using a t-test to compare mpg between foreign and domestic cars yields a p-value of 0.0005.

1

Page 32: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Tables

Table: do-file

* Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘,* and ‘‘foreign‘‘.

* Regression of mpg on several covariates.s2d, t: reg mpg weight weightsq foreign

Page 33: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Tables

Table: reStructuredText

Finally, we’ll try regressing ‘‘mpg‘‘ on ‘‘weight‘‘, ‘‘weightsq‘‘,and ‘‘foreign‘‘.

.. s2d, t: reg mpg weight weightsq foreign

.. stata-table:: Regression of mpg on several covariates.

------------------------------------------------------------------------------mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------weight | -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567

weightsq | 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06foreign | -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002

_cons | 56.53884 6.197383 9.12 0.000 44.17855 68.89913------------------------------------------------------------------------------

Page 34: Reproducible Research Using Stata › meeting › 4nasug › Schumm_NASUG...Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples Common practice

Managing Statistical Output Reproducible Research Using Stata reStructuredText Examples

Tables

Table: PDF via LATEX

Finally, we’ll try regressing mpg on weight, weightsq, and foreign.

Table 1: Regression of mpg on several covariates.

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

weight -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567weightsq 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06foreign -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002cons 56.53884 6.197383 9.12 0.000 44.17855 68.89913

1