hrp 223 - 2008

47
HRP223 2008 Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law. HRP 223 - 2008 Topic 4 – Making and Looking at Data

Upload: patia

Post on 22-Mar-2016

34 views

Category:

Documents


1 download

DESCRIPTION

HRP 223 - 2008. Topic 4 – Making and Looking at Data. Toy Data. While it is of little use in real life, SAS lets you manually enter data. First make a library so the data will be permanently stored. Toy Data. Tell it to make the dataset:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HRP 223 - 2008

HRP223 2008

Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

HRP 223 - 2008

Topic 4 – Making and Looking at Data

Page 2: HRP 223 - 2008

HRP223 2008

Toy Data

While it is of little use in real life, SAS lets you manually enter data.

First make a library so the data will be permanently stored.

Page 3: HRP 223 - 2008

HRP223 2008

Toy Data

Tell it to make the dataset:

Page 4: HRP 223 - 2008

HRP223 2008

Page 5: HRP 223 - 2008

HRP223 2008

If you type a number as the first value of a character variable, EG converts the column to numeric. Right click on the column headings to change them back if this inadvertently happens.

Page 6: HRP 223 - 2008

HRP223 2008

Professional programmers equate 0 with “no” and 1 means “yes” but create a format to make reports pretty.

Page 7: HRP 223 - 2008

HRP223 2008

Open the data, then set it to be not read only by unchecking the option.

Page 8: HRP 223 - 2008

HRP223 2008

When you come back…

If you return to the project it will have forgotten about the formats you applied.

Add a one line program to tell it what libraries (folders) have formats stored in them.

This little program shows the details on formats in a library.

Page 9: HRP 223 - 2008

HRP223 2008These 4 records really represent 300 people. So if you were to do a frequency count on the cancer name variable, you would get the wrong count.

Notice that it uses the labels.

Page 10: HRP 223 - 2008

HRP223 2008

If you find the label “The FREQ Procedure” annoying, turn it off in the options Tasks > Tasks General pane.

This is the same as the code: ods noproctitle;

You can also set or remove default titles and footnotes here.

Page 11: HRP 223 - 2008

HRP223 2008

Fix the title also:

Page 12: HRP 223 - 2008

HRP223 2008

Setting the Order

There are options to set the order that the results print. If the options don’t work, make a format.

Page 13: HRP 223 - 2008

HRP223 2008

Ordering the Information When data is sorted in format order, the first

“letter” of the alphabet is blank. So put a leading space in the format for the things you want listed first.

I added a leading blank before the Y

Page 14: HRP 223 - 2008

HRP223 2008One format is numeric.

The other format is character.

Page 15: HRP 223 - 2008

HRP223 2008

Two Categorical Variables

You can do similar voodoo with two categorical variables:

… no idea why Frequency count shows first on the task roles.

Page 16: HRP 223 - 2008

HRP223 2008

Specify What is a Row vs. a Column

Drag your outcome variable over first. Drag the exposure variable over second.

First

Page 17: HRP 223 - 2008

HRP223 2008

The character variable lists No before Yes.

Page 18: HRP 223 - 2008

HRP223 2008Notice the leading space before the Y.

You could go back and manually change the format by clicking on the column heading in the data set but I recommend just applying it in the analysis.

This will replace values in a character variable so this a character format.

Page 19: HRP 223 - 2008

HRP223 2008

Be aware that all the common statistics are here so you do not need to learn the code.

Use the Preview code button to see if you have the right options set.

Page 20: HRP 223 - 2008

HRP223 2008

Summarizing Numeric Data

Begin with a graphic.– Remember that you want to show both central

tendency and variability.– You have already briefly seen the Summary

Statistics and Distribution Analysis menu options (aka proc means and proc univariate).

I want you to know how to summarize large and small datasets.

Page 21: HRP 223 - 2008

HRP223 2008

Numeric Data Say somebody tells you to simulate rolling dice. The formula to do this says:

– generate a random number between 0 and 1– multiply it by 6 – round up to the closest integer

data die;*the 22 says which list of numbers between 0 & 1;aNumber = ranuni(22);die = ceil(6*aNumber);* Generate a random integer between 1 and 6.;dieDie = ceil(6*ranuni(78687632));output; * write to the new dataset;return; * go to the top and try to read in data;

run;

Page 22: HRP 223 - 2008

HRP223 2008

Doing Stuff Repeatedly How to roll two dice:data dice;do x = 1 to 2 by 1;roll= ceil(6*ranuni(78687632));output;end;return; * go to the top and try to read in data;

run;

Page 23: HRP 223 - 2008

HRP223 2008

Craps… In the dice game “craps” you throw two dice and the number you roll

determines if you win or lose. How do you simulate rolling 10 pairs of dice?

data craps ;do trial = 1 to 10;do dieNumber = 1 to 2;roll = ceil(6*ranuni(78687632));output;end;end;return;

run;

Page 24: HRP 223 - 2008

HRP223 2008

The Total

Calculate the sum across the rolls using Summary Statistics on the Describe menu.

Page 25: HRP 223 - 2008

HRP223 2008

Total on a Trial

Page 26: HRP 223 - 2008

HRP223 2008

Do the histogram on the summary data.

Page 27: HRP 223 - 2008

HRP223 2008

Crank up the number of simulations. Turn off the histograms for each trial.Generate a histogram based on the 1000 trials.

I want to fix the way the histogram is binned.

When the code is open, push any key and it will make a copy of the code which you can edit.

Page 28: HRP 223 - 2008

HRP223 2008

Page 29: HRP 223 - 2008

HRP223 2008

Page 30: HRP 223 - 2008

HRP223 2008

Page 31: HRP 223 - 2008

HRP223 2008

Page 32: HRP 223 - 2008

HRP223 2008

Do Loops

Loops are used whenever you need to repeatedly do something. Say you wanted to read in 24 lines of data, where the first 6 records are from 1 treatment, the next 6 are from a 2nd, etc.

Page 33: HRP 223 - 2008

HRP223 2008

More Condensed

The group could be a counter that goes from 1 to 4.

Page 34: HRP 223 - 2008

HRP223 2008

How to Summarize

You can get a boxplot or a histogram with only 6 values but they will not be very informative.

Page 35: HRP 223 - 2008

HRP223 2008

Page 36: HRP 223 - 2008

HRP223 2008

Page 37: HRP 223 - 2008

HRP223 2008

Only a Few

If you only have a few data points, you should consider a mean and dot plot. SAS doesn’t have one built in so I made a macro to do it.

Macros are self contained blocks of code that do complex things. – A good Macro is like a function. You pass it a few

arguments and it returns an answer. You don’t need to look at how its guts work.

Page 38: HRP 223 - 2008

HRP223 2008

The plotit Macro

You paste in the macro beginning with the macro line and ending in the mend line.

Then you invoke the macro using the name following the %macro statement:

Page 39: HRP 223 - 2008

HRP223 2008

Page 40: HRP 223 - 2008

HRP223 2008

Macro Stuff

Macros can do simple formulas like calculating an age.

Or really ugly stuff like validating dates.

Page 41: HRP 223 - 2008

HRP223 2008

Page 42: HRP 223 - 2008

HRP223 2008

Function Help

The books for the class have lists of frequently used functions but you probably want to bookmark the function help in EG as well as using onlineDoc.

Page 43: HRP 223 - 2008

HRP223 2008

Add it to the favorite page.

Highlight a word in the right windowpane and then type control-f to find words.

Page 44: HRP 223 - 2008

HRP223 2008

Dummy records in the HW

Recall there was a dummy record at the beginning of the Homework datasets. Why?– Columns of data in Excel are allowed to take

arbitrary widths. So if you have a “last-name” column it will import into a database as having the width of the longest name.

– If you import a second dataset and it has a different length and you try to append them together a database will choke.

– You can use a dummy record to make sure the columns have the same length.

Page 45: HRP 223 - 2008

HRP223 2008

Combine two datasets

Page 46: HRP 223 - 2008

HRP223 2008

Page 47: HRP 223 - 2008

HRP223 2008