jmp session lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/jmp session labs 1,2,3.docx  · web...

14
Fall 2012 JMP Session Lab 1 (assn 2) Objectives: Open a data set Assign the appropriate data types (continuous, ordinal, categorical) to a column of data Get descriptive statistics (number of measurements/subjects (n), mean, standard deviation (SD)) Analyze a histogram Compare matched or unmatched data Stratify data by group Save the graphs and analysis to word 1. Download the dataset fat.xls from the website: http://gornbein.bol.ucla.edu/cedarassign.htm In this data set, “TFAT” is total dietary fat in grams and “PFAT” is the percent of body weight that is composed of fat. There are two times, time 0 baseline and 36 week follow up. There are two treatment groups. Group 1 was given a dietary education and intervention and Group 2 is a control group. 2. Although JMP Data Tables (ie, spreadsheets) have the file extension .jmp, JMP will also open several other file types, including xls. Use "Open Data Table" or File>Open or Ctl>O to open fat.xls. 3. The Analyze menu contains four main analysis categories: Distribution- Creates a histogram Fit Y by X- Creates a scatter plot Matched Pairs- Compares the same group measured twice (for example, before and after) Fit Model- Performs a multivariate analysis of the effect of multiple variables on an outcome Distribution 4. Select Distribution to create a histogram of TFAT0. Say you wanted to look at group 1 and 2 separately (this is called stratification). To do this, add group to the box labeled By.

Upload: trannga

Post on 30-Jan-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

JMP Session Lab 1 (assn 2)

Objectives: Open a data set Assign the appropriate data types (continuous, ordinal, categorical) to a column of data Get descriptive statistics (number of measurements/subjects (n), mean, standard deviation (SD)) Analyze a histogram Compare matched or unmatched data Stratify data by group Save the graphs and analysis to word

1. Download the dataset fat.xls from the website: http://gornbein.bol.ucla.edu/cedarassign.htmIn this data set, “TFAT” is total dietary fat in grams and “PFAT” is the percent of body weight that is composed of fat. There are two times, time 0 baseline and 36 week follow up. There are two treatment groups. Group 1 was given a dietary education and intervention and Group 2 is a control group.

2. Although JMP Data Tables (ie, spreadsheets) have the file extension .jmp, JMP will also open several other file types, including xls. Use "Open Data Table" or File>Open or Ctl>O to open fat.xls.

3. The Analyze menu contains four main analysis categories:Distribution- Creates a histogramFit Y by X- Creates a scatter plotMatched Pairs- Compares the same group measured twice (for example, before and after)Fit Model- Performs a multivariate analysis of the effect of multiple variables on an outcome

Distribution4. Select Distribution to create a histogram of TFAT0. Say you wanted to look at group 1 and 2 separately (this is called stratification). To do this, add group to the box labeled By.

5. You should see two graphs, one showing the distribution of TFAT0 for people in group 1 and one showing TFAT0 for people in group 2. Notice that this gives you the sample size, mean and SD of the data (ie, descriptive statistics) to the right. Here, I rotated the display by clicking the hotspot by TFAT0 and selecting Display Options>Horizontal Layout.

Page 2: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

Fit Y by X- for comparing two unmatched groups6. Go back to the data table and select Fit Y by X from the analyze menu. Enter TFAT0 and TFAT36 as Y. Enter Group as X.

7. Observe that JMP has assumed Group is a continuous variable, as the axis for group is labeled 0, 0.1, 0.2 etc. For now, this cosmetic error is merely annoying, but incorrectly assigned variable types can affect the analysis as well, as many tests are based on the assumption that data fits a certain distribution. If you click on the hotspot (red triangle next to blue triangle), you will see that the t-test is not available.

Page 3: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

8. For the next graph, we want to jitter (offset) overlapping data points to improve visualization. Here’s how: File>Preferences>Platforms>Oneway and check the box at the end that says Points Jittered. Click apply and Ok.

9. Go back to the original Data Sheet and look at the list under Columns on the left. Most of the column variables have blue triangles, representing continuous data. In JMP, data in a column is continuous, ordinal, or nominal.

Continuous data can conceptually take on any value in an interval (ie age, weight, height)

Ordinal data data values are categorical, but may be ranked in some way (ie grade level) however the number assigned doesn't imply any actual mathematical relationship (three first graders one third grader)

Page 4: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

Nominal or Categorical data represents an unordered category (ie race, gender, hair color)

10. Group should be a Nominal variable then. Right click on the blue triangle icon next to group and change it to Nominal.

11. Now we can understand some things about the analyses available when comparing different combinations of variable types. When you select Fit Y by X, the following plot appears as a guide:

Bivariate ( by ) shows change in x as a function of y, used for regression and curve fitting.

Logistic ( either or by ) shows how well the value predicts the category, used for odds ratios and Receiver Operating Characteristic.

Oneway( by either or ) Compares means (ie, weight by group), used for t test and anova.

Contingency ( or by or ) shows how well one category (test positive) predicts another category (true positive), used for sensitivity and specificity, and chi square.

12. Select Fit Y by X from the analyze menu. Enter TFAT0 and TFAT36 as Y. Enter Group as X. JMP will now use the correct analysis, which is Oneway. Click on the red triangle ("hot spot") and note that t-test is now available. Prob > |t| gives the p value- significant difference at 36 weeks!

Page 5: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

13. The last skill for this lab is saving the graphs. On the graph window right click on the bar that says “Fit Y by X Group” and choose edit>Copy picture. Paste it into an empty word document, save and email it to your TA. It should look something like this:

Homework Hint: Today we performed unmatched analysis, but the homework will ask for both matched and unmatched analyses. What option would you choose from the Analyze menu to perform a matched analysis?

0

100

200

TFAT0

1 2

GROUP

Oneway Analysis of TFAT0 By GROUP

0

100

TFAT36

1 2

GROUP

Oneway Analysis of TFAT36 By GROUPFit Y by X Group

Page 6: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

JMP Session Lab 2 (assn 3)

Objectives:Compare distributionsPerform log transformationsPerform goodness of fit testsUse equations for math or logicDetermine sensitivity, specificity, and accuracyPerform ROC analysis

1. Download and open PSA.xls from the website: http://gornbein.bol.ucla.edu/cedarassign.htm

2. Analyze the distribution of PSA stratifying by group (Analyze>Distribution>add PSA as Y, group as X). Use the red hotspot to access additional commands. Fit a normal curve by selecting Continuous Fit> Normal. Try Continuous Fit>LogNormal.

3. To the right of the plot there are two boxes titled Fitted Normal and Fitted LogNormal. Use their hot spot to select "Goodness of Fit."

Goodness of Fit tests test the hypothesis that the data come from a Normal or LogNormal distribution.

Small P values reject the hypothesis (meaning the distribution is not normal or LogNormal). Which distribution provides a better fit (has a larger p value)?

4. Here’s another way to check: use the hot spot next to PSA and check Normal Quantile Plot. The Quantile Plot or Q-Q plot is a plot comparing the distribution function to a Normal distribution. If the data has a normal distribution, the black dots form a straight line.

5. To add a formula, create a new column using Cols>New Column, or right click on the blank column>New Column. Name it logPSA. Under Column Properties, select Formula. Use Transcendental>Log10 then click on PSA to enter the base 10 log of PSA.

6. For logPSA, plot the distribution, then fit a Normal and test the Goodness of Fit. If the p value is larger than the p value of the original PSA, the fit is improved with log transformation. Add the Quantile Plot. Use Edit>Copy Picture to move the Q-Q plot and the Goodness-of-Fit analysis into a word document. Save it as part of the report.

Part 2: Sensitivity, specificity, and accuracy7. Let's say we want to determine a cutoff for PSA or log PSA that predicts whether a person was in the disease or normal group. For this, we can use Receiver Operating Characteristic (ROC) analysis to select a threshold with optimal sensitivity and specificity.

8. Recall what we learned in Lab 1 about Logistic analysis:Logistic ( either or by ) shows how well the value predicts the category, used for odds ratios and Receiver Operating Characteristic.

Page 7: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

9. Use Analyze>Fit Y by X and enter Group as Y and PSA as X. Did you change group to nominal?

10. Use the hotspot to add the ROC curve. Select 2 (control group) as the positive level. The yellow tangent intersects where there are the greatest number of true positives and the smallest number of false positives, ie at maximum sensitivity and specificity. Record these values.

Sensitivity______ Specificity ______

11. Click the triangle by ROC Table to expand it. Locate your combination of Sensitivity and Specificity in the table. The ideal threshold value is marked by an asterisk to the right of the Sens-(1-Spec) column. Find the corresponding value in column X.

X _____ TP_____ TN_____ FP_____ FN_____12. Here’s a way to test our new threshold. Recall what we learned about contingency plots (also called a cross-tabulation) in Lab 1:Contingency ( or by or ) shows how well one category (test positive) predicts another category (true positive), used for sensitivity and specificity, and chi square.

13. We already know whether each subject was really in group 1 or 2 (true values) and we need a second column of categorical values that uses our threshold to determine whether the subject was probably in group 1 or 2 (test values) based on their TFAT36 value. Add a column named PredictGroup. Under column properties, select formula. Using functions under conditional and comparison, enter the statement If TFAT36≥threshold, then 2, else 1.

14. Make a contingency plot of Group (Y) and Predict Group (X). This is literally called a “Table of Confusion” in JMP and a Classification table in medical literature.

Actual Group

Page 8: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

normal disease

Predicted Group

normal True Negatives

False Negatives

Disease FalsePositives

True Positives

15. Click on the hotspot and check only Count. Count gives the number of cases in each category (TP, FP, TN, FN). Use the equations for sensitivity and specificity

16. Go back to the hot spot and check col%. The TP box shows sensitivity, and the TN box shows specificity.

17. One last thing. What if instead of having all of the individual PSA or log PSA values for the diseas and normal groups, you were only told the mean and standard deviation for each group, and told that the distribution is normal. Do you have enough information to calculate sensitivity and specificity for a given threshold?

18. The Z score describes the distance between any value on a distribution and the population mean in units of the standard deviation. Z is negative when the value is below the mean, positive when above. (Image used with permission, CreativeCommons)

19. Determine the Z score for each group (get two Z scores) using threshold (t) 39.3 for both. How can you get the mean (μ) and standard deviation(σ) for the groups?

20. Now create a new column called zscoreG1, and for column properties select formula. Select Probability>Normal Distribution. In the brackets, enter the Z for the disease Group : Normal

Page 9: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

Distribution(Z). Make another column for zscoreG2 using the Z for then normal Group. The value in the column is the percentile that corresponds to the Z score you entered. The value under zscoreG1 is sensitivity, and the value of zscoreG2 is 1-specificity.

Page 10: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

JMP Session Lab 3 (assn 4)

Objectives: Comparing means using summary statistics t tests for matched data Graph association and calculate correlation

Use the “fat” dataset as before where “TFAT” is total dietary fat in grams and “PFAT” isthe percent of body weight that is composed of fat. As before, there are two timeperiods, time “0” baseline (TFAT, PFAT0) and 36 week follow up. Assuming that thetime 0 and time 36 value is from the same person (ie each person has a baseline and 36week follow up).1. Carry out and report t tests (use Oneway Analysis) for comparing TFAT0, PFAT0, TFAT36 and PFAT36 between the two groups along with the appropriate summary statistics (mean, SD, SEM) .

TFAT0

Group 1: mean: _______ SD:_________ SEM:_________

Group 2: mean: _______ SD:_________ SEM:_________

T test p value for group 1 versus group 2: _____________

TFAT36

Group 1: mean: _______ SD:_________ SEM:_________

Group 2: mean: _______ SD:_________ SEM:_________

T test p value for group 1 versus group 2: _____________

PFAT0

Group 1: mean: _______ SD:_________ SEM:_________

Group 2: mean: _______ SD:_________ SEM:_________

t test p value for group 1 versus group 2: _____________

PFAT36

Group 1: mean: _______ SD:_________ SEM:_________

Page 11: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012

Group 2: mean: _______ SD:_________ SEM:_________

t test p value for group 1 versus group 2: _____________

2. Carry out and report t tests for comparing the change from time 0 to 36 weeks in TFAT and PFAT between the two groups.

TFAT36-TFAT0

Group 1 p value: ________

Group 2 p value: ________

PFAT36-PFAT0

Group 1 p value: ________

Group 2 p value: ________

3. Create and report a scatter plot of the change in TFAT versus the change in PFAT andalso report the correlation coefficient and linear regression equation for each group.

Group 1:

Correlation coefficient: ___________

Linear regression equation:______________

Group 2:

Correlation coefficient: ___________

Linear regression equation:______________

Are the mean levels of TFAT change and PFAT change the same between the twogroups? Is the relationship between TFAT change and PFAT change the same in eachgroup?

Page 12: JMP Session Lab 1 - gornbein.bol.ucla.edugornbein.bol.ucla.edu/JMP Session Labs 1,2,3.docx  · Web viewJMP Session Lab 1 (assn 2) ... JMP Session Lab 2 (assn 3) Objectives: Compare

Fall 2012