hard versus soft science: studies in biometrics and psychometrics peter h. westfall horn professor...
TRANSCRIPT
Hard versus Soft Science: Studies in Biometrics and
Psychometrics
Peter H. WestfallHorn Professor of Statistics
Dept. of ISQS
Goals of this Talk
• Characterize “hard” and “soft” science– Biometrics– Psychometrics– Medicine
• Differences concern– Measurement– Models– Action orientation
• Describe pitfalls• Recommendations
Hard and Soft Measurements
• Hard(er) endpoints– Patient genotype– Patient bilirubin level
• Soft(er) endpoints– Patient-reported pain level– Patient reported quality of life
Characterizations
• Hard endpoints– Meaningful units (eg, g/L)– Reliable– Accurate
• Soft endpoints– Units not as meaningful (e.g., 1-5 Likert scale)– Less reliable – Accurate?
Measurement Scales
Hard Science Soft Science
Measurement: 23.2 grams
What do you think?
Disapprove Approve 1 2 3 4 5
Measurement: “I dunno, … , uh, 4?”
A “Hard Science” Model
Genotype
Phenotype 1
Phenotype 2
Phenotype 3
Data for “Hard” Science Model
Gene1 Gene2 Eye Color Metabolism SchizophreniaDiabetesAA AA Brn High Yes NoAA AB Blue High Yes NoAB AB Blue Med No NoAA BB Brn High No YesAC AA Blue Med No YesCC AA Green Low No NoAA BB Brn High Yes NoBB AB Hzl High Yes YesAA AB Blue High No YesAB AB Brn Low No Yes… … … … … …
PhenotypesGenotype
A “Soft Science” Model
“Intelligence”
Test 1
Test 2
Test 3
Data for “Soft” Science Model
Math Verbal Test1 Test2 Test3 Test4? ? 79 75 73 79? ? 79 69 73 86? ? 76 82 83 86? ? 80 82 84 74? ? 81 80 82 76? ? 78 83 84 75? ? 85 86 83 76? ? 84 80 76 78? ? 84 78 81 77? ? 88 84 81 87… … … … … …
Test ScoresLatent Constructs
What is “Intelligence”?• “An Intelligent person is one who scores
high on tests.”– Circular: Defined in terms of test scores, and yet
also is used to predict test scores.
– Usual psychometric model simply assumes that there is a number “intelligence” existing in each individual person (like a genotype).
– It assumes all people in the universe are perfectly ordered by their “intelligence.”
– This is hogwash.
Assumed Psychometric Data
Math Verbal Test1 Test2 Test3 Test40.27 0.51 79 75 73 790.18 -1.53 79 69 73 86
-1.19 -0.97 76 82 83 86-0.15 0.39 80 82 84 740.00 -0.53 81 80 82 76
-1.72 -0.40 78 83 84 75-0.06 0.21 85 86 83 76-0.21 1.49 84 80 76 78-0.29 -0.37 84 78 81 772.76 -0.48 88 84 81 87
… … … … … …
Test ScoresLatent Constructs
These numbers are assumed to exist!People are perfectly ordered by them.This is hogwash!
SEM (Structural Equations Model)Measurement Model
Structural Model
Assumptions:
1. Existence of latent variables and
2. Structural form (linearity, constraints)
3. Independence
4. Homogeneity of subjects
5
y
x
Y
X
B
. Normality (not as crucial as all the others)
The Utility of Better Models
To bring the data into sharper focus:
Clearer focus with SEM model:
When is a Model Good?
• Property 1: A good model is one whose predictions (what comes out of the black box) match reality well.
• Property 2: A good model is one whose constructs (what is inside the black box) match reality well.
• Property 3: A good model is one that has prescriptive utility.
Property 1: Outputs
• Both models predict data that “looks like” the data we see:
– SEM model predicts generally high test scores for a person with “high intelligence.”
– Genotype/phenotype model predicts certain physical characteristics for people sharing a common genotype.
Property 2: Model Constructs
• The latent constructs are not real, thus the model fails on this count.
• The genotype/phenotype constructs are real, and the directional arrows have clear biological justification (genes code for proteins that perform biological functions).
Property 3: Prescription
• Prescriptive use of the SEM model:– Since latent factors do not exist, we cannot
use the model prescriptively.– But the model is often used for scoring; and
scores might be used prescriptively.
• Prescriptive uses of Genotype/Phenotype model:– Counseling– Saving lives
Is Psychometric Score Construction Helpful?
Many variables
Psycho-metricScore
construction
Use scoreIn futureanalysis
(Multiple variablesX1, X2,…,X20)
(Cronbach’s alpha, SEM, discriminantand convergent validity; S= X1+X3+X17)
(Classification,Prediction)
Example 1: Arthritis Pain Measurement
• Ask subjects to rate pain in feet, knees, shoulder, hands, in morning; all in midday, morning, and night.
• Psychometric score: “Advancement of Arthritic Condition” (essentially a summate of all measures).
• If used to evaluate a knee therapy, this score will waste the company’s money and delay the progress of science.
Example 2: The essence of Turtle
Measurements: Log(Length), Log(Width), Log(Height)
Reliability of T = Log(Length) + Log(Width) + Log(Height)as a measure of the “essence of turtle”:
Males: Cronbach’s Alpha = 0.97Females: Cronbach’s Alpha = 0.98
Exceptional! Alpha > 0.70 often considered “acceptable”.
T is the score we should use in further analysis!
Example 2 Continued:
Despite its high reliability, T is improper for characterizing Female vs. Male turtles.
The best classifier is
W = -2.42Log(Length) -0.48Log(Width) + 3.74Log(Height).
(Females turtles are shaped differently from Males.)
The psychometric scale impedes science.
Example 3: Patient Condition
• Measurements (Likert scale): Xi = condition at week i after start of treatment, i=1,2,3,4.
• Psychometric scale: “Condition” = X1+X2+X3+X4.
• But this is an inappropriate:
“Improvement” = -1.5X1 -0.5X2 + 0.5X3 +1.5X4 is better.
• The pychometric scale will cost the drug company more, delay approval, and possibly result in lives lost.
Revised Score Construction Model
Many variables
Pilot study or Training sample
Use scoreIn futureanalysis
(Multiple variablesX1, X2,…,X20)
(Construct score usingscientific relevance and statistical predictive ability;S = (X2 + X5) – (X7+X9))
(Classification,Prediction)
Follow the Money• Money talks: “Hard science” approaches
receive the money:– Data mining in business
• Expensive customer scoring data• Analyze money spent, not intention to spend
– Pharmaceutical company • exploration – genes, chemistry• experimentation - 100’s of millions of dollars
change hands on a single clinical trial
Then why do we do so much soft science?
• Inertia, inbreeding– Journals– Universities, “research methods”
• Money:– Drug trials: $10,000 per subject– Undergraduate students: $0 per subject
Inbreeding: The Exponential BS (bogus science) Theory
BS0 published
Time
0
1
2
3
4
BS1 published
BS1 published
BS2 BS2 BS2 BS2
3 333 3 3 3 3
Comparison
• Hard Science: Spend a winter collecting and analyzing fungus from caves in Northern Alaska
• Soft Science: Ask students to pretend they are fungus in caves in Northern Alaska
Survey data on undergraduate students
Survey data on undergraduate students analyzed via complex statistical model
Conclusions
Let’s move towards harder science:– Work harder to get relevant data– Use more real measures, less fictional– Use more models that
• Predict reality• Have real constructs• Are prescriptive• Are falsifiable
– Use more external validation