hard versus soft science: studies in biometrics and psychometrics peter h. westfall horn professor...

Hard versus Soft Science: Studies in Biometrics and

Psychometrics

Peter H. WestfallHorn Professor of Statistics

Dept. of ISQS

Goals of this Talk

• Characterize “hard” and “soft” science– Biometrics– Psychometrics– Medicine

• Differences concern– Measurement– Models– Action orientation

• Describe pitfalls• Recommendations

Hard and Soft Measurements

• Hard(er) endpoints– Patient genotype– Patient bilirubin level

• Soft(er) endpoints– Patient-reported pain level– Patient reported quality of life

Characterizations

• Hard endpoints– Meaningful units (eg, g/L)– Reliable– Accurate

• Soft endpoints– Units not as meaningful (e.g., 1-5 Likert scale)– Less reliable – Accurate?

Measurement Scales

Hard Science Soft Science

Measurement: 23.2 grams

What do you think?

Disapprove Approve 1 2 3 4 5

Measurement: “I dunno, … , uh, 4?”

A “Hard Science” Model

Genotype

Phenotype 1

Phenotype 2

Phenotype 3

Data for “Hard” Science Model

Gene1 Gene2 Eye Color Metabolism SchizophreniaDiabetesAA AA Brn High Yes NoAA AB Blue High Yes NoAB AB Blue Med No NoAA BB Brn High No YesAC AA Blue Med No YesCC AA Green Low No NoAA BB Brn High Yes NoBB AB Hzl High Yes YesAA AB Blue High No YesAB AB Brn Low No Yes… … … … … …

PhenotypesGenotype

A “Soft Science” Model

“Intelligence”

Test 1

Test 2

Test 3

Data for “Soft” Science Model

Math Verbal Test1 Test2 Test3 Test4? ? 79 75 73 79? ? 79 69 73 86? ? 76 82 83 86? ? 80 82 84 74? ? 81 80 82 76? ? 78 83 84 75? ? 85 86 83 76? ? 84 80 76 78? ? 84 78 81 77? ? 88 84 81 87… … … … … …

Test ScoresLatent Constructs

What is “Intelligence”?• “An Intelligent person is one who scores

high on tests.”– Circular: Defined in terms of test scores, and yet

also is used to predict test scores.

– Usual psychometric model simply assumes that there is a number “intelligence” existing in each individual person (like a genotype).

– It assumes all people in the universe are perfectly ordered by their “intelligence.”

– This is hogwash.

Assumed Psychometric Data

Math Verbal Test1 Test2 Test3 Test40.27 0.51 79 75 73 790.18 -1.53 79 69 73 86

-1.19 -0.97 76 82 83 86-0.15 0.39 80 82 84 740.00 -0.53 81 80 82 76

-1.72 -0.40 78 83 84 75-0.06 0.21 85 86 83 76-0.21 1.49 84 80 76 78-0.29 -0.37 84 78 81 772.76 -0.48 88 84 81 87

… … … … … …

Test ScoresLatent Constructs

These numbers are assumed to exist!People are perfectly ordered by them.This is hogwash!

SEM (Structural Equations Model)Measurement Model

Structural Model

Assumptions:

1. Existence of latent variables and

2. Structural form (linearity, constraints)

3. Independence

4. Homogeneity of subjects

5

y

x

Y

X

B

. Normality (not as crucial as all the others)

The Utility of Better Models

To bring the data into sharper focus:

Clearer focus with SEM model:

When is a Model Good?

• Property 1: A good model is one whose predictions (what comes out of the black box) match reality well.

• Property 2: A good model is one whose constructs (what is inside the black box) match reality well.

• Property 3: A good model is one that has prescriptive utility.

Property 1: Outputs

• Both models predict data that “looks like” the data we see:

– SEM model predicts generally high test scores for a person with “high intelligence.”

– Genotype/phenotype model predicts certain physical characteristics for people sharing a common genotype.

Property 2: Model Constructs

• The latent constructs are not real, thus the model fails on this count.

• The genotype/phenotype constructs are real, and the directional arrows have clear biological justification (genes code for proteins that perform biological functions).

Property 3: Prescription

• Prescriptive use of the SEM model:– Since latent factors do not exist, we cannot

use the model prescriptively.– But the model is often used for scoring; and

scores might be used prescriptively.

• Prescriptive uses of Genotype/Phenotype model:– Counseling– Saving lives

Is Psychometric Score Construction Helpful?

Many variables

Psycho-metricScore

construction

Use scoreIn futureanalysis

(Multiple variablesX1, X2,…,X20)

(Cronbach’s alpha, SEM, discriminantand convergent validity; S= X1+X3+X17)

(Classification,Prediction)

Example 1: Arthritis Pain Measurement

• Ask subjects to rate pain in feet, knees, shoulder, hands, in morning; all in midday, morning, and night.

• Psychometric score: “Advancement of Arthritic Condition” (essentially a summate of all measures).

• If used to evaluate a knee therapy, this score will waste the company’s money and delay the progress of science.

Example 2: The essence of Turtle

Measurements: Log(Length), Log(Width), Log(Height)

Reliability of T = Log(Length) + Log(Width) + Log(Height)as a measure of the “essence of turtle”:

Males: Cronbach’s Alpha = 0.97Females: Cronbach’s Alpha = 0.98

Exceptional! Alpha > 0.70 often considered “acceptable”.

T is the score we should use in further analysis!

Example 2 Continued:

Despite its high reliability, T is improper for characterizing Female vs. Male turtles.

The best classifier is

W = -2.42Log(Length) -0.48Log(Width) + 3.74Log(Height).

(Females turtles are shaped differently from Males.)

The psychometric scale impedes science.

Example 3: Patient Condition

• Measurements (Likert scale): Xi = condition at week i after start of treatment, i=1,2,3,4.

• Psychometric scale: “Condition” = X1+X2+X3+X4.

• But this is an inappropriate:

“Improvement” = -1.5X1 -0.5X2 + 0.5X3 +1.5X4 is better.

• The pychometric scale will cost the drug company more, delay approval, and possibly result in lives lost.

Revised Score Construction Model

Many variables

Pilot study or Training sample

Use scoreIn futureanalysis

(Multiple variablesX1, X2,…,X20)

(Construct score usingscientific relevance and statistical predictive ability;S = (X2 + X5) – (X7+X9))

(Classification,Prediction)

Follow the Money• Money talks: “Hard science” approaches

receive the money:– Data mining in business

• Expensive customer scoring data• Analyze money spent, not intention to spend

– Pharmaceutical company • exploration – genes, chemistry• experimentation - 100’s of millions of dollars

change hands on a single clinical trial

Then why do we do so much soft science?

• Inertia, inbreeding– Journals– Universities, “research methods”

• Money:– Drug trials: $10,000 per subject– Undergraduate students: $0 per subject

Inbreeding: The Exponential BS (bogus science) Theory

BS0 published

Time

0

1

2

3

4

BS1 published

BS1 published

BS2 BS2 BS2 BS2

3 333 3 3 3 3

Comparison

• Hard Science: Spend a winter collecting and analyzing fungus from caves in Northern Alaska

• Soft Science: Ask students to pretend they are fungus in caves in Northern Alaska

Survey data on undergraduate students

Survey data on undergraduate students analyzed via complex statistical model

Conclusions

Let’s move towards harder science:– Work harder to get relevant data– Use more real measures, less fictional– Use more models that

• Predict reality• Have real constructs• Are prescriptive• Are falsifiable

– Use more external validation

hard versus soft science: studies in biometrics and psychometrics peter h. westfall horn professor...

Documents