deconstructing data science -...

51
Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 6: Validity Feb 8, 2016

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Deconstructing Data ScienceDavid Bamman, UC Berkeley

Info 290

Lecture 6: Validity

Feb 8, 2016

Page 2: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Hypotheses

hypothesis

The average income in two sub-populations is different

Web design A leads to higher CTR than web design B

Self-reported location on Twitter is predictive of political preference

Male and female literary characters become more similar over time

Page 3: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Hypotheses

hypothesis “area”

Voters in big cities prefer Hillary Clinton

Email marketing language A is better than language B

Slapstick comedies do not win Oscars

Joyce’s Ulysses changed the form of the novel after 1922

The first step is formalizing a question into a testable hypothesis.

Page 4: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Null hypothesis• A claim, assumed to be true, that we’d like to test

(because we think it’s wrong)

hypothesis H0

The average income in two sub-populations is different The incomes are the same

Web design A leads to higher CTR than web design B The CTR are the same

Self-reported location on Twitter is predictive of political preference

Location has no relationship with political preference

Male and female literary characters become more similar over time

There is no difference in M/F characters over time

Page 5: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Hypothesis testing

• If the null hypothesis were true, how likely is it that you’d see the data you see?

Page 6: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Example• Hypothesis: Berkeley residents tend to be

politically liberal

• H0: Among all N registered {Democrat, Republican} primary voters, there are an equal number of Democrats and Republicans in Berkeley.

#demN =

#repN = 0.5

Page 7: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Example

• If we had access to the party registrations (and knew the population), we would have our answer.

Page 8: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Example

10 10

50% 2 18

10%

7 13

45%

13 7

65%

15 5

75%

11 9

55%

Page 9: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Hypothesis testing

• Hypothesis testing measures our confidence in what we can say about a null from a sample.

Page 10: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Example

Binomial probability distribution for number of democrats in n=1000 with p = 0.5

0.000

0.005

0.010

0.015

0.020

0.025

400 450 500 550 600# Dem

Page 11: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

0.000

0.005

0.010

0.015

0.020

0.025

400 450 500 550 600# Dem

ExampleAt what point is a sample statistic unusual enough to reject

the null hypothesis?

510

580

Page 12: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Example

• The form we assume for the null hypothesis lets us quantify that level of surprise.

• We can do this for many parametric forms that allows us to measure P(X ≤ x) for some sample of size n; for large n, we can often make a normal approximation.

Page 13: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Z score

For Normal distributions, transform into standard normal (mean = 0, standard deviation =1 )

Z =Y � np�

(np(1 � p))For Binomial distributions, normal approximation

(for large n)

p = 0.5 (proportion we are

testing)

n=1000(total sample

size)

Y=580(democrats in

sample)

Z =X � μσ/

�n

Page 14: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

0.0

0.1

0.2

0.3

0.4

-6 -3 0 3 6z

density

580 democrats = z score 5.06

510 democrats = z score 0.63

Z score

Page 15: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Tests

• We will define “unusual” to equal the most extreme areas in the tails

Page 16: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

least likely 10%

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4z

density

Page 17: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

least likely 5%

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4z

density

Page 18: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

least likely 1%

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4z

density

Page 19: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

0.0

0.1

0.2

0.3

0.4

-6 -3 0 3 6z

density

580 democrats = z score 5.06

510 democrats = z score 0.63

Tests

Page 20: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

• Decide on the level of significance α. {0.05, 0.01}

• Testing is evaluating whether the sample statistic falls in the rejection region defined by α

Tests

Page 21: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Tails• Two-tailed tests measured whether the observed statistic is different (in either direction)

• One-tailed tests measure difference in a specific direction

• All differ in where the rejection region is located; α = 0.05 for all.

two-tailed test

lower-tailed test upper-tailed test

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4z

density

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4z

density

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4z

density

Page 22: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

p values

• Two-tailed test p-value(z) = 2 � P(Z � �|z|)

• Lower-tailed test p-value(z) = P(Z � z)

• Upper-tailed test p-value(z) = 1 � P(Z � z)

A p value is the probability of observing a statistic at least as extreme as the one we did if the null hypothesis were true.

Page 23: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Errors

keep null reject null

keep null Type I errorα

reject null Type II errorβ Power

Test results

Truth

Page 24: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Errors

• Type I error: we reject the null hypothesis but we shouldn’t have.

• Type II error: we don’t reject the null, but we should have.

Page 25: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

1 Berkeley residents tend to be politically liberal

2 San Francisco residents tend to be politically liberal

3 Albany residents tend to be politically liberal

4 El Cerrito residents tend to be politically liberal

5 San Jose residents tend to be politically liberal

6 Oakland residents tend to be politically liberal

7 Walnut Creek residents tend to be politically liberal

8 Sacramento residents tend to be politically liberal

9 Napa residents tend to be politically liberal

… …

1,000 Atlanta residents tend to be politically liberal

Page 26: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Errors

• For any significance level α and n hypothesis tests, we can expect α⨉n type I errors.

• α=0.01, n=1000 = 10 “significant” results simply by chance

• When would this occur in practice?

Page 27: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Multiple hypothesis corrections

• Bonferroni correction: for family-wise significance level α0 with n hypothesis tests:

• [Very strict; controls the probability of at least one type I error.]

• False discovery rate

α � α0n

Page 28: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Effect size

• Hypothesis tests measure a binary decision (reject or do not reject a null). Many ways to attain significance; e.g.:

• large true difference in effects • large n

Page 29: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Effect size

• Difference between the observed statistic and null hypothesis

null hypothesis observed effect size (%) effect size (n)

0.50 0.58 0.08 80

0.000

0.005

0.010

0.015

0.020

0.025

400 450 500 550 600# Dem

580

Page 30: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Power

• The probability of a single sample to reject the null hypothesis when it should be rejected

Page 31: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

0.000

0.005

0.010

0.015

0.020

0.025

400 500 600 700z

density

0.00

0.01

0.02

400 500 600 700z

density

99.90% of samples from here will be in rejection region (if H0 is false)

For a fixed effect size, how much of alternative distribution is in the H0 rejection region?

Page 32: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Nonparametric tests

• Many hypothesis tests rely on parametric assumptions (e.g., normality)

• Alternatives that don’t rely on those assumptions:

• permutation test • the bootstrap

Page 33: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Observational data

• A survey of the political affiliation of Berkeley residents is observational data

• the independent variable (living in Berkeley) is not under our control

• Tweets, books, surveys, the web, the census etc. — is all observational.

Page 34: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

• Hypothesis tests for observational data assess the relationship between variables but don’t establish causality.

• Example: if we intervened and relocated someone to Berkeley, would they become liberal?

Observational data

Page 35: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Experimental data

• Data that allows you to perform an intervention and determine the value of some variable

• Clinical data: treatment vs. placebo • Web design: one of two homepage

designs • Political email campaigns: one of two

(differently worded) solicitations

Page 36: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

• A potential confound exists if any other variable is correlated with your intervention decision:

• e.g., users volunteering to receive a drug (and not the placebo)

Experimental data

Page 37: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Randomization experiments

• Users are randomly assigned an outcome (which web page), which allows us to better establish causality

• A/B testing = significance test in randomized experiment with two outcomes

Page 38: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Krippendorff (2004)

Page 39: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Face validity

• Does a finding “make sense” (in retrospect)?

• The “gatekeeper for all other kinds of validity”

Page 40: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Social validity

• Does a finding make a “contribution to the public discussion of important social concerns?”

Page 41: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Sampling validity

• Does a finding contain sample:

• large enough to support its results? • not biased in the quantity of interest?

• e.g., Twitter

Page 42: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Semantic validity

• Does a finding ascribe meaning to its categories in a way that corresponds to how its subjects understand them?

• e.g., sentiment analysis, {democrat, republican}, libel

Page 43: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Structural validity

• Does a finding rely on methods that have internal coherence?

• e.g., fame from google books, historical argument

Page 44: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Functional validity

• Does a finding rely on a method that has a record of success?

Page 45: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Correlative validity

• Convergent validity: Does a finding correlate with another trusted variable?

• Divergent validity: Does a finding not correlate with measures of different phenomena?

Page 46: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Predictive validity

• Does a finding make correct predictions about the future?

Page 47: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Validity

What other forms of validity should we add?

Page 48: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Krippendorff (2004)

Page 49: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

Homework 1, part I• Creativity in conceptualizing what an "ideal" representation would look

like, even if impractical.

• Originality in finding or imagining other types of potentially unusual data that could be included; alternatively, justification for the use of simplicity.

• Practice in the formulation of hypotheses (potential features that might be predictive) that can be justified a priori and then tested experimentally.

• Clarity in what counts as an "instance" for each of the nomination categories.

• Clarity in what counts as a "feature" that can be operationalized, and what constitutes sensible values for that feature.

Page 50: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

• Ability to operationalize the abstract features from part I into a tangible implementation.

• Ambition and creativity in the collection of data from which features can be instantiated

Homework 1, part IIa

Page 51: Deconstructing Data Science - Coursescourses.ischool.berkeley.edu/i290-dds/s16/dds/slides/6_validity.pdf · Homework 1, part I • Creativity in conceptualizing what an "ideal" representation

• Understanding of the ways in which a human process can be understood as an "algorithm."

• Strong argument for the ways in which representation is consequential for learning.

• Strong argument for potential sources of bias.

• The use of specific mechanisms/techniques from data science to support your arguments

Homework 1, part IIb