introduction to econometrics · introduction to econometrics lecture 2: review of basic statistics...

106
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Econometrics Lecture 2: Review of Basic Statistics & Randomized Controlled Trials(RCTs) Zhaopeng Qu Business School,Nanjing University Sep. 24th, 2020 Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 1 / 106

Upload: others

Post on 20-Oct-2020

35 views

Category:

Documents


14 download

TRANSCRIPT

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Introduction to EconometricsLecture 2: Review of Basic Statistics & Randomized Controlled

    Trials(RCTs)

    Zhaopeng Qu

    Business School,Nanjing University

    Sep. 24th, 2020

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 1 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Outlines

    1 An Brief Review of Basic Statistics

    2 Review: Random Experiment as the Research Design

    3 What is an RCT?

    4 Assuming Case: the California School

    5 Limitations of RCTs

    6 An Example of Randomized Controlled Trials

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 2 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics

    An Brief Review of Basic Statistics

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 3 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Basic Concepts

    Population and Sample(总体与样本)

    A population is a collection of people, items, or events aboutwhich you want to make inferences.

    Population always have a probability distribution.A sample is a subset of population, which draw from populationin a certain way.

    The sample could also follow a probability distribution.To represent the population well, a sample should be randomlycollected and adequately large.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 4 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Basic Concepts

    Random Sample(随机样本) and i.i.d(独立同分布)

    DefinitionThe r.v.s are called a random sample of size n from the populationf(x) if X1, ...,Xn are mutually independent and have the samep.d.f/p.m.f f(x). Alternatively, X1, ...,Xn are called independent,and identically distributed random variables(r.v.s) with p.d.f/p.m.f, commonly abbreviated to i.i.d. r.v.s.

    eg. Random sample of n respondents in a survey.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 5 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Basic Concepts

    Statistic(统计量) and Sampling Distribution

    DefinitionX1, ...,Xn is a random sample of size n from the population f(x). Astatistic(T) is a real-valued or vector-valued function fully dependedon X1, ...,Xn, thus

    T = T(X1, ...,Xn)

    A statistic is only a function of the sample (统计量是样本的函数).The probability distribution of a statistic T is called thesampling distribution (抽样分布)of T.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 6 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Basic Concepts

    Sample Mean(样本均值) and Sample Variance(样本方差)

    Two common and important estimators

    DefinitionThe sample average or sample mean, X, of the n observationsX1, ...,Xn is

    X̄ = 1n(X1 + X2 + ...+ Xn) =1

    n

    n∑i=1

    Xi

    Accordingly, the sample variance is

    S2 = 1n − 1

    n∑i=1

    (Xi − X)2

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 7 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Basic Concepts

    Sample Mean(样本均值) and Sample Variance(样本方差)

    As we know that if Xi is a random variable(r.v.), then f(Xi),which is a function of Xi, is also a r.v..(随机变量的函数还是随机变量)So if Xi is a r.v., then

    ∑Xi is also a r.v..

    the sample mean and the sample variance are alsofunctions of sums, therefore they are a r.v.s too.there are some certain probability functions which candescribe distributions of the sample mean and the samplevariance.Then naturally ask a question: what are the expectation,variance or p.d.f./c.d.f. of these distributions?

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 8 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Basic Concepts

    A simple case of sample meanLet {X1, ...,Xn} ∈ [1, 100] , assume n = 2, thus only X1and X2

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 9 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    Sampling Distributions

    There are two approaches to characterizing samplingdistributions:

    exact/finite sample distribution: The samplingdistribution that exactly describes the distribution of X forany n is called the exact/finite sample distribution of X.approximate/asymptotic distribution: when the samplesize n is large, the sample distribution approximates to acertain distribution function.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 10 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    Two Key Tools

    Two key tools used to approximate sampling distributions whenthe sample size is large, thus assume that n → ∞

    The Law of Large Numbers(L.L.N.): when the sample sizeis large, X will be close to µY , the population mean withvery high probability.The Central Limit Theorem(C.L.T.): when the samplesize is large, the sampling distribution of the standardizedsample average,(Y−µY)/σY , is approximately normal.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 11 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    Convergence in probability(概率收敛)

    DefinitionLet X1, ...,Xn be an random variables or sequence, is said to convergein probability to a value b if for every ε > 0,

    P(| Xn − b |> ε) → 0

    as n → ∞. We denote this as Xnp−→ b or plim(Xn) = b.

    It is similar to the concept of a limitation in a probability way.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 12 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    The Law of Large Numbers(大数定律)

    TheoremLet X1, ...,Xn be an i.i.d draws from a distribution with mean µ andfinite variance σ2(a population) and X = 1n

    ∑ni=1 Xi is the sample

    mean, thenX p−→ µ

    Intuition: the distribution of Xn “collapses”on µ.直观解释:抽样样本量越大,样本平均值越接近总体平均值(抽样分布更紧凑)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 13 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    A simple case

    ExampleSuppose X has a Bernoulli distribution if it have a binary valuesX ∈ {0, 1} and its probability mass function is

    P(X = x) ={0.78 if x = 10.22 if x = 0

    then E(X) = p = 0.78 and Var(X) = p(1− p) = 0.1716.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 14 / 106

  • . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    Convergence in Distribution(分布收敛)DefinitionLet X1,X2,... be a sequence of r.v.s, and for n = 1, 2, .... Let Fn(x)be the c.d.f of Xn. Then it is said that X1,X2, ...converges indistribution to r.v. W with c.d.f, FW, if

    limn�∞Fn(x) = FW(x)

    which we write as Xn d−→ W.

    Basically: when n is big, the distribution of Xn is very similar tothe distribution of W.Standardize: by subtracting its expectation and dividing by itsstandard deviation

    Z = X − E[X]√Var[X]

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 16 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    The Central Limit Theorem(中心极限定理)TheoremLet X1, ...,Xn be an i.i.d draws from a distribution with sample size nwith mean µ and 0 < σ2 < ∞, then

    Xn − µσ/√n

    d∼ N(0, 1)

    Because we don’t have to make any specific assumption aboutthe distribution of Xi, so whatever the distribution of Xi, when nis big,

    the standardized Xn d∼ N(0, 1)or Xn d∼ N(µ, σ

    2

    n )

    直观理解:选取的样本量越大,样本均值的分布越趋于正态分布。

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 17 / 106

  • . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

  • . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

    How large is “large enough”?

    How large is large enough ?how large must n be for the distribution of Y to beapproximately normal?

    The answer: it depends.if Yi are themselves normally distributed, then Y is exactlynormally distributed for all n.if Yi themselves have a distribution that is far from normal,then this approximation can require n = 30 or even more.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 20 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Statistical Inference: from Samples to Population

    InferenceWhat is our best guess about some quantity of interest?What are a set of plausible values of the quantity of interest?

    Our focus: {Y1,Y2, ...,Yn} are i.i.d. draws from f(y) or F(Y),thus population distribution.Statistical inference is using samples to infer f(y).

    Normally, we don’t need to know everything of thepopulation, just some measures (the moment) enough todescribe the characteristics of the population.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 21 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Statistical Inference: Point estimation

    Point estimation: providing a single “best guess”as to thevalue of some fixed, unknown quantity of interest, θ, which is is afeature of the population distribution, f(y).

    µ =?E[Y]σ2 =?Var[Y]µy − µx? = E[Y]− E[X]

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 22 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Three Characteristics of an EstimatorLet µ̂Y denote the some estimation value of the populationmoment, µY and E(µ̂Y) is the mean of the sampling distributionof µ̂Y,

    1 Unbiasedness: the estimator of µY is unbiased ifE(µ̂Y) = µY

    2 Consistency:the estimator of µY is consistent ifµ̂Y

    p−→ µY3 Efficiency:Let µ̃Y be another estimator of µY and suppose that

    both µ̃Y and µ̂Y are unbiased.Then µ̂Y is said to be moreefficient than µ̂Y

    var(µ̂Y) < var(µ̃Y)Comparing variances is difficult if we do not restrict ourattention to unbiased estimators because we could alwaysuse a trivial estimator with variance zero that is biased.Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 23 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Three Characteristics of an Estimator

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 24 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Three Characteristics of an Estimator

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 25 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Three Characteristics of an Estimator

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 26 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Properties of the Sample Mean

    Let µY and σ2Y denote the mean and variance of Y,(总体的均值和方差)Let Y = 1n

    ∑ni=1 Yi is the sample mean of Yi(样本均值)

    Then the expectation of the sample mean(样本均值的期望) is

    E(Y) = 1n

    n∑i=1

    E(Yi) = µY

    so Y is an unbiased estimator of µY.Based on the L.L.N., Y p−→ µY, so Y is also consistent.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 27 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Properties of the Sample Mean

    The Variance of sample mean(样本均值的方差)

    Var(Y) = var(1

    n

    n∑i=1

    Yi

    )=

    1

    n2n∑

    i=1Var(Yi) =

    σ2Yn

    Then, the Standard Deviation of the sample mean is σY = σY√n

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 28 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Properties of the Sample Mean

    Follow the C.L.T, the

    Y d∼ N(µY,σ2Yn )

    And let Z be the standardized Y, then

    Z = Y − µYσY√n

    d∼ N(0, 1)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 29 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Properties of the Sample Variance

    Let µY and σ2Y denote the mean and variance of Yi , then thesample variance:

    S2Y =1

    n − 1

    n∑i=1

    (Yi − Y)2

    Then it is easy to prove that1 E(S2Y) = σ2Y, thus S2 is an unbiased estimator of σ2Y which is

    also the reason why the average uses the divisor n − 1instead of n.

    2 S2YP−→ σ2Y, thus the sample variance is a consistent

    estimator of the population variance.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 30 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    The Standard Error

    Recall: the standardized sample mean will be approximatelyfollow a standard normal distribution when n is large.

    Z = Y − µYσY√n

    d∼ N(0, 1)

    But in general σY, the standard deviation of population isunknown, so we have to use sample to estimate it.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 31 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    The Standard Error

    Let σY = σY√n , because S2Y is an unbiased and consistentestimator of the σ2Y, then we can use SY√n as an estimator of thestandard deviation of the sample mean, σY.It is called the standard error(标准误)of the sample mean

    SE[Y] = σ̂Y =SY√

    n

    Equivalence to the“standard deviation”(标准差)of the sampledistribution which measures the deviations of the sample mean.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 32 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Application: Sample Size and Standard Error(Population)

    Population ∼ N(0, SD)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 33 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Sample Size and Standard Error(sample n=250)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 34 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Sample Size and Standard Error(n=500)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 35 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Sample Size and Standard Error(n=1000)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 36 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    Recall: The Chi-Square Distribution

    Let Zi(i = 1, 2, ...,m) be independent random variables, eachdistributed as standard normal. Then a new random variablecan be defined as the sum of the squares of Zi :

    X =m∑

    i=1Z2i

    Then X has a chi-squared distribution with m degrees offreedom.Then, it can be prove that a variation of the sample variance willfollow a Chi-Square distribution

    (n − 1)S2Yσ2

    ∼ χ2n−1

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 37 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    The Student-t Distribution

    The Student t distribution can be obtained from a standardnormal and a chi-square random variable.Let Z have a standard normal distribution, let X have achi-square distribution with n degrees of freedom and assumethat Z and X are independent. Then the random variable

    T = Z√X/n

    has has a t-distribution with n degrees of freedom, denoted asT ∼ tn.Then, the Z will follow a student t distribution.

    Z = Y − µYSY√n

    ∼ t(n − 1)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 38 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

    The Student-t Distribution

    It does not matter alot in the large sample.As the degrees offreedom get largewhich is highlycorrelated with thesample size n, thet-distribution actuallyapproaches thestandard normaldistribution.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 39 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    Interval Estimation

    A point estimate provides no information about how close theestimate is likely to be to the population parameter.We cannot know how close an estimate for a particular sample isto the population parameter because the population is neverunknown.A different (complementary) approach to estimation is toproduce a range of values that will contain the truth with somefixed probability.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 40 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    What is a Confidence Interval?

    DefinitionA 100(1− α)% confidence interval for a population parameter θ is aninterval Cn = (a, b) , where a = a(Y1, ...,Yn) and b = b(Y1, ...,Yn)are functions of the data such that

    P(a < θ < b) = 1− α

    In general, this confidence level is 1− α (置信水平) ; where αis called significance level.(显著性水平)The key is how to obtain or construct the values of a and b.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 41 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    Interval Estimation and Confidence IntervalsSuppose the population has a normal distribution N(µ, σ2) andlet Y1,Y2, ...,Yn be a random sample from the population.

    Then the sample mean Y has a normal distribution:Y ∼ N(µ, σ2n )The standardized sample mean Z is given by:Z = Y−µσ/√n ∼ N(0, 1)

    Then let θ = Z, then P(a < θ < b) = 1− α turns into

    a < Y − µσ/√n

    < b

    then it follows thatP(Y − aσ/√n < µ < Y + bσ/√n) = 1− α

    Thus the random interval contains the population mean with aprobability 1− α.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 42 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    Interval Estimation and Confidence IntervalsTwo cases: σ is known and unknownWhen σ is known, for example,σ = 1, thus Y ∼ N(µ, 1),

    Y ∼ N(µ, σ2

    n =1

    n)

    From this, we can standardize Y, and, because the standardizedversion of Y has a standard normal distribution, and we let α = 0.05,then we have

    P(−1.96 < Y − µ1/√

    n< 1.96) = 1− 0.05

    The event in parentheses is identical to the eventY − 1.96/√n ≤ µ ≤ Y + 1.96/√n, so

    P(Y − 1.96/√n ≤ µ ≤ Y + 1.96/√n) = 0.95The interval estimate of µ may be written as[Y − 1.96/√n,Y + 1.96/√n]

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 43 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    Interval Estimation and Confidence Intervals

    When σ is unknown, we could use an estimate of σ,thusSE[Y] = σ̂Y = SY√n , the standard error, replacing unknown σthus

    Zt =Y − µY

    SY√n

    =Y − µYSE(Y)

    We just suggested that it follows a student t distribution.

    Zt ∼ tn−1

    DefinitionThe t-statistic or t-ratio:

    Y − µYSE(Y)

    ∼ tn−1

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 44 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    Interval Estimation and Confidence Intervals

    To construct a 95% confidence interval, let c denote the 97.5thpercentile in the tn−1 distribution.

    P(−c < t ≤ c) = 0.95

    where cα/2 is the critical value of the t distribution.The condence interval may be written as [Y ± cα/2S/√n]

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 45 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    Interval Estimation and Confidence Intervals

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 46 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

    A simple rule of thumb for a 95% confidence interval

    Because as the degrees of freedom get large which is highlycorrelated with the sample size n, the t-distribution approachesthe standard normal distribution.And Φ(1.96) = 0.975, so a rule of thumb for an approximate95% confidence interval is

    [Y ± 1.96× SE(Y)]

    Or[Y ± 2× SE(Y)]

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 47 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Hypothesis TestingDefinitionA hypothesis is a statement about a population parameter, thus θ.Formally, we want to test whether is significantly different from acertain value µ0

    H0 : θ = µ0which is called null hypothesis. The alternativehypothesis(two-sided) is

    H1 : θ ̸= µ0

    If the value µ0 does not lie within the calculated confidenceinterval, then we reject the null hypothesis.If the value µ0 lie within the calculated confidence interval, thenwe fail to reject the null hypothesis.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 48 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Introduction

    In criminal law, institutions in most countries follow the rule:“innocent until proven guilty”(疑罪从无)

    The prosecutor wants to prove their hypothesis that theaccused person is guilty.However, the burden is on the prosecutor to show guilt.The jury or judge starts with the “null hypothesis”thatthe accused person is innocent.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 49 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Introduction

    In program evaluations,instead of “presumption of innocence,”the rule is: “presumption of insignificance”Policymaker’s hypothesis: the program improves learning.Evaluators approach experiments using the hypothesis:

    there is zero impact of the programThen we test this “null hypothesis”

    The burden of proof is on the programit should show a statistically significant impact.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 50 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Two Type Errors(两种错误)In both cases, there is a certain risk that our conclusion is wrong

    Type I ErrorA Type I error is when we reject the null hypothesis when it is in facttrue.A Type II error is when we fail to reject the null hypothesis when it isfalse.

    In criminal trialThe Type I : the judge reject the null hypothesis when thesuspect is actually no guilty.“宁可错杀一千,不能放过一个”

    The Type II: the judge fail to reject the null hypothesiswhen the suspect is actually guilty.“宁可放过一千,不能错杀一个”

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 51 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    The Significance level(显著性水平)

    DefinitionThe significance level or size of a test is the maximum probabilityfor the Type I Error

    P(Type I error) = P(reject H0 | H0is true) = α

    Usually, we has to carry the "burden of proof,"We would like to prove that the assertion of H1 is true byshowing that the data rejects H0.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 52 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Testing procedure

    The following are the steps of the hypothesis testing:1 Specify H0 and H1.2 Choose the significance level α.3 Define a decision rule (critical value).4 Given the data compute the test statistic and see if it falls

    into the critical region.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 53 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Decision Rule

    The decision rule that leads us to reject or not to reject H0 isbased on a test statistic, which is a function of the data

    Tn = T(Y1, ...,Yn)

    Usually, one rejects H0 if the test statistic falls into a criticalregion(rejection region). A critical region is constructed bytaking into account the probability of making wrongdecisions,thus α.By convention, α is chosen to be a small number, for example,α = 0.01, 0.05, or 0.10

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 54 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    P-Value

    To provide additional information, we could ask the question:What is the largest significance level at which we could carry outthe test and still fail to reject the null hypothesis?Or in other word, given the data, the smallest significance levelat which the null can be rejected.We can consider the p-value of a test

    1 Calculate the t-statistic t2 The largest significance level at which we would fail to reject

    H0 is the significance level associated with using t as ourcritical value

    p − value = 1− Φ(t)where Φ(t) denotes the standard normal c.d.f.(we assumethat n is large enough)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 55 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    P-ValueSuppose that t = 1.52, then we can find the largest significance levelat which we would fail to reject H0

    p − value = P(T > 1.52 | H0) = 1− Φ(1.52) = 0.065

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 56 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Hypothesis Test of of Ȳ

    Specify H0 and H1

    H0 : E[Y] = µY,0 H1 : E[Y] ̸= µY,0

    Choose the significance level α and define a decision rule (criticalregion or critical value)

    eg. if we choose α = 0.05, then the critical value is 1.96,then the region is (−∞,−1.96] and [1.96,+∞)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 57 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Hypothesis Test of of Ȳ

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 58 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

    Hypothesis Test of of Ȳ

    Given the data compute the test statisticStep1: Compute the sample average ȲStep2: Compute the standard error of Ȳ

    SE(Y) = sY√n

    Step3: Compute the t-statistic

    tact = Ȳ − µY,0SE(Ȳ)

    Step4: Reject the null hypothesis if| tact |> critical valueor if p − value < significance level

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 59 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Review: Random Experiment as the Research Design

    Review: Random Experiment as the Research Design

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 60 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Review: Random Experiment as the Research Design

    Recall the last lecture

    The Core of Empirical Studies: Causality v.s. ForecastingThe Central Question of Causality?

    Rubin Causal Model: comparing counterfactuals or potentialoutcomes.However, we can never observe both counterfactuals —fundamental problem of causal inference.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 61 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Review: Random Experiment as the Research Design

    Recall the last lecture

    Random Assignment Solves the Selection Problem.We should treat experimental design as a Benchmark.To construct the counterfactuals, two broad categories ofempirical strategies.

    Random Controlled Trials/Experiments:it can eliminates selection bias which is the mostimportant bias arises in empirical research. If we couldobserve the counterfactual directly, then there is noevaluation problem, just simply difference.

    Program Evaluation Econometrics:The various approaches using naturally-occurring dataprovide alternative methods of constructing the propercounterfactual.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 62 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    What is an RCT?

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 63 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    Randomized Controlled Trials(RCTs)(随机可控试验)

    In essence, an RCT is an experiment carried out on two ormore groups where participants are randomly assigned toreceive an intervention or not.

    Participants are randomly assigned to either an treatmentgroup who are given the intervention, or a control groupwho are not..

    In RCTs, each group is tested at the end of the trial and theresults from the groups are compared to see if the interventionhas made a difference and achieved its desired outcome. If therandomized groups are large enough, you can be confident thatdifferences observed are due to the intervention and not someother factor.RCTs are considered the gold standard for establishing a causallink between an intervention and change.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 64 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCT in History

    First recorded RCTwas done in 1747 byJames Lind,who wasa Scottish physician inthe Royal Navy.Scurvy(败血症) is aterrible disease causedby Vitamin Cdeficiency. Seriousissue during long seavoyages.Lind took 12 sailorswith scurvy and splitthem into six groups oftwo.Groups were assigned:

    (1) 1 qt cider(苹果酒) (2) 25 dropsof vitriol(硫酸)(3) 6 spoonfuls ofvinegar, (4) 1/2 ptof sea water, (5)garlic, mustard(芥末)and barleywater(大麦汤),(6) 2 oranges and1 lemon

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 65 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCT in History

    Only Group 6 (citrus fruit) showed substantial improvement.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 66 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCT in History

    Ronald A.Fisher(1890-1962),Britishstatistician and geneticist whopioneered the application ofstatistical procedures to the design ofscientific experiments.“a genius who almostsingle-handedly created thefoundations for modernstatistical science”.“Rothamsted Experimental

    Station”

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 67 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCTs in Economics

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 68 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    Randomized Experimental Methods: Noble Prize 2019

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 69 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCTs in Social Policies

    According to Baruch (1978), 245 randomized field experimentshad been conducted in U.S for social policies evaluations up to1978.The huge effort has been prompted by the 1% part of everysocial budget devoted to evaluation.Some of them were ambitious and very costly, and affecteddifferent kind of policies.

    the Perry Preschool Program in 1961The Rand Health Insurance Experiment from 1974-1982.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 70 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    Education: the Perry Preschool Program

    123 children born between 1958 and 1962 in MichiganHalf of them (drawn at random) entered the perry schoolprogram at 3 or 4 years old.Education by skilled professionals in nurseries and kindergarten.Program duration circle 30 weeksfollow-up survey (age : 14, 15, 19, 27 and 40 years old)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 71 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    Health Care: The Rand Health Insurance Experiment

    5809 people randomly assigned in 1974 to different insuranceprograms with 0%, 25%, 50% and 75% sharing.They were followed until 1982.Main results : paying a portion of health cost make people giveup some “superfluous”cares, with little harm on their health.But some heterogeneity : not true for poor people.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 72 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCTs in China

    “One egg a day”program in rural China by REAP at Stanford.One egg a day

    “Free-lunch”program in primary schools at Western China.Free Lunch

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 73 / 106

    https://reap.fsi.stanford.edu/research/egg_programs_and_nutritional_deficiencies_in_gansu_worth_the_efforthttp://www.mianfeiwucan.org/

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCT in Business

    An interesting question: What is the optimal color for taxis?Ho, Chong and Xia(2017), Yellow taxis have fewer accidentsthan blue taxis because yellow is more visible than blue,PNAS.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 74 / 106

    https://www.pnas.org/content/early/2017/02/28/1612551114https://www.pnas.org/content/early/2017/02/28/1612551114https://www.pnas.org/content/early/2017/02/28/1612551114

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCT in Business

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 75 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    RCT in Business

    Another Critical Question for business: Is Working at Home isbetter than Working at Office?

    Bloom, Liang, Roberts and Ying,(2015), “Does Working from HomeWork? Evidence from a Chinese Experiment”, The Quarterly Journalof Economics

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 76 / 106

    https://academic.oup.com/qje/article-abstract/130/1/165/2337855?redirectedFrom=fulltexthttps://academic.oup.com/qje/article-abstract/130/1/165/2337855?redirectedFrom=fulltexthttps://academic.oup.com/qje/article-abstract/130/1/165/2337855?redirectedFrom=fulltext

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    What is an RCT?

    Types of RCTs

    Lab Experimentseg: students evolves a experiment in a classroom.eg: computer game for gamble in Lab

    Field Experimentseg: the role of women in household’s decision or fakeresumes in job application

    Quasi-Experiment or Natural Experiments: some unexpectedinstitutional change or natural shock

    eg: Germany Reunion(德国统一), Great Famine inChina(1959-1961 年大饥荒)and U.S Bombing inVietnam(美国轰炸越南).

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 77 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School

    Assuming Case: the California School

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 78 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School An assuming case: the California School

    A Case: the California School

    Draw schools (n = 420) randomly from all school in CaliforniaVariables:

    5th grade test scores (Stanford-9 achievement test, combined mathand reading), district averageStudent-teacher ratio (STR) = no. of students in the district dividedby no. full-time equivalent teachers

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 79 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School An assuming case: the California School

    Summary Table: Descriptive Statistics

    Does this table tell us anything about the relationship between testscores and the STR?

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 80 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School An assuming case: the California School

    Scatterplot: test score v. student-teacher ratio

    What does this figure show? and it may suggest...?

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 81 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School An assuming case: the California School

    The California Test Score

    We need to get some numerical evidence on whether districts withlow STRs have higher test scores.But how?

    1 Compare average test scores in districts with low STRs to those withhigh STRs (“estimation”)

    2 Test the “null”hypothesis that the mean test scores in the two typesof districts are the same, against the “alternative”hypothesis thatthey differ (“hypothesis testing”)

    3 Estimate an interval for the difference in the mean test scores, high v.low STR districts (“confidence interval”)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 82 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School An assuming case: the California School

    The California Test Score

    Compare districts with “small”and “large”class sizes:

    Small v.s. LargeClass Size Average score(Y) Standard deviation N

    Small(STR < 20) 657.4 19.4 238Large(STR ⩾ 20) 650.0 17.9 182

    1 Estimation of ∆= difference between group means2 Test the hypothesis that ∆ = 03 Construct a confidence interval for ∆

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 83 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School Comparing Means from Different Populations

    An Example: Comparing Means from Different Populations

    In an RCT, we would like to estimate the average causal effectsover the population

    ATE = ATT = E{Yi(1)− Yi(0)}

    We only have random samples and random assignment totreatment, then what we can estimate instead

    difference in mean = Ytreated − Ycontrol

    Under randomization, difference-in-means is a good estimate forthe ATE.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 84 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School Comparing Means from Different Populations

    Hypothesis Tests for the Difference Between Two Means

    To illustrate a test for the difference between two means, let µwbe the mean hourly earning in the population of women recentlygraduated from college and let µm be the population mean forrecently graduated men.Then the null hypothesis and the two-sided alternativehypothesis are

    H0 : µm = µwH1 : µm ̸= µw

    Consider the null hypothesis that mean earnings for these twopopulations differ by a certain amount, say d0. The nullhypothesis that men and women in these populations have thesame mean earnings corresponds to H0 : H0 : d0 = µm − µw = 0

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 85 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School Comparing Means from Different Populations

    The Difference Between Two MeansSuppose we have samples of nm men and nw women drawn atrandom from their populations. Let the sample average annualearnings be Ym for men and Yw for women. Then an estimatorof µm − µw is Ym − Yw .Let us discuss the distribution of Ym − Yw .

    ∼ N(µm − µw,σ2mnm

    +σ2wnw

    )

    if σ2mand σ2w are known, then the this approximate normaldistribution can be used to compute p-values for the test of thenull hypothesis. In practice, however, these population variancesare typically unknown so they must be estimated.Thus the standard error of Ym − Yw is

    SE(Ym − Yw) =

    √s2mnm

    +s2wnw

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 86 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School Comparing Means from Different Populations

    The Difference Between Two Means

    The t-statistic for testing the null hypothesis is constructedanalogously to the t-statistic for testing a hypothesis about asingle population mean, thus t-statistic for comparing two meansis

    tact =Ym − Yw − d0SE(Ym − Yw)

    If both nmand nm are large, then this t-statistic has a standardnormal distribution when the null hypothesis is true,thusYm − Yw = 0.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 87 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School Comparing Means from Different Populations

    Confidence Intervals for the Difference Between TwoMeans

    the 95% two-sided confidence interval for d consists of thosevalues of d within ±1.96 standard errors of Ym − Yw , thusd = µm − µw is

    (Ym − Yw)± 1.96SE(Ym − Yw)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 88 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Assuming Case: the California School Comparing Means from Different Populations

    Hypothesis Test of the Difference Between Two Means

    Reject the null hypothesis if| tact |=| Ym−Yw−d0SE(Ym−Yw) |> critical valueor if p − value < significance level

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 89 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Limitations of RCTs

    Limitations of RCTs

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 90 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Limitations of RCTs

    RCT are far from perfect!

    High Costs, Long DurationPotential Ethical Problems: “Parachutes reduce the risk ofinjury after gravitational challenge, but their effectiveness has notbeen proved with randomized controlled trials."

    Milgram ExperimentStanford Prison ExperimentMonkey Experiment

    Limited GeneralizabilityRCTs allow us to gain knowledge about causal effects butwithout knowing the mechanism.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 91 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Limitations of RCTs

    Potential Problems in Practice

    Small sample: Student EffectHawthorne effect(霍桑效应):The subjects are in an experimentcan change their behavior.Attrition(样本流失):It refers to subjects dropping out of thestudy after being randomly assigned to the treatment or controlgroup.Failure to randomize or failure to follow treatment protocol:People don’t always do what they are told.

    Wearing glasses program in Western Rural China.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 92 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    Limitations of RCTs

    Program Evaluation Econometrics(项目评估计量经济学)Question: How to do empirical research scientifically when wecan not do experiments? It means that we always have selectionbias in our data, or in term of “endogeneity”.Answer: Build a reasonable counterfactual world by naturallyoccurring data to find a proper control group is the core ofeconometrical methods.Here you Furious Seven Weapons in Applied Econometrics(七种盖世武器)

    1 Random Controlled Trials (RCT)(随机实验)2 OLS(最小二乘回归)3 Decomposition(分解)4 Instrumental Variable(工具变量)5 Differences in Differences(双差分)6 Matching and Propensity Score(匹配)7 Regression Discontinuity(断点回归)8 Synthetic Control (合成控制法)

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 93 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    An Example of Randomized Controlled Trials

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 94 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Working from Home(WFH) v.s Working from Office

    “Does Working from Home Work? Evidence from a ChineseExperiment”,by Nicholas A. Bloom, James Liang, John Roberts,Zhichun Jenny Ying The Quarterly Journal ofEconomics,February 2015, Vol. 130, Issue 1, Pages 165-218.Basic Question: WFH = SFH

    SFH(Shirking from Home)?

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 95 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Working from Home(WFH) is a trend internationally

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 96 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Motivations

    Working from home is a modern management practice whichappears to be stochastically spreading in the US and Europe20 million people in US report working from home at least onceper weekLittle evidence on the effect of workplace flexibility

    productivityemployee satisfactionshirking

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 97 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Ctrip Experiment

    Ctrip, China’s largest travel-agent, with16,000 employees, $6bnNASDAQ.Co-founder of Ctrip, James Liang, was an Econ PhD atStanford and decided to run a experiment to test WFH.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 98 / 106

    http://www.ctrip.comhttp://www.itb-china.com/speaker/mr-james-jianzhang-liang/

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Ctrip Experiment: A call center in ShanghaiThe experiment runs on airfare & hotel departments in Shanghai.Main Work: Employees take calls and make bookings.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 99 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    The Experimental Design

    Treatment: work 4 shifts (days) a week at home and to workthe 5th shift in the office on a fixed day.Control: work in the office on all 5 days.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 100 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    The Experimental Design: Timeline

    In early November 2010, employees in the airfare and hotelbooking departments were informed of the WFH program.Of the 994 employees in the airfare and hotel bookingdepartments, 503 (51%) volunteered for the experiment.Among the volunteers, 249 (50%) of the employees met theeligibility requirements and were recruited into the experiment.The treatment and control groups were then determined fromthis group of 249 employees through a public lottery.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 101 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    The Experimental Design

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 102 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Results: the number of receiving calls

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 103 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Results: Working hours

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 104 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Results:Many Outcomes

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 105 / 106

  • ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    ...

    .

    An Example of Randomized Controlled Trials

    Conclusion: Very positive

    They found a highly significant 13% increase in employeeperformance from WFH,

    of which about 9% was from employees working moreminutes of their shift period (fewer breaks and sick days)and about 4% from higher performance per minute.

    Home workers also reported substantially higher work satisfactionand psychological attitude scores, and their job attrition rates fellby over 50%.

    Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 24th, 2020 106 / 106

    An Brief Review of Basic Statistics Basic ConceptsLarge-Sample Approximations to Sampling DistributionsStatistical Inference: Estimation, Confident Intervals and Testing Interval Estimation and Confidence IntervalsHypothesis Testing(假设检验)

    Review: Random Experiment as the Research DesignWhat is an RCT?Assuming Case: the California SchoolAn assuming case: the California SchoolComparing Means from Different Populations

    Limitations of RCTsAn Example of Randomized Controlled Trials