introduction to econometrics - unige.ch · introduction to econometrics stefan sperlich...

Introduction to Econometrics

Stefan Sperlich

Universite de Geneve

June 24, 2016

Stefan Sperlich (Universite de Geneve) Introduction to Econometrics June 24, 2016 1 / 253

Introduction General information

General information

Bilbliography• Wooldridge, J. (2009): Introductory Econometrics: A ModernApproach. South-Western

• Greene, W. (2003): Econometric Analysis, Prentice Hall• Gujarati, D.N. and Porter, D.C. (2008): Basic Econometrics (4thedition), Mcgraw-Hill Publishing

• Stock, J. and Watson, M. (2006): Introduction to Econometrics,Pearson Education

• Judge, G., Hill, R., Griffiths, W., Lutkepohl, H. (1988): Introductionto the Theory and Practice of Econometrics, John Wiley & Sons

Background• Mathematics (linear algebra, integration, derivation)• Statistics• some Probability• some Numerics and Programming• (in parallel: applied econometrics)


Introduction Definition

What is Econometrics?

Pitmaster.com

”the application of statistical and mathematical methods in the field ofeconomics to test and quantify economic theories and the solutions toeconomic problems.”

The Economist’s Dictionary of Economics

”The setting up of ... mathematical models describing economicrelationships, testing the validity of such hypotheses and estimating theparameters ... to ... measure ... influences of the different ... variables.”



Wikipedia

”Econometrics, literally means ’economic measurement’ ... branch ofeconomics that applies statistical methods to the empirical study ofeconomic theories ... combination of mathematical economics, statistics,economic statistics and economic theory.”

Hicks (1946)

”pure economics has a remarkable way of producing rabbits out of a hat –apparently a priori propositions which apparently refer to reality. It isfascinating to try to discover how the rabbits got in; ...”



Goldberger (1989)

”... econometrics is what econometricians do”

Sherlock Holmes (Sir Arthur Conan Doyle)

”It is a capital mistake to theorize before one has data. Insensibly onebegins to twist facts to suit theories, instead of theories to fit facts.”


Introduction Motivation

Examples from microeconomics: Relationships between ...

• Production and factors (at any level, ...)

• Supply and demand, salaries and human capital

• consumer behavior and marketing strategies



Examples from macroeconomics: Relationships between ...

• GDP and ...?...

• Poverty, development and growth

• Free trade and the wealth of nations



Examples from the financial world:

• Modelling share prices and stock returns

• Valuation of derivatives (options, hedge-funds, ...)

• Estimation of conditional volatilities and risks (investment, ...)



Fundamental questions of Econometrics

• identification and estimation of relationships of effects ceteris paribus(discussions about ’spurious regression’, ’Granger Causality’, etc.)

• prediction and simulation

• validation,


Introduction Examples

The Question of Causality

• Simply establishing a relationship between variables is rarely sufficient• Need the effect to be considered causal• If we’ve truly controlled for all other effects, then the estimatedceteris paribus effect can be considered to be causal

• Can be difficult to establish causality

Example: Returns to Education

A model of human capital investment implies getting more educationshould lead to higher earnings. In the simplest case:

Earnings = β0 + β1Education + u

The estimate of β1 is the return to education, but can it be consideredcausal? Note that the error term, u, includes all other factors affectingearnings: Can we ”control” for them this way? What if some of them arerelated to education?


Some background in Probability and Statistics

Some background in Probability and Statistics

• To analyse economic models

• by exploring the data

• it is necessary to estimate, test...

• hence it is necessary to workwith the tools of Statistics


Some background in Probability and Statistics Random variables

Random variables

• let us denote by Y some individual’s characteristics that we areinterested in

• these characteristics might vary in time or across individuals

Examples

gender, education, canton, visits to the doctor, age, salary

• If we are interested in all characteristics ω ∈ Ω we consider

Y (ω)→ IR even if we talk about cantons

• It is a Function that assigns a real value to each individual

• Notation: Yi or Yt or even Yit

• this enables us to explain the characteristics / properties of Ω interms of formulas



Types of variables (random)

We can distinguish among categorical variables (either nominal, ordinal ordichotomous) and numerical or quantitative variables (discrete orcontinuous)

Examples

nominal: gender (male/female) Y : ω → 0, 1

ordinal: education Y : ω → 0, 1, 2, 3, 4, 5(no education/primary school/secondaryschool/bachelor/master/Ph.D.)

numerical discrete: visits to the doctor Y : ω → IN0

(positive integers and zero)

numerical continuous: salary Y : ω → [0,∞)

Most frequently we can choose the scale (for example: age – discrete orcontinuous)

Sometimes there exist also aggregation problemsStefan Sperlich (Universite de Geneve) Introduction to Econometrics June 24, 2016 29 / 253


As we always assign numerical values to Y it is enough to distinguishbetween discrete and continuous variables

Y : Ω→ ZZ (example) Y : Ω→ IR

Attention!

the problem of scale exists also for ordinal variables: The distance betweentheir values does not need to be the same!

Note

The variable Y (ω) ∈ 0.1; 1.5; 2.03; 101.7 is also discrete.

The set of values that the variable Y can take is the support of Y .

• for discrete random variables the support can be either finite or infinite

• for continuous random variables it is infinite but it might be composedby a finite number of intervals such as [0, 1]; [1.5, 3); (4, 7.1]

However, it is possible to mix both types of variables, for example salary0; [minimum wage,∞)



Summary: random variable Y

• It is a function that assigns a numerical value to eachevent/subject/individual ω

• of a well defined population set (Ω)

• referring to characteristics of interest (age, gender , salary , ...)

• that (should) vary across individuals

• the relative frequency of one characteristic in the population gives theprobability to observe this characteristic (to draw) by accident(randomly)

Bayesian ideas: an alternative approach

Subjective probability; the parameter turns out to be a random variablethat is subject to changes according to the information available



Probability Space or Probability Triple

• Let Ω be the sample space (population) such that ω ∈ Ω andY : Ω→ Support

• Over the entire population, Y can take different values (p.ex. y1, y2,...) from the support with corresponding relative frequencies(probabilities) that basically represent the percentage of elements ofω for those Y (ω) takes the same value

• The cumulative percentages go from 1 to (100%)

• The mathematical expression (formulae) that describes theprobabilities is denoted by distribution F

• The probability space is therefore given by (Ω,Y ,F)

How can we characterize this distribution?


Some background in Probability and Statistics The distribution

The probability function

easiest case: The variables are discrete (and finite)

Example: percentage of gender, femme = 1, homme = 0

P(Y (ω) = 0|Ω) = 0.51 and P(Y (ω) = 1|Ω) = 0.49 (sum= 1)

it works in the same way for education etc.

Attention!

If the support is finite,∑

∞

i=0 P(Y (ω) = yi |Ω) = 1even in the case that P(yi ) > 0

Hence, if the percentage of females in Ω is of 0.49, the probability to draw(random drawing) a female is of 0.49



Empirical distribution functions for both discrete and acontinuous random variables



The density function

What can we say whether the random variable is continuous?The percentage is approximately cero: P(yi ) = 0 ∀i

What can we do? Even if P(Y = y) = 0 for a interval I we can obtain apositive percentage! provided that 1 = P(Ω) = P(Y ∈ Support) and thesupport is the union of intervals!

Then, for the discrete case we have

P(Y ∈ I ) =∑

yi∈I

P(yi ) =[

∫

I

P(y)dy]

=

∫

I

dF (y)

and for the continuous case

P(Y ∈ I ) =[

∑

yi∈I

P(yi )]

=

∫

I

dF (y) =

∫

I

f (y)dy

being F (distribution function) the integral of the density f ; noting thatdF (y)/dy = f (y)



Continuous distribution

below the probabilities described by a densityabove the cumulative probabilities, the distribution function



The histogram as a proxy to a density function, etc.

Remind:

1 The integral over [a, b] ((a, b],[a, b) or (a, b)) is the areainterpreted as a probability.

∫ b

a

f (y)dy =

∫ b

a

dF (y)

= P(Y ∈ [a, b]) = F (b)− F (a)

2 f ≥ 0 avec f > 0 sur supp(Y )

∫

∞

−∞

f (y)dy =

∫

supp(Y )f (y)dy

= P(Y ∈ supp(Y )) = 1



The (standard) normal

• If we have a continuousrandom variable, that is,if we do not have amixture ofcontinuous/discretedistributions, F is alsocontinuous (withoutdiscontinuity jumps)

• and the most well knowndistribution among themis the GAUSSIANdensity function (normal,bell shaped curve,...)

extract from 10DM ticket with historic buildings of

Gottingen



The (cumulative) distribution function

• P(a ≤ Y ≤ b) = F (b)− F (a) =∫ b

af (y) dy

• it is continuous / step function (remind the graphics)

• 0 ≤ F (u) ≤ 1, F (u) −→u→−∞

0, F (u) −→u→∞

1

• the steps are located where the probabilities are positive

Examples of distribution functions with jumps:

• bilateral trade

• received wages (compare net - gross)



Examples of mixtures of distribution functions (continuousand discrete)

1. Example: The distributionfunction with positive probability atzero for a r. v. Supp(Y ) = (−∞,∞)

2. Example: a density with positiveprobability at zero for a r. v.Supp(Y ) = [0,?)


Some background in Probability and Statistics Moments of the model

Expected value of a random variable / distribution

Let Y a random variable, g a measurable function.

We define the expected value of g(Y ) as

E [g(Y )] =

∫

S

g(y)dF (y) =

∫

Sg(y) · f (y) dy , si Y continue

∑

i g(yi ) · P(Y = yi ) , si Y discrete

You already know that

• The Expected value of Y is the expression above when g(Y ) = Y

(g =identite)

• The variance de Y is the expression above wheneverg(Y ) = (Y − E [Y ])2

• etc ...



The moments of a random variable

We denote byE [Y k ] with k ∈ IN0

the k th moment of Y

In the literature it is also defined the k − th centered moment with respectto its expected value as

E[

(Y − E [Y ])k]

avec k ∈ IN0 .

the moment 0 E[

(Y − E [Y ])0]

= E[

Y 0]

is always 1The 1. moment E

[

Y 1]

= E [Y ] is the meanThe 2. centered moment E

[

(Y − E [Y ])2]

=: σ2 the varianceThe 3. centered moment nomalized∗ E

[

(Y − E [Y ])3]

· σ−3 the asymmetry

The 4. centered moment normalized∗ E[

(Y − E [Y ])4]

· σ−4− 3 is the kurtosis

∗ normalized it means that for the normal standard distribution the asymmetry and thekurtosis are = 0



Vectors

Individuals or subjects ω from Ω do frequently exhibit severalcharacteristics, for example let Y be a variable that assigns two values to

(gender, educ, age, salary, visits to the doctor)

all those that we have considered before.

• = vector of k random variables Y : ω → IRk (here k = 5)

• After, P , f ,F : IRk → IR are multivariate functions

• and we talk about (joint) probability / density / distribution (ormultivariate)

• The notation sometimes is less rigourous ...



Matrices

If we have available several observations of some characteristics, it is theneasier to use the notation in matrix terms: for example if Y is a matrixn × k

Y =

Y11 Y12 · · · Y1k...

. . ....

Yn1 Yn2 · · · Ynk

To be revised during the seminar :

• adding, multiplying and transposing matrices

• squared and symmetric matrices

• the rank and the determinant of a square matrix

• the trace and the inverse of a square matrix

• (eigenvaules and eigenvectors, quadratic forms)



The idea of a model (tool)



We are interested in models ...

Ω→ described by Y and F → probability/density → formula withparameters

Sometimes (in our course ’many times’) we use a model to describe therelationships between the variables

Causality ? We are going to studythe problem of identification later ...



A model and the parameters of interest

• even if nowdays it is much common to analyze the the distribution ofa population

• we start by analyzing some parameters that somehow characterize themost important features of the distribution

• the most popular ones are the expected value (mean, locationparameter) and the variance (standard deviation, scale parameter)

Attention!

The parameters (its interpretation) depends on the scale chosen


Some background in Probability and Statistics Population & sample

The sample space: Population

• We describe the population Ω through the distribution function F

• using some observations from Y

• Why should we work with a model (population)?1 We do not always have a census at hand2 Even if we would have it we would like to make general statements



De la (hyper)population a l’echantillon



Random sample representative sampling

• From a population we ’always’ observe one sample

• Sample: A set of realizations from Y

• we would like to have a representative sample• one way to do so is to sample / draw randomly• random sampling ∼= representative sampling

Find the mistake!



Estimation and Inference ⇒ information about population

• We use the sample to analyze the population

• to estimate and testing the unknown parameters of our model

• Example: we calculate the moments / parameters from the sample

• In order to make inference it is necessary to analyze the sampling

distribution

• Example: we construct confidence intervals

• Here, in this context, testing and building confidence intervals is thesame



Sampling distribution

Find the mistake!


Some background in Probability and Statistics Relationship among variables

Analyzing the relationship among variables

• Multivariate model: We have available Y and (vector) X

• we denote by f (y , x) / P(y , x) / F (y , x) the joint distribution∫

f (y , x)dxdy = 1,∑

i

∑

j

P(yi , xj) = 1, 0 ≤ F (y , x) ≤ 1

∫

f (y , x)dx = f (y),∑

i

P(y , xi ) = P(y), etc .

• independence: f (y , x) = f (x) · f (y), even for P or / and F

• covariance:Cov(Y ,X ) = E [(X − E [X ])(Y − E [Y ])] = E [XY ]− E [Y ] · E [X ]

• correlation: ρy ,x = Cov(Y ,X )/√

V [Y ] · V [X ] = σyx/(σyσx)

• Uncorrelation and independence are not the same concepts(indep.⇒ ρ = 0) but (ρ = 0 ; indep.); exception : the normaldistribution



Once again: Do not get mistaken between correlation andcausality !

Mis à jour le 11.10.2012

ETUDE

NAISSANCE D'UN DAUPHIN À HAWAII

Une femelle a donné naissance à son

l'oeil des caméras du centre Dolphin Q

Regardez la vidéo

VAUD & RÉGIONS SUISSE MONDE ÉCONOMIE BOURSE SPORTS CULTURE PEOPLE VIVRE AUTO HIGH-TECH

Sciences Santé Environnement Images

La Une | Vendredi 12 octobre 2012 | Dernière mise à jour 12:59 Mon journal numérique | Abonnements | Publicité

Immo | Emploi



Conditional distribution: probability, density, moments

P(Y = y given that X = x) = P(y |x) = P(y , x)/P(x), similarlyf ,F

verify that they are also probabilities (0 ≤ P(y |x) ≤ 1,∑

i P(yi |x) = 1)

respectively densities (0 ≤ f (y |x),∫

f (y |x)dy = 1)

Example:

The probability of getting a good grade in the exam (Y > 5) changes(rises up) with the attendance to the lectures (X = x times)



Conditional distribution: the moments

Let P(y |x) = P(y , x)/P(x), f (y |x) = f (y , x)/f (x)

E [Y given that X = x ] = E [Y |x ] =

∫

y dF (y |x) =∫

y f (y |x) dy

∑

i yiP(yi |x) =∑

i yiP(yi |x)

Exercise: What happens if Y and X are independent random variables?

Examples:

• Explain the salary stratified by education, economic sector, age,gender, ...

• Regress the GDP on investment, active population, human capital

• Forecast the production mean given certain amount of inputs

The idea to obtain a model/forecast for the conditional mean


The Linear Regression Model

The Linear Regression Model


The Linear Regression Model The univariate model

The (Simple) Linear Regression Model: Some definitions

y = β0 + β1x + u (1)

In this model we refer to Y as

• The response or explained variable (causality);

• or the dependent variable (model);

• or simply the left hand side variable

In the regression of y on x we denote by x the

• explanatory or control variable (causality)

• factor, regressor, ...

• independent variable or the covariate (model);

• right hand side variable



The Linear Regression Model: Some definitions 1

y = β0 + β1x + u

The parameter β0 is

• the constant or intercept (model)

The parameter β1 is

• the X coefficient (just indicating the name of the variable: age,education, ...)

• the slope (of X )

• for example the elasticity, in a log-linear model



The Linear Regression Model: Some definitions 2

In this model, we denote by u

• The error term. In statistics an ’error’ is any departure from theunderlying data generating process

• due to: omission of explanatory variables, misspecification of thefunctional relationship, measurement errors in Y or X , etc

• in econometrics it is more complicated and it depends on the context.Usually it is referred to unobserved individual heterogeneity (due toadditional variables or other type of departure from the averagebehavior)

• Sometimes we refer it as residual, but in general this terminolgy isreferred to the sample and, if referred to the population, it has moreto do with forecasting, i.e. the departure from the model fitted todata



The conditional mean

Let f (x , y) be the joint density of (X ,Y ) and let f (x) and f (y) berespectively the marginal densitiesThe conditional density of y |x is defined as f (y |x) = f (x , y)/f (x) andhence the conditional mean is E [Y |X ] =

∫

y f (y |X ) dyIn another model, if we know the functional form, we have

E [Y |X ] = E [β0 + β1X + u|X ] = β0 + β1X + E [u|X ]

and E [Y ] = E [E [Y |X ]] = β0 + β1E [X ] + E [E [u|X ]], law of iterated

expectations

From this expression we obtain

E [Y |X = x ] = E [β0 + β1X + u|X = x ] = β0 + β1x + E [u|X = x ]

being E [u|X = x ] a scalar and E [u|X ] a random variable.



The model, graphic

In the case of homoskedasticity (Var(u|x) = σ2, the variance of u doesnot depend on x) we have

Definitions: homoskedasticity: Var(u|x) = σ2 is constant;

heteroskedasticity: Var(u|x) = σ2(x) is a function of x


introduction to econometrics - unige.ch · introduction to econometrics stefan sperlich...

Documents