nonparametric regresion models estimation in r · nonparametric regresion models estimation in r...

14
Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014 Nonparametric regresion models estimation in R Maer Matei Monica Mihaela, Bucharest University Of Economic Studies National Scientific Research Institute for Labour and Social Protection Eliza Olivia Lungu National Scientific Research Institute for Labour and Social Protection

Upload: others

Post on 06-Jun-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Nonparametric regresion models

estimation in R

Maer Matei Monica Mihaela,

Bucharest University Of Economic Studies

National Scientific Research Institute for Labour and Social Protection

Eliza Olivia Lungu

National Scientific Research Institute for Labour and Social Protection

Page 2: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Theoretical background: Nonparametric estimation of regression

functions with both categorical and continuous data (Racine and Li, 2004)

Software solution: R np package (Hayfield, and Racine, 2008)

Practical problem : Estimate the over education impact on earnings

Page 3: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Objectives:

To model a dataset comprised of continuous, discrete, or categorical data (nominal

or ordinal), or any combination.

To construct a more flexible model.

To let the data determine an appropriate model without specifying the functional

forms for objects being estimated.

Page 4: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

METHOD- nonparametric regression based on kernel methods

Key notions

- generalized product kernels

- kernels for categorical data

- bandwidth selection

Page 5: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

R package “np” (Hayfield, and Racine, 2008):

- density estimation

- regression, and derivative estimation for both categorical and continuous data,

- a range of kernel functions and bandwidth selection methods

- tests of significance for nonparametric regression.

- A variety of bootstrap methods for computing standard errors, nonparametric

confidence bounds, and bias-corrected bounds are implemented.

- A variety of bandwidth methods are implemented

Page 6: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

FUNCTIONS

npunitest - for testing equality of two univariate density/probability functions (Maasoumi and

Racine,2002).

npregbw - computes a bandwidth object for a p-variate kernel regression estimator defined

over mixed continuous and discrete, using the method of Racine and Li (2004) and Li and

Racine (2004).

npreg - computes a kernel regression estimate of a one (1) dimensional dependent variable on

p- variate explanatory data, given a set of explanatory data and dependent data), and a

bandwidth specification using the method of Racine and Li (2004) and Li and Racine (2004).

Page 7: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

The difficulties we encountered are related to the estimation time especially when the routines

for significance testing based on bootstrap are called.

- Execution time for most routines is exponentially increasing in the number of observations

and increases with the number of variables involved.

- Data-driven bandwidth selection methods involving multivariate numerical search can

betime-consuming, particularly for large datasets.

- A version of this package is under development to facilitate computation involving large

datasets- Package ‘npRmpi

Page 8: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Estimate the overeducation impact on earnings

- REFLEX database includes information on early career outcomes of school leavers

graduating ISCED 5 in 1999/2000 for 14 countries

- UK sample 932 graduates

- Main independent variable:

{

{ }

Page 9: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Dependent variable

Page 10: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Other independent variables

gender

number of months employed since graduation (totworkdu)

number of months at current job (workdu)

Page 11: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Testing equality of the density functions

‘Srho’: 0.04526657 P Value: < 2.22e-16 *** Null of equality is rejected at the 0.1% level

Page 12: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Signifficance test for the estimated coefficients and Rsquared

Country

X1

(total

work

duration)

X2

( work

duration

current

job)

X1

(job-

education

match)

X2

(gender)

R

squared

UK 0.070 0.320 0.008 < 2.22e-

16 0.145

Page 13: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Partial local linear nonparametric response plots- UK case

Page 14: Nonparametric regresion models estimation in R · Nonparametric regresion models estimation in R New Challenges for Statistical Software - The Use of R in Official Statistics, 27

Nonparametric regresion models estimation in R

New Challenges for Statistical Software - The Use of R in Official Statistics, 27 MARTIE 2014

Conclusions

The results allow us to understand the overeducation impact on earnings distribution without

assuming the functional form of the relationship between overeducation and higher education

graduates earnings.