1 disentangling age-period-cohort effects: new models, methods, and empirical applications kenneth...

1

Disentangling Age-Period-Cohort Effects: New Models, Methods, and Empirical Applications

Kenneth C. Land, Duke UniversityPRI Summer Methodology Workshop

PresentationPennsylvania State University

June 16, 2008

2

Objectives of the Presentation

Briefly Review the Early Literature on Cohort Analysis and the Age-Period-Cohort (APC) Identification Problem

Describe Models, Methods, and Empirical Applications Recently Developed for APC Analysis in Three Research Designs:

1) APC Analysis of Age-by-Time Period Tables of Rates 2) APC Analysis of Microdata from Repeated Cross-Section Surveys3) Cohort Analysis of Accelerated Longitudinal Panel Designs

3

Part I: The Early Literature on Cohort Analysis and the Age-Period-Cohort (APC) Identification Problem

Why cohort analysis? See the abstract from Norman Ryder’s classic article:

Ryder, Norman B. 1965. “The Cohort as A Concept in the Study of Social Change.” American Sociological Review 30:843-861.

4


5


And what is the APC identification problem? See the abstract from the classic Mason et al. article:

Mason, Karen Oppenheim, William M. Mason, H. H. Winsborough, W. Kenneth Poole. 1973. “Some Methodological Issues in Cohort Analysis of Archival Data.” American Sociological Review 38:242-258.

6


7


These two articles were particularly important in framing the literature on cohort analysis in sociology, demography, and the social sciences over the past five decades: Ryder (1965) argued that cohort membership could be as important

in determining behavior as other social structural features such as socioeconomic status.

Mason et al. (1973) specified the APC multiple classification /accounting model and defined the identification problem therein.

8


The Mason et al. (1973) article, in particular, spawned a large methodological literature, beginning with Norval Glenn’s (1976) critique: Glenn, N. D. 1976. “Cohort Analysts’ Futile Quest: Statistical

Attempts to Separate Age, Period, and Cohort Effects.” American Sociological Review, 41:900–905.

and Mason et al.’s (1976) reply: Mason, W. M., K. O. Mason, and H. H. Winsborough. 1976. “Reply

to Glenn.” American Sociological Review, 41:904-905.

9


The Mason et al. reply continued with Bill Mason’s work with Stephen Fienberg: Fienberg, Stephen E. and William M. Mason. 1978. "Identification

and Estimation of Age-Period-Cohort Models in the Analysis of Discrete Archival Data." Sociological Methodology 8:1-67,

which culminated in their 1985 edited volume: Fienberg, Stephen E. and William M. Mason, Eds. 1985. Cohort

Analysis in Social Research. New York: Springer-Verlag,

a volume of the methodological literature on APC analysis in the social sciences as of about 25 years ago.

10


The critiques of new approaches also continued; see, e.g., the article applying a Bayesian statistics approach: Saski, M., & Suzuki, T. 1987. “Changes in Religious Commitment in

the United States, Holland, and Japan.” American Journal of Sociology, 92:1055–1076,

and the critique: Glenn, N. D. 1987. “A Caution About Mechanical Solutions to the

Identification Problem in Cohort Analysis: A Comment on Sasaki and Suzuki.” American Journal of Sociology, 95:754–761.

11


Another approach, developed by Firebaugh (1989), is based on a decomposition of change over time into the relative contributions of intracohort aging and cohort replacement; see Danigelis, Hardy, and Cutler (2007) for a recent application.

Firebaugh, Glenn. 1989. “Methods for Estimating Cohort Replacement Effects.” Sociological Methodology 19:243-262.

Danigelis, Nicholas, Melissa Hardy, and Stephen J. Cutler. 2007. “Population Aging, Intracohort Aging, and Sociopolitical Attitudes.” American Sociological Review72:812-830.

12


This decomposition method, called for by Glenn (1977) and developed by Firebaugh, was critiqued by Rodgers (1990; with reply by Firebaugh (1990). And now Glenn (2005: 36) thinks neither this nor any similar approach to decomposition “is very helpful for understanding change.”

Firebaugh, Glenn. 1990. “Replacement Effects, Cohort and Otherwise: Response to Rodgers.” Sociological Methodology 20:439-446.

Glenn, Norval D. 1977 [2005] Cohort Analysis, [2nd edition]. Thousand Oaks, CA: Sage.

Rodgers, Willard L. 1990. “Interpreting the Components of Time Trends.” Sociological Methodology 20:421-438.

13


For additional material on these and related contributions to the literature on cohort analysis, see the following three recent reviews: Mason, William M. and N. H. Wolfinger. 2002. “Cohort Analysis.”

Pp. 151-228 in International Encyclopedia of the Social and Behavioral Sciences. New York: Elsevier.

Yang, Yang. 2006. “Age/Period/Cohort Distinctions.” Encyclopedia of Health and Aging, K.S. Markides (ed). Thousand Oaks, CA: Sage Publications.

14


Where does this literature on cohort analysis leave us today? If a researcher has a temporally-ordered dataset and wants to

tease out its age, period, and cohort components, how should he/she proceed?

Are there any methodological guidelines that can be recommended?

15


The problem with much of the extant literature is that there is a deficiency of useful guidelines on how to conduct an APC analysis. Rather, the literature often leads to the conclusion either that: it is impossible to obtain meaningful estimates of the

distinct contributions of age, time period, and cohort to the study of social change,

or that: the conduct of an APC analysis is an esoteric art that is

best left to a few skilled methodologists.

16


My collaborators (Wenjiang Fu, Sam Schulhofer-Wohl, and Yang Yang) and I seek to redress this situation by focusing on recent methodological contributions to APC analysis that we and others have made for three relatively common research designs.

We think that: developments in statistics over the past three decades (e.g.,

mixed (fixed and random) effects models, MCMC estimation of Bayesian models) can lead to better methods for APC analysis that can be applied by ordinary social scientists, and

this, in turn, can lead to the accumulation of more reliable knowledge about age, period, and cohort dynamics.

17

Part II: First Research Design: APC Analysis of Age-by-Time Period Tables of Rates or Proportions Major References for Part II:

Fu, W. J. 2000. “Ridge Estimator in Singular Design with Application to Age-Period-Cohort Analysis of Disease Rates.” Communications in Statistics--Theory and Method 29:263-278.

Yang Yang, Wenjiang J. Fu, and Kenneth C. Land. 2004. “A Methodological Comparison of Age-Period-Cohort Models: The Intrinsic Estimator and Conventional Generalized Linear Models.” Sociological Methodology, 34:75-110.

Yang Yang, Sam Schulhofer-Wohl, Wenjiang J. Fu, and Kenneth C. Land. 2008. “The Intrinsic Estimator for Age-Period-Cohort Analysis: What It Is and How To Use It.” American Journal of Sociology,113(May).

Yang Yang. 2008. “Trends in U.S. Adult Chronic Disease Mortality, 1960-1999: Age, Period, and Cohort Variations.” Demography 45(May).

18

Part II: First Research Design: APC Analysis of Age-by-Time Period Tables of Rates or Proportions

Data Structure: Tabular Rate Data

19

Part II: First Research Design: APC Analysis of Age-by-Time Period Tables of Rates or Proportions

Example: Lung Cancer Death Rates for U.S. Adult Females: 1960 - 1999

Deaths per 100,000 PopulationAge Period

1960 - 64 1965 - 69 1970 - 74 1975 - 79 1980 - 84 1985 - 89 1990 - 94 1995 - 9920 - 24 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.125 - 29 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.230 - 34 0.8 0.9 1.0 0.9 0.8 0.8 0.9 0.835 - 39 2.3 3.0 3.6 3.5 3.3 2.7 2.9 3.040 - 44 5.1 7.1 9.1 10.5 9.9 8.9 7.5 7.745 - 49 8.6 12.9 18.1 22.2 23.9 23.1 20.9 17.050 - 54 12.5 19.4 28.9 36.9 44.2 47.2 44.9 39.055 - 59 16.1 25.5 40.1 53.1 69.0 78.3 81.8 74.260 - 64 19.9 28.8 46.6 69.2 92.6 115.2 127.3 125.165 - 69 24.5 33.9 51.3 78.6 114.7 145.5 172.6 180.070 - 74 29.2 38.4 52.8 77.7 120.2 168.3 208.2 233.575 - 79 34.0 41.8 56.1 76.1 111.4 162.0 219.4 251.680 - 84 36.9 45.8 57.3 75.3 102.5 141.1 199.8 249.685 - 89 39.8 48.6 59.7 75.2 96.9 120.9 164.5 214.890 - 94 34.2 43.1 60.6 73.6 91.8 108.8 136.3 166.2

95 - 125+ 26.5 44.2 51.0 68.9 82.7 104.1 120.0 132.8All 10.3 14.7 21.3 28.9 38.3 47.9 57.0 61.7

Source: CDC/NCHS Multiple Cause of Death File

20

Part II: First Research Design: APC Accounting/Multiple Classification Model The Algebra of the APC Identification Problem

Model Specification:

(1)

Mij denotes the observed occurrence/exposure rate of deaths for the i-th age group for i = 1,…,a age groups at the j-th time period for j = 1,…, p time periods of observed data

Dij denotes the number of deaths in the ij-th group, Pij denotes the size of the estimated population in the ij-th group

μ denotes the intercept or adjusted mean

αi denotes the i-th row age effect or the coefficient for the i-th age group

βj denotes the j-th column period effect or the coefficient for the j-th time period

γk denotes the k-th cohort effect or the coefficient for the k-th cohort for k = 1,…,(a+p-1) cohorts, with k=a-i+j

εij denotes the random errors with expectation E(εij ) = 0

Fixed effect GLIM reparameterization: , or setting one of each of the categories as the reference group.

ijkjiijijij PDM /

0 kkjjii

21

Part II: First Research Design: APC Accounting/Multiple Classification Model The Algebra of the APC Identification Problem

Generalized Linear Models (GLIM): Simple Linear Models

where Yij is the expected outcome in cell (i, j) that is assumed to be normally distributed or equivalently the error term is assumed to be normally distributed with a mean of 0 and variance σ2;

Log-Linear Models

log(Eij) = log(Pij) + μ + αi + βj + γk

where Eij denotes the expected number of events in cell (i,j) that is assumed to be distributed as a Poisson variate, and log(Pij) is the log of the exposure Pij

Logistic Models

where θij is the log odds of event and mij is the probability of event in cell (i,j).

kjiij

ijij m

m

1log

ijkjiijY

ij

22

Part II: First Research Design: APC Accounting/Multiple Classification Model The Algebra of APC Identification Problem

Least-squares regression in matrix form:(2)

Identification Problem:

or the solution to normal equation does not exist because the design matrix X is singular with 1-less than full column rank and (XTX)-1 does not exist due to:

Period = Age + Cohort

Tpapab ),...,,,...,,,...,( 211111

XbY

YXXXb TT 1)(ˆ

23

Part II: First Research Design: APC Accounting/Multiple Classification Model Conventional Solutions to APC Identification Problem

Constrained Coefficients GLIM estimator (CGLIM) Impose one or more equality constraints on the coefficients of the coefficient

vector in (2) in order to just-identify (one equality constraint) or over-identify (two or more constraints) the model;

Proxy variables approach Use one or more proxy variables as surrogates for the age, period, or cohort

coefficients (see O'Brien, R.M. 2000. "Age Period Cohort Characteristic Models." Social Science Research 29:123-139);

Nonlinear parametric (algebraic) transformation approach Define a nonlinear parametric function of one of the age, period, or cohort

variables so that its relationship to others is nonlinear. References:

Fienberg and Mason (1985) Yang, Yang. 2005. New Avenues for Cohort Analysis: Chapter 2. Ph.D.

Dissertation. Duke University. [Proquest]

24

Part II: First Research Design: APC Accounting/Multiple Classification Model

Limitations of Conventional Solutions to APC Identification Problem

Proxy variables approach the analyst does not want to assume that all of the variation associated with the

A, P, or C dimensions is fully accounted for by a proxy variable;

Nonlinear parametric (algebraic) transformation approach it may not be evident what nonlinear function should be defined for the effects

of age, period, or cohort;

Constrained Coefficients GLIM estimator (CGLIM) it is the most widely used of the three approaches, but suffers from some major

problems summarized below.

25


Limitations of Conventional Solutions to APC Identification

Problem Constrained Coefficients GLIM estimator (CGLIM)

the analyst desires to employ the flexibility of the APC accounting model with its individual effect coefficients for each of the A, P, or C categories;

the analyst needs to rely on prior or external information to find constraints that hardly exists or can be well verified;

different choices of identifying constraints can produce widely different estimates of patterns of change across the A, P, and C categories of the analysis;

all just-identified CGLIM models will produce the same levels of goodness-of-fit to the data, making it impossible to use model fit as the criterion for selecting the best constrained model.

See, e.g., Yang et al. (2004) and Yang et al. (2006), for details.

26


Guidelines for Estimating APC Models of Rates Step 1: Descriptive data analyses using graphics Step 2: Model fitting procedures Objectives:

to provide qualitative understanding of patterns of age, or period, or cohort variations, or two-way age by period and age by cohort variations;

to ascertain whether the data are sufficiently well described by any single factor or two-way combination of the A, P, and C dimensions or if it is necessary to include all three.

27


Step 1: Graphical analyses: example from Yang (2008)

28


Step 1: Graphical analyses As a first step in the analysis of a table of age-period-specific rates or age-

cohort-specific, we recommend a graphical representation of the data such as the U.S. female lung cancer mortality rates shown in Figure 3 from Yang (2008).

If there are no cohort effects, then the curves of the age-specific rates should show parallel curvatures. But it can be seen that the curves of age-specific rates show substantial departure from this condition.

For example, the curve of age-specific rates for 1995-99 cuts cross a number of birth cohort curves, such as 1900, 1905, 1910, and 1920. Therefore, the shape of the period curve is affected by both varying age effects and cohort effects. The question of how these effects operate simultaneously to shift period curve motivates the use of statistical regression modeling.

29


Step 2: Model fitting procedures Examples from Yang et al. (2004) and Yang (2008)

Table 1. Goodness-of-Fit Statistics for Age-Period-Cohort Log Linear Models of U.S. Adult Mortality

Female

Cause of Models A AP AC APC* Death DF 112 105 90 84

Deviance 695527 40443 72089 18903 Total AIC 695751 40653 72269 19071

BIC 695763 40664 72279 19080 Deviance 782210 52225 18638 9243

Heart Disease AIC 782434 52435 18818 9411 BIC 782446 52446 18827 9420 Deviance 655622 12660 25967 1480

Stroke AIC 655846 12870 26147 1648 BIC 655858 12881 26157 1657 Deviance 320050 42126 5296 245

Lung Cancer AIC 320274 42336 5476 413 BIC 320286 42347 5486 422 Deviance 9748 7403 1553 512

Breast Cancer AIC 9972 7613 1733 680 BIC 9984 7625 1743 689

30


Step 2: Model fitting procedures As a second step in model specification/estimation, we recommend the estimation of a

sequence of nested log-linear models as illustrated in Tables 1 and 4 for analyses reported in Yang (2008).

These tables show goodness-of-fit statistics for six reduced log linear models: three gross effects models, namely, model A for age effects, model P for period effects, and model C for cohort effects; and three two-factor models, one for each of three possible pairs of effects, namely, AP, AC, and PC effects models. All of these models then are nested within a full APC model where all three factors are simultaneously controlled.

Goodness-of-fit statistics were calculated and used to select the best fitting models for male and female mortality data. Because likelihood ratio tests (Table 4) tend to favor models with a larger number of parameters, two most commonly used penalized-likelihood model selection criteria are reported in Table 1, namely, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), each of which adjust the impact of model dimensions on model deviances.

For the female lung cancer data, both the AIC and BIC statistics imply that the full APC models fit the data significantly better than any of the reduced models.

31

Part II: First Research Design: APC Accounting/Multiple Classification Model Guidelines for Estimating APC Models of Rates

If the foregoing descriptive analyses suggest that only one or two of the A, P, and C dimensions is operative, then the analysis can proceed with a reduced model (2) that omits one or two dimensions and there is no identification problem.

If, however, these analyses suggest that all three dimensions

are at work, then Yang et al. (2004, 2008) recommend:

Step 3: Apply the Intrinsic Estimator (IE).

32


What is the Intrinsic Estimator (IE)? It is a new method of estimation that yields a unique solution to the

model (2) and is the unique estimable function of both the linear and nonlinear components of the APC model determined by the Moore-Penrose generalized inverse. It achieves model identification with minimal assumptions.

Why is the IE useful? The basic idea of the IE is to remove the influence of the

design matrix (which is fixed by the number of age and period groups and not related to Yij) on coefficient estimates. This produces estimates that have desirable statistical properties.

33

Part II: First Research Design: APC Accounting/Multiple Classification Model The Intrinsic Estimator (IE): Algebraic Definition

The linear dependency between A, P, and C is mathematically equivalent to:

(3)

The eigenvector B0 of eigenvalue of 0 is fixed by X:

00 XB

0

00 ~

~

B

BB

TCPAB ),,,0(~

0

2

1)1(,,

2

11

aa

aA

)1(

2

1,,1

2

1p

ppP

2

)2(,,2

1pa

papa

C

34

Part II: First Research Design: APC Accounting/Multiple Classification Model The Intrinsic Estimator (IE): Algebraic Definition/Geometric

Representation Parameter vector orthogonal decomposition:

(4)(5)

, projection of b to the non-null space of X t is a real number, tB0 is in the null space of X and represents trends of linear

constraints – Different equality constraints used by CGLIM estimators, such as b1 and b2, yield different values of t.

00 tBbb

bPb proj0

b2 b0 b1

0 B0 tB0

bBBIb T )( 000

35

Part II: First Research Design: APC Accounting/Multiple Classification Model The Intrinsic Estimator (IE) Method: Algebraic Definition

From the infinite number of estimator of b in model (2):

(6)

the IE estimates the parameter vector b0 corresponding to t = 0:

(7)

The IE is the special estimator that uniquely determines the age, period, and cohort effects in the parameter subspace defined by b0 :

(8)

0ˆ tBBb

bBBIB T ˆ)( 00

XBXBtXBXBtBBXbX 0)(ˆ00

36


The IE also can be viewed as a special form of principal components regression estimator that removes the influence of the null space of the design matrix X on the estimator:

(a) the analyst computes the eigenvalues and eigenvectors (principal components) of the matrix XTX,

(b) normalizes them to have unit length; (c) identifies the eigenvector B0 corresponding to the unique eigenvalue 0;

(d) estimates a regression model with response vector Y and design matrix U whose column vectors are the principal components determined by the eigenvectors of non-zero eigenvalues, i.e., estimates a principal components regression model; and

(e) then uses the orthonormal matrix of all eigenvectors to transform the coefficients of the principal components regression model to the regression coefficients of the intrinsic estimator B.

37

Part II: First Research Design: APC Accounting/Multiple Classification Model The Intrinsic Estimator (IE) Method

Desirable statistical properties (Yang et al. 2004): Estimability

The IE is an estimable function in the sense that it is invariable to the choice of linear constraints on b.

UnbiasednessFor a fixed number of time periods of data, it is an unbiased estimator of the special parameterization (or linear function) b0 of b.

Relative efficiencyFor a fixed number of time periods of data, it has a smaller variance than any CGLIM estimators.

Asymptotic consistency Under suitable regularity conditions on the error term process and a fixed set

of age categories, the IE will converge asymptotically to the “true” parameters. Monte Carlo Simulation Analysis

Demonstrated numerically the foregoing finite-time-period and asymptotic properties of the IE – Presented at 2007 Annual Meetings of ASA: Sociological Methodology Paper Session (Yang, Schulhofer-Wohl, and Land).

38


Because of its estimability/unbiasedness properties, the IE may provide a means for the accumulation of reliable estimates of the trends of coefficients across the age, period, and cohort categories of the APC accounting model. To provide intuition for this statement, recall the distinction between the homogeneous and inhomogeneous solutions to the ordinary differential equation for, say, Hooke’s Law for the motion of a displaced spring-mass system subject to an external forcing motion in classical mechanics. This law has the algebraic form:

F kx acos t where F denotes acceleration (second derivative of the motion with respect to time) of the mass, x denotes the distance of displacement, k is a constant unique to the particular spring under study and

acos t is the forcing term.

39


The complete solution (CS) to Hooke’s Law is the sum of two parts: a homogeneous part (HP) and a non-homogeneous part (NP):

NPHPCS The HP of the CS characterizes the motion of the spring-mass system (its characteristic pattern of damped oscillations back and forth) after the initial conditions captured in the NP (the level of the initial force of the oscillations due to the length of extension of the spring from which the oscillations are begun. Thus, the HP is characteristic of the intrinsic motion of the spring-mass system and generalizes from displacement to displacement after the initial conditions have worn off. In an analogous way, the IE can be thought of as the HP of a CS that purges the estimated coefficient vector of the APC accounting model of the initial conditions (the NP) of the estimation, i.e., the number of time period of observations in the age-by-time-period table of rates, and captures that part of the analysis that is characteristic of the application and generalizes from tales that may begin with 3, 4, 5, 6, or more time periods of data. And it is the homogeneous part that may be invariant with respect to the initial conditions (number of time periods of data).

40

Part II: First Research Design: APC Accounting/Multiple Classification Model The Intrinsic Estimator (IE) Method: Computation Software

The S-Plus/R program can be obtained by writing Wenjiang J. Fu at [email protected] Stata Ado Files

Typing “ssc install apc” or “net install apc” on the Stata 9.2 command line on any computer connected to the Internet

Download from the Statistical Software Components archive at http://ideas.repec.org/s/boc/bocode.html Uses much the same syntax as Stata’s glm command for generalized linear models. For example, to fit a

log-linear model, use command:

> apc_ie y, exposure(exp) family(poisson) link(log) age(a) period(t) cohort(c) for a dependent variable “y”, an exposure variable “exp”, an age variable “a”, a period variable

“t”, and a cohort variable “c”.

See “help apc_ie” and “help apc_cglim” for more detail. An example of model estimates in Yang et al. (2004) is available at:

http://home.uchicago.edu/~yangy/apc_sectionC

41

Part II: First Research Design: APC Accounting/Multiple Classification Model Example: Intrinsic Estimates of Age, Period, and Cohort Effects of Lung Cancer

Mortality by Sex (Yang 2008)Age Effect

-6.0

-4.0

-2.0

0.0

2.0

Age

Lo

g c

oe

ffic

ien

t Male

Female

Period Effect

-2.0

-1.0

0.0

1.0

2.0

Year

Male

Female

Cohort Effect

-3.0

-2.0

-1.0

0.0

1.0

2.0

Cohort

Lo

g c

oe

ffic

ien

t

Male

Female

42


The Intrinsic Estimator (IE): Conclusion Is the Intrinsic Estimator a “final” or “universal” solution to the APC

“conundrum” in the context of age-by-time period tables of rates? No. There will never be such a solution.

But the IE has been shown to be a useful approach to the identification and estimation of the APC accounting model that has desirable mathematical and statistical properties; and has passed both case studies and simulation tests of model

validation.

43

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys

Major References for Part III: Yang, Yang. 2006. “Bayesian Inference for Hierarchical Age-Period-

Cohort Models of Repeated Cross-Section Survey Data.” Sociological Methodology 36:39-74.

Yang Yang and Kenneth C. Land. 2006. “A Mixed Models Approach to the Age-Period-Cohort Analysis of Repeated Cross-Section Surveys, With an Application to Data on Trends in Verbal Test Scores.” Sociological Methodology 36:75-98.

Yang Yang and Kenneth C. Land. 2008. ”Age-Period-Cohort Analysis of Repeated Cross-Section Surveys: Fixed or Random Effects?” Sociological Methods and Research 36(February):297-326.

Yang, Yang. 2008. “Social Inequalities in Happiness in the United States, 1972 to 2004: An Age-Period-Cohort Analysis.” American Sociological Review 73(April):204-226.

44

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Data Structure: Individual-level Data in the Age-by-

Period ArrayPeriod j

Age i

nij >1

45

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Solution to the Identification Problem

Many researchers previously have assumed that the APC identification problem for age-by-time period tables of rates transfers over directly to this research design.

But note that this research design yields individual-level data, i.e., microdata on the ages and other characteristics of individuals in the samples.

Solution: Use of different temporal groupings for the A, P, and C dimensions breaks the linear dependency: Single year of age Time periods correspond to years in which the surveys are conducted Cohorts can be defined either by five- or ten-year intervals that are

conventional in demography or by application of a substantive classification (e.g., War babies, Baby Boomers, Baby Busters, etc.).

46

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Two-way Cross-Classified Data Structure in the GSS: Number of Observations by

Cohort and Period in the Verbal Ability Data (Yang and Land 2006)

Year (K) Cohort (J) 1974 1976 1978 1982 1984 1987 1988 1989 1990 1991 1993 1994 1996 1998 2000 Total

1890 12 18 8 0 0 0 0 0 0 0 0 0 0 0 0 38 1895 31 25 19 19 6 0 0 0 0 0 0 0 0 0 0 100 1900 62 52 49 27 18 17 13 11 5 2 0 0 0 0 0 256 1905 88 69 68 43 38 23 11 12 11 11 15 15 10 0 0 414 1910 77 89 69 75 50 48 34 27 25 29 13 31 27 18 8 620 1915 109 111 84 100 81 81 42 36 37 41 37 60 39 24 27 909 1920 115 104 112 110 73 97 60 53 40 56 55 85 59 32 37 1088 1925 113 108 106 131 99 92 52 53 53 40 50 84 81 68 52 1182 1930 129 92 90 111 81 95 47 54 43 62 43 86 72 45 64 1114 1935 130 106 108 112 80 101 39 59 44 37 58 101 100 61 64 1200 1940 119 140 130 127 100 142 49 74 49 65 58 134 117 65 78 1447 1945 179 161 184 163 133 143 98 84 85 74 85 168 161 104 85 1907 1950 179 180 197 199 170 185 101 94 95 111 99 173 169 101 111 2164 1955 89 151 180 260 162 219 102 117 106 118 127 198 213 149 145 2336 1960 0 8 59 175 186 190 109 121 102 118 103 231 208 161 147 1918 1965 0 0 0 38 75 161 101 86 76 91 111 182 188 157 111 1377 1970 0 0 0 0 0 29 32 48 55 77 81 157 188 116 145 928 1975 0 0 0 0 0 0 0 0 0 1 23 59 128 84 107 402 1980 0 0 0 0 0 0 0 0 0 0 0 0 4 34 62 100 Total 1432 1414 1463 1690 1352 1623 890 929 826 933 958 1764 1764 1219 1243 19500

47

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys This Data Structure illustrates that:

respondents are nested in and cross-classified simultaneously by the two higher-level social contexts defined by time period and birth cohort >>> so the basic idea here is to treat time periods and birth cohorts as contexts;

individual members of any birth cohort can be interviewed in multiples replications of the survey; and

individual respondents in any particular wave of the survey can be drawn from multiple birth cohorts.

48

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Further Questions:

Is there evidence for clustering (correlation) of random errors, due to the fact that: individuals surveyed in the same year may be subject

to similar unmeasured events that influence their outcomes?

members of the same birth cohort may be subject to similar unmeasured events that influence their outcomes?

How can this random variability be modeled and explained?

49

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Method

Hierarchical Age-Period-Cohort (HAPC) Models Mixed (fixed and random) effects models or

hierarchical linear models (HLM) Cross-classified random effects model (CCREM) Objective: Model the level-two heterogeneity to:

Assess the possibility that individuals within the same periods and cohorts could share unobserved random variance;

Explain the level-two variance by contextual characteristics of time periods and birth cohorts.

50

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Illustrative Application

APC Analysis of General Social Survey (GSS) Data on Verbal Test Scores: 1974 – 2000 The Initial Papers

Alwin, D. 1991. “Family of Origin and Cohort Differences in Verbal Ability.” American Sociological Review 56:625-38.

Glenn, N.D. 1994 “Television Watching, Newspaper Reading, and Cohort Differences in Verbal Ability.” Sociology of Education 67:216-30.

The debate in the American Sociological Review Wilson, J.A. and W.R. Gove. 1999. "The Intercohort Decline in

Verbal Ability: Does It Exist?" and reply to Glenn and Alwin & McCammon. ASR 64:253-266, 287-302.

Glenn, N.D. 1999. “Further Discussion of the Evidence for An Intercohort Decline in Education-Adjusted Vocabulary.” ASR 64:267-71.

Alwin, D.F. and R.J. McCammon. 1999. “Aging Versus Cohort Interpretations of Intercohort Differences in GSS Vocabulary Scores.” ASR 64:272-86.

51


Debate Initiation: Alwin (1991) and Glenn (1994) found evidence of a long-

term intercohort decline in verbal ability beginning in the early part of the twentieth century.

Wilson and Gove (1999) took issue with this finding and argued that the Alwin and Glenn analyses confused cohort effects with aging effects.

Wilson and Gove also suggested the possibility of a curvilinear age effect and the importance of treating the collinearity between age and cohort in the GSS data.

While Alwin and Glenn assumed that period effects are minimal or null, Wilson and Gove found “that year of survey [time period] is negatively related to verbal score when education is controlled” and considered this as an indication of “the presence of a period effect.”

52


Response: In response, Glenn (1999) disagreed that the decline in GSS

vocabulary scores resulted solely from period influences and also argued against the Wilson and Gove claim that cohort differences actually reflected only age effects.

After reexamining aging versus cohort explanations, Alwin and McCammon (1999) similarly insisted that aging explains only a tiny portion of the variation in verbal ability data and therefore is not sufficient to account for the contributions of unique cohort experiences to the decline in verbal skills.

53


Followup: More recently, Alwin and McCammon (2001 “Aging,

Cohorts, and Verbal Ability.” The Journals of Gerontology Series B: Psychological Sciences and Social Sciences 56:S151–61) analyzed 14 repeated cross-sections from the GSS over a 24-year period and concluded that age-related differences in cognitive abilities observed in cross-sectional samples of individuals may in part be spurious due to the effects of cohort differences in schooling and related factors. They found that “the curvilinear contributions of aging to variation in verbal scores account for less than one-third of 1 percent of the variance in vocabulary knowledge, once cohort is controlled” (Alwin and McCammon 2001:151).

54

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Research Questions

Can distinct age, period, and cohort components of change in verbal ability in the U.S. be estimated?

How can period and/or cohort level heterogeneity be explained by period and/or cohort characteristics?

Analytic Method Apply the HAPC-CCREM to estimate

fixed effects of age and other individual level and level-two covariates,

random effects of period and cohort and variance components.

55

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Model Specification

Level 1 / Within-Cell Model

(9) Level 2 / Between-Cell Model

(10)

for i = 1, 2, …, njk individuals within cohort j (j = 1, …, 19) and period k (k = 1, …,

15). Combined Model:

ijkijkijkjjijk EDUCATIONAGEAGETVNEWSWORDSUM 32

21210 ijkkjijkijk evuBLACKFEMALE 0054

ijkijkijkijkijkijkjkijk eBLACKFEMALEEDUCATIONAGEAGEWORDSUM 5432

210

),0(~ 2Neijk

kjjjjk vuTVNEWS 002100 , ),0(~0 uj Nu , ),0(~0 vk Nv

56

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Coefficient Estimation Using Restricted Maximum

Likelihood-Empirical Bayes (REML-EB) and SAS PROC MIXEDproc mixed data=gssverb covtest CL;

class year cohort;

model wordsum = age1 age2 education female black cohort_news cohort_tv /solution CL;

random intercept/sub=year solution;

random intercept/sub=cohort solution;

title “Final HAPC_CCREM for GSS verbal data“;

run;Source: codes used in Yang and Land (2006, 2007);

Note: all explanatory variables have been properly centered (around grand mean or group mean) for the intercept to be meaningful.

57

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Estimated Cohort Effects, Period Effects, with 95% CIs, and Age Effects on GSS

Verbal Test Scores (Yang and Land 2006) Cohort Effect

5.00

5.20

5.40

5.60

5.80

6.00

6.20

6.40

6.60

6.80

7.00

Cohort

Verb

al T

est

Sco

re

Period Effect

5.50

5.60

5.70

5.80

5.90

6.00

6.10

6.20

6.30

6.40

6.50

Period

Verb

al T

est

Sco

reAge Effect

4.50

4.70

4.90

5.10

5.30

5.50

5.70

5.90

6.10

6.30

6.50

Age

Pre

dic

ted

Verb

al T

est

Sco

re

58


BACK TO THE DEBATE ON TRENDS IN VERBAL ABILITY: Who is right, Alwin and Glenn or Wilson and Gove?

The results of the HAPC analyses show: significant random variance components that reside in all three

levels of the APC data: individuals nested within cohorts and periods;

controlling for the effects of key individual characteristics, namely, education, sex and race, and period and cohort effects does not explain away all age effects;

significant contextual effects of cohorts and periods on verbal ability; and

strong effects of cohort characteristics: cohorts that have a larger proportion of daily newspaper readers are better off in their verbal ability; more hours of TV watching per day tends to undermine average cohort verbal ability.

59

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Extensions of HAPC Modeling

Fixed Effects vs. Random Effects Model A Full Bayesian HAPC Model Generalized Linear Mixed Models (GLIMM)

60

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Fixed Effects vs. Random Effects Model:

The HAPC-CCREM approach illustrated above uses a mixed (fixed and random) effects model with a random effects specification for the level-2 (time period and cohort) contextual variables.

Alternative: fixed effects specification for the level-2 variables in which ones uses dummy (indicator) variables to record the cohort and the time period of the survey.

The comparison seems especially pertinent when the number of replications of the survey is relatively small—say 3 to 5.

61

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys Fixed Effects vs. Random Effects Model:

The estimates of cohort and time period effects from a fixed effects model for the GSS data are quite similar in pattern to those from the random effects model.

Random effects model preferred to fixed effect models: It avoids potential model specification error by not using the

assumption of the fixed effect model that the indicator/dummy variables representing the fixed cohort and periods effects fully account for all of the group effects;

It allows group level covariates to be incorporated into the model and explicitly models cohort characteristics and period events to test explanatory hypotheses;

For unbalanced research designs (designs in which there are unequal numbers of respondents in the cells), such as one typically has in repeated cross-section survey designs, a random effect model for the level-2 variables generally is more statistically efficient.

62

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys A Full Bayesian HAPC Model:

Limitations of HAPC Modeling Using REML-EB Estimation Small numbers of cohorts (J) and periods (K) Unbalanced data Inaccurate REML estimates of variance-covariance components Inaccurate EB estimates of fixed effects regression coefficients

A Remedy: Bayesian Model Estimation A full Bayesian approach, by definition, ensures that inferences about

every parameter fully account for the uncertainty associated with all others.

63


Bayesian HAPC Models: Model Specification

Level-1 Model (Likelihood): ),|( 2Yf

Level-2 Model (Stage-1 Prior): ),,|( vup

Stage-2 Prior: )()()()(),,,( 22 ppppp vuvu

Joint Posterior:

),,,(),,|(),|()|,,,,( 222 vuvuvu ppYfYp .

Gibbs Sampling: Starting with initial values of the parameters, it repeatedly samples from the conditional distributions until stochastic convergence. The chain of values generated by this sampling procedure is known as a Monte Carlo Markov Chain (MCMC).

Computational Software: WinBUGS (Windows Bayes Using Gibbs Sampling): Freely available from http://www.mrc-bsu.cam.ac.uk/bugs/ at the MRC Biostatistics Unit, Cambridge, UK

64

Part III: Second Research Design: APC Analysis of Repeated Cross-Section Surveys HAPC Generalized Linear Mixed Models:

Family of GLIMMs Normal outcome: Linear mixed models using Gaussian link Binomial outcome: Logistic mixed models using logit link Ordinal or nominal outcome: Ordinal logistic mixed models Count outcome: Poisson mixed models using log link Count outcome with dispersion: Negative Binomial mixed

models REML-EB Estimation: SAS PROC GLIMMIXED

65

Part IV: Third Research Design: Cohort Analysis of Accelerated Longitudinal Panels

Major References for Part IV: Miyazaki, Yasuo and Stephen W. Raudenbush. 2000.

"Tests for Linkage of Multiple Cohorts in an Accelerated Longitudinal Design." Psychological Methods 5:44-63.

Yang, Yang. 2007. “Is Old Age Depressing? Growth Trajectories and Cohort Variations in Late Life Depression.” Journal of Health and Social Behavior 48:16-32.

66


Accelerated Longitudinal Panel Designs ALPD Definition: A longitudinal panel study of an initial

sample of individuals from a broad array of ages (and thus birth cohorts) interviewed or monitored with three or more follow-up waves.

The design allows a more rapid accumulation of information on age and cohort effects than does a single cohort follow-up study.

67


Cohort

Age (Time)

Data Structure: Accelerated Longitudinal Panel Design

68


Growth Curve Models of Individual Change: Assess the intra-individual age changes and birth cohort

differences simultaneously; Assess differential cohort patterns in age changes: age-by-

cohort interaction effects; Period effects?

The time period for an accelerated longitudinal panel study often is short (e.g., a decade or so), so the effects of period usually can be ignored.

In growth curve models, age and time are the same variable, so the effects of period need not be estimated.

Thus, the analysis can focus on the age-by-cohort interactions. If period effects are of concern, estimate the HAPC-CCREM.

69

Part IV: Third Research Design: Cohort Analysis of Accelerated Longitudinal Panels Illustrative Application: Cohort Variations in Age

Trajectories of Depression in the Elderly (Yang 2007) Research Questions

Does the age growth trajectory show an increase in depressive symptoms in late life?

Is there cohort heterogeneity in levels of depressive symptoms and age growth trajectories of depressive symptoms?

What social risk factors are associated with these effects? Data

Established Populations for Epidemiologic Studies of the Elderly (EPESE) in North Carolina: A four-wave panel study of older adults aged 65+ from 1986 to 1996

70


Model Specification Level-1 Repeated Observation Model

(11)

Yti = CES-D for person i at time t, for i =1, …, n and t = 1, …, Ti

Xpti = (marital status, economic status, health status,

stress and coping resources)

= expected CES-D for person i

= expected growth rate per year of age in CES-D for person i

= regression coefficient associated with Xpti

tip

ptipitiiiti eXAgeY 10

i0

i1

pi

),0(~ 2Neti

iid

71


Model Specification Level-2 Individual Model

(12)Zqi = (Female, Black, Education)

= expected CES-D for person i for the reference group (at median age in Cohort 1 at T1)

= main cohort effect coefficient: mean difference in CES-D between cohorts

= regression coefficient associated wit Zqi

= age effect coefficient: expected rate of change in CES-D

= age*cohort coefficient: mean difference in rate of change between cohorts

iq

qiqii rZCohort 0001000

iii rCohort 111101

110

010

1

0 ,0

0~

Nr

r

i

i iid

00

01

q0

10

11

72

Part IV: Third Research Design: Cohort Analysis of Accelerated Longitudinal Panels Coefficient Estimation Using Restricted Maximum

Likelihood-Empirical Bayes (REML-EB) and SAS PROC MIXED:proc mixed data=depression_dat covtest CL;

class ID;

model CES-D = age cohort age*cohort x1-x10

/solution CL;

random intercept age/sub=ID solution;

title “Final growth curve HAPC model of depression data“;

run;

Source: codes used in Yang ( 2007);

Note: all explanatory variables have been properly centered (around grand mean or group mean) for the intercept to be meaningful.

73


Fixed Effect Model 1

(Total)

Model 7

(Net)

Intercept, 2.856*** 2.525***

Growth Rate: Age, 0.048*** -0.018

Cohort 0.244*** -0.213**

Age * Cohort -0.019# -0.040***

Random Effect Variance Component % Reduction

Level-1: Within person 36.987*** 35.109*** 5%

Level-2: In intercept 6.170*** 3.763*** 39%

In growth rate 0.057*** 0.051*** 11%

Goodness-of-fit

AIC (smaller is better) 51190.5 48167.4

BIC (smaller is better) 51215.6 48192.5

i0

i1

# p < .10; * p < .05; ** p < .01; *** p < .001.

00

01

10

11

2

0

1

Model Estimates

74

Part IV: Third Research Design: Cohort Analysis of Accelerated Longitudinal Panels Expected Growth Trajectories and Cohort Variations in Depression

a. Model 1-Gross Age and Cohort Effects

1

2

3

4

5

Age

CES-

D

b. Model 7- Net Age and Cohort Effects

1

2

3

4

5

Age

CES-

D

All cohort 1 cohort 2 cohort 3 cohort 4 cohort 5

75


Summary of Findings: The gross age trajectory of depressive symptoms during

late life is positive and linear. There is substantial cohort heterogeneity in both average

levels of depressive symptoms and age growth trajectories of depressive symptoms.

The age growth trajectories of depressive symptoms are not significant after adjusting for cohort effect and risk factors associated with historical trends in education, life course stages, survival, health decline, stress and coping resources.

Net of all the factors considered, more recent birth cohorts have higher levels of depression.

76

Conclusion

Copies of all of our papers referenced in this presentation as well as others can be obtained from the Webpage:

http://home.uchicago.edu/~yangy/apc

Happy Hunting for Distinct Age, Period, and Cohort Effects!

1 disentangling age-period-cohort effects: new models, methods, and empirical applications kenneth...

Documents