more details can be found in the “ course objectives and content ” handout on the course webpage

22
More details can be found in the Course Objectives and Content” handout on the course webpage. Multiple Regression Analysis (MRA) i i i i X X Y 2 2 1 1 0 Do your residuals meet the required assumptions? Test for residual normalit y Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a multi-level model If your sole predictor is continuous, MRA is identical to correlational analysis If your sole predictor is dichotomous, MRA is identical to a t-test If your several predictors are categorical, MRA is identical to ANOVA If time is a predictor, you need discrete- time survival analysisIf your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotom ous outcome) Multinomia l logistic regression analysis (polytomo us outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a principal components analysis Use cluster analysis Use non- linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use factor analysis: EFA or CFA? © Willett, Harvard University Graduate School of Education, 06/24/2022 S052/III.2(a) – Slide 1 S052/III.2(a): Applied Data Analysis Where Does Today’s Topic Area Fall, Within The Overall Roadmap of the Course? Today’s Topic Area

Upload: fionan

Post on 10-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

If your several predictors are categorical, MRA is identical to ANOVA. If your sole predictor is continuous, MRA is identical to correlational analysis. If your sole predictor is dichotomous, MRA is identical to a t-test. Do your residuals meet the required assumptions ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

More details can be found in the “Course Objectives and Content” handout on the course webpage.

More details can be found in the “Course Objectives and Content” handout on the course webpage.

Multiple RegressionAnalysis (MRA)

Multiple RegressionAnalysis (MRA) iiii XXY 22110

Do your residuals meet the required assumptions?

Test for residual

normality

Use influence statistics to

detect atypical datapoints

If your residuals are not independent,

replace OLS by GLS regression analysis

Use Individual

growth modeling

Specify a multi-level

model

If your sole predictor is continuous, MRA is

identical to correlational analysis

If your sole predictor is dichotomous, MRA is

identical to a t-test

If your several predictors are

categorical, MRA is identical to ANOVA

If time is a predictor, you need discrete-

time survival analysis…

If your outcome is categorical, you need to

use…

Binomial logistic

regression analysis

(dichotomous outcome)

Multinomial logistic

regression analysis

(polytomous outcome)

If you have more predictors than you

can deal with,

Create taxonomies of fitted models and compare

them.

Form composites of the indicators of any common

construct.

Conduct a principal components analysis

Use cluster analysis

Use non-linear regression analysis.

Transform the outcome or predictor

If your outcome vs. predictor relationship

is non-linear,

Use factor analysis:EFA or CFA?

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 1

S052/III.2(a): Applied Data Analysis Where Does Today’s Topic Area Fall, Within The Overall Roadmap of the Course?

S052/III.2(a): Applied Data Analysis Where Does Today’s Topic Area Fall, Within The Overall Roadmap of the Course?

Today’s Topic Area

Page 2: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 2

S052/III.2(a): Exploratory Cluster Analysis of Variables How Does Today’s Topic Map Onto The Printed Syllabus?S052/III.2(a): Exploratory Cluster Analysis of Variables

How Does Today’s Topic Map Onto The Printed Syllabus?

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Taking a Different Perspective on the Standard PCA Solution

(Slides 4-11).

The Cluster Analysis of Variables(Slide 13-20).

Which Strategy For Forming Composites Of Multiple Indicators Is The “Best”?

(Slide 22).

Page 3: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 3

S052/III.2(a): Exploratory Cluster Analysis of Variables How Does Today’s Topic Map Onto The Printed Syllabus?S052/III.2(a): Exploratory Cluster Analysis of Variables

How Does Today’s Topic Map Onto The Printed Syllabus?

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Taking a Different Perspective on the Standard PCA Solution

(Slides 4-11).

Page 4: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 4

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

Here’s a dataset in which teachers’ responses to what the investigators believed were multiple indicators

of a single underlying construct of Teacher Job Satisfaction:

The data described in TSUCCESS_info.pdf.

Here’s a dataset in which teachers’ responses to what the investigators believed were multiple indicators

of a single underlying construct of Teacher Job Satisfaction:

The data described in TSUCCESS_info.pdf.

Dataset TSUCCESS.txt

Overview Responses of national sample of teachers to six questions about job satisfaction.

SourceAdministrator and Teacher Survey of the High School and Beyond (HS&B) dataset, 1984 administration, National Center for Education Statistics (NCES). All NCES datasets are also available free from the EdPubs on-line supermarket.

Sample Size 5269 teachers (4955 with complete data).

More Info

HS&B was established to study educational, vocational, and personal development of young people beginning in their elementary or high school years and following them over time as they began to take on adult responsibilities. The HS&B survey included two cohorts: (a) the 1980 senior class, and (b) the 1980 sophomore class. Both cohorts were surveyed every two years through 1986, and the 1980 sophomore class was also surveyed again in 1992.

Principal Component and Frau Himmler

Page 5: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 5

Eigenvalues of the Correlation Matrix  Eigenvalue Difference Proportion Cumulative 1 2.60599489 1.39439026 0.4343 0.43432 1.21160463 0.49880170 0.2019 0.63633 0.71280293 0.11761825 0.1188 0.75514 0.59518468 0.14741881 0.0992 0.85435 0.44776587 0.02111886 0.0746 0.92896 0.42664701 0.0711 1.0000

Eigenvalues of the Correlation Matrix  Eigenvalue Difference Proportion Cumulative 1 2.60599489 1.39439026 0.4343 0.43432 1.21160463 0.49880170 0.2019 0.63633 0.71280293 0.11761825 0.1188 0.75514 0.59518468 0.14741881 0.0992 0.85435 0.44776587 0.02111886 0.0746 0.92896 0.42664701 0.0711 1.0000

Recall our earlier scree plot inspection of the eigenvalues from the teacher satisfaction example….Recall our earlier scree plot inspection of the eigenvalues from the teacher satisfaction example….

ComponentNumber

Eigen-value

1 2.6062 1.2123 0.7134 0.5955 0.4486 0.426

0

0.5

1

1.5

2

2.5

3

0 2 4 6 8

Component #

Eig

en

valu

e

Looking for the scree?

We concluded that this scree plot suggested there may be two important dimensions of information being measured

by the six indicators as a group.

We concluded that this scree plot suggested there may be two important dimensions of information being measured

by the six indicators as a group.

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

Page 6: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

Principal components (eigenvectors) ------------------------------------------------------------------------ Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 ----------+------------------------------------------------------------ X1 | 0.3472 0.6182 0.0896 0.0264 0.6261 0.3108 X2 | 0.3617 0.5950 0.0543 -0.0217 -0.6685 -0.2548 X3 | 0.3778 -0.3021 0.7555 0.4028 0.0503 -0.1746 X4 | 0.4144 -0.1807 -0.5972 0.6510 -0.0493 0.1129 X5 | 0.4727 -0.2067 -0.2418 -0.4501 0.3022 -0.6176 X6 | 0.4591 -0.3117 0.0558 -0.4584 -0.2548 0.6433 ------------------------------------------------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 6

This suggests that the 1st and 2nd eigenvectors are most interesting, and that perhaps we can ignore the rest …This suggests that the 1st and 2nd eigenvectors are most interesting, and that perhaps we can ignore the rest …

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

Previously, we’ve interpreted the elements of these eigenvectors as representing how each of the six original (standardized) indicators is weighted in the orthogonal composite variables PC_1 & PC_2.

Each indicator loads on PC_1 & PC_2 in different ways and, by inspecting the magnitude and direction of the loadings, we have concluded that PC_1 & PC_2 measure: Teacher Enthusiasm, and Teacher Frustration, respectively.

But now, let’s adopt a different perspective: Rather than trying to interpret PC_1 and PC_2 separately as

composite variables that measure uncorrelated features of teacher job satisfaction,

Let’s regard PC_1 and PC_2 as defining orthogonal directions in an underlying two-dimensional space, in which the six original indicators can now be plotted efficiently : Let’s try to imagine what the six original variables “look

like” in that reduced space.

Page 7: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 7

Eigenvectors

Comp1 Comp2

X1 Have high standards of teaching 0.3472 0.6182X2 Continually learning on job 0.3617 0.5950X3 Successful in educating students 0.3778 -.3021X4 Waste of time to do best as teacher 0.4144 -.1807X5 Look forward to working at school 0.4727 -.2067X6 Time satisfied with job 0.4591 -.3117

Eigenvectors

Comp1 Comp2

X1 Have high standards of teaching 0.3472 0.6182X2 Continually learning on job 0.3617 0.5950X3 Successful in educating students 0.3778 -.3021X4 Waste of time to do best as teacher 0.4144 -.1807X5 Look forward to working at school 0.4727 -.2067X6 Time satisfied with job 0.4591 -.3117

This is easiest to imagine by plotting the elements of the eigenvectors on the same plot, as follows ...This is easiest to imagine by plotting the elements of the eigenvectors on the same plot, as follows ...

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

S052/III.2(a): Exploratory Cluster Analysis of Variables Taking a Different Perspective on the Standard PCA Solution

0

0.2

0.4

0.6

0.8

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

Loadings on Comp1Loadings on Comp1

Loadings on Comp2Loadings on Comp2

Page 8: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

<Usual data-input statements omitted> …*------------------------------------------------------------------------------* Carry-out the principal components analysis interactively, in successively* smaller groups of variables, selected based on the prior pca.*------------------------------------------------------------------------------

* First pca of all the indicators, to determine the initial structure: pca X1-X6, means

* Second pass, within groups of indicators established in the first pass: * Group #1, output scores on the first component in the group: pca X1 X2, means predict GP1_PC1 * Group #2, output scores on the first component in the group: pca X3 X4 X5 X6, means predict GP2_PC1

*------------------------------------------------------------------------------* Inspect the properties of composite scores obtained.*------------------------------------------------------------------------------* List out the indicator and principal component scores for first 35 teachers: list X1-X6 GP1_PC1 GP2_PC1 in 1/35, nolabel* Estimate univariate descriptive statistics for the two composite scores: tabstat GP1_PC1 GP2_PC1, stat(n mean sd) columns(statistics)* Estimate the bivariate correlation between the two composite scores on: pwcorr GP1_PC1 GP2_PC1, sig obs

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 8

Here’s the STATA code for Handout III.2(a).1, in which I regroup the indicators and composite them within sensible groups …Here’s the STATA code for Handout III.2(a).1, in which I regroup the indicators and composite them within sensible groups …

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

First Pass, to provide the initial principal components analysis of all

six indicators.Second Pass to composite indicators in Group

#1, consisting of variables X1 & X2, and provide composites with prefix GP1_ .

Hopefully, a single composite will capture most of the variation in X1 & X2.

Second Pass to composite indicators in Group #2, consisting of variables X3, X4, X5 & X6, and provide composites with prefix GP2_ .

Hopefully, a single composite will again capture most of the important variation in X3,

X4, X5 & X6.

Inspect the statistical properties of the obtained “sub-group” composites:• List out the values of a few cases.• Obtain univariate descriptive statistics on

each composite.• Estimate the bivariate correlation between

of the composites.

Page 9: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

Principal components/correlation Number of obs = 5058 Number of comp. = 2 Trace = 2 Rotation: (unrotated = principal) Rho = 1.0000 -------------------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 | 1.55199 1.10397 0.7760 0.7760 Comp2 | .448013 . 0.2240 1.0000 --------------------------------------------------------------------------Principal components (eigenvectors) ----------------------------------- Variable | Comp1 Comp2 -------------+--------------------- X1 | 0.7071 0.7071 X2 | 0.7071 -0.7071 -----------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 9

Here’s the PCA output for the principal components analysis of the first group of indicators (X1 & X2) …Here’s the PCA output for the principal components analysis of the first group of indicators (X1 & X2) …

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Successful first principal component of X1 & X2, containing almost 78% of the initial two units

of standardized variance.

*2

*1 71.071.01_1 iii XXPCGP

Teachers who score high on this composite…• Have high standards of teaching performance.• Feel that they are continually learning on the job.

Teachers who score high on this composite…• Have high standards of teaching performance.• Feel that they are continually learning on the job.

A composite measure of TEACHER PERFORMANCE?

A composite measure of TEACHER PERFORMANCE?

Page 10: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

Principal components/correlation Number of obs = 5031 Number of comp. = 4 Trace = 4 Rotation: (unrotated = principal) Rho = 1.0000 -------------------------------------------------------------------------- Component | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 | 2.25102 1.52944 0.5628 0.5628 Comp2 | .721572 .127624 0.1804 0.7431 Comp3 | .593948 .160484 0.1485 0.8916 Comp4 | .433464 . 0.1084 1.0000 --------------------------------------------------------------------------Principal components (eigenvectors) ------------------------------------------------------- Variable | Comp1 Comp2 Comp3 Comp4 -------------+----------------------------------------- X3 | 0.4509 0.7636 0.4248 0.1820 X4 | 0.4687 -0.5960 0.6358 -0.1447 X5 | 0.5344 -0.2260 -0.4516 0.6778 X6 | 0.5398 0.1033 -0.4599 -0.6975 -------------------------------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 10

Here’s the PCA output for the principal components analysis of the second group of indicators (X3 thru X6) …Here’s the PCA output for the principal components analysis of the second group of indicators (X3 thru X6) …

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Successful first principal component of X3, X4, X5 & X6,

containing 56% of the initial four units of standardized variance.

*6

*5

*4

*3 54.053.046.045.01_2 iiiii XXXXPCGP

Teachers who score high on this composite…• Believe they are successful in educating students.• Feel that it is not a waste of time to be a teacher.• Look forward to working at school.• Are always satisfied on the job

Teachers who score high on this composite…• Believe they are successful in educating students.• Feel that it is not a waste of time to be a teacher.• Look forward to working at school.• Are always satisfied on the job

A composite measure of TEACHER FEELINGS?A composite measure of TEACHER FEELINGS?

Page 11: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

variable | N mean sd----------+--------------------- GP1_PC1 | 5058 0 1.246 GP2_PC1 | 5031 0 1.500--------------------------------

| GP1_PC1 GP2_PC1-------------+------------------ GP1_PC1 | 1.0000 | 5058 | GP2_PC1 | 0.3245 1.0000 | 4955 5031

+-----------------------------------------------+| X1 X2 X3 X4 X5 X6 GP1_PC1 GP2_PC1 ||-----------------------------------------------|| 5 5 3 3 4 2 1.074 -1.404 || 4 3 2 1 1 2 -0.711 -3.842 || 4 4 2 2 2 2 -0.143 -3.159 || . 6 3 5 3 3 . -0.299 || 4 4 3 2 4 3 -0.143 -0.740 ||-----------------------------------------------|| . 5 2 4 3 3 . -1.251 || 4 4 4 4 5 3 -0.143 0.894 || 6 4 4 1 1 2 1.154 -2.500 || 6 6 3 6 5 3 2.291 0.785 || 3 5 3 6 3 3 -0.223 -0.018 ||-----------------------------------------------|| 4 2 1 3 2 2 -1.279 -3.550 || 5 6 2 6 6 4 1.642 1.460 || 4 3 3 2 5 3 -0.711 -0.339 || 3 3 3 3 4 3 -1.360 -0.459 || 4 4 3 6 3 2 -0.143 -0.963 |…

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 11

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Everyone with complete data has a 1st component score on each new grouping of indicators,

But, because they were obtained in separate PCAs, the two composite scores are no longer uncorrelated with each other.

Page 12: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 12

S052/III.2(a): Exploratory Cluster Analysis of Variables How Does Today’s Topic Map Onto The Printed Syllabus?S052/III.2(a): Exploratory Cluster Analysis of Variables

How Does Today’s Topic Map Onto The Printed Syllabus?

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

The Cluster Analysis of Variables(Slide 13-20).

Page 13: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

< usual data-input statements have been omitted … >*--------------------------------------------------------------------------------* Conducting a cluster analysis of variables.*--------------------------------------------------------------------------------* Before you execute the rest of this code, make sure the STATA user-supported* routine "clv" is available on your workstation. Check by typing "help clv."* * Now, perform a cluster analysis of all six indicators of teacher satisfaction: clv X1 X2 X3 X4 X5 X6, textsize(small)

*--------------------------------------------------------------------------------* Some important ancillary PCA analyses*--------------------------------------------------------------------------------* To gain insight into the "clv" clustering process, it's useful to conduct some* selected ancillary pca analyses, which mirror the critical steps in the "clv"* algorithm itself. * First, we must conduct a listwise deletion of cases with missing values to* ensure that the sample for the ancillary analyses is identical to that used in* the clv application, as follows: dropmiss, obs any * The following steps mirror the steps of the clv process. however, the clv* routine carries out far more subsidiary PCA analyses than are listed below,* in order to make its critical clustering decisions. But, these steps are the* critical decision steps whose consequences appear as summary statistics in the* clv output that you will obtain above. * Step #1: Combine X1 and X2 to form Object#7: pca X1 X2 * Step #2: Combine X5 and X6 to form Object#8: pca X5 X6 * Step #3: Combine X4 and Object#8(X5,X6) to form Object#9: pca X4 X5 X6 * Step #4: Combine X3 and Object#9(X4,(X5,X6)) to form Object#10: pca X3 X4 X5 X6 * Step #5: Combine Object#7(X1,X2) and Object#10(X3,(X4,(X5,X6))) to form Object#11: pca X1 X2 X3 X4 X5 X6

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 13

There’s a routine in STATA that conducts a similar clustering of variables automatically.It’s called “clv”… and its use is featured in Data-Analytic Handout III.2(a).2 … There’s a routine in STATA that conducts a similar clustering of variables automatically.It’s called “clv”… and its use is featured in Data-Analytic Handout III.2(a).2 …

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Calls on cva to cluster indicators X1 through X6.

Before you can use the “cva” routine, you must download it, into your version of STATA, because it is a user-supported

routine. Additional instructions are provided in the comments of the Data-

Analytic Handout itself.

These are the PCA’s that correspond to the decision steps in the clv analysis.

The cva routine works by conducting multiple PCA’s, so we can gain insight

into its functioning by conducting a few ourselves.

Got to ensure listwise deletion of cases with missing data first, to ensure

comparability of output.

Page 14: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 14

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

The cluster solution is easier to comprehend if it is plotted as a tree diagram or dendrogram:

The cluster solution is easier to comprehend if it is plotted as a tree diagram or dendrogram:

025

50%

Une

xpla

ined

Var

ianc

e

Have high st~g Continually ~b Successful i~s Waste of tim~c Look forward~o Time satisfi~bVariables

Clustering around Latent Variables (CLV)

X1 X4 X5 X6X2 X3

The vertical axis displays the percentage of the total standardized variance in the original indicators that is not contained in the

composites have been formed, at this level of clustering … as follows:

The vertical axis displays the percentage of the total standardized variance in the original indicators that is not contained in the

composites have been formed, at this level of clustering … as follows:

Page 15: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

--------------------------------TOTAL VARIANCE: 6.00000NUMBER OF INDIVIDUALS: 4955METHOD: CLASSICAL------------------------------------------------------------------------------------------------------- # of T ExplainedStep clusters Child 1 Child 2 Parent T value Variance----------------------------------------------------------------------- 1 5 X1 X2 7 5.5548 92.581% 2 4 X5 X6 8 5.1077 85.128% 3 3 X4 8 9 4.4912 74.853% 4 2 X3 9 10 3.8086 63.477% 5 1 7 10 11 2.6060 43.433%-----------------------------------------------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 15

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Here’s the clustering process:Here’s the clustering process:

Before the Clustering begins… There are 6 original “Objects “:

Indicators X1 thru X6: Referred to, oddly, as

“children.” Each contributes one unit of

original standardized variability to the compositing process.

Thus, the total sum of original standardized variance: T = 1 + 1 + 1 + 1 + 1 + 1 = 6

PCA of X1 & X2 Rotation: (unrotated = principal) ----------------------------------------- Component | Eigenvalue Difference -------------+--------------------------- Comp1 | 1.55484 1.10968 Comp2 | .445159 . -----------------------------------------

Eigenvectors ---------------------------------- Variable | Comp1 Comp2 -------------+-------------------- X1 | 0.7071 0.7071 X2 | 0.7071 -0.7071 ----------------------------------

First Step … PCA is conducted on each of all-possible pairs of objects:

Value of first eigenvalue is noted, in each analysis. That pair of objects which can be combined best are identified:

Here, objects X1 & X2 have largest first eigenvalue of any pair of objects, at this step (1.5548),

They are then joined and treated as a single object from here on, named Object #7 (see “Parent” column).

There are now five objects remaining: Original Objects X3, X4, X5 & X6, and Newly formed Object #7, a cluster of X1 & X2.

Total variability in remaining objects is now: T = 1.5548 + 1+ 1+ 1 + 1 = 5.5548 units (or 92.58% of 6).

Page 16: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

--------------------------------TOTAL VARIANCE: 6.00000NUMBER OF INDIVIDUALS: 4955METHOD: CLASSICAL------------------------------------------------------------------------------------------------------- # of T ExplainedStep clusters Child 1 Child 2 Parent T value Variance----------------------------------------------------------------------- 1 5 X1 X2 7 5.5548 92.581% 2 4 X5 X6 8 5.1077 85.128% 3 3 X4 8 9 4.4912 74.853% 4 2 X3 9 10 3.8086 63.477% 5 1 7 10 11 2.6060 43.433%-----------------------------------------------------------------------

PCA of X5 & X6 Rotation: (unrotated = principal) ----------------------------------------- Component | Eigenvalue Difference -------------+--------------------------- Comp1 | 1.55286 1.10573 Comp2 | .447136 . ------------------------------------------

Eigenvectors ---------------------------------- Variable | Comp1 Comp2 -------------+-------------------- X5 | 0.7071 0.7071 X6 | 0.7071 -0.7071 ----------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 16

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Here’s the clustering process:Here’s the clustering process:

Second Step … PCA is conducted on each of all-possible remaining pairs of objects:

Value of first eigenvalue is noted, in each analysis. That pair of objects which can be combined best are identified:

Here, objects X5 & X6 have largest first eigenvalue of any pair of objects, at this step (1.5529),

They are then joined and treated as a single object from here on, named Object #8 (see “Parent” column).

There are now four objects remaining: Original Objects X3 & X4, and Object #7 & newly formed Object #8, a cluster of X5 & X6.

Total variability in remaining objects is now: T = 1.5548 + 1 + 1+ 1.5529 = 5.1077 units (or 85.13% of 6).

Page 17: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

--------------------------------TOTAL VARIANCE: 6.00000NUMBER OF INDIVIDUALS: 4955METHOD: CLASSICAL------------------------------------------------------------------------------------------------------- # of T ExplainedStep clusters Child 1 Child 2 Parent T value Variance----------------------------------------------------------------------- 1 5 X1 X2 7 5.5548 92.581% 2 4 X5 X6 8 5.1077 85.128% 3 3 X4 8 9 4.4912 74.853% 4 2 X3 9 10 3.8086 63.477% 5 1 7 10 11 2.6060 43.433%-----------------------------------------------------------------------

PCA of X4, X5 & X6 Rotation: (unrotated = principal) ----------------------------------------- Component | Eigenvalue Difference -------------+--------------------------- Comp1 | 1.93635 1.31481 Comp2 | .621538 .179423 Comp3 | .442115 . -----------------------------------------

Eigenvectors ---------------------------------- Variable | Comp1 Comp2 -------------+-------------------- X4 | 0.5392 0.8298 X5 | 0.6043 -0.2618 X6 | 0.5867 -0.4929 ----------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 17

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Here’s the clustering process:Here’s the clustering process:

Third Step … PCA is conducted on each of all-possible remaining pairs of objects:

Value of first eigenvalue is noted, in each analysis. That pair of objects which can be combined best are identified:

Here, X4 & Object #8 have largest first eigenvalue of any pair of objects, at this step (1.9364),

They are then joined and treated as a single object from here on, named Object #9 (see “Parent” column).

There are now three objects remaining: Original Object X3, and Object #7 & newly formed Object #9, a cluster of X4, X5 & X6.

Total variability in remaining objects is now: T = 1.5548 + 1 + 1.9364 = 4.4912 units (or 74.85% of 6).

Page 18: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

--------------------------------TOTAL VARIANCE: 6.00000NUMBER OF INDIVIDUALS: 4955METHOD: CLASSICAL------------------------------------------------------------------------------------------------------- # of T ExplainedStep clusters Child 1 Child 2 Parent T value Variance----------------------------------------------------------------------- 1 5 X1 X2 7 5.5548 92.581% 2 4 X5 X6 8 5.1077 85.128% 3 3 X4 8 9 4.4912 74.853% 4 2 X3 9 10 3.8086 63.477% 5 1 7 10 11 2.6060 43.433%-----------------------------------------------------------------------

PCA of X3, X4, X5, X6

Rotation: (unrotated = principal) ----------------------------------------- Component | Eigenvalue Difference -------------+--------------------------- Comp1 | 2.25375 1.53467 Comp2 | .719086 .124064 Comp3 | .595022 .162881 Comp4 | .432141 . -----------------------------------------

Eigenvectors ---------------------------------- Variable | Comp1 Comp2 -------------+-------------------- X3 | 0.4499 0.7759 X4 | 0.4700 -0.5791 X5 | 0.5337 -0.2340 X6 | 0.5402 0.0889 ---------------------------------- © Willett, Harvard University Graduate School of Education,

04/22/2023S052/III.2(a) – Slide 18

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Here’s the clustering process:Here’s the clustering process:

Fourth Step … PCA is conducted on each of all-possible remaining pairs of objects:

Value of first eigenvalue is noted, in each analysis. That pair of objects which can be combined best are identified:

Here, X3 & Object #9 have largest first eigenvalue of any pair of objects, at this step (2.2538),

They are then joined and treated as a single object from here on, named Object #10 (see “Parent” column).

There are now two objects remaining: Object #7 & newly formed Object #10, a cluster of X3, X4, X5 &

X6. Total variability in remaining objects is now:

T = 1.5548 + 2.2538= 3.8086 units (or 63.48% of 6).

Page 19: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

--------------------------------TOTAL VARIANCE: 6.00000NUMBER OF INDIVIDUALS: 4955METHOD: CLASSICAL------------------------------------------------------------------------------------------------------- # of T ExplainedStep clusters Child 1 Child 2 Parent T value Variance----------------------------------------------------------------------- 1 5 X1 X2 7 5.5548 92.581% 2 4 X5 X6 8 5.1077 85.128% 3 3 X4 8 9 4.4912 74.853% 4 2 X3 9 10 3.8086 63.477% 5 1 7 10 11 2.6060 43.433%-----------------------------------------------------------------------

PCA of X1, X2, X3, X4, X5 & X6 Rotation: (unrotated = principal) ----------------------------------------- Component | Eigenvalue Difference -------------+--------------------------- Comp1 | 2.60599 1.39439 Comp2 | 1.2116 .498802 Comp3 | .712803 .117618 Comp4 | .595185 .147419 Comp5 | .447766 .0211189 Comp6 | .426647 . -----------------------------------------

Eigenvectors ---------------------------------- Variable | Comp1 Comp2 -------------+-------------------- X1 | 0.3472 0.6182 X2 | 0.3617 0.5950 X3 | 0.3778 -0.3021 X4 | 0.4144 -0.1807 X5 | 0.4727 -0.2067 X6 | 0.4591 -0.3117 ----------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 19

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Here’s the clustering process:Here’s the clustering process:

Fifth Step … PCA is conducted on each of all-possible remaining pairs of objects:

Value of first eigenvalue is noted, in each analysis. That pair of objects which can be combined best are identified:

Here, Object #7 & Object #10 have largest first eigenvalue of any pair of objects, at this step (2.6060),

They are then joined and treated as a single object from here on, named Object #11 (see “Parent” column).

There is now one object remaining: Newly formed Object #11, a cluster of X1, X2, X3, X4, X5 & X6.

Total variability in remaining objects is now: T = 2.6060 = 2.6060 units (or 43.43% of 6).

Page 20: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

-------------------------------------------------------------- # of T ExplainedStep clusters Child 1 Child 2 Parent T value Variance-------------------------------------------------------------- 1 5 X1 X2 7 5.5548 92.581% 2 4 X5 X6 8 5.1077 85.128% 3 3 X4 8 9 4.4912 74.853% 4 2 X3 9 10 3.8086 63.477% 5 1 7 10 11 2.6060 43.433%--------------------------------------------------------------

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 20

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

S052/III.2(a): Exploratory Cluster Analysis of Variables An Example of The Cluster Analysis of Variables

Tying it all together … Tying it all together …

025

50%

Une

xpla

ined

Var

ianc

e

Have high st~g Continually ~b Successful i~s Waste of tim~c Look forward~o Time satisfi~bVariables

Clustering around Latent Variables (CLV)

X1 X4 X5 X6X2 X3

100% - 92.58% = 7.42%

100% - 85.13% = 14.87%

100% - 74.85% = 25.15%

100% - 63.48% = 36.42%

100% - 43.43% = 54.57%

Vertical axis displays the percentage of the total standardized variance in the original indicators that is not contained in the composites formed at

this level of clustering.

Vertical axis displays the percentage of the total standardized variance in the original indicators that is not contained in the composites formed at

this level of clustering.

Page 21: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 21

S052/III.2(a): Exploratory Cluster Analysis of Variables How Does Today’s Topic Map Onto The Printed Syllabus?S052/III.2(a): Exploratory Cluster Analysis of Variables

How Does Today’s Topic Map Onto The Printed Syllabus?

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Which Strategy For Forming Composites Of Multiple Indicators Is The “Best”?

(Slide 22).

Page 22: More details can be found in the “ Course Objectives and Content ” handout on the course webpage

© Willett, Harvard University Graduate School of Education, 04/22/2023

S052/III.2(a) – Slide 22

S052/III.2(a): Exploratory Cluster Analysis of Variables So, From Among This Surfeit Of Riches, Which Compositing Strategy To Choose?

S052/III.2(a): Exploratory Cluster Analysis of Variables So, From Among This Surfeit Of Riches, Which Compositing Strategy To Choose?

Use more than one component

as several optimal

composites

Obtain clusters of indicators using

PROC VARCLUS

Form aclassical

composite from the standardized indicators

Use the first principal component as the optimal composite

Form aclassical

composite from the raw

indicators

Obtain clusters of indicators by inspecting simultaneous

plots of the “important” eigenvectors