session5 factor analysis handout
TRANSCRIPT
-
7/30/2019 Session5 Factor Analysis Handout
1/16
Factor Analysis
z Several variables are to be studied in multivariate analysis.
Research Studies
z These variables may / may not be mutually independent of
each other
z Some may hold strong correlation with some other
variables. Multi-collinearity may exist among variables.
z Data analysis methods in this situation are called Inter-dependence Methods.
Major Inter-dependence Methods
zFactor Analysis to reduce several
correlated variables into a few
uncorrelated meaningful factors
zCluster Analysis to classify individual
elements of the population into a few
homogeneous groups.
Research Studies
z Several variables are to be studied
z Purpose is to establish a cause-and-effect relationship
z One dependent (effect) variable and several
independent (cause) variables
z Data are obtained on them from sample
z Data analysis methods in such situations are called
Dependence Methods.
-
7/30/2019 Session5 Factor Analysis Handout
2/16
Major Dependence Methods
Dependent Variable
Metric Categorical
Independent Variables Independent Variables
Categorical Metric Categorical Metric
Analysis of Multiple Canonical Multiple
Variance Regression Correlation Discriminant
ANOVA DISCRIMINANT ANALYSIS REGRESSION
Major Dependence Methods
SimilaritiesNumber of One One OnedependentVariables
Number ofindependent Many Many Manyvariables
Differences
Nature of thedependent Metric Categorical MetricVariables
Nature of theindependent Categorical Metric Metricvariables
Multivariate Analysis Methods
Major Multivariate methods:
1. Factor Analysis
2. Cluster Analysis
3. Multivariate Discriminant Anal sis
4. Multivariate Regression Analysis.
-
7/30/2019 Session5 Factor Analysis Handout
3/16
Factor Analysis
z To define the underlying structure among the variables in the analysis.
z Examines the interrelationships among a large number of variables and
then attempts to explain them in terms of their common underlying
dimensions, referred to as factors.
z Examines entire set of inter-dependent relationships without making any
distinction between dependent and independent variables.
zReduces the total number ofvariables in the research study to a smallernumber offactors by combining a few correlated variables into a factor.
What is a Factor
A factor is a linear combination of the
observed original variables V1 ,V2 , . . ,Vn:
Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + WinVn
where
Fi = The ith factor (i = 1, 2,..,m n)Wi = weight (factor score coefficient)
n = number of original variables.
Factor Analysis
z Discovers a smaller set ofuncorrelated factors (m) to
significantly (m
n)
z These factors do not have multi-collinearity, i.e. they
are orthogonal to each other
z They can then be used in further multivariate analysis
(regression or discriminant analysis).
Example # 1
z Evaluate credit card usage & behavior of customers
z Initial set of variables is large: Age, Gender, Marital
Status, Income, Education, Employment Status,
Credit History, Family Background: Total 8 variables
z Fi = Wi1V1 + Wi2V2 + Wi3V3 + . . . + Wi8V8
-
7/30/2019 Session5 Factor Analysis Handout
4/16
Example # 1
Reduction of8 variables into 3 factors (i = 3):
ac or : eavy we g age or age, gen er, mar a s a us
and low weightages to other variables
z Factor 2: Heavy weightage for income, education, employment
status & low weightages to others
z Factor 3: Heavy weightage for credit history & family
background and low weightages to other variables.
Example # 1
These 3 un-correlated factors can be identified by commoncharacteristics of variables with heavy weightages & named
accordingly as follows:
z Factor 1: (age, gender, marital status) as Demographic Status
z Factor 2: (income, education, employment status) as Socio-
economic Status
z Factor 3: (credit history & family background) as Background
Status.
Example # 2
z Evaluate customer motivation for buying a two wheeler
z Initial set of variables is large:
1. Affordable
2. Sense of freedom3. Economical
4. Mans vehicle
5. Feel powerful
6. Friends jealous
7. Feel good to see ad of this brand
8. Comfortable ride
9. Safe travel
10.Ride for three.
Example # 2
Reduction of10 variables to 3 factors:
z Pride: (mans vehicle, feel powerful, sense of freedom, friends
jealous, feel good to see ad of this brand)
z Utility: ( economical, comfortable ride, safe travel)
z Economy: (affordable, ride for three to be allowed)
-
7/30/2019 Session5 Factor Analysis Handout
5/16
Standardize the Data
z Enlist all variables that can be important in resolving theresearch problem
z Collect metric data on each variable from all subjects
sampled
z Convert all data on each variable into standard format
ean: . ev.: s nce eren var a es may ave
different units of measurement
z SPSS / SAS etc. do it automatically.
Standard Normal Distribution
Two Steps in Factor Analysis
Factor Extraction
What Factor Extraction does
1. It determines the minimum numberof factors that can
comfortabl re resent all variables in the research stud
z Obviously, maximum number of factors equals the total number of
variables
2. It converts correlated variables into the desired number of
un-correlated factors
Tool: Principal Component Method.
-
7/30/2019 Session5 Factor Analysis Handout
6/16
Principal Component Method
z SPSS gives inter-variable correlations
z PCM assists checking appropriateness of factor analysis
(Bartletts test)
z Assists checking adequacy of sample size (KMO test)
z Gives initial eigen values
z They determine the minimum numberof factors that can
represent all variables.
z To determine the benefits consumers seek from
Example # 3
z Sample of 30 persons was interviewed
agreement with the following statements using a 7
point scale: (1=Strongly agree, 7=Strongly disagree)
Six Important Variables
V1: Buy a toothpaste that prevents cavities
V2: Like a toothpaste that gives shiny teeth
V3: Toothpaste should strengthen your gums
V4: Prefer toothpaste that freshens breath
V5: Prevention of tooth decay is not an important benefit
V6: Most important concern is attractive teeth
Data obtained are given in the next slide.
Original Data: 30 persons, 6 variablesRESPONDENT
NUMBER V1 V2 V3 V4 V5 V61 7.00 3.00 6.00 4.00 2.00 4.00
2 1.00 3.00 2.00 4.00 5.00 4.00
3 6.00 2.00 7.00 4.00 1.00 3.00
4 4.00 5.00 4.00 6.00 2.00 5.00
5 1.00 2.00 2.00 3.00 6.00 2.00
6 6.00 3.00 6.00 4.00 2.00 4.00
7 5.00 3.00 6.00 3.00 4.00 3.00
8 6.00 4.00 7.00 4.00 1.00 4.00
9 3.00 4.00 2.00 3.00 6.00 3.00
10 2.00 6.00 2.00 6.00 7.00 6.00
11 6.00 4.00 7.00 3.00 2.00 3.00
12 2.00 3.00 1.00 4.00 5.00 4.00
13 7.00 2.00 6.00 4.00 1.00 3.00
14 4.00 6.00 4.00 5.00 3.00 6.00
15 1.00 3.00 2.00 2.00 6.00 4.00
16 6.00 4.00 6.00 3.00 3.00 4.00
17 5.00 3.00 6.00 3.00 3.00 4.00
18 7.00 3.00 7.00 4.00 1.00 4.00
19 2.00 4.00 3.00 3.00 6.00 3.00
. . . . . .
21 1.00 3.00 2.00 3.00 5.00 3.00
22 5.00 4.00 5.00 4.00 2.00 4.00
23 2.00 2.00 1.00 5.00 4.00 4.00
24 4.00 6.00 4.00 6.00 4.00 7.00
25 6.00 5.00 4.00 2.00 1.00 4.00
26 3.00 5.00 4.00 6.00 4.00 7.00
27 4.00 4.00 7.00 2.00 2.00 5.00
28 3.00 7.00 2.00 6.00 4.00 3.00
29 4.00 6.00 3.00 7.00 2.00 7.00
30 2.00 3.00 2.00 4.00 7.00 2.00
-
7/30/2019 Session5 Factor Analysis Handout
7/16
Inter-variable Correlations:
Correlation Matrix from SPSS
Variables V1 V2 V3 V4 V5 V6
V1 1.000
V2 -0.530 1.000
V3 0.873 -0.155 1.000
V4 -0.086 0.572 -0.248 1.000
V5 -0.858 0.020 -0.778 -0.007 1.000
V6 0.004 0.640 -0.018 0.640 -0.136 1.000
Bartletts Test
z For valid factor analysis, many variables must becorrelated with each other
z That means, if each original variable is completely
independent of each of the remaining n-1 variables,
there is no need to perform factor analysis
z i.e. if zero correlation among all variables
z H0: Correlation matrix is unit matrix.
Unit Matrix
V1 V2 V3 ---- ---- Vn
V1 1 0 0 0 0 0
V2 0 1 0 0 0 0
V3 0 0 1 0 0 0
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
Vn 0 0 0 0 0 1
Bartletts Test
z For valid factor analysis, many variables must be correlated
with each other
zH0 : Correlation matrix is unit matrix
z Here, SPSS gives p level < 0.05
z Reject H with 95% level of confidence
z So, correlation matrix is not unit matrix
Conclusion: Factor analysis can be validly done.
-
7/30/2019 Session5 Factor Analysis Handout
8/16
KMO Test
zKaiser-Meyer-Olkin measure of sampling adequacy inthis case= 0.660
z Values of KMO between 0.5 and 1.0 suggest thatsample is adequate for carrying out factor analysis.Otherwise, we must draw additional sample.
z Here, 0.660 > 0.5
zConclusion: Sample is adequate
z Thus, these two tests together confirm appropriatenessof factor analysis.
Initial Eigen Values
Initial Eigen values
Factor Eigen value % of variance Cumulat. % 1 2.731 45.520 45.520 2 2.218 36.969 82.488
3 0.442 7.360 89.848 4 0.341 5.688 95.536
5 0.183 3.044 98.580 6 0.085 1.420 100.000
Eigen Value
z Variance of each standardized variable is 1
z Total variance in study = Number of variables (here 6)
z Fi = W i1V1 + W i2V2 + W i3V3 + . . . . . . . . . . . . . + W i6V6
z Variance explained by a factor is called Eigen Value of that factor
z It depends on (a) weights for different variables and (b)
corre a ons e ween e ac or eac var a e ca e ac or
Loadings)
z Higher the eigen value of the factor, bigger is the amount of
variance explained by the factor.
Principal Component Method
z Each original variable has Eigen value = 1 due to
standardization
z So, factors with eigen value < 1 are no better than a single
variable
z Onl factors with ei en value 1 are retained
z Principal Component Method determines the least number
of factors to explain maximum variance.
-
7/30/2019 Session5 Factor Analysis Handout
9/16
PCM is a Sequential Process
z Selects weights (i.e. factor score coefficients) in such a
manner that the first factor explains the largest portion of the
z F1 = W11V1 + W12V2 + W13V3 + . . . . . . . . . . . + W1nVn
z Then selects a second set of weights for
z F2 = W21V1 + W22V2 + W23V3 + . . . . . . . . . . . + W2nVn
variance, subject to being uncorrelated with first factor
z Process goes on till cumulative variance explained crosses
a desired level, usually 60%.
Two Factors Explain > 60% Variation
Factor Eigen Value % of Variance Cumulative %. . .
2 2.218 36.969 82.488
Conclusion: Number of factors required
to explain >60% variation is 2.
Factor Loadings: Correlation Between
Each Factor & Each Variable
Factor Matrix
.
Variables Factor 1 Factor 2
V1 0.928 0.253
V2 -0.301 0.795
V3 0.936 0.131
V4 -0.342 0.789
- . - .
V6 -0.177 0.871
Factor Rotation
z Initial factor matrix rarely results in factors that can be easily
z Therefore, through a process of rotation, the initial factor
matrix is transformed into a simpler matrix that is easier to
interpret
z It leads to identify which factors are strongly associated with
which original variables.
-
7/30/2019 Session5 Factor Analysis Handout
10/16
Rotation of Factors
Factor Rotation
t e re erence axes o t e actors are tune a out t e originuntil some other position has been reached.
Since unrotated factor solutions extract factors based on howmuch variance they account for, with each subsequent factoraccounting for less variance.
So the ultimate effect of rotating the factor matrix is toredistribute the variance from earlier factors to later ones toachieve a simpler, theoretically more meaningful factor pattern.
Two Rotational Approaches:
1. Orthogonal = axes are maintained at 90 degrees.
2. Oblique = axes are not maintained at 90 degrees.
Orthogonal Factor RotationOrthogonal Factor Rotation
UnrotatedUnrotatedFactor IIFactor II
Rotated Factor IIRotated Factor II+1.0
V1
UnrotatedUnrotatedFactor IFactor I
-1.0 -.50 0 +.50 +1.0
+.50V2
RotatedRotatedFactor IFactor I
-.50
-1.0
V4
V5
UnrotatedUnrotatedFactor IIFactor II OrthogonalOrthogonal
Rotation: Factor IIRotation: Factor II+1.0
1Oblique Rotation:Oblique Rotation:
Oblique Factor RotationOblique Factor Rotation
UnrotatedUnrotatedFactor IFactor I
-1.0 -.50 0 +.50 +1.0
+.50 V2
Factor IIFactor II
ObliqueObliqueRotation:Rotation:Factor IFactor I
-.50
-1.0
V4
V5
OrthogonalOrthogonal
Rotation: Factor IRotation: Factor I
-
7/30/2019 Session5 Factor Analysis Handout
11/16
Orthogonal Rotation Methods:
Quartimax (simplify rows).
Varimax sim lif columns .
Equimax (combination).
Simplification means attempting to making zero values either:
in rows (variables, i.e. maximizing a variables loading on asingle factor) making as many values in rows as close to zero
,
in columns (factors, i.e. making the number of high loading asfew as possible) - making as many values in each column asclose to zero as possible.
Choosing Factor Rotation MethodsChoosing Factor Rotation Methods
Orthogonal rotation methods:
o are the most widely used rotational methods.
o are the preferred method when the research goal is datareduction to either a smaller number of variables or a set of
uncorrelated measures for subsequent use in other
multivariate techniques.
Oblique rotation methods:
o best suited to the goal of obtaining several theoretically
meaningful factors or constructs because, realistically, very
few constructs in the real world are uncorrelated.
Factor Rotation
In rotating the factors, we would like each factor to
variables.
The process of rotation is called orthogonal rotation if
the axes are maintained at right angles
Let us see how it is done.
Illustration of Rotation of Axes
Let us take a simpler illustration
z Su ose factor loadin s of 2 variables on 2 factors:
Factor 1 Factor 2
V1 0.6 0.7
V2 0.5 - 0.5
z Variation explained by V1 = (0.6)2 + (0.7)2 = 0.85
z = 2 + - 2 = . . .
z None of the loadings is too large or too small to reach anymeaningful conclusion
z Let us rotate the two axes & see what happens.
-
7/30/2019 Session5 Factor Analysis Handout
12/16
Graph of Original Loadings
Factor 2 +1V
Factor 1
-1 0 +1
V2
-1
Graph of Rotated Axes (clockwise)
Factor 2 +1V
Factor 1
-1 0 +1
V2
-1
Graph of Rotated Axes
-1 Factor 2V
0
V2
-1 Factor 1 +1
Factor Loadings After Rotation
z Factor loadings of 2 variables on 2 factors:
ac or ac or
V1 -0.2 0.9V2 0.7 0.1
z Variation explained by V1 = (-0.2)2 + (0.9)2 = 0.85
z Variation explained by V2 = (0.7)2 + (0.1)2 = 0.50
z Note that variation explained remains unchanged
z Some of the loadings are too large or too small
zNow, we can reach meaningful conclusion.
-
7/30/2019 Session5 Factor Analysis Handout
13/16
Example # 3:
Factor Loadings after Rotation
Rotated Factor Matrix
Variables Factor 1 Factor 2
V1 0.962 -0.027
V2 -0.057 0.848
V3 0.934 -0.146
V4 -0.098 0.845
- -. .
V6 0.083 0.885
Weightages to Variables for Each Factor
from SPSS
Factor Score Coefficient Matrix
Variables Factor 1 Factor 2
V1 0.358 0.011
V2 -0.001 0.375
V3 0.345 -0.043
V4 -0.017 0.377
V5 -0.350 -0.059
V6 0.052 0.395
Factors: (6 Variables into 3 factors)
F = W V + W V + W V + . . . + W V
In example # 3:
F1 = 0.358V1 0.001V2 + 0.345V3 0.017V4 0.350V5 + 0.052V6
F2 = 0.011V1 + 0.375V2 0.043V3 + 0.377V4 0.059V5 + 0.395V6
Interpretation of Factors
A factor can then be interpreted in terms of the variables that load
.
FACTOR 1 has high coefficients for:
z V1: Buy a toothpaste that prevents cavities
z V3: Toothpaste should strengthen your gums
z V5: Prevention of tooth decay is not an important benefit
(Note: Coefficient is negative)
FACTOR 1 may be labelled as Health Factor.
-
7/30/2019 Session5 Factor Analysis Handout
14/16
Interpretation of Factors
F2 = 0.011V1 + 0.375V2 0.043V3 + 0.377V4 0.059V5 + 0.395V6
FACTOR 2 has high coefficients on:
z V2: Like a toothpaste that gives shiny teeth
z V4: Prefer toothpaste that freshens breath
z V6: Most important concern is attractive teeth
FACTOR 2 may be labelled as Aesthetic Factor
The factors are jointly calledprincipal components.
Conclusion
From the data gathered from 30 respondents on 6,
seek from purchase of a toothpaste are HEALTH and
AESTHETICS
Health has 45.5 % importance
Aesthetics has 36.9 % importance.
Selecting a Surrogate Variable
z Sometimes, we are not willing to discover new factors but
we want to stick to ori inal variables and want to know
which ones are important
z By examining the factor matrix, we could select for each
factor just one variable with the highest loading for that
factor, if possible
z That variable could then be used as a surrogate variable for
the associated factor
Selecting Surrogate Variables
z V1 has highest loading on F1
z So, V1 is surrogate variable for F1
z Similarly V6 could be surrogate for F2
z So, we concentrate on only 2 variables: V1 (Preventing
Cavities) & V6 (Attractive teeth).
-
7/30/2019 Session5 Factor Analysis Handout
15/16
-
7/30/2019 Session5 Factor Analysis Handout
16/16
Assessing Factor LoadingsAssessing Factor Loadings
While factor loadings of +0.30 to +0.40 are minimally acceptable, valuesgreater than + 0.50 are considered necessary for practical significance.
To be considered significant:o A smaller loading (i.e. +0.30) is enough either a larger sample size,
or a larger number of variables being analyzed.
o A larger loading (i.e. + 0.50 and above) is needed for a smaller
sample size.
Statistical tests of significance for factor loadings are generally veryconservative and should be considered only as starting points needed for
including a variable for further consideration.
Interpreting The FactorsInterpreting The Factors
An optimal structure exists when all variables have high loadings only on a
single factor.
Variables that cross-load (load highly on two or more factors) are usually
deleted unless theoretically justified.
Variables should generally have communalities of >0.50 to be retained in
the analysis.
Re-specification of a factor analysis can include options such as:
o deleting a variable(s),
o changing rotation methods, and/or
o increasing or decreasing the number of factors.