disriminanant analysis .final
Post on 30-May-2018
218 Views
Preview:
TRANSCRIPT
-
8/9/2019 disriminanant Analysis .Final
1/44
Discriminant Analysis
Prepared by-
Sumit Jain
-
8/9/2019 disriminanant Analysis .Final
2/44
Introduction-
Discriminant analysis or DA, is a technique for analysing marketingresearch data when criterion or dependent variable is categorical andthe predictor or independent variables are interval in nature . In otherwords, Discriminant analysis is a statistical method thatis used by researchers to help them understand the
relationship between a "dependent variable" and oneor more "independent variables." A dependentvariable is the variable that a researcher is trying toexplain or predict from the values of the independentvariables. Discriminant analysis is similar to regressionanalysis and analysis of variance (ANOVA). Theprincipal difference between discriminant analysis andthe other two methods is with regard to the nature ofthe dependent variable.
-
8/9/2019 disriminanant Analysis .Final
3/44
-
8/9/2019 disriminanant Analysis .Final
4/44
Contd..
It is a statistical technique that is used to classify the dependentvariable between two or more categories. Discriminant analysisalso has a regression technique, which is used for predicting thevalue of the dependent categorical variable.
F test (Wilks lambda) The overall model significance of thediscriminant function is tested by the Wilks lambda test. If theoverall model is significant, then the F test is used to test whetheror not the individual variable means differ from the group mean
function..
-
8/9/2019 disriminanant Analysis .Final
5/44
Examples-
For example, an educational researcher may want toinvestigate which variables discriminate between high schoolgraduates who decide (1) to go to college, (2) to attend a tradeor professional school, or (3) to seek no further training or
education. For that purpose the researcher could collect dataon numerous variables prior to students' graduation. Aftergraduation, most students will naturally fall into one of thethree categories. Discriminant Analysis could then be used todetermine which variable(s) are the best predictors of students'
subsequent educational choice.
-
8/9/2019 disriminanant Analysis .Final
6/44
Another example a medical researcher may record differentvariables relating to patients' backgrounds in order to learnwhich variables best predict whether a patient is likely torecover completely (group 1), partially (group 2), or not at all
(group 3). A biologist could record different characteristics ofsimilar types (groups) of flowers, and then perform adiscriminant function analysis to determine the set ofcharacteristics that allows for the best discrimination betweenthe types.
-
8/9/2019 disriminanant Analysis .Final
7/44
Purpose-
The main purpose of a discriminant function analysis is topredict group membership based on a linear combination of theinterval variables. The procedure begins with a set ofobservations where both group membership and the values of the
interval variables are known. The end result of the procedure is amodel that allows prediction of group membership when only theinterval variables are known. A second purpose of discriminantfunction analysis is an understanding of the data set, as a carefulexamination of the prediction model that results from the
procedure can give insight into the relationship between groupmembership and the variables used to predict group membership.
-
8/9/2019 disriminanant Analysis .Final
8/44
Objectives-
To classify cases into groups using a discriminant predictionequation.
To test theory by observing whether cases are classified aspredicted.
To investigate differences between or among groups.
To determine the most parsimonious way to distinguish amonggroups.
To determine the percent of variance in the dependent variableexplained by the independents.
To determine the percent of variance in the dependent variable
explained by the independents over and above the varianceaccounted for by control variables, using sequentialdiscriminant analysis.
-
8/9/2019 disriminanant Analysis .Final
9/44
To assess the relative importance of the independentvariables in classifying the dependent variable.
To discard variables which are little related to groupdistinctions.
To infer the meaning of MDA dimensions whichdistinguish groups, based on discriminant loadings.
-
8/9/2019 disriminanant Analysis .Final
10/44
Multiple discriminant analysis (MDA) is an extension ofdiscriminant analysis and a cousin of multiple analysis ofvariance (MANOVA), sharing many of the same assumptions andtests. MDA is used to classify a categorical dependent which has
more than two categories, using as predictors a number ofinterval or dummy independent variables. MDA is sometimesalso called discriminant factor analysis or canonical discriminantanalysis.
-
8/9/2019 disriminanant Analysis .Final
11/44
Assumptions in Discriminant analysis-
1. Independence: Each case should be independent of each other.Correlated data cannot be used in discriminant analysis.
2. Adequate sample size: There must be at least two cases for
each category of the dependent variable. However, it isrecommended that there should be at least four or five times asmany cases as independent variables.
3. Interval data: In discriminant analysis, there should be aninterval data for independent variable.
4. Variance: No independents have a zero standard deviation inone or more of the groups formed by the dependent.
-
8/9/2019 disriminanant Analysis .Final
12/44
Contd..
5. Random error: Error terms are assumed to be randomly distributed.
6. Homogeneity of variances: Variance with each group of independentvariables should be equal.
7. Absence of perfect multicollinearity: There should be no perfectmulticollinearitybetween the independent variables.
8. Assumes linearity: The discriminant functions should be linear and relatedto each other.
9. Normally distributed: The predictor variable should be normallydistributed.
-
8/9/2019 disriminanant Analysis .Final
13/44
STEPSSTEPS
-
8/9/2019 disriminanant Analysis .Final
14/44
Key Terms and Concepts-
Discriminating variables: Discriminating variables areindependent variables that are used to predict the dependentvariable. These variables are also called the predictors.
The criterion variable: Dependent variables are also called thecriterion variables.
Discriminant function: The Linear combination of thediscriminating (independent) variable is called the
discriminant function. For example, L = b11 + b22 + + bnxn + c where L= discriminant function, b1= discriminant
coefficients, X= independents variables, and C = constants
-
8/9/2019 disriminanant Analysis .Final
15/44
Number of discriminant functions: For the two groups, there isone discriminant analysis function. For multivariatediscriminant analysis there will be g-1 discriminant function.
The Eigenvalues: This is also called characteristic root, whichtells us the variance explained by each discriminant function.
The discriminant score: By applying discriminant formulas, the
value that comes is called the discriminant score. Thisdiscriminant score helps us to classify the group category.
-
8/9/2019 disriminanant Analysis .Final
16/44
Contd
Cutoff: This is the value which divides the group value into twoparts. When the value of the discriminant score is at thenegative side of the cutoff point, then the group will fall into a
lower category, and when it is at the positive side, the groupwill be at a higher category.
Unstandardized discriminant coefficients: Unstandardizeddiscriminant coefficients are simply like the regression beta,which is used to predict the discriminate score. Standardizeddiscriminant coefficients are used to compare the relativeimportance of the independent variables.
-
8/9/2019 disriminanant Analysis .Final
17/44
TYPES OF DISCRIMINANT ANALYSIS-
LINEAR DISCRIMINANT ANALYSIS
Linear Discriminant model (LDA) is used in the case when
the groups are separable by linear combinations of the discriminatingvariables. If only two features, the separators between objectsgroup will become lines. If the features are three, the separator isa plane and the number of features (i.e. independent variables) ismore than 3, the separators become a hyper- plane. The finalvalue of the Discriminant function will determine the group the
particular observation belongs to. Appropriate threshold valuesand relative significance of individual Discriminant function willlead to the final
outcome/group.
-
8/9/2019 disriminanant Analysis .Final
18/44
Contd..
LDA is closely related to ANOVA (analysis of variance) andregression analysis, which also attempt to express onedependent variable as a linear combination of other features ormeasurements. In the other two methods however, the
dependent variable is a numerical quantity, while for LDA it isa categorical variable (i.e. the class label).
-
8/9/2019 disriminanant Analysis .Final
19/44
Application-
-
8/9/2019 disriminanant Analysis .Final
20/44
Career Counsellors
suppose we have two groups of high schoolgraduates: Those who choose to attendcollege after graduation and those who donot. We could have measured students'
stated intention to continue on to collegeone year prior to graduation. If the meansfor the two groups (those who actually wentto college and those who did not) aredifferent, then we can say that intention to
attend college as stated one year prior tograduation allows us to discriminatebetween those who are and are not collegebound (and this information may be used bycareer counsellors to provide the
appropriate guidance to the respectivestudents).
-
8/9/2019 disriminanant Analysis .Final
21/44
Marketing-
In marketing, discriminant analysiswas once often used to determinethe factors which distinguish
different types of customers and/orproducts on the basis of surveys orother forms of collected data.
Logistic regression or other methodsare now more commonly used. Theuse of discriminant analysis inmarketing can be described by thefollowing steps:
-
8/9/2019 disriminanant Analysis .Final
22/44
Formulate the problem and gatherdata - Identify the salient attributesconsumers use to evaluate products inthis category - Use quantitative
marketing research techniques (suchas surveys) to collect data from asample of potential customersconcerning their ratings of all the
product attributes. The data collectionstage is usually done by marketingresearch professionals. Surveyquestions ask the respondent to rate aproduct from one to five (or 1 to 7, or 1
to 10) on a range of attributes chosen
-
8/9/2019 disriminanant Analysis .Final
23/44
Anywhere from five to twentyattributes are chosen. They couldinclude things like: ease of use,
weight, accuracy, durability,colourfulness, price, or size. Theattributes chosen will vary dependingon the product being studied. The
same question is asked about all theproducts in the study. The data formultiple products is codified andinput into a statistical program suchas R, SPSS or SAS. (This step is the
-
8/9/2019 disriminanant Analysis .Final
24/44
Estimate the Discriminant FunctionCoefficients and determine the statisticalsignificance and validity - Choose theappropriate discriminant analysis method. Thedirect method involves estimating thediscriminant function so that all the predictorsare assessed simultaneously. The stepwisemethod enters the predictors sequentially. Thetwo-group method should be used when thedependent variable has two categories or
states. The multiple discriminant method isused when the dependent variable has three ormore categorical states. Use Wilkss Lambdatotest for significance in SPSS or F stat in SAS.
The most common method used to test validityis to split the sample into an estimation oranalysis sample, and a validation or holdout
-
8/9/2019 disriminanant Analysis .Final
25/44
The estimation sample is used in constructing thediscriminant function. The validation sample is used toconstruct a classification matrix which contains thenumber of correctly classified and incorrectly classifiedcases. The percentage of correctly classified cases is
called the hit ratio.
Plot the results on a two dimensional map, define thedimensions, and interpret the results. The statisticalprogram (or a related module) will map the results.
The map will plot each product (usually in twodimensional space). The distance of products to eachother indicate either how different they are. Thedimensions must be labelled by the researcher. Thisrequires subjective judgement and is often verychallenging.
-
8/9/2019 disriminanant Analysis .Final
26/44
SOCIAL SCIENCES-
Prediction of Elections:
In this case the variables can be various social and economicfactors,
coupled with party effort parameters. Some of these variables canbe as follows
(1)No. of new projects implemented by incumbent party
(2)No. of candidates in fray
(3)National reach of the party (no .of states active in)
-
8/9/2019 disriminanant Analysis .Final
27/44
(4)SEC division of the Electorate (in form of ratios)
(5)Profession wise division of the Electorate
(6)Age wise division of the Electorate.
The variables mentioned above are few of the representative parametersthat might have a bearing on the coming elections. Nowadays another
important parameter is the result of exit polls, which are conducted by
various media agencies. They provide the general expectations of the
electorate in view.
-
8/9/2019 disriminanant Analysis .Final
28/44
Outcome of terrorist attacks with hostages:
With the increasing occurrences of terrorist attacks, it becomesvery important for the law and order enforcing body andgovernments to ensure minimal collateral damage during rescueoperations. Lot of times it can be prudent to predict the
possibility of such an operation going bad i.e. casualty whilerescue. Research on this front has already been initiated. Thebasic hypothesis is based on the fact that various variables maybe good predictors of the safe release or execution of thehostages. Some of these variables are as follows-
-
8/9/2019 disriminanant Analysis .Final
29/44
Contd..
(1)Number of terrorists(2)Strength of their support in the local population(3)Number of weapons and amount of ammunition with the terrorists(4)Type of weapons wielded by the attackers(5)Ratio of terrorists to hostages(6)Whether the terrorists are independent operators or they belong tosome large scale terrorist outfit(7)Time since the hostages were taken(8)Female/male ratio among the hostages(9)Children/adults ratio among the hostages
A careful training with past cases can help the government take a decisionon whether to use force or negotiations to neutralize the terrorist threat.
-
8/9/2019 disriminanant Analysis .Final
30/44
MEDICINE AND DIAGNOSTICS
The application of multivariate analysis, and especiallydiscriminant analysis ,to the study of trace elements in food andenvironmental fields has been largely used in various occasions.In the clinical field, Discriminant analysis has been tentatively
used to improve the predictive value of tomography images indifferential diagnosis between AD and frontotemporal dementia.Similarly, the need for non-invasive, specific and sensitive testled to study whether levels of some proteins considered markersof neuronal degeneration were useful to discriminate between
patients and control groups.
-
8/9/2019 disriminanant Analysis .Final
31/44
Hepatitis Disease Detection
Research has been going in this domain. The basic diagnosticflowchart follows. Here LDA is useful in determining the mostimportant features impacting the advent of the disease. Once thereduction is done, the actual classification is done through a
fuzzy network based classifier. Here the LDA is like a dataconditioning function, instead of being a predictor. Diagram
-
8/9/2019 disriminanant Analysis .Final
32/44
-
8/9/2019 disriminanant Analysis .Final
33/44
-
8/9/2019 disriminanant Analysis .Final
34/44
Contd..
The study hence conducted attained 94.16% accuracy indetection on Hepatitis, which is very high. This would help quickmedication and hence recovery for the patient.
-
8/9/2019 disriminanant Analysis .Final
35/44
INSURANCE COMPANIES
Insolvency prediction (Case study on Spanish Banks)
Unlike other financial problems, there are agreat number of agents facing business failure, soresearch in this topic has been of growing interest
in the last decades. Insolvency, early detection offinancial distress, or conditions leading toinsolvency of insurance companies have been aconcern of parties such as insurance regulators,investors, management, financial analysts, banks,auditors, policy holders and consumers. Thisconcern has arised from the necessity ofprotecting the general public
-
8/9/2019 disriminanant Analysis .Final
36/44
against the consequences of insurersinsolvencies, as well as minimizing the costsassociated to this problem such as theeffects on state insurance guaranty funds or
the responsibilities for management andauditors. It has long been recognized that there needsto be some form of supervision of such entities toattempt to minimize the risk of failure. Nowadays,Solvency II project is intended to lead to the reform ofthe existing solvency rules in European Union. Manyinsolvency cases appeared after the insurance cycles ofthe 1970s and 1980s in the United States and inEuropean Union.
Contd..
-
8/9/2019 disriminanant Analysis .Final
37/44
Contd..
Several surveys have been devoted to identify the main causes ofinsurers insolvency, in particular, the Mller Group Report(1997) analyses the main identified causes of insuranceinsolvencies in the European Union. The main reasons can besummarized as follows: operational risks (operational failurerelated to inexperienced or incompetent management, fraud);underwriting risks (inadequate reinsurance programme andfailure to recover from reinsurers, higher losses due to rapidgrowth, excessive operating costs, poor underwriting process);
insufficient provisions and imprudent
-
8/9/2019 disriminanant Analysis .Final
38/44
Contd..
investments. On the other hand, many insurance companies,specially larger companies, have developed internal risk modelsfor a number of purposes. There is an absence of suchstandardized systems in Spain, where most insurance companieshave internal check mechanism to predict insolvency.
A recent study by academicians from Madrid performed a LDAto predict insolvency of Spanish banks using historical data from72 banks. The data was collected 1,2,3 years prior to theinsolvency. Some of the results of the study are as given below.
click
-
8/9/2019 disriminanant Analysis .Final
39/44
, ,Here Model 1 2 and 3 are predictors with data 1, .2 and 3 years prior to insolvency respectively
-
8/9/2019 disriminanant Analysis .Final
40/44
: ,Table List of Financial Ratios used as variables for the Predictormodel
&BDM DM
-
8/9/2019 disriminanant Analysis .Final
41/44
:Table Final Results of the LDA performed in the three models,From the above results we see that the LDA model was probably not the
,best model to apply here as the accuracy was very low and only
slightly. .better than 0 5 probability in the case of the test cases Maybe some
other.high level classification method would work better here
&BDM
-
8/9/2019 disriminanant Analysis .Final
42/44
In short, Discriminant Analysis is avery useful tool (1) for detecting thevariables that allow the researcher to
discriminate between different(naturally occurring) groups, and (2)for classifying cases into different
groups with a better than chanceaccuracy.
CONCLUSIONS-
-
8/9/2019 disriminanant Analysis .Final
43/44
Reference
www.wikipedia.com
www.books.google.co.in
www.resample.com
www.statsoft.com
www.faculty.chass.ncsu.edu
www.eso.org
http://www.wikipedia.com/http://www.books.google.co.in/http://www.resample.com/http://www.statsoft.com/http://www.faculty.chass.ncsu.edu/http://www.eso.org/http://www.eso.org/http://www.faculty.chass.ncsu.edu/http://www.statsoft.com/http://www.resample.com/http://www.books.google.co.in/http://www.wikipedia.com/ -
8/9/2019 disriminanant Analysis .Final
44/44
top related