mining from open answers in questionnaire data
TRANSCRIPT
Mining from Open Mining from Open Answers in Answers in Questionnaire DataQuestionnaire DataKDD 01 San Francisco CA LISA
Copyright ACM 2001
Hang Li*NEC Corporation
Kenji YamanishiNEC Corporation
AgendaAgenda• Analysis of open answers• Rule Analysis
–Classification Rules–Association Rules–Algorithm
• Correspondence Analysis• Mining Result
Presented byJoyce Chen
Analysis of open answers
Automatically summarize open answersAutomatically mine useful information from open answers.Survey Analyzer system to analyze open answers (SA.)
Two statistical learningRule learning (Rule analysis)
Correspondence Analysis
Presented byJoyce Chen
Rule Analysis – Classification Rules
A number of categories containing a number of texts.Automatically acquire rules from the categorizes texts.Classify new texts on the basis of the acquired rules.SA
View each analysis target as a categoryView open answers associated with the target as texts.
Presented byJoyce Chen
Rule Analysis (Cont.)
Presented byJoyce Chen
Rule Analysis – Algorithm (SC)
SA learn classification rules or association rules by Stochastic Complexity (SC)
MLD (Minimum Description Length) principle.Rectangles : 10 open answersAnalysis target: TSome contain a specific word: W△SC > 0 is positive, that is most likely to have given rise to the data.
Presented byJoyce Chen
Correspondence Analysis
Presented byJoyce Chen
Relationship between Rule analysis and Correspondence analysis
Rule analysis Employs a conditional probability model: P(Y|X)Provides the facts in detail. (Table 2, 3).
Correspondence analysis Employs a joint probability model: P(Y, X)Yields the entire structure. (Position map.)
Y : analysis targetX: words
Presented byJoyce Chen
Mining result
With Car Data
With Eye-drap DataWith Beverage Data
Presented byJoyce Chen
Advantage of the mining system
It is much faster and less costly way to summarize or mine from open questions.SA is the first system that can performing rule analysis and correspondence analysis.New statistical learning methodology base on Stochastic Complexity.SA has successfully been used in the mining of various types of questionnaire data.