finite mixture model of bounded semi- naïve bayesian network classifiers kaizhu huang, irwin king,...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Finite mixture model of Bounded Semi-Naïve Bayesian Network Classifiers
Kaizhu Huang, Irwin King, Michael R. Lyu
Multimedia Information Processing Laboratory
The Chinese University of Hong KongShatin, NT. Hong Kong
{kzhuang, king, lyu}@cse.cuhk.edu.hk
ICANN&ICONIP2003, June, 2003Istanbul, Turkey
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
2
Outline
Abstract Background
Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree
Bounded Semi-Naïve Bayesian Classifiers Mixture of Bounded Semi-Naïve Bayesian Classifiers Experimental Results Discussion Conclusion
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
3
Abstract
Propose a technique for constructing semi-naïve Bayesian classifiers.
It is bounded by the number of variables that can be combined into a node.
It has a less computational cost than the traditional semi-naïve Bayesian networks.
Experiments show the proposed technique is more accurate. Upgrade the Semi-Naïve structure into a mixture
structure The expression power is increased Experiments show the mixture approach outperforms other
types of classifiers
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
4
A Typical Classification Problem
Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
5
Probabilistic Classifiers The classification mapping function is defined as:
The joint probability is not easily estimated from the dataset; Usually, the assumption about the distribution has to be made, e.g., dependent or independent?
a constant for a given x w.r.t. cl
Background
Posterior probability
Joint probability
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
6
Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes
are independent: Classification mapping function
Related Work
(1)
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
7
Related Work
Naïve Bayesian Classifiers NB’s performance is comparable with some state-
of-the-art classifiers even when its independency assumption does not hold in normal cases.
Question: Question: Can the performance be better when the conditional Can the performance be better when the conditional
independency assumption of NB is independency assumption of NB is relaxedrelaxed??
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
8
Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables,
given the class label C.
Related Work
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
9
A tree dependence structure
Related Work
Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables,
given the class variable C.
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
10
A conditional tree
dependency assumption
among variables
A conditional independency
assumption among jointed
variables
Chow & Liu68 developed a
global optimal and polynomial
time cost algorithm
Traditional SNBs are not
well developed like CLT
Summary of Related Work
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
11
Kononenko91 Pazzani96
Local heuristicLocal heuristic
Efficient?
Accurate?
NoInefficient even in
jointing 3 variables
No
Exponential time cost
Problems of Traditional SNBs
Yes
Semi-dependence does not hold
in real cases as wellStrong
Assumption?
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
12
Our Solution
Bounded Semi-Naïve Bayesian Network(B-SNB) Accurate?
We use a global combinatorial optimization method. Efficient?
We find the network based on Linear Programming, which can be solved in polynomial time.
Mixture of B-SNB (MBSNB) Strong assumption?
Mixture structure is a superclass of B-SNB
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
13
Our Solution
Improved significantly
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
14
Jointed variables
Completely covering the variable set without overlapping
Conditional independency
Bounded
Bounded Semi-Naïve Bayesian Network Model Definition
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
15
Large search space
Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K
Hidden principle: When K is small, a K cardinality of jointed variables will be more accurate than
separating them into several jointed variables. Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).
Search space after reduction:
Constraining the Search Space
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
16
How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables)
from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood.
[x] means rounding the x to the nearest integer
Searching K-Bounded-SNB Model
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
17
Relax the previous constraints into 0x1--an integer programming
(IP) problem is changed into a linear programming (LP)
problem
Relax the previous constraints into 0x1--an integer programming
(IP) problem is changed into a linear programming (LP)
problem
No coverage among jointed
variables
All the jointed variables forms the variable set
Rounding Scheme:Rounding LP solution into an IP
Solution.
Rounding Scheme:Rounding LP solution into an IP
Solution.
Global Optimization Procedure
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
18
Mixture Upgrading (using EM)
E STEP
M STEP
, update Sk dby B-SNB method
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
19
Experimental Setup
Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”
Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
20
Overall Prediction Rate(%)
• We set the bound parameter K to 2 and 3.• 2-BSNB means the BSNB model for bounded parameter set to 2.
Experimental Results
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
21
NB vs MBSNB
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
22
BSNB vs MBSNB
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
23
CLT vs MBSNB
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
24
C4.5 vs MBSNB
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
25
Average Error Rate
Average Error Rate Chart
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
26
Observations
Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy
decreases.
Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias;
K=3, the accuracy increases.
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
27
Discussion
When n cannot be divided by K exactly (n mod K)=l, l0, The assumption that all the joined variable has
the same cardinality K will be violated.Solution:
Find an l-cardinality jointed variable with the minimum entropy Do the optimization on the other n-l variables since (n-l mod K) will be
0.
How to choose K ? When the sample number of the dataset is small, a large K may
not get a good performance. A good K should be related to the nature of the datasets. A natural way is to use the cross validation methods to find the
optimal K.
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
28
Conclusion
A novel Bounded Semi-Naïve Bayesian classifier is proposed.
Direct combinatorial optimization method enables B-SNB to have global optimization.
The transformation from IP into an LP problem reduces the computational complexity into a polynomial one.
A Mixture of BSNB is developed Expand the expression power of B-SNB Experimental results show the mixture approach outperforms
other types of classifiers.
ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab
30
Thank you!