Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function as Feature Choice Criterion in Alpha Procedure
Roman RaderNational Technical University of Ukraine “Kyiv Polytechnical Institute”, Ukraine
Scientific Advisor:Prof. Dr.-Ing. Tatjana Lange
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Contents
● Overview of Alpha Procedure
● Separation power
● Alternative to separation power using Potential Function
● Comparison with original method with visualization
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Intro
First, let's take a high-level overview of Alpha Procedure method.
Alpha Procedure is the pattern recognition algorithm.
Most importrant advantages of it that it's
– Non-parametric
– Can significantly reduce feature space
– Since it operates 2d and 3d spaces it's useful to visualize learning and recognition process.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Input data
AP is supervised learning method, so it requires input feature vectors pk=(x1,x2,...,xn) which are already classified by “trainer” by two classes, let's call them A and B. Since it's non-parametric, no any other data should be provided.
# p1 p2 p3 class
1 X1,1 X1,2 X1,3 A
2 X2,1 X2,2 X2,3 B
3 X3,1 X3,2 X3,3 A
4 X4,1 X4,2 X4,3 B
{(x11,x2
1, ... , xn1,C1
) ,(x12,x2
2, ... , xn2,C2
), ... ,(x1k ,x2
k , ... , xnk ,Ck
)}
x1=(x11, x12, ... , x1n)
x2=(x21,x22, ... ,x2n)
x3=(x31,x32, ... ,x3n)
x4=(x41, x42, ... , x4n)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
Method based on step-by-step selection of the most “powerful” feature.
If we present single feature as an axis and put all training samples on it, we can choose one that separates points best way – method of determining will be described later. It will be most “powerful” feature.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
1. From given features we select one that separates data the best (it will be our basis feature (and also current repere axis)) - f0.
Intersection area
This feature definitely separates data better,so we will use it as repere axis, f0=p1
p1
p2
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
2. Now let's build set of 2D spaces using f0 as first axis and each of remaining features as second axis.
f0
fk
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
2. Let's create new axis that goes through zero point and turn it around the origin by the angle .α
On each step of rotation let's project points on the plane on this axis.
f0
fk
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
2. Let's create new axis that goes through zero point and turn it around the origin by the angle .α
On each step of rotation let's project points on the plane on this axis.
f0
fk
α
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
2. The goal of this step is to find best pair of second axis feature and angle of new axis.
In the end we'll have n-1 pairs of feature and angle, so using “power” metric of data axis we can choose the best pair, and after that on this stage we have two features projected on new axis. This new axis will be our first repère vector - f1.
f0
f1
α1
f1
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
3. On next step we use f1 as basis axis
f1
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
3. On next step we use f1 as basis axisf1
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
3. On next step we use f1 as basis axis
And repeat same procedure as on 2nd step: walk through all remaining features and build 2D space with them, rotating new axis around the origin.
f1
fk
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
3. On next step we use f1 as basis axis
And repeat same procedure as on 2nd step: walk through all remaining features and build 2D space with them, rotating new axis around the origin.
f1
fk
α2
f2
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure: Algorithm
4, 5, … . On next steps we repeat previous until data is separated or no more features remain.
f1
fk
α2
f2
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Alpha Procedure
As described above, Alpha Procedure based on geometric transformations of space which is quite easy to figure out, because doesn't matter how many features are given, method operates only on 2D spaces.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function
Let's disgress from Alpha Procedure and look at Potential Function.
Assume, as previously, that our problem is to classify objects
by classes A and B. Input data contains object's features and “teacher” classification:
x=(x1,x2,... , xn)
{(x11,x2
1, ... , xn1,C1
) ,(x12,x2
2, ... , xn2,C2
), ... ,(x1k ,x2
k , ... , xnk ,Ck
)}
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function
In geometric interpretation, objects are points on space X with coordinats
Assume space
Features are
Then, solution of this problem will be scalar field
which is positive if point should be classified as class A and negative that should be classified as class B
(x1,x2,... , xn)
X=ℝn
xi∈ℝ , i∈1. .n
Φ=Φ(x) ,(Φ∈ℝ)
{' A' ,Φ(x)≥0' B' ,Φ(x)<0
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function
Let's introduce function - so called “kernel”
For fixed point x* value of this function is point in space X. In Physics such kind of functions called potential function – this is the origin of method name. This function defined on space X but depends on the origin of the signal.
On the figure below example of potential function with signal source in the point 0.
K (x ,x *), x , x *∈X
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function
Now, let's introduce functions
In geometric interpretation result of these functions will be superposition of potentials of all points of specific class in the given point x.
Considering features of potential function K, in the resulting plot of functions KA and KB, points located densely will magnify the potential of nearly located points, and will form common region on the plot.
K A=∑x j∈AK (x ,x j)
K B=∑x j∈BK (x , x j)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function
We have now functions that define the “power” of specific class in the point, so we can introduce the recognition function
where x is feacture vector.
That function, which is actually scalar field, will be solution of our problem.
Having the method of calculating Φ(x) we can predict the class of random object x.
Φ(x)=K A(x)– K B(x)Φ(x)=K A(x)– K B(x)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Method of Potential Function
Φ(x)=K A(x)– K B(x)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Separation Power
Now let's return to Alpha Procedure. As mentioned, it's based on choosing the best axis in terms of quality of data separation. Let's elaborate how original Alpha Procedure method do this.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Separation Power
To calculate which feature separates data better, Alpha Procedure offers straighforward way of calculation the separation power: we have to find the intersection area, which is area where objects could not be unambiguously classified by putting “separation point” between A-class points cloud and B-class cloud.
It can be defined as
where l is overall amount of objects and is count of objects in intersection area.
Intersection area
F (pq)=ωq
lωq
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Separation Power
For the given example on the figure, = 3, l=10
So, F = 3/10 = 0.3Intersection area
F (pq)=ωq
l
ωq
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Separation Power
Regardless of the way of calculation, the idea of function F as metric of separation quality gives ability to find other, probably better ways keeping same “interface”.
So, let's fix that 0≤F (pq)≤1
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function as Separation Power
Let's consider a way to use Potential Function as separation power.
For each axis on each step let's find out the separation function In our case it will be 1 argument function, because our data defined on single axis.
To calculate the we have to define the K(x,x*) function. It defines how point influences on the potential depending on the distance to it's location.
Let's introduce the distance, which
will be
Also, let's assume the kernel of
potential function as
Φ=Φ(x)
Φ
ρ=ρ(x , x *)=|x−x*|
K (ρ)=11+aρ2
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function Shape
For kernel function K parameter a should be determined. It defines the shape of potential function and hence, the greater it's value the less influence objects make to neighbours, and the plot of potential function will fit original data better.
Alpha Procedure is non-parametric method, so we have to find a way to keep it untouched, automatic kernel shape determining will be very helpful.
K (ρ)=1
1+aρ2
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function Shape
Let's see how parameter influences kernel shape
K (ρ)=11+aρ2
α=0.5 α=5α=1
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function Shape
To determine the parameter we have to find a way to estimate how accurate kernel separates the data and also prevent overfitting.
In this study we used cross-validation, so that optimal kernel will make less error recognitions on test dataset.
Let's introduce recognition error function:
where is heaviside step function
ξA(Φ)=∑x j∈ Aθ(−Φ(x j))
ξB(Φ)=∑x j∈Bθ(Φ(x j))
θ(x) θ(x)={0, x<01, x≥0
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function Shape
Then, general error of recognition of function with kernel K will be
Now, let's assume kernel parameters (a) as parameters for function K, . Then Фkernel function will be ,
recognition function will be
and error of recognition function will be
Now let's fix function doing currying, because method of recognition does not Фmatter for recognition error function
Φ(x)
ξ(Φ)=ξA(Φ)+ξB(Φ)
K=K (ρ ,a)
Φ=Φ(x ,a)
ξX (Φ ,a)=∑x j∈X
θ(−Φ(x j ,a))
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function Shape
Now let's fix function doing currying, because method of recognition does not Фmatter for recognition error function, we will get
Then, the problem of finding best parameter a reduced to finding the minima of function
ξX (Φ ,a)=∑x j∈X
θ(−Φ(x j ,a))
ξ(a)=ξA(a)+ξB(a)
a=argmina∈[amin , amax ]
ξ(a)
ξ(a)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Potential Function: Conclusion
● The shape of kernel function should be determined
● Having parameter a we have everything to build the recognition function with Фpotential function for specific axis of data.
● Then, we can calculate separation power for the data using recognition function , Фwhich is based on potential function now, so we'll use recognition error function to determine better one.
● All other steps in Alpha Procedure algorithm remain the same, so we kept the “interface” untouched and changed the internal way of determining better feature.
a=argmina∈[amin ,amax]
ξ(a)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study
We described how we can incorporate Potential Function into the Alpha Procedure feature choice algorithm. Let's see now how it influences the method in general.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study
● Potential function is able to split our data in more complicated way then just on two partitions.
● Two partitions division covers only the case with one separation point in the middle of two clouds.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Separation of complex data
Let's consider this case:
Blue circles here can represent any class that is described by the two-sided inequality
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Separation of complex data
On the figure it's generated data sample that shows how PF can split data into three partitions
Intersection area
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Separation of complex data
● Here, we have three partitions, left and right where potential function is positive and central, where potential function is negative.
● This approach gives Alpha Procedure flexibility to solve complex problems where data is located on the axes in complex way and hence, return more accurate predictions.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Robustness
● Outliers can become big trouble for the quality of separation.
● Let's consider “Banknote authentication Data Set” https://archive.ics.uci.edu/ml/datasets/banknote+authentication
On this data outlier extended the intersection area, and wrong classified objects count here is 210.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Robustness
● Now, let's use potential function
● In this case count of wrong-classified objects is 151.
It means that separation quality increased on 28%
210−151210
⋅100%=28%
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Robustness
● This happened because outlier makes influence on the value of potential function only in it's location and neighbours. So, we can't break the cloud of objects just putting one outlier in the middle.
210−151210
⋅100%=28%
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Robustness
● Let's consider same data without outlier, and separate it with original separation power.
● As we see, just one outlier added 17.5% of wrong-classified objects.
210−173210
⋅100%=17.5%
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Robustness
● On the other hand, without outlier potential function returned 149 errors, which is almost same amount as with outlier.
● That means that Potential Function based separation power much more robust than original method.
151−149151
⋅100%=1.3%
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Overfitting● Let's consider “Wine Data Set”
http://archive.ics.uci.edu/ml/machine-learning-databases/wine/
Data on x0 axis of this dataset is not dense and algorithm chose pretty thin shape of the kernel function. Hence, plot of potential function is obviously overfitted.
● Wider kernel would be better for recognition quality overall, but recognition on this specific axis would be worse.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Study: Overfitting
It shows that there're ways to enhance method by improving kernel parameters calculation.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Conslusions: Original separation powerPros Cons
Can uniquely determine the quality of separation of the data on axis by two clouds
Doesn't work on non-linearly separated datasets
Easy implementation! Not robust – single outliers can break calculations
Reliable (doesn't require parametrization for proper work)
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
Conslusions: Potential MethodPros Cons
Can separate complex data disposition, like segmented clouds
Prone to overfit
Robust: outliers hardly can influence error function value
Implementation of this algorithm has O(x2), while original method has O(x)
On most datasets where original method worked well, this method also works and choses same axes as basis. Order of axes is also same or similar.
Potential function tries to separate data the best way it can, but doesn't shows real results if kernel wideness (parameters) are wrong.
Hamburg, 2015 Roman Rader
Method of Potential Function as Feature Choice Criterion in Alpha ProcedureMethod of Potential Function as Feature Choice Criterion in Alpha Procedure
References
● Aizerman M.A., Braverman, E.M., Rozonoer, L.I.: Методы потенциальных (The method of potential functions in функций в теории обучения машин
the theory of machine learning).
Nauka, Moscow, 1970.
● Vasil’ev V.I., Lange T., und Baranoff A.E.: Interpretation of fuzzy terms (in Russian).
VIII Meshdunarodnaya Konferenziya 1999. KDS 99. Kiazjaveli (Krim).
● V.I. Vasil’ev.: The reduction principle in problems of revealing regularities (in Russian).
Cybernetics and System Analysis 5, Part I: 69—81, 2003. Part II: 7—16, 2004