mingzhu lu department of ece university of texas at san antonio this is a joint work with dr. c. l....

Mingzhu Lu

Department of ECE

University of Texas at San Antonio

This is a joint work with Dr. C. L. Philip Chen, Dr. Long Chen and Dr. Yufei Huang

Study on Statistical Machine Learning-Kernel Methods for Intelligent System and

Their applications

Mingzhu Lu, C. L. Philip Chen, Long Chen and Yufei Huang @ UTSA

Outline

• Motivation

• Introduction to Kernel Methods

• Kernel Method for Multi-agent System

• Multiple Kernel Learning for SVM with GA and PSO

• Multiple Kernel Fuzzy c-means for Image Segmentation

• Multiple Kernel Gaussian Process for miRNA Target Prediction (In process)

• Conclusions

• Future Work


Motivation

• Limited by traditional natural resources and power demand pressure, there is a great need to involve DER to the existing central power plant.

• With the increase of system scales, how to build an stable, reliable and intelligent power system with learning capability becomes very important.

• Existing intelligent system lacks the learning capability.

• Kernel method is one of the most popular area of machine learning recently.


Outline

• Motivation

• Introduction to Kernel Methods• Kernel Method for Multi-agent System



• Multiple Kernel Gaussian Process for miRNA Target Prediction

• Conclusions

• Future Work


Kernel Methods

• Kernel methods are machine learning methods employing positive definite kernels.

• Kernel trick: a mapping function from input space to a high-dimensional feature space, which is often a nonlinear function denoted by .

2x

Input Space1x

1iy

1iy

Feature Space

),( iK xx

1iy

1iy

),( iK xx


Commonly Used Kernel methods and functions

• Well-known kernel methods

– Support Vector Machine(SVM)

– Gaussian Process

– Principal components anlysis

…..any algorithm which involves “Kernel trick”

• Common used kernels

– Polynomial

– Radial basis function

– Two-layer NN

…..

( ) pnx y c

2exp( || || )x y

tanh( ), ,|| || 1n nvx y c c v x


Outline

• Motivation


• Kernel Method for Multi-agent System• Multiple Kernel Learning for SVM with GA and PSO



• Conclusions

• Future Work


Distributed Power System

Geothermal Power Plant

Traditional Central Power Plant

Steam/Gas Turbines

Cogeneration

Power storage systemPower SupplyPower demand

Solar Power Plant

PV Panels/Heat engines

DC/DCConverter

Wind generation system

Wind turbines

AC/DCConverter

Tidal Power Plant

Tidal Stream Generators

DC/DCConverter

…...

Other Renewable Energy Plants

(Geothermal, Wave, Biofuel, Biomass, etc)

The structure of the hybrid distributed power system (DPS)


Architecture for LADA-DPS

…...Microgrid agent

WGS clusteragent

Geothermal PP cluster agent

Solar PP clusteragent

Wave PP cluster agent

WGS agent 1

WGS agent 2

WGS agent N

SPP agent 1

SPP agent 2

SPP agent N

WPP agent 1

WPP agent 2

WPP agent N

PSS

agen

t

Cog

ener

atio

n

agen

t

WG

S ag

ent N

+1

…. .. W

PP

agen

t N+1 GPP

agent 1GPP

agent 2GPP

agent N

…...

Microgrid clutseragent

Super WPP cluster agent

Super WGS Cluster agent

Super SPP cluster agent

Microgrid agent 1

Microgrid agent 2 …...

WGS cluster agent 1

WGS cluster agent 2

…... WGS cluster agent N

Microgrid agent N

Physical layer architecture for large-scale

Physical layer architecture for medium-scale

Overall Architecture for small-scale LADA-DPS

LADA-DPS: LeArning Driven multi-Agent based Distributed Power System


LADA-DPS (Cont.)

Learning Driven Single Agent

Negotiation result signal

Single agent system

Decision

Elicitation Signal

Negotiate feedback Controller

Decision Making Unit Learning agent

Decision-Making Model Learning

Model Profile Data Clean

Feature

Extraction

Other Agents’ Decisions

Decision

TCPP agent PSS agent

Query-ref (store power)

Inform (request failure)

Query-ref (store power)

Confirm (accept request)

Control of LADA-DPS by FIPA Protocol and JADE Platform


Intelligent Fault Diagnosis AgentIntro. of Linear SVM

Given a training set where and .The optimization problem is

The optimal hyper-plane is

1 1 2 2{( ), ( ),..., ( )}N NU x , y x , y x , y nix R

{-1,1}iy

1min ( ) s.t. ( ) 1, 1,2,...,

2 i iw w y w x b i N

0 0( ) ( ) ,f x w x b where

00 1

.N

i i iiw y x

2x

1x

1: x2: xww

WX+b=0WX+b=0


Nonlinear SVM

For the non-linearly separable dataset, the optimization problem becomes

To convert it into an optimization problem with equality

constraints, which is easier to solve. Its dual problem is

After playing kernel trick, the optimal hyper-plane of SVM becomes

1

1min ( ) s.t. ( ) 1 , 1,2,...,

2

N

i i i iiw w C y w x b i N

1 , 1

1

Maximum

Subject to

1( )

2

0; 0, 1,2, ,

N N

i i j i j i ji i j

N

j j ij

W y y x x

y C i N

001

( ) ,N

i i iif x y K x x b


Multi-class lArge marGin lEarning SVM (MAGE-SVM)

Suppose a dataset , where , { is a function from to {-1, 1}}. Given a function , calculate the margin which is denoted by . Then the large margin learning of SVM is formulated as

(1)

MAGE-SVM algorithm:

Given a m classes and N pairs of dataset – Step1: Compute Eq. (1) and obtain the separating function. Split the dataset into two subsets

Up and Un based on the separating hyper-plane with maximum margin among classes.– Step2: Check whether both the subset Up and Un contain only one class, if yes, stop;

otherwise, go to step 3.– Step3: If the subset is a multi-class problem, treat the dataset Up the same as the initial

dataset, go to step 1; Deal with the subset Un similarly.

1 2{ , ,..., }NU x x x nix R f ff

Margin( )f

Maximum (Margin( ))f f

U


Intelligent Fault Diagnosis Agent for Transformer

Table . Data of gas content of the transformer unit: ppm

H2 CH4 C2H6 C2H4 C2H2 Fault class*

14.7 3.8 10.5 2.7 0.2 C1

980 73 58 12 0 C2

181 262 41 28 0 C4

127 107 11 154 224 C3

200 48 14 117 131 C3

21.58 4.81 3.33 20.57 0 C1

220 340 42 480 14 C5

170 320 53 520 3.2 C5

27 90 42 63 0.2 C4

565 93 34 47 0 C2

32.4 5.5 1.4 12.6 13.2 C3

56 286 96 928 7 C5

160 130 33 96 0 C4

650 53 34 20 0 C2

42 98 156 610 0 C5

*C1 , C2, C3 , C4, C5 represents normal, low-energy discharge, high-energy discharge, lower-temperature heating, high-temperature superheating, respectively.

The authors are with the Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX 78249 USA, (e-mail: [email protected]). Partial support from NASA grant, NNC04GB35G is acknowledged.

SVM1

SVM2 SVM3

Class 1 Class 2 Class 3 SVM4

Class 4 Class 5

Fig. MAGE-SVM for fault diagnosis agent of power transformer

Method Diagnosis accuracy

MAGE-SVM 98.9%

1-vs-n SVM 97.8%

1-vs-1 SVM(Max Wins) 97.6%

ANN 92.7%

Table. Mean of 10-fold cross-validation on testing data


Outline

• Motivation






• Conclusions

• Future Work


Why Multiple Kernel Learning?

• Multi-agent system is composed of homogenous or heterogeneous agents.

• If the data is obtained from different resources or methods, they may have different characteristics.

– Take an image for example, the intensity of a pixel is directly obtained from the image itself, but some complicated texture information is often gained from some wavelet filtering of the image.

• The traditional single kernel is not enough for data confusion. This motives us to investigate multiple kernels to deal with the data.


Multiple Kernels

Multiple Kernels: is a combination of kernel functions by different operators, which should be satisfied the Mercer’s theorem. Its representation is

where denotes the kernel function, is the exponent of i-th kernel function and represents the operator between the two kernel functions, which can be addition and multiplication operators.

Without loss of generality, we study the linear multiple kernels, where is the weight.

11( ) ... ( ) mee

com mK K K

( 1,2,..., )iK i m ie

1 1 2 2( , ) ( , ) ... ( , ) ... ( , ) where 1, 2,..., .c i i n nK a K x y a K x y a K x y a K x y i n

ia


Multiple Kernel Learning for SVM

• Traditional multiple kernel learning is based on the training classification accuracy on the training dataset, which suffer from the overfitting problem.

• To improve the performance of SVM, learn a multiple kernel on the training set such at it attains the maximum margin.

Now it becomes an optimization problem.

Maximum (Margin( ))cK cK


Genetic Algorithm Solution

Initialize the chromosome CH and fitness function ( )M j . Specify S , sf , cf , mf , cE , fc , G and sG .

Randomly generate S chromosomes ( 1,2,..., )iCH i S . For each iCH , use the combined kernel function with parameters iCH to calculate and obtain 0b and 0w based on the

training set. Check if 0[( ) ] 1, 1,2,..., .i iy w x b i l , then 1( )M i

w , otherwise ( ) 0M i .

Compare all ( )M j where 1,2,...,j S , then set1,2,...,

(0) ( )maxj S

M M j

and 0 jCH CH

Choose cE chromosomes which are guaranteed to survive to the next generation by sorting ( )M j decreasingly.

Select (1 )fc S members of current population P to add to next new generation SP

according to its fitness value.

Use the specified selection function sf to select the parents in P to crossover to generate the next generation chromosomes. Use specified mf to perform mutation.

SP becomes the current generation, compute the fitness value for each chromosome iCH as step 2 and then sort. Choose the biggest value to compare with (0)M , if greater than (0)M then update (0)M and 0CH

outputs 0CH and (0)M .

Iterations< G ? No

Yes


Simulation on UCI Database

Note: Left table shows the mean of 10 times’ testing accuracy. The single kernel function are Polynomial kernel function, Gaussian kernel function, and Heavy Tailed RBF respectively.


Loo

p un

til

all

part

icle

s ex

haus

t

Initialize all the particles with random position and velocity

vectors set the number of particles is N .

For particle i with position ix calculates fitness value

(margin): fitness( ix )

If ( ) ( )i ifitness x fitness pbest then i ipbest x

Set best of ipbest as gbest , i.e.1,...,

max ( ( ))ii N

gbest fitness pbest

Update all the particles’ velocity 1kiv and Position 1k

ix

Loo

p un

til m

ax it

erat

ion

Stop: giving out optimal solution gbest , which maximizes the margin.

Start

Particle Swarm Optimization Solution


Simulation on UCI Database


Outline

• Motivation






• Conclusions

• Future Work


Fuzzy C-means (FCM)pR

2

1 1

c n miji j

Q u

j ix o

• Given a dataset of size n, X={x1,..,xn} and xj• FCM groups X into c clusters by minimizing the weighted

distance between the data and the cluster centers defined as:

uij is the membership of data xj belonging to cluster i

1 1

N Nm m

ij iji i

u u

j io x

12 1

21

1m

c

ijk

u

i j

i k

x o

x o

• The FCM iteratively updates the prototypes [o1, o2, …, oc] and memberships uij through the equations below:

• This iteration is stopped until the difference between old membership values and the updated new ones is small enough.

• Finally based on resulting uij, we assign data xj into the cluster k, where uik is the largest membership value in all uij (i=1 to c)


the problem with

only kernel function k

Kernel Fuzzy C-Means2

1 1

c n miji j

Q u

j ix o2

1 1( ) ( )

c n miji j

Q u

j ix o

2( ) ( ) j ix o ( ) ( ) ( ) ( ) 2 ( ) ( ) j j i i j ix x o o x o

= k(xj,xj) +k(xi,xi) -2k(xj,oi)

We reformulate the objective function as

• Both the data and the cluster centers are mapped from the original space to a new space by , and we don’t know what exactly is.

But because

1 1( ( , ) ( , ) 2 ( , ))

c n miji j

Q u k k k

j j i i j ix x x x x o

• Because we know k, the new Q has explicit formulation; we can optimize it anyway.

.

Original problem with mapping

Can not be solved solve this problem“kernel trick”


Multiple Kernel Fuzzy C-means (MKFCM)

Kernel fuzzy c-means

Data source 1

Data source n

Data source 2

Kernel function 1

Kernel function 2

Kernel function n

Combined Kernel Fuction

Without loss of generality, the combined kernel is k=wb

1*k1+wb2*k2+…+wb

n*kn

Now the objective function becomes

Then the Lagrange method was used to update coefficients w1, w2, to wn automatically.

2

1 1( )

c n mij L j ii j

Q u

x o


MKFCM for Image Segmentation

• We mathematically proved that the most two successful kernel fuzzy c-means are special cases of MKFC-means.

• They use Gaussian kernel to combine the pixel intensity and spatial information (mean of median of neighborhood pixels). One use the traditional objective function, the other one is

• The objective function of MKFCM is

1 1 1 1 1(1 ( , )) (1 ( ', ))

c n c nm mij j i ij j ii j i j

Q u k x o u k x o

2

1 1( ) ( )

c n mij com comi j

Q u

j ix o1 1

( ( , ) ( , ) 2 ( , ))c n m

ij com com comi ju k k k

j j i i j ix x o o x o

2 2

11 12 (1 (exp( ) exp( ' ))) 2

c n mij j i j ii j

u r x o r x o Q


Simulation on two-textured image

(a) The 2-textured image. (b) The segmentation result of AKFCM_meanf (SA=0.716). (c) The segmentation result of AKFCM_medianf (SA=0.748). (d) The segmentation result of DKFCM_ meanf (SA=0.715). (e) The segmentation result of DKFCM_ medianf (SA=0.747). (f) The segmentation result of MKFCM-K third variant (SA=0.753). (g) The segmentation result of LMKFCM (SA=0.853). (h) MKFCM-K, kcom=k1k2k3 (SA=0.723).

(i)The segmentation result of MKFCM-K, kcom=k1+k2+k3 (SA=0.730).

(j) The segmentation result of KFCM, single intensity kernel (SA=0.720). (k) The segmentation result of KFCM, single spatial kernel (SA=0.709). (l) The segmentation result KFCM, texture kernel (SA=0.763).


Simulation on MRI

(a) MR image and its correct segmentation. From left to right are the integrated MR image, the CSF, the GM and the WM. (b) Segmentation results of AKFCM_meanf. (c) Segmentation results of DKFMC_meanf.


Simulation on MRI (Cont.)

(d) Segmentation results of MKFCM-K_meanf (first variant). (e) Segmentation results of MKFCM-K_poly (f) LMKFCM.


Simulation on MRI (Cont.)

Table . Segmentation accuracy of different methods on the MRI-brain

Fig. Segmentation results of different methods on a PET dog lung image.

(a)PET dog lung. (b) Segmentation result of AKFCM_meanf. (c) Segmentation results of DKFCM_meanf. (d) Segmentation results of LMKFCM. (e) Segmentation results of MKFCM_poly.


Outline

• Motivation






• Conclusions

• Future Work


Background of microRNA

and/or

Single-stranded RNA of about 21-23 nucleotides in length;

Known to regulate more than 20% of human genes;

Each miRNA thought to regulate about few hundreds targets

Play an important role in post-transcription stage in • cell development, • stress responses • viral infection, and cancer

Regulatory modes:• Primary: inhibit translation• Secondary: degrade mRNA

Single-stranded RNA of about 21-23 nucleotides in length;

Known to regulate more than 20% of human genes;

Each miRNA thought to regulate about few hundreds targets

Play an important role in post-transcription stage in • cell development, • stress responses • viral infection, and cancer

Regulatory modes:• Primary: inhibit translation• Secondary: degrade mRNA

mRNA down-regulationProtein down-regulation

mRNABinding site


New Way to View the Binding Status

Fig. Match (bind) ratio for miRNA-122 on Positive Luciferase data Fig. Match (bind) ratio for miRNA-122 on Seed

Mapping (proteomic data)

Fig. Match status for miRNA-122 and mRNA pairs on Positive Luciferase data

Fig. Match status for miRNA-122 and mRNA pairs on proteomic data


BCmicrO Algorithm for miRNA target prediction

• Poor sensitivity and specificity of existing algorithms.

• Poor agreement between the results of different algorithms and yet they achieve similar performance.

• Different algorithms rely on different mechanisms in making prediction, each of which has its own advantages.

• For a gene, given the scores of TargetScan, miRanda and Pictar, provide a final score that represent the probability it is the target of miRNA.

Graphical model of BCmicrO


BCmicrO Performance

Top 25 Top 50 Top 100 Top 150 Top 200 Top 300-60

-50

-40

-30

-20

-10

0

cum

ula

tive s

um

of

pro

tein

fold

change

BCmicrO

PicTar

miRanda

TargetScan

Cumulative sum of protein fold change for different number of top ranked predictions of miR-124

ROC Curve of different miRNA target prediction algorithms


Multiple Kernel Gaussian Processes for miRNA target prediction

String

Kernel

Sequence data (ATCGGGCCTT…)

Expression dataCLIP-Seq data

(binding position info)

RBF Kernel Polynomial

Kernel

Constructing the multiple kernel mean and covariance functions for Gaussian Process

Gaussian Processes

mRNA and miRNA target relationships


Outline

• Motivation






• Conclusions

• Future Work


Conclusions

• Presented the framework for LADA-DPS, which includes MAGE-SVM for fault diagnosis.

• Investigated the Multiple kernels for SVM based on large margin learning to improve its generalize capability.

• Developed Multiple Kernel Fuzzy C-means for image segmentation.

• Create a new way to view the miRNA binding status.

• Proposed multiple kernel Gaussian process for miRNA target prediction. (In process)


Future Work

• Implement multiple intelligent agents, investigate the better way to cooperate to finish a task (Swarm Intelligence).

• Finish the multiple kernel GP and also try some semi-supervised learning methods on the sequencing data.

• Current multiple kernel learning is more rely on the prior experience when choosing kernels for data. How to automatically design kernels, including kernel types and parameters, is challenging and worth to explore.

Expected graduation time: Aug. 2011


Acknowledgements

This work is supported by NASA grant (NNC04GB35G), NSF Grant CCF-0546345 and San Antonio Life Sciences Institute

(SALSI).

Dissertation Committee:Dr. C. L. Philip Chen*Dr. Yufei HuangDr. David AkopianDr. Keying Ye

Lab Mates: Dr. Long ChenMr. Dong Yue

(*Work at University of Macau now)


Published Papers• Mingzhu Lu, C. L. Philip Chen, and Long Chen, “The Design and Reliability Assessment of Learning Model Driven Multi-Agent

based Distributed Power system”, in IEEE Transactions on system, man and cybernetics: Part C (Submitted).

• Long Chen, C. L. Philip Chen, and Mingzhu Lu, “A Multiple-Kernel Fuzzy C-means Algorithm for Image Segmentation”, in IEEE Transactions on system, man and cybernetics: Part B, accepted to appear.

• Daniel R. Boutz, Patrick Colins, Uthra Suresh, Mingzhu Lu, Yufei Huang et al.,” A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122”, in The Journal of Biological Chemistry, accepted to appear.

• Jia Meng, Mingzhu Lu, Yidong Chen, et al., “Robust inference of the context specific structure and temporal dynamics of gene regulatory network,” in BMC Genomics 2010, 11(Suppl 3):S11.

• Mingzhu Lu, Long Chen and C. L. Philip Chen, “Sensitivity Analysis of Parametric t-norm and s-norm based Fuzzy Classification System”, in IEEE conference on system, man and cybernetics, Oct. 2010, Turkey.

• Long Chen, Mingzhu Lu, and C. L. Philip Chen, “Multiple Kernel Fuzzy C-means for Image Segmentation”, in IEEE conference on system, man and cybernetics, Oct. 10-13, 2010, Istanbul, Turkey.

• Dong Yue, Hui Liu, Mingzhu Lu, C. L. Philip Chen, Yidong Chen, Yufei Huang., “A Bayesian Decision Fusion Approach for miRNA Target Prediction”, in ACM International Conference on Bioinformatics and Computational Biology, Aug. 2010, NY, USA.

• Mingzhu Lu, and C. L. Philip Chen, “Optimization of Multiple Kernels for SVM by Genetic Algorithm based on Large Margin Learning” (Poster), in CRA-W Graduate Cohort Workshop, Apr. 2010, Bellevue WA, USA.

• Mingzhu Lu, and C. L. Philip Chen, “The Design of Multi-agent based Distributed Energy System”, in IEEE conference on system, man and cybernetics, Oct. 11-14, 2009, San Antonio, TX, USA, pp. 2001-2006.

• Mingzhu Lu, C.L.Philip Chen, Jianbing Huo, et al, “Multi-Stage Decision Tree based on Inter-class and Inner-class Margin of SVM”, in IEEE conference on system, man and cybernetics, Oct. 2009, San Antonio, TX, USA, pp. 1875-1880.

• Mingzhu Lu, C.L.Philip Chen, Jianbing Huo,” Optimization of Combined Kernel Function for SVM by Particle Swarm Optimization”, in International Conference on Machine Learning and Cybernetics, July 2009, Baoding, China, pp. 1160-1166.

• Mingzhu Lu, C.L.Philip Chen, Jianbing Huo, et al, “Optimization of combined kernel function for SVM based on Large margin learning theory”, in IEEE conference on system, man and cybernetics, 2008,Singapore, pp. 353-358.

• Xizhao Wang, Mingzhu Lu, Jianbing Huo, “Fault Diagnosis of Power Transformer Based on Large Margin Learning Classifier,” In International Conference on Machine Learning and Cybernetics, Dalian, Aug. 2006, Vol.5, pp. 2886-2891.


Thank you for your attention!

Comments and

Suggestions?

mingzhu lu department of ece university of texas at san antonio this is a joint work with dr. c. l....

Documents

philip chen

long chen

kernel trickcommon

psomultiple kernel fuzzy

applicationsmingzhu

yufei huangstudy

learning capability

machine learning methods