ryota tomioka & stefan haufe tokyo tech / tu berlin / fraunhofer first

Post on 22-Feb-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Combined classification and channel/basis selection with L1-L2 regularization with application to P300 speller system. Ryota Tomioka & Stefan Haufe Tokyo Tech / TU Berlin / Fraunhofer FIRST. P300 speller system. Evoked Response. Farwell & Donchin 1988. P300 speller system. - PowerPoint PPT Presentation

TRANSCRIPT

Combined classification and channel/basis selection withL1-L2 regularization with application to P300 speller

system

Ryota Tomioka & Stefan HaufeTokyo Tech / TU Berlin / Fraunhofer FIRST

P300 speller system

EvokedResponse

Farwell & Donchin 1988

P300 speller systemA B C D E FG H I J K LM N O P Q RS T U V W XY Z 1 2 3 45 6 7 8 9 _

A B C D E FG H I J K LM N O P Q RS T U V W XY Z 1 2 3 45 6 7 8 9 _

ER detected!

ER detected!

The character must be “P”

Common approach

Feature extraction

P300 detection

Decoding

e.g., ICA or channel selection

e.g., Binary SVM classifier

e.g., Compare the detector outputs

EEG signal

Feature vector

Detector outpus(6 cols& 6rows)

Decoded character(36 class)

?

?

Lots of intemediate goals!!

Our approach

e.g., ICA or channel selection

e.g., Binary SVM classifier

Compare the detector outputs

Decoding

EEG signal

Decoded character(36 class)

P300 detection

Feature extraction

Define a “detector” fW(X)

Our approach

minimize L(W) + lW(W)

Data-fit Regularization

Regularized empirical risk minimization:

Decoding

EEG signal

Decoded character(36 class)

P300 detection

Feature extraction

Detect P300

Extract structure

Learning the decoding model

• Suppose that we have a detector fw(X) that detects the P300 response in signal X.

f1 f2 f3 f4 f5 f6

f7

f8

f9

f10

f11

f12

This is nothing but learning 2 x 6-class classifier

How we do this

12 2 8 1 3 4 11 9 5 6 10 7 …

Multinomial likelihood f. Multinomial likelihood f.

-log PW(col | Xi) -log PW(row | Xi)+Si=1

nL(w) =

( )

Detector

fW(X) =<W, X>

X#samples

#cha

nnel

s

W#samples

#cha

nnel

s

L1-L2 regularization

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

W#samples

#cha

nnel

s

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

16

(1) Channel selection (linear sum of row norms)

(2) Time sample selection(linear sum of col norms)

(3) Component selection(linear sum of component norms)

The method

minimize L(W) + lW(W)

2 x 6-class multinomial loss L1-L2 regularization

Nonlinear convex optimization with second order cone constraint

Results - BCI competition III dataset II [Albany](1) Channel selection regularizer

l=5.46Subject A:99% (97%)72% (72%)

Subject B:93% (96%)80% (75%)

(Rakotomamonjy & Gigue)

15 repetitions5 repetitions

Results- BCI competition III dataset II [Albany](2) Time sample selection regularizer

l=5.46Subject A:98% (97%) 70% (72%)

Subject B:94% (96%)81% (75%)

(Rakotomamonjy & Gigue)

15 repetitions5 repetitions

Results- BCI competition III dataset II [Albany](3) Component selection regularizer

15 repetitions5 repetitions

l=100Subject A:98% (97%) 70% (72%)

Subject B:94% (96%)82% (75%)

(Rakotomamonjy & Gigue)

Filters(1) Channel selection regularizer

(2) Time sample selection regularizer

(3) Component selection regularizer

Summary

• Unified feature extraction and classifier learning– L1-L2 regularization

• Use decoding model to learn the classifier– 2x 6-class multinomial model

• Solve the problem in a convex regularized empirical risk minimization problem– Nonlinear second-order cone problem(efficient subgradient based optimization routine will

be made available soon!)

top related