1 a statistical mechanical analysis of online learning: seiji miyoshi kobe city college of...

A Statistical Mechanical Analysis of Online Learning:

Seiji MIYOSHIKobe City College of Technology

miyoshi@kobe-kosen.ac.jp

Background (1)

• Batch Learning– Examples are used repeatedly– Correct answers for all examples– Long time– Large memory

• Online Learning– Examples used once are discarded– Cannot give correct answers for all examples– Large memory isn't necessary– Time variant teacher

Can Student be more Clever than Teacher ?

Jan. 2006

BMoving Teacher

JStudent

True Teacher

Jan. 2006

Many Teachers or Few Teachers ?

True teacher

Student

Ensemble teachers

P U R P O S EP U R P O S ETo analyze generalization performance of a model composed of a student, a true teacher and K teachers (ensemble teachers) who exist around the true teacher

To discuss the relationship between the number, the diversity of ensemble teachers and the generalization error

M O D E L (1/4)M O D E L (1/4)True teacher

Student

• J learns B1,B2, ・・・ in turn.

• J can not learn A directly.

• A, B1,B2, ・・・ ,J are linear perceptrons with noises.

Ensemble teachers

Simple Perceptron

Output

Inputs

Connection weights

)sgn(Output1

Output

Inputs

Connection weights

)sgn(Output1

Simple Perceptron

Output

Linear Perceptron

M O D E L (2/4)M O D E L (2/4)

Linear Perceptrons with Noises

M O D E L (3/4)M O D E L (3/4)• Inputs: 　 • Initial value of student:

• True teacher: 　• Ensemble teachers:

• N→∞ (Thermodynamic limit)

• Order parameters– Length of student– Direction cosines

True teacher

Student

Ensemble teachers

Student learns K ensemble teachers in turn.

M O D E L (4/4)M O D E L (4/4)

Gradient method

Squared errors

GENERALIZATION ERRORGENERALIZATION ERROR• A goal of statistical learning theory is to obtain generalization error theoretically.

• Generalization error = mean of errors over the distribution of new input

Simultaneous differential equations in deterministic forms, Simultaneous differential equations in deterministic forms, which describe dynamical behaviors of order parameterswhich describe dynamical behaviors of order parameters

Analytical solutions of order parametersAnalytical solutions of order parameters

GENERALIZATION ERRORGENERALIZATION ERROR• A goal of statistical learning theory is to obtain generalization error theoretically.

• Generalization error = mean of errors over the distribution of new input

Dynamical behaviors of generalization error, Dynamical behaviors of generalization error, RRJJ and and ll

（ η=0.3, K=3, RB=0.7, σA2=0.0, σB

2=0.1, σJ2=0.2 ）

q=1.00

q=0.80q=0.60q=0.49

10 15 20

Student

Ensembleteachers

q=1.00q=0.80q=0.60q=0.49

0 50.2

10 15 20

Analytical solutions of order parametersAnalytical solutions of order parameters

Steady state analysisSteady state analysis （（ tt → → ∞ ∞ ））

・ If η ＜０　 or 　 η＞２

・ If ０＜ η ＜２

Generalization error and length of student diverge.

If η ＜１ , the more teachers exist or the richer the diversity of teachers is, the cleverer the student can become.　

If η ＞１ , the fewer teachers exist or the poorer the diversity of teachers is, the cleverer the student can become.

Steady value of generalization error, Steady value of generalization error, RRJJ and and ll

（ K=3, RB=0.7, σA2=0.0, σB

2=0.1, σJ2=0.2 ）

0 0.5 1 1.5 2

q=1.00q=0.80q=0.60q=0.49

0.5 1 1.5 2

q=1.00q=0.80q=0.60q=0.49

Steady value of generalization error, Steady value of generalization error, RRJJ and and ll

（ q=0.49, RB=0.7, σA2=0.0, σB

2=0.1, σJ2=0.2 ）

0 0.5 1 1.5 2

K=1K=3K=10K=30

0 0.5 1 1.5 2

K=1K=3K=10K=30

CONCLUSIONSCONCLUSIONSWe have analyzed the generalization performance of a student in a model composed of linear perceptrons: a true teacher, K teachers, and the student.

Calculating the generalization error of the student analytically using statistical mechanics in the framework of on-line learning, we have proven that when the learning rate satisfies η<1, the larger the number K is and the more diversity the teachers have, the smaller the generalization error is. On the other hand, when η>1, the properties are completely reversed.

If the diversity of the K teachers is rich enough, the direction cosine between the true teacher and the student becomes unity in the limit of η→0 and K→∞.

1 a statistical mechanical analysis of online learning: seiji miyoshi kobe city college of...

Documents

ws optimizing wp03b overview fra miyoshi

seiji akatsuka usergroupleaderworkshop-11.28.17

proposal for collaboration of agricultural projects...

kobe obesity

coverkobelco-comp.co.jp › products › compressor ›...

seiji ozawa-music director symp y - worldcat

seiji tsuboi - realtime ocean bottom broadband seismograph

april 2006 junichi goto kobe university · kobe university...

[kobe pitapa u c ( ij o (kobe pit:apa visa.fi— kobe...

hpc compiler optimisations renato golin takahiro miyoshi

west japan women’s baseball tournament in miyoshi 11 13

church kobe

a miyoshi- suite for mar convers

akira miyoshi

budan seiji) bunka seiji). -...

kobe luminarie

hyderabad-miyoshi city sister city affiliation

prof. santosh kumar director, sdmc irp kobe japan irp, kobe

gempa kobe

takemasa miyoshi and masaru kunii - university of...