neural networks for optimization

Neural Networks for

OptimizationBill Wolfe

California State University Channel Islands

Neural Models

• Simple processing units• Lots of them• Highly interconnected• Exchange excitatory and inhibitory signals• Variety of connection architectures/strengths• “Learning”: changes in connection strengths• “Knowledge”: connection architecture• No central processor: distributed processing

Simple Neural Model

• ai Activation

• ei External input

• wij Connection Strength

Assume: wij = wji (“symmetric” network)

W = (wij) is a symmetric matrix

ai ajwij

ei ej

Net Input

eaWnet

i

j

jiji eawnet ai

aj

wij

Vector Format:

Dynamics

• Basic idea:

ai

neti > 0

ai

neti < 0

ii

ii

anet

anet

0

0

netdt

adnet

dt

dai

i

Energy

aeaWaE TT 21

net

netnet

ewew

aEaE

E

n

j

nnj

j

j

n

,...,

,...,

/,....,/

1

11

1

netE

Lower Energy

• da/dt = net = -grad(E) seeks lower energy

net

Energy

a

Problem: Divergence

Energy

net a

A Fix: Saturation

))(1( iiii

aanetdt

da

corner-seeking

lower energy

10 ia

Keeps the activation vector inside the hypercube boundaries

a

Energy

0 1

))(1( iiii

aanetdt

da

corner-seeking

lower energy

Encourages convergence to corners

Summary: The Neural Model

))(1( iiii

aanetdt

da

i

j

jiji eawnet

ai Activation ei External Inputwij Connection StrengthW (wij = wji) Symmetric

10 ia

Example: Inhibitory Networks

• Completely inhibitory– wij = -1 for all i,j– k-winner

• Inhibitory Grid– neighborhood inhibition

Traveling Salesman Problem

• Classic combinatorial optimization problem

• Find the shortest “tour” through n cities

• n!/2n distinct tours

D

D

AE

B

C

AE

B

C

ABCED

ABECD

TSP solution for 15,000 cities in Germany

TSP

50 City Example

Random

Nearest-City

http://www.jstor.org/view/0030364x/ap010105/01a00060/0

An Effective Heuristic for the Traveling Salesman Problem

S. Lin and B. W. Kernighan

Operations Research, 1973

http://www.jstor.org/view/0030364x/ap010105/01a00060/0

Centroid

Monotonic

Neural Network Approach

D

C

B

A1 2 3 4

time stops

cities neuron

Tours – Permutation Matrices

D

C

B

A

tour: CDBA

permutation matrices correspond to the “feasible” states.

Not Allowed

D

C

B

A

Only one city per time stopOnly one time stop per city

Inhibitory rows and columns

inhibitory

Distance Connections:

Inhibit the neighboring cities in proportion to their distances.

D

C

B

A-dAC

-dBC

-dDC

D

A

B

C

D

C

B

A-dAC

-dBC

-dDC

putting it all together:

Research Questions

• Which architecture is best?• Does the network produce:

– feasible solutions?– high quality solutions?– optimal solutions?

• How do the initial activations affect network performance?

• Is the network similar to “nearest city” or any other traditional heuristic?

• How does the particular city configuration affect network performance?

• Is there a better way to understand the nonlinear dynamics?

A

B

C

D

E

F

G

1 2 3 4 5 6 7

typical state of the network before convergence

“Fuzzy Readout”

A

B

C

D

E

F

G

1 2 3 4 5 6 7

à GAECBFD

A

B

C

D

E

F

G

Neural ActivationsFuzzy Tour

Initial Phase


Monotonic Phase


Nearest-City Phase

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

to

ur

len

gt

h

10009008007006005004003002001000iteration

Fuzzy Tour Lengths

centroidphase

monotonicphase

nearest-cityphase

monotonic (19.04)

centroid (9.76)nc-worst (9.13)

nc-best (7.66)2opt (6.94)

Fuzzy Tour Lengthstour length

iteration

12

11

10

9

8

7

6

5

4

3

2

tour length

70656055504540353025201510# cities

average of 50 runs per problem size

centroid

nc-w

nc-bneur

2-opt

Average Results for n=10 to n=70 cities

(50 random runs per n)

# cities

DEMO 2

Applet by Darrell Longhttp://hawk.cs.csuci.edu/william.wolfe/TSP001/TSP1.html

http://hawk.cs.csuci.edu/william.wolfe/TSP001/TSP1.html

Conclusions

• Neurons stimulate intriguing computational models.

• The models are complex, nonlinear, and difficult to analyze.

• The interaction of many simple processing units is difficult to visualize.

• The Neural Model for the TSP mimics some of the properties of the nearest-city heuristic.

• Much work to be done to understand these models.

EXTRA SLIDES

E = -1/2 { ∑i ∑x ∑j ∑y aix ajy wixjy }

= -1/2 {

∑i ∑x ∑y (- d(x,y)) aix ( ai+1 y + ai-1 y) + ∑i ∑x ∑j (-1/n) aix ajx + ∑i ∑x ∑y (-1/n) aix aiy +

∑i ∑x ∑j ∑y (1/n2) aix ajy

}

wix jy =

1/n2 - 1/n y = x, j ≠ i; (row) 1/n2 - 1/n y ≠ x, j = i; (column)

1/n2 - 2/n y = x, j = i; (self)1/n2 - d(x, y) y ≠ x, j = i +1, or j = i - 1. (distance )

1/n2 j ≠ i-1, i, i+1, and y ≠ x; (global )

Brain

• Approximately 1010 neurons• Neurons are relatively simple• Approximately 104 fan out• No central processor• Neurons communicate via excitatory and

inhibitory signals• Learning is associated with modifications of

connection strengths between neurons

Fuzzy Tour Lengths

iteration

tour length

Average Results for n=10 to n=70 cities

(50 random runs per n)

# cities

tour length

01

10W

1,1

1

1,1

1

v

v

a1

a2

1

1

0111

1011

1101

1110

W

a1

a2

1

1

with external input e = 1/2

Perfect K-winner Performance: e = k-1/2

e

e

e

e

e

e

e

e

1

0

initial activations

final activations

1

0

initial activations

final activations

e=½(k=1)

e=1 + ½(k=2)

neural networks for optimization

Documents

y i x j

x y dx

j i row 1n2

y ai

x j y 1n2 aix ajy

y aix ai

x j y aix ajy wixjy

distance 1n2j i