neural network control of an inverted pendulum on a...

Neural Network Control of an Inverted Pendulum on a CartVALERI MLADENOV1, GEORGI TSENOV1, LAMBROS EKONOMOU2, NICHOLAS

HARKIOLAKIS3, PANAGIOTIS KARAMPELAS3

1Department of Theoretical Electrical EngineeringTechnical University of Sofia

Sofia 1000, “Kliment Ohridski” blvd. 8; [email protected], [email protected]

2A.S.PE.T.E. - School of Pedagogical and Technological EducationDepartment of Electrical Engineering Educators

Ν. Ηeraklion, 141 21 Athens, [email protected]

3IT Faculty, Hellenic American University12 Kaplanon Str., 106 80 Athens, GREECE

[email protected]

Abstract: - The balancing of an inverted pendulum by moving a cart along a horizontal track is a classic problem in thearea of control. This paper describes two Neural Network controllers to swing a pendulum attached to a cart from aninitial downwards position to an upright position and maintain that state. Both controllers are able to learn thedemonstrated behavior which was swinging up and balancing the inverted pendulum in the upright position startingfrom different initial angles.

Key-Words: - neural networks, inverted pendulum, nonlinear control

1 IntroductionInverted pendulum control is an old and challengingproblem which quite often serves as a test-bed for abroad range of engineering applications. It is a classicproblem in dynamics and control theory and widely usedas benchmark for testing control algorithms (PIDcontrollers, neural networks, fuzzy control, geneticalgorithms, etc).There are many methods in the literature (see [1], [2], [3]and the references there) that are used to control aninverted pendulum on a cart, such as classical controland machine learning based techniques. Rocket guidancesystems, robotics, and crane control are common areaswhere these methods are applicable. The largestimplemented uses are on huge lifting cranes onshipyards. When moving the shipping containers backand forth, the cranes move the box accordingly so that itnever swings or sways. It always stays perfectlypositioned under the operator even when moving orstopping quickly.The inverted pendulum system inherently has twoequilibria, one of which is stable while the other isunstable. The stable equilibrium corresponds to a state inwhich the pendulum is pointing downwards. In the

absence of any control force, the system will naturallyreturn to this state. The stable equilibrium requires nocontrol input to be achieved and, thus, is uninterestingfrom a control perspective. The unstable equilibriumcorresponds to a state in which the pendulum pointsstrictly upwards and, thus, requires a control force tomaintain this position. The basic control objective of theinverted pendulum problem is to maintain the unstableequilibrium position when the pendulum initially startsin an upright position.In this paper we utilize two neural network controllersbased on Multilayer Feedforward Neural Network(MLFF NN) and Radial Basis Function Network(RBFN) to solve the control problem. The objectivewould be achieved by means of supervised learning. Inpractice, the teacher (or supervisor) would be a humanperforming the task. Since a human teacher is notavailable in simulation, a nonlinear controller has beenselected as a teacher.The paper is organized as follows. In the next section wegive a brief description of the mathematical model of thesystem and the teacher controller. Then in chapter 3 theneural network controllers are introduced. Simulationresults are presented in chapter 4 and the conclusions are

Proceedings of the 9th WSEAS International Conference on Robotics, Control and Manufacturing Technology

ISSN: 1790-5117 112 ISBN: 978-960-474-078-9

given in the last chapter.

2 Mathematical modelThe inverted pendulum system on a cart considered inthis paper is given in Figure 1. A cart equipped with amotor provides horizontal motion, while cart position x,and joint angle θ, measurements can be taken via aquadrature encoder.

Figure 1 Inverted pendulum system

The pendulum’s mass “m” is assumed to be equallydistributed on its body, so its center of gravity is at itsmiddle of its length at “ 2l L ”.

No damping or other kind of friction has beenconsidered when deriving the model. The parameters ofthe model that are used in the simulations are below inTable 1.

Table 1 Table of parametersParameters Values

Mass of pendulum, m 0.1 [kg]Mass of cart, M 1 [kg]Length of bar, L 1 [m]

Standard gravity, g 9.81 [m/s2]Moment of inertia of bar

w.r.t its COG2 12 0.0083I mL

[kg*m2]

The system is underactuated since there are two degreesof freedom which are the pendulum angle and the cartposition but only the cart is actuated by horizontal forceacting on it.The equations of motion can either be derived by theLagrangian method or by the free-body diagram method.The inverted pendulum system can be written in statespace form by selecting the states and the input as

1x , 2x , 3x x , 4x x and u F . Thus thestate space equations of the inverted pendulum are

1 2x x

2 2 21 2 1 1 1

2 2 2 2 21

sin sin cos coscos

M m mgl x m l x x x ml x ux

I ml M m m l x

3 4x x (1)

2 2 2 2 21 1 2 1

4 2 2 2 21

sin cos sin

cos

m l g x x I ml mlx x I ml ux

I ml M m m l x

The equilibrium points of the inverted pendulum arewhen the pendulum is at its pending position (stableequilibrium point, i.e. 0 0 0 T

eq xx , where x isany point on the track where the cart moves) and when

the pendulum is at its upright position (unstableequilibrium point, i.e. 0 0 T

eq xx ).The state space equations linearized around

0 0 0 0T T

x x are

1 12 2 22 2 2

2 2

3 322 2

4 42 2 22 2 2

0 1 0 0 0

0 0 0

00 0 0 1

0 0 0

M m mgl mlx xI ml M m m lI ml M m m lx x

ux x

x x I mlm l g

I ml M m m lI ml M m m l

(2)

The eigenvalues of this linearized system with theparameter values given in Table 1, are 1 3.9739 ,

2 3.9739 , 3 0 , 4 0 . Since one of the

eigenvalues 2 3.9739 is positive (i.e. in the right halfplane), the upright equilibrium point is unstable. Theobjective of the control problem is to swing up the

y

x

MF

m

L


ISSN: 1790-5117 113 ISBN: 978-960-474-078-9

pendulum from its pending position or from anotherstate to the upright position and balance it in thatcondition by using supervised learning. The controlproblem has two parts mainly, swinging up andbalancing the inverted pendulum. For swing up controlenergy based methods (i.e. regulating the mechanicalenergy of the pendulum) [1], [2] and for balancingcontrol linear control methods (i.e. PD, PID, statefeedback, LQR, etc.) are common among classicaltechniques. Artificial neural networks, fuzzy logicalgorithms and reinforcement learning [3], [4], [5] areused widespreadly in machine learning based approachesboth for swing up and balancing control phases.A simple controller in order to swing up the pendulumfrom its pendant position can be developed by means of

physical reasoning. A constant force can be applied tothe cart according to the direction of the angular velocityof the pendulum so that the pendulum would swing tillthe vicinity of the upright position. The balancingcontroller is a stabilizing linear controller which caneither be a PD, PID or be designed by pole placementtechniques (e.g. Ackermann) or by LQR. Here a PDcontroller is selected. The switching from the swingingcontroller to the linear controller is done when thependulum angle is between 0 , 20 , or when

340 , 360 . This controller can formally bepresented as below

elseF

ifKK

ifKK

uu

,sgn

29

17,029

0,00

,

0

21

21

(3)

where 0 2F Newton, 1 40K , 1 10K . The closedloop poles obtained with these gains are -10.6026,-4.0315, 0, and 0. The results of the simulationsperformed with this controller with the initial conditions

0 0 0 0 Tx , without any kind of sensor noise oractuator disturbance are given in Figure 2.

0 5 10 15 200

1

2

3

4

5

6

7

time [sec]

pend

ulum

angl

e[ra

d]

Simulation results with controller for infinite track length for x0 = [ 0 0 0]

0 5 10 15 20-10

-5

0

5

10

time [sec]

pend

ulum

angu

larv

eloc

ity[ra

d/s]

0 5 10 15 20-8

-6

-4

-2

0

2

4

time [sec]

cart

posi

tion

[m]

0 5 10 15 20-2

-1.5

-1

-0.5

0

0.5

1

time [sec]

cart

velo

city

[m/s

]

0 2 4 6 8 10 12 14 16 18 20-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

time [sec]

Forc

eac

ting

onth

eca

rt[N

]

Figure 2 Simulation results with controller for infinite track length for x0 = [ 0 0 0]T


ISSN: 1790-5117 114 ISBN: 978-960-474-078-9

It can be observed from the Figure 2 that the pendulumhas been stabilized in the upright position, whereas thecart continues its travel with a constant velocity.

3 Neural network controllersThe swing-up controller is developed by means ofphysical reasoning, which is related to applying aconstant force according to the direction of the angularvelocity of the pendulum. The teacher controller that isused here has only two inputs (i.e. pendulum angle andangular velocity), so it won’t be possible to keep thecart’s travel limited since no cart position (or velocity)feedback is used. However with this teacher controllerselection, it would be easier to visualize and thus tocompare the input/output mappings of the teacher andthe neural network. The input-output map (also calledpolicy) of this nonlinear controller is given in Figure 3.

Figure 3 Input-output map of the teacher controller

Here the horizontal axis represents the pendulum anglesand vertical axis represents the pendulum angularvelocities, and colors represents the actuator forcescorresponding to the pendulum angles and angularvelocities. From the teacher, sensor values (i.e.

pendulum angle and angular velocity) and actuatorvalues (i.e. input force acting on the cart) can berecorded. The simulations are performed by usingMatlab/Simulink and Neural Networks Toolbox. Thesimulations contain two phases, the demonstration andthe execution phases. In both phases the simulations areperformed by using a fixed step solver (ode5, Dormand-Prince) with a fixed step size of 0.1 sec. The design canbe separated into two phases; the teaching and theexecution phases. The teaching phase for eachdemonstration can be presented graphically as it isshown in Figure 4.

Figure 4 Teaching phase for supervised learning

There is sensor noise present in the simulations which isselected as normally distributed random numbers withdifferent initial seeds for each demonstration. Also band-limited white noise is added at the output of thecontroller to represent the different choices that a humandemonstrator can make in the teaching phase, in order tomake the demonstrations less consistent, and therefore amore realistic representation of human behavior.

(5a)

(5b)Figure 5 Execution phase for supervised learning (a) and the corresponding Simulink model (b)


ISSN: 1790-5117 115 ISBN: 978-960-474-078-9

When the teaching phase is finished, the neural networkis generated in order to be added to the model usingMatlab’s gensim command. After that the executionphase shown in Figure 5, starts with the nonlinearcontroller replaced with the neural network, and theactuator disturbance is removed.During the teaching phase 5 examples (ordemonstrations) are used, the variance of the normallydistributed random numbers for sensor noise in eachdemonstration is selected as

22 2180 rad , whichcorresponds to a standard deviation of 1 . The actuatordisturbance, representing the different actions of thedemonstrator has a noise power level of 2 20.1 N anda sampling time of 1 sec, which is close to the freeswinging period of the pendulum. This means that thedemonstrator can make an inconsistent action everysecond with a standard deviation of 0.1 0.316 N .Each demonstration starts from the pending initialposition, 0 0 0 0x , and lasts 15 seconds. Thependulum angles for these 5 demonstrations are given inFigure 6.

Figure 6 Pendulum angles obtained from demonstrations

It can be observed from Figure 6 that in alldemonstrations the pendulum is stabilized in the uprightposition. The input output (I/O) map of the teachercontroller with the demonstrated trajectories in the , portion of the state space on top of it, is given in Figure7. This figure can give a rough idea about which parts ofthe input-output space was visited duringdemonstrations.

4 Simulation resultsIn this chapter we present the results obtained by usingthe training data given in the last chapter.

4.1 Training of the Neural Networks

Figure 7 , trajectories on top of I/O map

4.1.1 Multilayer Feedforward Neural NetworkThe MLFF NN which was selected in order toapproximate the behavior of the teacher (controller) hastwo inputs (pendulum angle and angular velocity) and asingle hidden layer with 25 hidden neurons in it. Thenumber of hidden layer neurons are selected by trial-and-error according to the success of the network in thetesting phase. The activation function that is used in thehidden layer is tan-sigmoid, whereas a linear activationfunction is used in the output layer. The network hasbeen trained using a Levenberg-Marquardtbackpropagation (i.e. trainlm training algorithm fromMatlab) with a constant learning rate of 0.001. Theweights are updated for 500 epochs. The desired m.s.e.levels are selected as the variance of the actuatordisturbances, since the neural network would start fittingthe noise after that value. The performance (mean squareerror) obtained with this network is given in Figure 8. Itcan be observed that the m.s.e. has converged but thedesired mse level has not been reached although them.s.e. level at 500th epoch is close (approx. 0.165) to thedesired one (0.1).The plant input data (i.e. the target for the neuralnetwork), and its approximation obtained by the neuralnetwork at the end of the training session which isrelated to these examples are presented in Figure 9.

Figure 8 Training results with MLFF NN


ISSN: 1790-5117 116 ISBN: 978-960-474-078-9

The green plot is obtained by using Matlab’s simcommand in order to observe how well the neuralnetwork will fit the desired output with the pendulumangles and angular velocities from the demonstrations.

Figure 9 Actuator forces from demonstrations and MLFFNN approximation

A closer look might be taken to the approximation of the3rd demonstration in Figure 9 and it can be observedfrom Figure 10 that the neural network has made a goodapproximation.

Figure 10 Actuator forces from 3rd demonstration andMLFF NN approximation

4.1.2 Radial Basis Function NetworkThe RBFN which was selected in order to approximatethe behavior of the teacher (controller) has two inputs(pendulum angle and angular velocity). The centers ofthe gaussian functions are selected in a supervisedmanner. The training starts initially with no gaussianfunctions in the hidden layer, then gaussian functions areadded incrementally such that the error between thetarget and network’s output becomes smaller. The spread(width of the gaussian functions) parameter is selectedequal to 1, by trial-and-error according to the success ofthe network in the testing phase. The number of gaussianfunctions that are used in the hidden layer are 103. Thedesired m.s.e. levels are selected as the variance of theactuator disturbances, since the neural network wouldstart fitting the noise after that value. The performance(mean square error) obtained with this network is given

in Figure 11. It can be observed that the mse hasconverged but the desired mse level has not beenreached although the m.s.e. level at the end of training isclose (approx. 0.272) to the desired one. The training hasstopped automatically after the radial basis neuronsstarted overlapping.

Figure 11 Training results with RBFN

The plant input data (i.e. the target for the neuralnetwork), and its approximation obtained by the neuralnetwork at the end of the training session which isrelated to these examples can be presented below inFigure 12. The green plot is obtained by using Matlab’ssim command in order to observe how well the neuralnetwork will fit the desired output with the pendulumangles and angular velocities from the demonstrations.

Figure 12 Actuator forces from demonstrations and RBFNapproximation

A closer look might be taken to the approximation of the3rd demonstration in Figure 12 and it can be observedfrom Figure 13 that the neural network has made a fairapproximation.

4.2 Testing of the Neural NetworksThe neural networks that are trained from thedemonstration data are tested in two ways. First a dataset that was not encountered during the training phasewill be used.


ISSN: 1790-5117 117 ISBN: 978-960-474-078-9

Figure 13 Actuator forces from 3rd demonstration andRBFN approximation

This will be performed by simulating the invertedpendulum with initial pendulum angles incremented by

180 (i.e. 1 ) between 0 0, 2 .

The duration of the simulations in the execution phase istaken as 30 seconds. Secondly, the input-output maps ofthe neural networks will be compared in order to be ableto determine how well the network has learned andgeneralized the control policy of the teacher. This willalso help assessing the extrapolation capabilities of thedesigned networks.

4.2.1 Multilayer Feedforward Neural NetworkThe pendulum angles for several initial pendulum anglesfrom simulations in the execution phase are below inFigure 14. The MLFF NN managed to swing up andstabilize the pendulum in 361 out of 361 initial angleswhich corresponds to a success rate of 100 %.

0 5 10 15 20 25 30-0.2

-0.1

0

0.1

0.2

0.3

time [sec]

Pen

dulu

man

gle

[rad]

Initial pendulum angle, 0 = 0.2536

0 5 10 15 20 25 30-1

0

1

2

3

4

5

6

time [sec]

Pen

dulu

man

gle

[rad]


0 5 10 15 20 25 300

1

2

3

4

5

6

7

time [sec]

Pen

dulu

man

gle

[rad]


0 5 10 15 20 25 305.9

6

6.1

6.2

6.3

6.4

time [sec]

Pen

dulu

man

gle

[rad]


Figure 14 Several pendulum angle plots from execution phase

Figure 15 Input-output map of MLFF NN

The input output map of the neural controller can begiven as follows in Figure 15. It can be observed fromFigure 15 that the controller has learned most of theswinging region that is demonstrated to it, and it hasmade generalization errors in some regions marked withcircles. The linear control region on the right and theangle to switch from swinging control to linear controlhas been learned better compared to the one on the left.This might be due to insufficiency of demonstrated datain those regions.

4.2.2 Radial Basis Function NetworkThe pendulum angles for several initial pendulum angles


ISSN: 1790-5117 118 ISBN: 978-960-474-078-9

from simulations in the execution phase are below inFigure 16. The RBFN managed to swing up and stabilizethe pendulum in 360 out of 361 initial angles whichcorresponds to a success rate of 99.7 %.The input output map of the neural controller can begiven as follows in Figure 17. The circles at the corners

of the plot have nearly zero output, this would mostprobably be related to the fact that radial basis functionsgive nearly zero output outside their input range. Thenetwork has learned to separate the swing up region intotwo, even though there are some generalization errors.

0 5 10 15 20-0.1

-0.05

0

0.05

0.1

0.15

time [sec]

Pen

dulu

man

gle

[rad]


0 5 10 15 20-1

0

1

2

3

4

5

6

time [sec]

Pen

dulu

man

gle

[rad]


0 5 10 15 20-1

0

1

2

3

4

5

6

time [sec]

Pen

dulu

man

gle

[rad]


0 5 10 15 206.15

6.2

6.25

6.3

6.35

6.4

time [sec]

Pen

dulu

man

gle

[rad]


Figure 16 Several pendulum angle plots from execution phase

Figure 17 Input-output map of RBFN

4.3 Comparison of the ResultsThe first thing that can be noticed from the results is thatthe interpolation capabilities the networks are quite closeto each other. This can be observed from the fact that allnetworks can swing up and balance the invertedpendulum starting from any initial pendulum anglebetween 0 0, 2 . It can further be examined fromthe input-output maps, since the parts of the input-outputspace which were related to the demonstrated data waslearned quite well by all networks. The least trainingerror was obtained by using a MLFF NN. Theextrapolation capabilities of MLFF NN are better

compared to RBFN. The RBFN also contains morehidden neurons (gaussian functions) compared MLFFNN. A better generalization is also possible for MLFFNN by taking more and different training data.

5 ConclusionThe supervised learning method of the Neural Networkscontrollers consists of two phases, demonstration andexecution. In the demonstration phase necessary data iscollected from the supervisor in order to design a neuralnetwork. The objective of the neural network is toimitate the control law of the teacher. In the executionphase the neural controller is tested by replacing theteacher. The Neural Network control strategy is tested byusing a supervisor, with a control policy that ignores thefinite track length of the cart. By this way, it is easier tounderstand what the neural controller that is used toimitate the teacher does, by comparing the input/output(state-action) mappings using 2D colour plots. Thepolicy of a demonstrator (i.e. designed controller in thiscase) is not a function that can be directly measured bygiving inputs and measuring outputs, so learning thecomplete I/O map and making a perfect generalizationwould not be possible. The complexity of the functionthat would be approximated is also important since thefunction may include both continuous and discontinuous


ISSN: 1790-5117 119 ISBN: 978-960-474-078-9

components (such as switches between different controllaws). All of the designed neural networks controllers(MLFF NN and RBFN) are able to learn thedemonstrated behaviour which was swinging up andbalancing the inverted pendulum in the upright positionstarting from different initial angles. The MLFF NN dida better job by means of generalization and obtaininglowest training error compared to the RBFN.

References:[1] M. Bugeja, “Nonlinear Swing-Up and Stabilizing

Control of an Inverted Pendulum System”, Proc. ofEUROCON 2003 Computer as a Tool, Ljubljana,Slovenia, vol.2, 437- 441, 2003.

[2] K. Yoshida, “Swing-up control of an invertedpendulum by energy based methods”, Proceedings

of the American Control Conference, pp. 4045-4047, 1999.

[3] Darío Maravall, Changjiu Zhou, Javier Alonso,“Hybrid Fuzzy Control of the Inverted Pendulumvia Vertical Forces”, International Journal ofIntelligent Systems, vol. 20, pp. 195–211, 2005.

[4] The Reinforcement Learning Toolbox,Reinforcement Learning for Optimal Control Tasks,G. Neumann, May 2005.http://www.igi.tugraz.at/ril-toolbox/thesis/DiplomArbeit.pdf

[5] H. Miyamoto, J. Morimoto, K. Doya, M. Kawato,“Reinforcement learning with via-pointrepresentation”, Neural Networks, vol. 17, pp. 299–305, 2004.


ISSN: 1790-5117 120 ISBN: 978-960-474-078-9

neural network control of an inverted pendulum on a...

Documents