artificial neural networks with applications to power systems_ieee-doc

203
+IEEE I Power Engineering Society A Tutoria r eon I C ura e or s with IC Ion o Edited by Mohamed EI-Sharkawi and I Dagmar Niebur 96 TP 112-0

Upload: panam06

Post on 25-Nov-2015

278 views

Category:

Documents


27 download

TRANSCRIPT

  • +IEEEIPower EngineeringSociety

    A Tutoria r eon IC

    urae or s

    with IC Ion

    o

    Edited by

    Mohamed EI-Sharkawiand IDagmar Niebur

    96 TP 112-0

  • Application of Artificial NeuralNetworks to Power Systems

    Sponsored by:Intelligent Systems Applications Working GroupComputer and Analytical Methods Subcommittee

    Power System Engineering Committee

    Edited byM. A. EI-Sharkawi

    andDagmar Niebur

  • Abstracting is permitted with credit to the source. For copying, reprint, or republication permission, write to IEEE CopyrightsManager, IEEE Service Center, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331. All rights reserved. Copyright 1996 by The Institute of Electrical and Electronics Engineers, Inc.

    IEEE Catalog Number: 96 TP 112-0

    Additional copies can be ordered from

    IEEE Service Center445 Hoes LaneP.O. Box 1331Piscataway, NJ 08855-1331

    1-800-678-IEEE1-908-981-13931-908-981-1721 (Fax)

    Cover page design by Sibylle Hagmann, Pasadena, CA; detail of the photo presenting the "The Thinker" by Auguste Rodin, 1880,reproduced by permission of the Norton Simon Art Foundation, Pasadena, CA.

    ii

  • From the Course Editors

    In recent years new technologies have been introduced to help the engineer analyze anddesign large-scale complex systems. While no single technique has emerged as widelyapplicable to the engineering of such systems, the new tools are recognized as part of thenecessary arsenal of the modern engineer. One of these techniques is artificial neuralnetwork technology, whose primary advantages are in the areas of learning algorithms;on-line adaptation of dynamic systems; quick parallel computation; and intelligentinterpolation of data.

    The purpose of this course is to provide an introduction to artificial neural network(ANN) technology for power system engineers. The tutorial is composed of two parts:The first part gives an overview of ANNs, including network architectures, principles ofoperation, learning rules, advantages and limitations. The objective is to give the readersa working knowledge of ANN including examples.

    The second part of the tutorial deals with specific applications of ANN to power systemproblems, such as load forecasting, security assessment, planning, fault diagnosis andcontrol.

    Artificial neural networks represent a growing new technology as indicated by the widevariety of the proposed applications (e.g. remote sensing, control, forecasting, patternrecognition) and by the development of ANN integrated circuits and hardware modules.The main reasons for this growing activity are the ability of ANNs to learn complexnonlinear relations, and their modular structure which allows parallel processing. Neuralnetworks have been shown to be useful in solving algorithmic type problems and moreimportantly, to tackle problems for which algorithms are not available but significant datais available.

    The tutorial will emphasize practical aspects of ANN design, namely, the selection of theANN architecture for a particular application; the ANN training requirements (in terms ofthe selection of a training set and of a learning scheme); the setting up of the input data(i.e. scaling), and performance evaluation. This goal will be accomplished by presentingand analyzing several case studies.

    The coordinators of this tutorial would like to thank Professor Robert Fischl for hisinvolvement during the early stages of the tutorial planning.

    Mohamed A. El-Sharkawi and Dagmar Niebur

    iii

  • For further information, please contact

    Mohamed A. EI-Sharkawi

    Department of Electrical EngineeringBox 352500University of WashingtonSeattle, WA 98195-2500Phone: (206) 685-2286Fax: (206) 543-3842e-mail: [email protected]

    Dagmar Niebur

    Jet Propulsion LabM.S. 303-310California Inst. of Technology4800 Oak Grove Dr.Pasadena, CA 91109-8090Phone: (818) 354-1739Fax: (818) 393-5013e-mail: [email protected]

    ;v

  • List ofContributorso Historical Perspective on Neural-Net Computing

    By Yoh-Han Pao, Case Western University

    o Introduction to Concepts in Artificial Neural NetworksBy Dagmar Niebur, Jet Propulsion Laboratory

    o Artificial Neural Networks: Supervised ModelsBy Robert J Marks, University ofWashington

    o An Example of Unsupervised Networks Kohonen's Self-Organizing Feature MapBy Dagmar Niebur, Jet Propulsion Laboratory

    o Neural Network and its Ancillary Techniques as Applied to Power SystemsBy M A. El-Sharkawi, University ofWashington

    o State-of-the-art Overview on Artificial Neural Networks in Power SystemsBy H Mori, Meiji University

    o System Load Forecasting: the US PerspectiveBy Alex Papalexopoulos, Pacific Gas & Electric Co.

    o Short Term Load Forecasting: the International ActivitiesBy Tharam Dillon and S. Sestito, La Trobe University; Antonio Piras, SwissFederal Institute of Technology and Thomas Czernichow, National Institute ofTelecommunication, Paris

    o Security Assessment and EnhancementBy: Robert Fischl, Drexel University; Dagmar Niebur, Jet Propulsion Laboratory;M A. EI-Sharkawi, University ofWashington

    D Planning Tasks in Power SystemsBy H Sasaki, Hiroshima University

    D System Fault DiagnosisBy Edmund Handschin, Dietmar Kuhlmann, University of Dortmund, Institute ofElectrical Energy Systems; and Wolfgang Hoffmann, ZED0, R&D CompanyDortmund.

    D Control of Power SystemsBy Kwang Lee, Pennsylvannia State University

    o Additional References on Artificial Neural Networks and its Power SystemApplicationsBy Dagmar Niebur, Jet Propulsion Laboratory

    v

  • Table of Contents

    Page

    Chapter 1 Historical Perspective on Neural-Net Computing 1

    Chapter 2 Introduction to Concepts in Artificial Neural Networks 7

    Chapter 3 Artificial Neural Networks: Supervised Models 20

    Chapter 4 An Example ofUnsupervised Networks Kohonen's Self- 28Organizing Feature Map

    Chapter 5 Neural Network and its Ancillary Techniques as Applied to 39Power Systems

    Chapter 6 State-of-the-art Overview on Artificial Neural Networks in 51Power Systems

    Chapter 7 System Load Forecasting: the US Perspective 71

    Chapter 8 Short Term Load Forecasting: the International Activities 90

    Chapter 9 Security Assessment and Enhancement 104

    Chapter 10 Planning Tasks in Power Systems 128

    Chapter 11 System Fault Diagnosis 138

    Chapter 12 Control ofPower Systems 150Chapter 13 Additional References on Artificial Neural Networks and its 170

    Power System Applications

    Contributors' Biographies 191

    vii

  • CHAPTER 1

    HISTORICAL PERSPECTIVE ON NEURAL-NET COMPUTING

    Abstract - This introduction focuses on neural-net computingas a computational paradigm and relates it to research activitiesand achievements in several other disparate disciplines, such asphilosophy, neuro-science, psychology, a~d mat,hematics: T.hishistorical perspective is presented to enrich one s appreciationof the intellectual ancestry of this practice, to trace the roots andthe process of its historical development and to encourage thecontinuation of importing good ideas from all relateddisciplines. This discussion can serve as an introductoryoverview of the structure of the practice of neural-netcomputing and perhaps also in retrospect, as a skeletal formaround which new insights might develop.

    1.1. INTRODUCTION

    There is an area of research called Artificial Neural-Networks with a subfield called neural-net computing. Thishistorical perspective is mostly concerned with the latter butit cannot be said that it is entirely detached from the whole.

    That there should be an activity such as neural-netcomputing is of itself a strange thing indeed. How did thiscome about?

    It could be said that the development of neural-netcomputing is motivated in part by our desire to further ourunderstanding of links between mind, brain and behaviour,and also by our desire to improve the power of computer-based automation. But the account is not simple. Neural-netcomputing is truly the result of the confluence of manystreams of thought and activities spread over centuries in timeand over many disciplines in research context. The purposeof this perspective is to provide an overview of the variousfactors which have contributed to the development of neural-net computing; this is so that we might not only be able tobetter appreciate the past but perhaps also have additionalinsights for dealing with the future.

    The distinctive characteristics of neural-net computinginclude the extent and specificity to which it advocates andpractices parallel distributed processing with elementalprocessors, and the power of several functionalities, such assupervised learnin~ and self-organization. This historicalperspective emphasizes the functionality of supervisedlearning.

    An outline of our discussion is shown in Fig. 1.1. Tobegin with, we mention the motivations and stimulationsprovided by the early theoretical philosophical speculationsconcerning the nature of knowledge and our understanding ofknowledge. We can trace our practices ofpattem recognition

    1

    and the learning of multivariate functions to their articulationof nature of perception, knowledge, and learning

    Then we point to a dramatic series of basic .finding~ inneuroanatomy, neurophysiology, and biochemicalpharmacology which led to the development of a 'neurondoctrine' of cellular interconnections of signaling neurons.

    Then we come to the development of neural-net computingitself. There was an initial phase which consisted of a leap offaith. The early linear networks were not of sufficientgenerality but stimulated much interes~ a~d the adapt~velinear networks contributed to advances In SIgnal processmgand pattern recognition.

    In the meantime, along a parallel path, in mathematics,progress had also been made on the matter of therepresentation of continuous functions of several variables bysuperposition of functions of one variable and by sums offunctions. Remarkable results were obtained but those did nothave an influence on the development of neural-netcomputing until later.

    In due time, a number of new results were reported on howthe nonlinear multilayer feedforward net can be trained to'learn' multivariate functions. This brought on a resurgence inthe field of neural-net computing.

    Subsequently, a stream of mathematically orientedinvestigations have shown that even single hidden layerPerceptrons can be universal function approximators, andhave also begun to relate Perceptron-like architectures tonetwork representations of Kolmogorov's results, and to theimproved forms of those results.

    Finally, in addition to the supervised learning functionalityof neural-net computing, we address the matter of self-organization. There the interaction is strong with psychology,or more accurately with cognitive psychology. In addition tocategory formation through clustering in pattern space, thereis the direction taken by the 'feature map'. A potent ideaimported from neural science is that of lateral excitation andinhibition. That mechanism plays a crucial role in the variousmodes of self-organization. In this perspective we have tocontent ourselves with merely alerting ourselves to thewonderful and interesting things yet to come, in matters ofmemory, including the coding of information for storage,binding of information, associative recall, cross-contextremembering and many others.

    A full fledged historical perspective would also addressapplications of neural-net computing to the difficult tasks of

  • understanding and implementing the functions of sight,hearing, smell, and touch.

    The topics listed in Figure 1.1 are discussed in thefollowing sections.

    1.2. THEORETICAL PHILOSOPHICALSPECULATIONS

    The British philosophers, John Locke, George Berkeleyand David Hume, the associationists, also known as theempiricists, had a deep and continuing interest inunderstanding the nature of knowledge and of links betweenmind, memory, and human behaviour. They discussed andwrote on matters such as the nature of knowledge, the natureof perception, association, causation, and memory. By sodoing they articulated the issues. Such might be consideredbe the modem origin of the search for understanding of thelinks between mind and behaviour. Some topics for whichthey are most remembered are listed in Fig. 1.2.

    John Locke (1632-1704) argued that all knowledge beginswith sensory experience upon which the powers of the mindoperate, developing complex ideas, abstractions, and the like.He said that such knowledge known to varying degrees ofcertainty came from inductive generalization. His mostimportant work in this connection is An Essay ConcerningHuman Understanding [1], [2].

    George Berkeley (1685-1753) seemingly agreed with JohnLocke in saying that the only reality that exists are perceiversand perceptions but then ascribed these to be ideas in themind of Divine entity [2], [3].

    David Hume (1711-1776) was extreme in pursuing theempirical philosophy of Locke and Berkeley and was alsoextreme in his skepticism regarding induction. His bestknown work is Inquiry into Human Understanding [2], [4].

    In terms of modem scientific knowledge, much of thesetheoretical speculations might seem to be farfetched andirrelevant, but they are worth noting even when we disagreewith them. They generated some of the earliest discussionsregarding, perceptions, inductive learning, and memory, asexercised by the human mind.

    However it was the neuro-sciences which began to supplyknowledge of links between brain and behaviour.

    1.3. THE NEURON DOCTRINE

    Despite the attractions of theoretical speculations, it mustbe said that progress towards the development of neural-net

    2

    computing did not begin until a truly dramatic set ofexperimental findings were established during the latter partof the 19th century and the early part of the 20th century.

    In neuroanatomy, there were the findings of Camillo Golgiand of Santiago Ramon y Cajal. Golgi [5] developed thesilver impregnation methods that allowed microscopicvisualization of the whole neuron with all its processes: thecell body, the dendrites and the axon. Using the stainingtechniques of Golgi, Cajal [6]-[8]was able to show that thenervous system is not a fused mass of cells but a complexsystem of interconnected discrete cells. Cajal developed theconcept of the nervous system consisting of signalingelements, the neurons.

    Contributions from neurophysiology start a bit earlier,from about the end of the 18th century, when Luigi Galvani[9] reported that the nerve cells of animals produceelectricity. Subsequently, Emil Dubois-Reymond [10] andHerman von Helmholtz [11] found that nerve cells use theirelectrical capabilities for signaling information to oneanother.

    Modem studies of chemical synaptic transmission can betraced to the work of Claude Bernard [12], Paul Ehrlich[13]and John Langley [14]who realized that drugs interactedwith specific receptors on the surface of cells.

    The extreme phrenology theories of Franz Joseph Gall [15]were nevertheless useful in raising consciousness of thepossibility that brain functions might be localized. The worksof Hughlings Jackson [16], Carl Wernicke [17] and Cajal [7]began to establish the view of complex but specific cellularconnections within the brain. Regions of the brain arespecialized for different functions. Even cognitive functionscan be localized within the cerebral cortex. For example, asarticulated by Pierre Paul Broca [18], 'we speak with the lefthemisphere!' .

    In time, these findings from neural science helped toprovide the computationalist with important new conceptualingredients for molding computational structure andparadigms. These investigations and results are listed byname in Figures 1.3 and 1.4.

    The situation is more complex in more recent times.Accumulation of facts continues at a substantial rate. Forexample, a great deal is now known about the anatomicalbasis of sensory perception and motor co-ordination, aboutthe retina and phototransduction, about the anatomy of thecentral visual pathways, about the processing of form andmovement in the visual system, about similar matters in theauditory system, about somatic sensation and about taste andsmell. Even an edited summary of some of the basicknowledge of those matters take on the form of a massivevolume [19]. Neural-net computing would like to emulatethose circuits, but those are quite complex and it seems thatuntil truly parallel processing architectures become more

  • available for experimentation, the 'neural-nets' of neural-netcomputing will not seriously attempt to imitate human-brainor lower animal circuits [20]. This observation is correctdespite exceptionswhich prove the rule [21].

    1.4. THE MATHEMATICAL MOTIVATION

    One distinctive feature of neural-net computing is theextent and specificity to which it is dedicated to paralleldistributed processing, with use of networks ofintercommunicating (adaptive) elementalprocessors. But it isnot the only computational paradigm with that characteristic.What is more distinctive is the success achieved with the useof the multi-layered feedforward net, the Perceptronarchitecture.

    The functionality in question is that of supervisedlearning.The question is whether we can infer the values of a functionover a continuous domain, given only a discrete set of valuesof that function. This task might be viewed as reconstructionof a function, or learning a function, or functionapproximation. The task is very difficult if the function is amultivariate one; but that functionality is very much neededin information processing. It is the essence of modeling,estimation, prediction and other related tasks.

    As indicated in Figure 1.5, Shannon's Theorem [22] statesthat a one-dimensional band-limited function can bereconstructed in total over the entire continuous domain fromvalues of the function at a discrete set of sampling points.Extension of Shannon's Theorem to the multidimensionalcase can be done in a straightforward manner, to yield aprocedure which grows exponentially in computationalcomplexity with (linear) increase in the number ofdimensions; that is one aspect of what is sometimes referredto as the 'curse of dimensions' [23]. It is to exorcise this cursethat neural-net computing had to be invented, and it wasattempted at first in an empirical manner because ofinspiration from the neuron doctrine.

    1.5. THE LINEAR PERCEPTRON: A LEAP OFFAITH

    It is known that one of the most active proponents ofneural-net computing was Rosenblatt [24], [25] whoseemingly assembled ideas from the 'neuron doctrine',Hebbian learning and lateral excitation and inhibition, topropose that a scheme of neuron-like processorsat a 'hidden'association layer and an output layer of 'winner-take-all'threshold logic units could serve as the basis for learning andfor storage of learned functional mappings.

    3

    It was unfortunate for that fledging effort that there werestrong competitors to contend with and Rosenblatt did nothave the opportunityto evolve the linear Perceptron model tothe obvious next step, namely the inclusion of nonlinearassociationlayer.

    From the historical point of view, it is not important tolinger on the belated nature of the development of thePerceptron per se, but it is interesting to ask why it finallyhad a chance to evolve into a valid and useful paradigm,complementary to other well establishedpractices.

    First of all, despite the limitations of the linear Perceptron,Widrow'sAdaline 'neurons' served well as the basis for usefullinear signal processing, and for implementing adaptiveswitching circuits [26], and of course the Widrow-Hoffalgorithm [27], based on such concepts served patternrecognition well as a trainable classifier.

    In addition, researcherssuch as Nilsson [28] and Grossberg[29]-[31] continued to make progress in investigations ofhow nonlinearnetworksmight be configuredand trained.

    It is common knowledge that the publication of anexcellent monograph entitled Perceptron: An Introductionto Computational Geometry by Minsky and Papert in 1965[32] had the mixed result of clarifying the situationconceptually and subduingthat field of endeavor.

    Neural-net computing as a computational paradigmawaited new insights and new vigor, seemingly not to berealized until the mid-seventies. Meanwhile interestingthings had been happening in mathematics.

    1.6. RELATED PROGRESS INMATHEMATICS

    Meanwhile Kolmogorov in 1957 [33] and Sprecher in1964 [34] had proved remarkable results on the structure ofcontinuous of several variables.

    Kolmogorov proved the followingtheorem:

    Theorem (Kolmogorov). There exist fixed continuousincreasing functions 1I'qi (x), on I = [0,1] such that eachcontinuous function f on IS can be written in the form

    where the gq are properly chosen continuous functions ofone variable.

  • Sprecher showed that the 1/Iq; functions could be replacedby A;tP, yielding a stronger version of Kolmogorov'stheorem:

    Theorem (Sprecher). There exists constants Ai and fixedcontinuous increasing functions tPq on I =[0, 1] such thateach continuous function f on IS can be written in the form

    (1.2)

    In this theorem, the gq functions depend on f. Theconstants A;, and the functions l/Jq do not depend on f, andare universal functions which can be used for any ffunction.

    These theorems are diagrammed as feedforward nets inFigures 1.6 and 1.7.

    These nets indicate that it is possible to represent amultivariable function as the sum of a fmite number of singlevariable functions, a marvelous result except for the fact thatwe do not know what the single variables are, and what theone variable functions are; and there is no constructiveprocedure for developing those functions.

    It is not known if those mathematics results had any effecton the evolution of neural-net computing prior to theresurgence stage. In 1987, Hecht-Nielsen [35] drew attentionto the relevance of the Kolmogorov and Sprecher results toneural-net computing. In 1993, Sprecher [36] reported on yetanother formulation of the Kolmogorov theorem and thatformulation is diagrammed in Figure 1.8. But now we returnto the topic of neural-net computing.

    1.7. RESURGENCE OF THE PERCEPTRON

    Werbos [37] in 1974, Lecun [38] in 1985, and Parker [39]in 1987 all reported on methods for training nonlinear neural-nets. It seems however that it remained for Rumelhart,Hinton, and Williams, to report the 'internal representation'approach and the 'Backpropagation' algorithm in a mannerwhich made the Backpropagation net readily understandableand widely popular [40], [41]. The single hidden layerfeedforward net is diagrammed in Figure 1.9.

    The single hidden layer feed forward net withBackpropagation learning is not only practicable formoderate sized problems but has also been shown to becapable of serving as a universal approximator by Funahashi[42], Cybenco [43], Hornik et al [44]. Further importantmathematical studies of the properties of related net

    4

    structures include those of Hornik [45], [46], and Barron[47], [48].

    Both the radial basis function approach and the Functional-link approach can be diagrammed as shown in Figure 1.10.Moody and Darken [49], [50], Niranjan and Fallside [51],and Casdagli [52] used localized Gaussian functions, aslocalized radial basis functions, in a net such as thatdiagrammed in Figure 1.10. Pao and collaborators [53], havesuggested and demonstrated that when appropriate functionsare used for gq' essentially none of the internal parametersneed to be learned and the net reverts to linear learning. Itwas also shown in [54] that a random choice of the internalweight vectors in connection with sigmoidal activationfunctions sufficed to yield good function reconstruction withrapid learning. Igelnik and Pao [55] proved that stochasticchoice of basis functions leads to efficient universal functionapproximation.

    1.8. SELF-ORGANIZATION

    In this brief historical perspective, we limit ourselves to afew remarks in closing to address the intertwined matters ofself-organization, categorization, and associative memories.It would seem that clustering in pattern space is important fordata compression and might lead to the formation ofcategories. The ISODATA and k-nearest neighbor proceduresof pattern recognition would certainly work quite well formany purposes [56]. The ART procedures of Carpenter andGrossberg [57]-[59] might be especially relevant if paralleldistributed processing is actually implemented. But thefeature map mode of self-organization advocated andpracticed by Kohonen [60], [61] seems to be especiallyimportant and of interest if dimension reduction is involved.This is both from an information processing point of viewand also because what such reduced mapping can teach uswith respect to organization of memory.

    In closing, we mention that there are now many good testson neural-net computing. Our own early text [53] was aimedat relating the neural-net computing paradigm to otherestablished activities, such as statistical pattern recognitionand others. More recent focused texts include those by Hertzet al. [62], by Haykin [63], and by Hecht-Nielson [64].

    1.9. REFERENCES

    [1] Locke, 1., An Essay Concerning Human Understanding, excerptavailable in Pojman, L:, P. (ed.), Philosophy: The Quest For Truth,Wadsworth Publishing Co., Belmont, CA, 1992, pp. 119-132. Seealso reference [4].

    [2] Thomas, H. (ed.), Biographical Encyclopedia of Philosophy,Doubleday, GardenCity, N. 1., 1965.

    [3] Berkeley, G., Twelve dialogues between Hylas and Philonous,excerpt available in Pojman, L. P. (ed.), Philosophy: The Quest ForTruth, Wadsworth Publishing Co., Belmont,CA, 1992.

  • [4] Hume, D., An Inquiry Concerning Human Understanding, excerptavailable in Pojman, L. P. (ed.), Philosophy: The Quest For Truth,Wadsworth Publishing Co., Belmont, CA, 1992, pp. 141-150. Seealso reference[1].

    [5] Golgi, C. "The neuron doctrine--Theory and facts," 1906. In: NobelLectures: Physiology or Medicine, 1901-1921, Elsevier,Amsterdam,1967,pp. 189-217.

    [6] Cajal, S. R., "A new concept of the histology of the central nervoussystem," 1892.D. A. Ronttenberg (trans.). In: D. A. Rottenberg andF. H. Hochberg (eds.), Neurological Classics in ModernTranslation, Hafner,New York, 1977,pp. 7-29.

    [7] Cajal, S. R., "The Structure and connexions of neurons," 1906. In:Nobel Lectures: Physiology or Medicine, 1901-1921, Elsevier,Amsterdam, 1967,pp. 220-253.

    [8] Cajal, S. R., "Neuron Theory or Reticular Theory? ObjectiveEvidence of the Anatomical Unity of Nerve Cells," 1908. M. U.Purkiss and C.A. Fox (trans.). Consejo Superior de InvestigacionesCientificas Instituto Ramon y Cajal, Madrid, 1954.

    [9] Galvani, L., "Commentary on the Effect of Electricity on MuscularMotion," 1791,R. M. Green (trans.),Licht,Cambridge, Mass., 1953.

    [10] DuBois-Reymond, E., Untersuchungen uber Thierische Elektricitdt,vol. 1,2, Reimer,Berlin, 1848-1849.

    [11] Helmholtz, H. von., "On the rate of transmissionof nerve impulse,"Monatsber. Preuss. Akad., Wiss. Berl., pp. 14-15. Trans. in W.Dennis (ed.), Readings in the History of Psychology. Appleton-Century-Crofts, New York, 1948,pp. 197-198.

    [12] Bernard, C., Lecons sur les phenomenes de la vie communs auxanimaux et aux vegetaux, Bailliere, Paris, 1878.

    [13] Ehrlich, P., "Chemotherapeutics: Scientific principles,methods, andresults," Lancet, vol. 2, pp. 445-451.

    [14] Langley, 1. N., "On nerve endings and on special excitablesubstances in cells," Proc. R. Soc. Lond. [Bioi.}, vol. 78, 1906, pp.170-194.

    [15] Gall, F. 1. and Spurzheim, G., Anatomie et physiologie du systemenerveux en general, et du cerveau en particulier, avec desobservations sur la possiblite de reconnoitre plusieurs dispositionsintellectuelles et morales de I 'homme et des animaux, par laconfiguration de leurs tetes, Schoell,Paris, 1810.

    [16] Jackson,1. H., "The CroonianLectureson Evolutionand Dissolutionof the Nervous System," Br. Med. 1., vol. 1, 1884,pp. 591-593, 660-663,703-707.

    [17] Wernicke, C., "The symptom-complex of aphasia," 1908. In: A.Church (ed.), Diseases of the Nervous System, Appleton,New York,pp. 265-324.

    [18] Broca, P., "Sur Ie siege de la Facultedu langagearitcule," Bull. Soc.Anthropol., vol. 6, 1865,pp. 377-393.

    [19] Kandel, E. R. and Schwartz, 1. H. (eds.), Principles of NeuralScience, Elsevier,New York, 1985.

    [20] Deutsch, S. and Deutsch,A., Understanding the Nervous System: AnEngineering Perspective, IEEE Press,Piscatanoay, N. 1., 1993.

    [21] Mead. C., Analog VLSI and Neural System, Addison-Wesley,Reading,Mass., 1989.

    [22] Marks, R. J. II, Introduntion to Shannon Sampling and InterpolationTheory, Springer-Verlag, New York, 1991.

    [23] Traub, 1. F. and Wozniakowsky, H., "Breaking Intractability,"SCientific American, January 1994,pp. 102-107.

    [24] Rosenblatt, F., "Two theorems of statistical separability in thePerceptron." 1959. In: "Mechanization of Thought Processes,"Proceedings of symposium No. 10 held at the National PhysicalLaboratory, vol. 1,Nov., 1958,H. M. Stationery Office,London,pp.421-456.

    [25] Rosenblatt, F., Principles ofNeurodynamics: Perceptrons and TheTheory ofBrain Mechanisms, Spartan,New York, 1962.

    [26] Widrow,B., "Generalizationand Information Storagein Networks ofAdaline 'Neurons'." In Self-Organizing Systems 1962, (Chicago,1962),eds. M. C. Yovits,G. T. Jacobi, and G. D. Goldstein, Spartan,Washington, 1962,pp. 435-461.

    [27] Widrow, B. and Hoff, M. E., "Adaptive Switching Circuits." 1960IRE WESCON Convention Record, part 4, IRE, New York, 1960,pp.96-104.

    [28] Nilsson,N. 1., Learning Machines: Foundations of Trainable PatternClassifying Systems, McGraw-Hill, New York, 1965.

    5

    [29] Grossberg, S., "Some Nonlinear Networks Capable of Learning aSpatial Pattern of Arbitrary Complexity," Proceedings of theNational Academy ofSciences, USA, vol. 59,1968, pp. 368-372.

    [30] Grossberg, S., "Some Physiological and BiochemicalConsequencesof Psychological Postulates," Proceedings of the National AcademyofSciences, USA, vol. 60,1968, pp. 758-765.

    [31] Grossberg, S., "Embedding Fields: A Theory of Learning withPhysiological Implications," Journal of Mathematical Psychology,vol. 6, 1969,pp. 209-239.

    [32] Minsky, M. and Papert, S., Perceptron: An Introduction toComputational Geometry, MIT Press,Cambridge, Mass., 1969.

    [33] Kolmogorov, A. N., "On the representationof continuous functionsof several variables by superposition of continuous functions of onevariable and addition," Doklady Akademii. Nauk SSSR, vol. 114,1957,pp. 679-681.

    [34] Sprecher, D. A., "On the structureof continuousfunctionsof severalvariables," Transactions of the American Mathematical Society, vol.115, 1964,pp. 340-355.

    [35] Hecht-Nielsen, R., "Kolmogorov's Mapping neural networkexistence theorem," Proc. IEEE First International Conference onNeural Networks, SOS Printing. San Diego, CA. 1987, vol. II, pp.11-14.

    [36] Sprecher, D. A., "A Universal Mapping for Kolmogorov'sSuperpositionTheorem," Neural Networks, vol. 6, 1994, pp. 1089-1094.

    [37] Werbos, P., Beyond Regression: New Tools for Prediction andAnalysis in the Behavioral Sciences, Ph. D. Thesis, HarvardUniversity, 1974.

    [38] Le Cun, Y., "Une Procedure d'Apprentissage pour Reseau aSeuilAssymetrique." Cognitiva 85: A la Frontiere de l'IntelligenceArtificielle des Sciences de la Connaissance des Neurosciences,(Paris, 1985), CESTA,Paris, 1985,pp. 599-604.

    [39] Parker,D. B., "LearningLogic," Technical Report TR-47, Center forComputational Research in Economics and Management Science,Massachusetts Instituteof Technology, Cambridge, MA., 1985.

    [40] Rumelhart, D. E., Hinton, G. E. and Williams, R. 1., "LearningRepresentations by Back-Propagating Errors," Nature, vol. 323, pp.533-536, 1986.

    [41] Rumelhart, D. E., Hinton, G. E. and Williams, R. 1., "LearningInternalRepresentations by Error Propagation."In: D. E. Rumelhart,1. L. McClelland and the PDP Research Group (eds.), ParallelDistributed Processing: Explorations in the Microstructure ofCognition, vol. 1, chap.8, MIT Press, Cambridge, Mass., 1986

    [42] Funahashi, K., "On the approximate realization of continuousmappings by neural networks," Neural Networks, vol. 2, 1989, pp.183-192.

    [43] Cybenco, G., "Approximation by superposition of a sigmoidalfunction," Mathematics of Control, Signals and Systems, vol. 2,1989,pp. 303-314.

    [44] Hornik, K., Stinchcombe, M., White, H., "Multilayer feedforwardnetworks are universal approxirnators," Neural Networks, vol. 2,1989,pp. 359-366.

    [45] Hornik, K., "Approximation capabilitiesof multilayer perceptrons,"Neural Networks, vol. 4, 1991,pp. 251-257.

    [46] Hornik, K., "Some new results on neural network approximation,"Neural Networks, vol. 6, 1993,pp. 1069-1072.

    [47] Barron,A. R., "Universalapproximation bounds for superpositionofa sigmoidal function," IEEE Transactions on Information Theory,vol. 39, 1993,pp. 930-945.

    [48] Barron, A. R., "Approximation and estimation bounds for artificialneuralnetworks," Machine Learning, vol. 14, 1994,pp. 115-133.

    [49] Moody, 1. and Darken, C., "Learning with Localized ReceptiveFields," Proceedings of the 1988 Connectionist Models SummerSchool (Pittsburgh 1988), eds. D. Tourtzky, G. Hinton, and T.Sejnowski, MorganKaufmann, San Mateo, 1988,pp. 133-143.

    [50] Moody,1. and Darken, C., "Fast Learning in Networks of Locally-TunedProcessing Units,"Neural Computation, vol. 1, 1989,pp. 281-294.

    [51] Niranjan, M. and Fallside, F., "Neural Networks and Radial BasisFunctions in Classifying Static Speech Patterns," Computer Speechand Language, vol. 4, 1990,pp. 275-289.

  • [52] Casdagli, M., "Nonlinear Prediction of Chaotic Time Series,"Physica, vol. 350, 1989, pp. 335-356.

    [53] Pao, Y. H., Adaptive Pattern Recognition and Neural Networks,Addison-Wesley, Reading, Mass., 1989.

    [54] Pao, Y. H., Park, G. H., Sobajic, O. 1., "Learning and generalizationcharacteristics of the random vector Functional-Link net,"Neurocomputing, vol. 6, no. 2, Elsevier Publishing, Amsterdam,1994, pp. 163-180.

    [55] Igelnik, B. and Pao, Y. H., "Stochastic choice of basis functions inadaptive function approximation and the Functional-Link net," IEEETransactions on Neural Networks (inpress),

    [56] Ouda, R. O. and Hart, P. E., Pattern Classification and SceneAnalysis, Wiley, New York, 1973.

    [57] Carpenter, G. A. and Grossberg, S., "A Massively ParallelArchitecture for a Self-Organizing Neural Pattern RecognitionMachine," Computer Vision, Graphics, and Image Processing, vol.37, 1987,pp. 54-115.

    6

    [58] Carpenter, G. A. and Grossberg, S., "ART2: Self-Organization ofStable Category Recognition Codes for Analog Input Patterns,"Applied Optics, vol. 26, 1987, pp. 4919-4930.

    [59] Carpenter, G. A. and Grossberg, S., "The ART of Adaptive PatternRecognition by a Self-Organizing Neural Network," Computer,March 1988, pp. 77-88.

    [60] Kohonen, T., "Self-Organized Formation of Topologically CorrectFeature Maps," Biological Cybernetics, vol. 43, 1982, pp. 59-69.Reprinted in Anderson and Rosenfeld [1988].

    [61] Kohonen, T., Self-Organizing Maps, Springer-Verlag, New York,1995.

    [62] Hertz, 1., Krogh, A., Palmer, R. G., Introduction to the Theory ofNeural Computing, Addison-Wesley, Reading, Mass., 1991.

    [63] Haykin, S., Neural Networks, A Comprehensive Foundation, IEEEPress #PC04036, Macmillan College Publishing Company, Inc.,Englewood Cliffs, N. 1.,1994.

    [64] Hecht-Nielsen, R., Neurocomputing, Addison-Wesley, Reading,Mass., 1990.

  • CHAPTER 2

    INTRODUCTION TO CONCEPTS INARTIFICIAL NEURAL NETWORKS

    Abstract- This introduction to artificial neuralnetworks summarizes some basic concepts of com-putational neuroscience and the resulting models ofartificial neurons. The ter mlnology of biologicaland artificial neurons, biological and m achln elearning and neural processing is introduced. Theconcepts of supervised and unsupervlscd learningare explained with examples from the power systemarea. Finally, a taxonomy of different types of neu-rons and different classes of artificial ncural net-works is presented.

    2.1 INTRODUCTION

    The discipline of computational neuroscience has threegoals, first the computer-aided simulation of somefunctional-ities of the brain, second the understanding of the function ofthe brain in computational tenus and third the application ofneural concepts for innovative technical problem solving. Adetaileddiscussion of functionalities and models in computa-tional neuroscience as well as references concerning experi-mentaldata and theoretical models can be found in [1].

    The theory of artificial neural networks (ANN) is mainlymotivated by the second goal, i.e. the establishment of sim-pIe formal models of biological neurons and their intercon-nections called artificialneural networks; for an excellentin-troduction, see [2, 16]. In the powerengineering domain, pre-dominantly the third goal is, i.e. the application of alreadysimplified tools of ANNs to technical problems remains themain objective [3].

    Although Churchland and Sejnowski [1] give strong argu-ments for the validity of the simulation of complex behaviorwith very simple models, this tutorial is not so much con-cerned with the biological plausibility of the discussed artifi-cial neural network models as with lie applicability of thediscovered principles to a technical task. Nature and technol-ogy have different goals, means, materials and constraints.For instance, as Churchland and Sejnowski note, nature doesnot start "fromscratch." When "developing" birds, nature hadto start from dinosaurs. The material as biological matter wasgiven, the means were basically random mutation and theconstraints had to take into account multiple criteria.Concerning the task of flying, an optimally designed bird notonly had to fly fast but also to be able to reproduce.Although inspired by nature, the engineer can specify theconstraints, design an object, simulate the model on a com-puter, choose an appropriate material and finally build, e. g.,a machine. His solution can be globally optimal in the givenframe work whereas naturecan onlyachieve a local optimum,if only the constraints of a specific task (like flying fast)without reference to the largercontextmoe taken into account.However, when proceeding as described, it is highly unlikelythat the engineerwill develop a tool which flies only satisfac-

    7

    torily but is a magnificentswimmerand diver, as nature hasbeen able to producein the formof somebirds.

    Staying with this analogy, we understand an artificialneu-ral network as a brain-inspired computer which may solve asimilar task as lie biological brain, but will not be an imita-tion neither in materialnor means nor constraints. In the fol-lowing text the term "neuron" will usually refer to the basicunit of an artificial neural model, and "learning" and "train-ing" designate machine learning techniques.

    In the following two sections we will briefly introduce bi-ologicalneurons and show how a simplifiedartificialneuroncan be derived from biological principles. In section 2.4 weoutline the characterization of types of artificial neural net-works. These types may vary in the way their neurons areconnected (architecture), in the way their neuronsprocess in-COIning information and in the way the neurons adapt theirweights (learning).

    Following this outline we will give examples of differentarchitectures of artificial neural networks in section 2.5.Section 2.6 illustrates how artificial neurons respond to in-coming information. In section 2.7 we will introduce theconcepts of supervised and unsupervised learning. We willdiscuss common features of learning, optimization and clus-tering techniques and we will highlight some importantdif-ferences of learning in artificial neural nets. In section 2.8these different learning techniques will be illustrated forpower system security assessment. The chapter concludeswith a summary and the list of references.

    2.2 BIOLOGICAL NEURONS

    This section discusses a simple biological neural modelwhich influenced the development of artificial neural net-works.

    Oneof theearliestdescriptions of biological neurons is dueto Cajal in 1894, see [4], who identifieda neuronas an inde-pendent electric device transmitting and receiving electricalsignals. Although at least 500 different types of biologicalneurons have been distinguished, many neurons have a gen-eral structure similar to that shown schematically in Fig. 1.The following description of the function of biological neu-rons is necessarily simplified, see [1] for a more detailedandvery readable introduction.

    The principal components of a typical biological neuronare the cell body (or somas, lie dendrites, the axon, and thesynapses. The function of the dendrites is the collection andconduction of electric potentials which are generated at thesynapses when a presynaptic neuron experiences an actionpotential C'spike"), If the intracellular potential in the somaexceeds a certain value, called the threshold, an actionpoten-tial is generated which travels alonglie axon and the intracel-lular voltage is reset to a value close to the so-called resting

  • Incoming Action Potential Outgoing Action Potential

    l,~~lr lrL.~~lr

    Dendrites Synapses Soma Axon ~Fig. 2.1. Schematic drawing of a biological neuron model. We show the

    principal parts of the neuron, as introduced in the text, as well as schema-tized shapes of the neural signals. Note that this shape changes along the

    dendrites but remains the same when traveling along the axon.

    IncomingAction Potential

    => SynapsesInput Vector =>

    Weight Vector

    Soma=>

    Scalar Product

    OutgoingAction Potential

    =>Output Vector

    Axon=>

    Activation Function

    Fig. 2.2. Schematic drawing of the artificial neuron model. Terms in italicdenote the biological parts. Their formal component s are denoted in nor-

    mal type.

    Although extremely simple, this artificial neuron can cal-culate Boolean functions. For example for the Boolean values1 = T'RUE and -1 = FALSE, for the given synaptic weightslV 1 = lV2 =0.5 and a threshold value e= -0.5, the neuralnet computes the output y as the Logical OR between Xl andX2. For the same weights and a threshold value e= +0.5, theneural net calculates the Logical AND, see Table I.

    A straightforward calculation shows that me activationfunction g can be chosen as the sign function g if the dimen-sion of the input vector and weight vector is augmented andtheir valuesare clamped to -1 and erespectively.

    potentials and synapses are presentedin generalby real-valuedvectors. The stimulation of the soma is modeled as theweighted sum (i.e. scalar product) of incoming vector andweight vector and the amplificationof the axon is modeledbya (in general non-linear) activation function, also referred toas gain function. In diagrams showing artificial neuronsscalar product and activation function are usually grouped toform tile artificial neural unit.

    The simplest formal model, the Logic Threshold Unit isshown in Fig. 2.3. It consists of a two-dimensional binaryinput vector x, the activation function of the neural unit be-ing the threshold function g(h) with a given threshold (Jand aone-dimensional output vector y defined in (2.1).

    The synaptic weights ru-e modeled by two real number wk,k =1, 2. The neuron processes the binary input componentsxi, k = 1, 2 as follows:

    potential (the voltage obtained in the absence of synaptic in-puts). The axon is connected via synapses to the dendrites ofotherneurons (which are calledpostsynaptic) and therefore theaction potential will influence the voltage in these 'neurons.Functionally, the dendrites are closely associated with the in-put of the cell and the axon with tne output.

    There are basically two types of synapses, one called exci-tatory and the other inhibitory. Activation of excitatorysynapses increases the voltage in the postsynaptic neuronwhile activation of inhibitory synapses decreases the voltage.If many excitatory synapses are activated frequently, and ifonly few inhibitory synapses are activated, the intracellularvoltage of the neuron increases rapidly and reaches thresholdfast. Therefore, the neuron will generate action potentials at ahigh rate, i.e., the neuron will be very active. The soma,where the instantaneous voltage is compared with the thresh-old and, depending on the result of this comparison, an actionpotential is either generated or not, corresponds most closelyto the decision-making instance of the neuron.

    This description of neural function is grossly simplifiedand there are many exceptions to this prototypical behavior.For instance, there are many classes of neurons which do nothave a clear distinction between axons and dendrites. Otherneurons are not even capable of generating action potentialsand they communicate by other means. Also, our classifica-tion of synapses in only two types (excitatory and inhibitory)is simplified, and we have completely neglected the interiordynamics of cells, which are far more complex than just thesimple summation of synaptic inputs described above. Thefinal point we would like to make here is that recently alsothe notion of the spiking frequency being the only signaltransmitted between neurons has been questioned, and thatthere is increasing evidence that the time structure of the se-quences of spikes plays an important role. We have presentedthis simplified description of neuronal function since it is atthis level that the elements of artificial neural networks areusuallymodeled.

    [2] {-IY = g ~Wk xk with g(h)= 1 for h ~ (Jfor h> e (2.1)

    2.3 TI-IE ARTIFICIAL NElJRON - A COMPlrrArrIONALMODEL OF 'rI-IE BIOLOGICAL NElJRON

    This chapter defines the simplest and earliesr artificialneural unit, the Logic Threshold Unit and illustrates how thisunit computes simple Boolean functions.

    McCulloch and Pitts [5] established the first computationalmodel of a biological neuron, by translating the biologicalconcepts as shown in Fig. 2.2: Incoming and outgoing action

    8

    Fig. 2.3. Logic Threshold Unit with threshold ebeingpart of the activation function.

  • TABLE IBOOLEANFUNCTIONS AND AND OR 2.5 Al~ClIITECTURE OF ARTIFICIAL NEURAL

    NE1WORKS

    2.4 CHARACTERIZATION OF ARTIFICIAL NElJRALNETWORKS

    y

    Out ut la er

    Output vector

    Weights w

    Neurons j

    Neural net la er

    Activation functiong

    x

    Input vector

    In ut la er

    1) Layeredfeed-forward neural networks, where a layer ofneurons receives input only from neurons of a previouslayers, for instance, the multi-layer perceptron (MLP)shown in Fig. 2.6. The functional relation y = f(x) be-tween input and output is usually not given in an analyti-cal form but has be approximated numerically.

    The flow of information for the processing of inputvectors withfixed weight vectors is in one direction only.Input units feed the input values directly to the hiddenneurons whereas hidden and output units process their in-put through a non-linearactivation function.

    For the MLP we will show later that the flow of in-formation during training is in two directions, forwards tocalculate the actualoutputand backwards in order to back-propagate the error for the correction of the weights.

    In this section we will show various ways how singleunits can be connected resulting in different architectures ofartificial neuralnetworks.

    In general the architecture, see Fig. 2.5, consists of threeparts, the n-dimensional input layer where the input data isfed in, the neurai network layer consisting of N neurons in-terconnected in various ways and the m-dimensional outputlayer. The In input vector and n output vector componentsareeither binary or real numbers. Each input and output vectorcomponent can be connected with each neuron through asynaptic weight, which is a real number. We therefore havetwo weight matrices, one InxN-dimensional matrix for theinput layer - neural net layer connectionand one Nxn-dimen-sional matrix for the neural net layer - output layer connec-tions. The way the neurons are connected in the neural netlayer is specific for the different existing models, for exam-ple, the mulu-layer perceptron, Kohonen's self-organizingfeature map and the Hopfield model. Examples of architec-tures will be explained in the following section. There exist avarietyof other typesof neural architectures, see [2, 16].

    With respect to the architecture four main types of neuralnetworks can be distinguished:

    y

    for h s 0for h > 0{

    - Ig(h):= sign(h)= 1with

    h= AND ORxl x2 wlxl+w2x2 g(h) with fJ =0.5 g(h),with ()=-0.5

    IT-1 -1 -1 -1 -1x

    2T-1 1 0 -1 1x

    3T 1 -1 0 -1 1x

    4T 1 1 1 1 1x

    In the following we will therefore model any threshold eby an extra synaptic weight. Fig. 2.4 shows the architectureof the equivalent Logic Threshold Unit. The clamped (third)neuron is called the bias neuron.

    So far we have defined the architecture of a simple neuron.We have also seen how this neuron processes input vectors,provided the weights are known. In section 2.7 we will dis-cuss strategies, commonly referred to as learning, how to de-tennine the weight vectors for a given set of input vectors.

    The described simple model can be generalized in manyways.Every artificial neural network model can be character-ized by the following features: its architecture, its processingalgorithmand its training algoritlun.

    Combining several neurons so that the output Yj of neuronj serves as input to one or several other neurons leads to net-works of artificial neurons. The architecture specifies the ar-rangement of neural connections as well as the type of unitscharacterized by its activation function.

    For a given architecture the neural network is used in twodifferent modes, theprocessing modeand the training mode.

    In theprocessing mode, the processing algoritlun specificshow the neural unit calculates the output vector y for anyinput vector x and for a given set of weights w. 'The type ofprocessing further depends on the type of the activationfunction.

    The training algorithm specifies how the neural networkadapts its weights w for all M given input vectors x, calledtraining vectors.The set of training vectors is called the train-ing set.

    Let us now examine these three characteristics, archircc-ture, processingand training in more detail.

    Fig. 2.4. Logic Threshold Unit with fixed third inputand with threshold 8 being part of the weight vector. Fig. 2.5. General neural network architecture.

    9

  • xSigmoidal activation g ~) = umhfth

    y = f(x) Input vectorNeighborhood

    Classes

    Winner

    Inputlayer5 input units

    Hidden layer3 hidden units

    Output layer4 out ut units

    Input Feature map Output

    Fi~. 2.6. Architecture of a multi-layer perceptron containing 5 input,~ hidden and 4 output units. Weight Wjiconnects hidden neuron j withInput vector component Xj'Weight Wij connects hidden neuron j withoutput vector component Yjk. Not all weight vector components are

    shown in this figure.

    Fig. 2.8. Architecture of the Kohonen network containing 5 input unitsand 9 laterally connected units. TIle number of output classes depends

    on the characteristics of the training set and is at most equal to the num-ber of neurons. Weight Wjiconnects neuron j with input vector compo-nentxj- Not all weight vector components are shown in this figure. The

    connections from tile feature map to the outputs do not carry any weights.

    This architecture and the corresponding learning andprocessingalgoritlunscan easily be generalized for MLPswith two or more hidden layers.

    3) Laterally connected neural networks consisting of feed-forward input units and a lateral layer consisting of IIIneurons, which are laterally connected to their neighbors.

    Fig. 2.7. Architecture of the Hopfield network containing 5 input, 5recurrent and 5 output units. Note that, in contrast to the MLP, theinput and output units correspond to system states, not to physical

    neurons. Weight Wjiconnects neuron j with neuron i. Not all weightvector components are shown in this figure.

    2.6 PI~OCESSING OF INFORMATION WITH ARTI-FICIAL NEURONS

    TIle Kohonen network shown in Fig. 2.8 is an examplefor this type of network. It will be discussed in detail inthe fourth tutorial chapter [7].

    The number of the output classes depends on the char-acteristics of the training set and can thus not be consid-ered as part of the architecture which is specified in ad-vance. In the original model training as well as process-ing results in a recurrent dynamical process involving alllaterally connected neurons. The most commonly usedsimplification, however, applies a winner-take-all feed-forward strategy where only one neuron and its closeneighbors moe stimulated by the input.

    In biological neural networks, the incoming action poten-tial will excite different neurons to a different degree.Depending on the size of the stimulus the neuron will thenamplify or inhibit the incoming signal. In artificial neuralnetworks the degree of the excitation of a neuron is usuallymeasured as the similarity between input vector and weightvector. Higher similarity usually results in a larger output. Asaturated activation function ensures that this process staysbounded.

    Similarity of two vectorscan be defined as generalizedcol-inearity. In this case the angle formed by two vectors servesas a measure of similarity. Out of all weight vectors wi,i = 1, ..., 111, the weight vector W i* is the most similar toan input vector x, if their scalar product (i.e. weightedsum ofthe input vector components) takes its maximum:

    4) Hybrid networks combine two or three of the above fea-tures. For example, the Boltzmann machine has a hiddenlayer with recurrentconnections.The two neural layers ofthe Counter-propagation network consist of a Kohonenlayeranda feed-forward layer.

    FinJI value

    Out )ut staleRecurrent net

    ,= 1 .. 'max - 1Initial value1=0

    IIXI

    12~

    1)X3

    14X4

    /sXs

    In ut state

    2) Recurrent neural networks, where the inputs to a neuronare the net's previous outputs as well as inputs from ex-ternal sources which are input Xi and bias (equivalent to ashift of the threshold) l; The fully connectedHopfield net[6] shown in Fig. 2.7 is an example for this type of ar-chitecture. Here, the neurons process their input through athresholdfunction.

    During processing the I-Iopfield net will feedback itsoutput, during training however only one feed-forwardstep is used. A detailedexample is discussed in chapter 10of this tutorial.

    10

  • A secondconceptof similarity uses the Euclidean distanceof two vectors as a measure. Here Wi. is most similar to x if

    := II x - Wi* II= min {II x - will I i = 1, ..., Ill}

    The most conunonly used similarity measure is the scalarproduct of x and w for x, W E 9tm

    TIle logic threshold model (also called simple perceptron)can be easily extended for m-dimensional real-valued inputvectors. Depending on the task to be solved, subsequent mod-els replaced the threshold function by a

    sim(x, w)

    sim(x, w)

    .-

    .-

    =

    -cx, wi*>max{1 i =1, ..., In} (2.3)

    (2.4)

    y =gtsimix, w) (2.5)

    Applying the parallelogram equation it can be shown thatthesetwoconcepts are equivalent for normalized vectors.

    The activation function g further determines the way inwhich the neuron amplifies its output y for a given weightvector w and a given input vector x. The sign function willeither produce a TRUEor FALSE answer for all output neu-rons. The winner-take-all function will producea TRUE out-put only for the most stimulated neuron and FALSE for allothers. A linear threshold function will produce saturated re-sponsesfor very large and very small stimulation only.

    The processing algorithm for one neuron in general is:

    1) linearactivation function, (linearperceptron)2) linear-saturated activation function adaptive linearelement

    or "adaline")3) non-linear activation function, usually a sigmoidfunction

    (non-linear perceptron)4) Gaussian activation function (radial basis function neu-

    ron).5) the winner-take-all function (self-organizing feature map

    neuron)

    Table II shows examples of commonly used neural unitscharacterized by theirgainor activation functions.

    TABLE IITAXONOMY FOR ONEFORMAL NEURON

    Activation function g: 9\~9\; Slope pe 9\+Neuron:Threshold unit

    -C.~ 1

    sign(h)

    Input vector and output components

    X E {-I, I}'1l

    Y E {-I, I}

    Linear unit

    *1 1

    1 identity Y E 9\

    Linear saturated unit ~J1 1-saturated identity Y E 9\

    Non-linear unit

    (sigmoid unit)

    X E 9\11l

    y E [-1,1]

    Non-linear unit

    (sigmoid unit)

    Radial basis function unit

    L., ,

    1

    (1+exp(-2@h)r 1

    exp(-(h-Ci)2/202/)

    Y E [0,1]

    Y E [0, I)

    Winner-take all unit

    Cidenotes the center and a the width of the unit

    {I for most stimulated unit j *

    g(h) =a elsewhere

    11

    Y E to, I}

  • 2.7 LEARNING IN NEURAL NETWOI{KS 2.7.2 Machine learning

    2.7.2.1 Learning as an optimization task

    1 2 3 4Ytarget =-1, Ytarget =-1, Ytarget =-1, and Ytarget = 1 (2.9)

    Let us go back to our most simple example of a neuralnet, the Logic Threshold Unit shown in Fig. 2.3, which cal-culates the Logic AND function. We will now determine theweights of tile unit in order to solve this task.

    Defining TRUE as 1 and FALSE as -1 the training set isgiven by 4 input vectors:

    For artificialneural networks, the neural net objectives, forexample pattern classification,have to be defined for a set ofexamples called the training set.

    The training algorithm specifies how the neural networkadapts its weights for all given input vectors x, called train-ing vectors. If for every input vector x, the desired outputvector Ytarget is given, and the weights are adapted in order toproduce lie desired output, the training process is called su-pervised learning. If only the input vector is given and thestructure of the data is discovered autonomously, the trainingis called unsupervised learning.

    There are other types of learning, for examplereinforcement learning, also called learning with a critic,where the teacher or critic directs the learning by indicatingwhether the ANN response to an input is correct or incorrectwithout specifying the target output y explicitly. In thefollowing we will restrict our attention to supervised andunsupervised learning only. For more information on othertypesof learning see [16].

    (2.8)x2 =(-1, l)T,x4 = (1, l)T

    or in vector form

    Xl = (-1, -l)T,x3 = (1, -l)T, and

    and their corresponding targetoutput

    In the example of the Logic Threshold Unit, the synapticweightswereassumedfixed and known in advance. In biolog-ical terms this would correspond to the synaptic strength be-ing genetically pre-determined. Already from the estimatednumber of neurons and synapses in the human central ner-vous system (on the order of 1012 neuronsand 1015 synapses)it is, however, clear, that not all synaptic connections can bepre-coded. It is a biological fact that structural and functionalchanges in the nervous system, usually called plasticity, canresult from experience or damage. In psychological terms,plasticity is at the base of changes in behavior due to experi-ence (learning). Engineers usually prefer the term "training"which does not imply any "intelligence" of the learning in-dividual respectively computer. Therefore this term is lessprone to philosophical discussions about whether a machinecan be intelligent.

    In 1949, the psychologist D. Hebb [8] fonnulated the hy-pothesis that synapseschange in efficacy according to the fol-lowing principle:

    The strength increases when both pre- and postsynaptic el-ements are active simultaneously (learning). The synapsemay decrease in strength if there is a presynaptic activitywithoutconcurrentpostsynapticactivation (forgetting).

    Formulated in terms of artificial neurons Hebb's learningrulecan be statedas follows:

    For the weight vector w of an artificial neuron, given in-put vector x and output y, synaptic learning can be expressedas follows:

    Neural networks are commonly used for tasks like patternrecognition, content addressable memories, function approx-imation, classification, parameter estimation and non-linearcontrol. We will now discuss strategies for the determinationof weights in order to achieve the desired objecti ve. Thesestrategiesare usuallycalled learningor training.

    2.7.1 Hebb's rule - a biological learning hypothesis

    LlW=l1Yx 11 > 0 (2.6)Ytarget = [-1, -1, -1, I]T (2.10)

    There are models which also take forgetting into accountby adding a negat.ive term proportional to the output and thestrength of the synapses, i.e. the weight.

    The learning or adaptation rate11 and theforgetting factora are often chosen to be tiIne-dependent, e. g. 11 (t) anda(t) decrease as l/t.

    Synapses of Hebbian type as well as of non-Hebbian typehave been identified physiologically, but the details of theimplementation of biological plast.icity are still open.Nevertheless, most artificial neural networks base their learn-ing algorithmon Hebb's generalizedlearning principle (2.7).

    For the moment, let us assume a linear activation func-tion. We now want to determine the weights such that thecalculatedoutput is equal to the target output

    Since this problem is overdetermined we will formulate itas a least square error miniInization problem. Note, we arefurther defining tile threshold 8, as part of the weight vectoras outlined in (2.2) and shown in Fig. 2.4. We thereforeaddeda fixed third component -1 to all input vectors, thus

    (2.12)

    (2.11)

    x2 := (-1, 1, -1)T,x4 := (1, 1, -1)T

    x! := (-1, -1, -l)T,x3 := (1, -1, -l)T, and

    y~l(W) = = Y~rget for all J.1 = 1, ..4.(2.7)

    with 11, a > 0

    Ll w =11 Y x- a Y w =a Y 11/a)x - w)

    12

  • Geometrically our solution represents a hyperplane Hwhich divides the training set into two classes. Fig. 2.9shows another possible solution of this task

    We now define the input matrix X as

    IT-1 -1 -1x

    2T-1 1 -1

    x= x =3T 1 -1 -1x4T 1 1 -1x

    (2.13) H ={x E 9\2 I

  • 2. 7.3 Supervised learning - learning forparameter esti-mation

    ware has recently been introduced for power system securityassessment, see [13].

    Supervised Iearning techniques, also commonly referred toas learning by example or learning witha teacher fall in thesame class of tasks as interpolation and approximation tech-niques, regression analysis and parameterestimation.

    Note that in this framework, classification tasks can beformulated as the task of finding a regression model for thefunction which maps an input vector x onto its class label,for example TRUE or FALSE coded with binary numbers.The non-linear act.ivation function like the sign function willmap real-valued weight.ed sums onto binary outputs.

    In the following, we will define a local learning rule whichsolves the Logical AND problem while respecting the bio-logical learning paradigms cited above. It is based on a sim-ple iterative optimization algorithm, the steepest gradient de-scent technique. It converts the task of finding the zero of thefunction grad E(w) into the task of finding the fixed point ofa related function G. Under certain conditions, the latter canbe solved by a simple iterative method. Thus

    and11(0 bounded.The iterative procedure of steepest descent is shown in

    (2.25). The expression for the gradient of the error function Ewas obtained in (2.18):

    So far we have replaced the inversionof the pseudo-inverseby an iterative procedure. This procedure still needs theknowledge over thewhole training set.

    However instead of minimizing the error globally for allfour training patterns we can now try to minimize the errorlocally by randomly taking one training example (x)l, Y~rget)at a time

    lions. For example, the Logic XOR cannot be calculated withthe Logic Threshold Unit. In Fig. 2.9 this task would requireto place a hyperplane, i.e. straight line, such that tlle x2 andx3 lie on one side and x1 and x4 on the other side of theline. The interested reader may verify tllat for Ytarget =(-1, 1,1, -1)T, w =(0, 0, O)T is the only solution of the X 0 Rproblem.

    In the next tutorial chapter we will see tna: replacing thelinear activation function by a sigmoid function provides apowerful remedy to this problem. For the multi - layer per-ceptron and a sigmoid form of the activation function, thepartial derivatives of the errorwith respect to theweights caneasily becomputed numerically. The errorfunction is usuallyminimized with a stochastic gradient. descent technique whichis known as the back-propagat.ion algorithm and which wasindependently developed by Werbos, [10] and Rumelhart,[11].

    3) In order to establish the matrix X, we have assumed thatall trainingpatterns are known beforehand.

    In may adaptive tasks, however, optimization has to be ad-justed to new incoming patterns. This task can usually not besolved by otherwise powerful optimization tools like the cal-culation of the pseudo-inverse or Quasi-Newton techniques.Furthermore, for large training sets for the inversion of XTXrepresents a heavy computational burden, especially in thecase of singular XTX. We will present an iterative andadaptive algorithm in the next. section.

    4) We have established the performance criterion ti.e. theerror function to be minimized) solely based on contentsof the training set.

    In general we expect from an adaptive learning machine agood or even improving performance on new incoming data.This behavior is usually referred to as the generalizationcapability of the ANN. However an excellent performance onthe training set does not guarantee this behavior. Thedilemma of memorization versus generalization will bediscussed in tne following chapters. Issues related to thegeneralizationof infonnation are still the subject of on-goingresearch in the field of information theory and neuralnetworks.

    5) Last, but not least, we did not use a biologically plausi-ble learning concept.

    In general we are not overly concerned if our machinelearning task achieves the same objective as biological learn-ing, by using different means. However, by neglecting bio-logical learning paradigms like simple computation, localadaptation, and robustness with respect to failing neurons, re-searchers would have missed the opportunity to develop ro-bust, simple, efficient and highly parallel hardware circuits,which are quite different from the techniques employed byconventional super-computers, see [12]. Neural-based hard-

    14

    grad E(w*) = 0 G(w*) = w*

    with

    G(w) := \V - Tl(t) grad E(w)

    w(t + 1) = w(t) -11(t) gradE(w(t)4

    = w(t) + T\(t) I)yl! (w(t)) - yiarget) xl!J.l=l

    w(t + 1) = w(t) + T\(t)(yl! (w (t ) ) - yiarget ) X I!= w (t) + 11(t) 8J.l xJ.l

    with

    ~ ~l _ ~l ( ( ,Jlu . - y w t -} target

    (2.23)

    (2.24)

    (2.25)

    (2.26)

    (2.27)

  • 2.7.4 Unsupervised learning - learning for data reduction

    This stochastic updating or learning rule is commonly re-ferred to as the LMS rule, tile Widrolv-Hoff rule, or the deltarule. It is used for the training of one of the earliest adaptiveneural units, called adaptive linear unit or ADALINE pro-posed for adaptive control, [14].

    The iterative process converges stochast.ically to the mini-mum of the error function, if tile so-called "Robbins-Monro"conditions hold for the learning rate 11, see [15].

    For a convergence of the algorithm in the mean, these con-ditions can be alleviated to 11 to be "very smalI"., see [16].

    Although only applicable to linearly separable learningtasks, see remarks in sect.ion 2.7.2.2 the delta rule fulfillsseveral of the biological paradigms. It is computationallysimple, robust with respect to noisy input data as well asnumerical rounding errorsand it is a local adaptation schemelearning one example at a time.

    It further obeys a generalization of Hebb's learning princi-ple (2.6). For a fixed input and target output, the weightchanges depend on two terms only, the correlation of input xand calculated output y, and a constant stimulus, the productof input and target output.

    There are oilier supervised learning rules based on Hebb'sprinciple like the perceptron rule and the generalized deltarule, introduced in the next. tutorial chapter. Other types ofneural networks trained with a different type of supervisedlearning like the Functional-Link Net arediscussed in [17].

    2.7.4.1 Subspace techniques - reduction ofthedimension ofthe inputvector

    stereotypical grandmother. However, this representation isnot robust because when one unit is removed (or one cell diesin a biological brain), all information concerning the corre-sponding class would be lost.

    A solution to this dilemma is proposed by Kohonen's selforganizing feature map where (ideally) neighboring neuronsclassify neighboring features and thus the loss in one neuronwill result in a decrease of accuracy but not in a completeloss of information,

    Let us briefly introduce the main concepts used in sometypes of unsupervised networks. For more detailed infonna-tion see [2; 16]. Figs. 2.10, 2.11 and 2.12 show schemati-cally how the data reduction of randomly distributed data isachieved using 3 different types of unsupervisednetworks.

    The first unsupervised approach for the reduction of thedimension of the input vector falls in the class of subspacetechniques where the input vector is projected on a linear sub-space presenting the most salient features. Statistical princi-pal component analysis chooses the subspace spanned by theeigenvectors of the correlation matrix of the input vector. Thestandard deviation of the input vectors take their maximal andminimal values along the eigenvectors corresponding to max-imal and minimal eigenvalues. A simple example is shownin Fig. 2.10 where the data variation along the horizontalaxes is more prominent than the one along the vertical axes.

    A non-competitive unsupervised network for principalcomponent analysis based on Hebb's learning rule was pro-posed by Oja [18] and generalized by Sanger, [19]; for detailsand references to this work see also [16].

    (2.28)L11(t)=+ 00t=O

    Fig. 2.10. Data projection onto a one-dimensionalhyperplane. Each data point will be represented by its lower-

    dimensional projection onto the straight line. The shadedcircle denotes a new input vector for which the projection

    exists, although the classification error will be large.

    2.7.4.2 Quantization techniques 1- reduction ofthenumberof training vectors to a variable number ofclasses offixed size

    'The second unsupervised approach for the reduction ofthenumber of input vectors is based on clustering techniques. Inorder to reduce this number, the neural net categorizes thetraining vectors into classes or clusters based on the conceptof similarity introduce in section 2.6. For the examples wewill use the Euclidean distance between two vectors as a mea-sure of similarity,

    In classical clustering techniques, such as the ISOdata algo-rithm, [15], clusters are formed by computing the distance

    . .

    In unsupervised learning the input vectors of the trainingset are given, but tile corresponding target outputs are notspecified. Unsupervised neural nets fall into the same class oftools as statistical non-parametric data analysis, clustering al-gorithms and encoding or decoding techniques. Their maingoal consists in data reduction. The reduction of the data setof input vectors can be achieved in two different ways: eitherby reducing the dimensionality of the input vector, or by re-ducing the number of input vectors.

    The simplest neural network for unsupervised learningconsists of a layer of feed-forward winner-take-all units. Foreach input vector only one such unit will respond, namelythe unit characterized by the maximum output, respectivelyminiInum distance, for this input vector x. The units of thenetwork are thus competing for selection. Only the weightsof the winner will be adapted. All input vectors responding tothe same unit are said to form a class and the weight vector ofthis unit is called the class prototype. Here, the activationfunction is defined to yield "one" for the maximum output re-spectively minimum distance, and "zero" otherwise. Thistype of learning is commonly referred to (L~ competitive learn-ing.

    Winner-take-all units are related to the biological conceptof grandmother cells because they arc responsible for select-ing one specific feature, e.g. the feature presenting the

    15

  • e o000

    Fig. 2.11. Clustering of data into a variable number ofclasses of fixed diameter. TIle center of the circles not

    presented in this figure, present the class prototypes: Theshaded circle denotes a new input vector which does not fall

    into any of the trained classes.

    between an input vector and already existing clusters. If thedistancebetween the input vectorand the reference vectorofan existing cluster is smaller than a previously definedthreshold, the new input vector is grouped with this cluster;otherwise, a new cluster is formed . Functionally, a sphericalneighborhood is formed around the reference vectorof eachnew cluster. Note that the diameter of the sphere ispredetermined, whereas the number of clusters is not. Anexample of this type of clustering is presented in Fig. 2.11.A similar objective is achieved by the Adaptive ResonanceTheory (ART) networks [20].

    2.7.4.3 Quantization techniques 1I - reduction of the numberof training vectors to afixed number ofclasses ofvariable size

    In vectorquantization techniques based on the LBG algo-rithm [21] or the k-means clustering, like the Kohonen net-work, [22], the maximal number of clusters is determined bythe numberof neurons in the map. The weight vectors are thereference vectors or prototypes of the class. Here thediameterof a cluster containing the reference vector is not predeter-minedand the region is, in general, not spherical . Instead, theclusters are largein the regions where the density of probabil-ity of me input vectors is small, and vice-versa, as shown inFig. 2.12.

    In me case of simple vector quantization, that is for aKohonen network with winner-take-all unitsand no neighborstimulation, the network minimizes tile average distortion er-ror between the input vectorsand their reference vector. Theregions correspond to tile Voronoi tessellation, and bound-ariesof theregions around a clusterare hyperplanes. Morede-tails will be presented in the chapter on Kohoncn networks.Important resultsand references can be found in [16,23].

    o

    Fig. 2.12. Tessellation of data into a fixed number of classesof variable diameter. The striped circles, represent the class

    prototypes. The shaded circle denotes a new input vectorwhich does falls into one of the trained classes although the

    classification error will be large.

    2.8 PURPOSE OF TRAINING IN POWER SYSTEMSECURITY ASSESSMENT

    Let us illustrate the concepts of supervised and unsuper-vised learning for a very simple power system shown in Fig.2.13, consisting of two generation busses a, b, one load bus,c, and three lines ab, ac, be, whose active power flows Pab,Pac, and Pl>c moe limited by the maximal active line powers,i.e. Pab max, Pac max and Pbc max'

    The operating vectorcan be chosen to consistof me activeline powers (Pal> , Pac, Pbc)T. In this case the secure operatingspace is definedby a parallelepiped whoseboundaries are de-termined by Pal> max, Pac max and Pbc max ' It is shown asshaded cube in Fig. 2.14. For simplicity we will throughoutthis work refer to this parallelepiped as the security "cube".Operating points inside tile shadedcube are secure,points in-side but at the border are critical and operating point outsidethe shaded cube are insecure because they violateat leastoneconstraint on the maximum admissible line powers.

    2.8.1 Supervisedlearning -parameter estimation tech-niques

    This example is based on several simplifications. Only ac-tivepowers have beenconsidered. In the general case me cubehas to be replaced by a non-linearmanifold. Furthermore, notall vectors of the three-dimensional power system operatingspace shown in Fig. 2.14 represent feasible operating states,since Voltage-VAr constraints and Kirchhoff's lawsapply foreach bus and each line. Nevertheless, the example illustratesconveniently the differences between supervised and unsuper-vised learning.

    Supervised training estimates (approximates) the bound-aries of the operating space for tile training set and interpo-

    GenerationPa

    Bus a

    Loa d p,

    Generation Ph

    Bus b

    Secure

    Jibe.Jr-----,..,

    +---+'"'-tI~' :1 1:"' ........- -1- CriticalPab

    Fig. 2.13. A 3-bus-3line linear power system model.

    16

    Fig. 2.14. The operating space of the 3-bus3-line linear powersystem model.

  • Pbc

    Pbcmax

    PacPacmax

    Pab[] Secure

    l3 Criticalm Insecure

    Fig . 2.15 . Limitation of the numb er of contingenci es. Fig . 2.17. Qu ant ization of the operating space.

    Fig . 2.16 . Reduction of the dim ension of the operating vector .

    2.8.2 Unsupervised learning - subspace techniques

    lates in between known data points. 11 basically constructsseparating hyperplanes (manifolds in the non-linear case)cor-responding to lie surfaces of lie shaded secure cube in Fig.2.14. An example for this technique as well as several en-hancements are discussed in [24].

    However, because in the general case the dimension of theoperating space is very high (in the order of 500 for amedium sized power system at the transmission level) , it isnot feasible to generate a set of operating points which isdensely distributed in lie operating space and to analyze theoperating points with multiple contingency analysis off-line.In order to overcome this "curse of dimensionality", unsuper-vised learning tackles the dimensionality problem first basedon lie two different approaches introduced in the previoussection, subspace techniques and quantization techniques.

    AIUlOUgh usually discussed on equal terms, mere is an im-portant difference between supervised and unsupervised leam-ing. Unsupervised learning helps to organize complex fea-tures into classeswhereas supervised leaming will lien calcu-late follow-up features for specific classes.

    Unsupervised networks can therefore be viewed as a datapre-processing step which reduces me size of me data set be-fore learning the data's characteristics with supervised leam-ing. The Functional Link Net (FLN) is often used in combi-nation with the ART2 network [29]. Other ANNs combiningan unsupervised and a supervised step are the Counter-Propagation Network (ePN) [30], and lie Radial BasisFunctions Network (RI3F) [31]. The CPN combines a

    2.8.4 Comparison of supervised andunsupervised learning

    2.8.3 Unsupervised learning - quantization techniques

    Thesecond class of unsupervised approaches encountered inpower system security assessment are quantization tech-niques. Fig. 2.17 shows an example of lie quantization ofthe operating space into classes of typical states. Dependingon me distance measure used for classification, classes maybe hyperboxes, spheres or in lie case of lie self-organizingfeature map, of a more general form becauseof lie arrange-ment of neurons on a grid. The classes usually do not dividethe cube crisply in secureand insecureareas, but may containcriticallyhigh loaded as well as slightly overloaded cases.

    The twodifferent clusteringapproaches discussed in section2.7.3 have been applied to security assessment.

    In the case of a small space station transmission system,Sobajic et al. [27] quantize the operating space into a variablenumber of hyperspheres of fixed radius using anunsupervisedART2-like ANNalgorithm.

    In [28] Kohonen's self-organizing feature map is used forthe quantization of the operating space. The maximal numberof classes is given by the number of neurons whose weightvector represents typical operating states. The size of eachclass depends on the densityof the probability distribution ofthe training vectors. TIle operating space is represented on lietwo-dimensional feature map by secureand insecure regions.This case will be discussed in more detail in lie followingchapter,

    of the input vector. The researchers implemented their ap-proach in a conventional algorithmic manner insteadof usingOja'sand Sanger's neural net approach.

    InsecurePbc

    The simplest subspace technique is the conventional con-tingency ranking techniques. If for example theoutage of lineab is selected as the most important contingency, the operat-ing space of lie linear model is projected to a two-dimen-sional subspace as illustrated in Fig. 2.15.

    Conventional load flow analysis examines the projectionof lie basecase onto thissubspace. Supervised techniques arealso applied to construct the boundaries of the projected re-duced, security cube, see [25] .

    Fig. 2.16 shows a more general example of the reductionof lie operating space by a lower-dimensional manifold.Depending on the projection used for reduction, the manifoldmaybe a linearor even orthogonal subspace.

    In [26] the principal component analysis method (alsocalled Karhunen-Loeve expansion) is used to reduce the di-mensionality of the training vectors and construct theeigenspacecorresponding to the most significant components

    17

  • Neural net parameters

    TABLE VIARTIFICIAL NEURAL NE1WORK TRAINING

    TABLE IIIARTIFICIAL NEURAL NE1WORK PARAMETERS

    TABLE VARTIFICIAL NEURAL NE1WORK PROCESSING

    AdalineMulti-layer perceptronKohonenHopfieldDiagonally recurrent ANN

    Examples

    KohonenHopfield

    Radial Basis Functions netCounter-Propagation netBoltzmann machine

    Multi-laver perceptron

    ~xaJnples

    Feed-forward,feed x once to get y

    Recurrent. iterate x to get y

    Lateral connections

    Arch itcctu re

    Fully connected

    Hybrid networks

    Layered

    Processing(x, w given, calculate y)

    TABLE IVARTIFICIAL NEURAL NETWORK ARCHITECTURE

    2.9 SlJMMARY

    Kohonen map layer with a feed forward layer. In the case ofthe RBF, clustering can be achieved by any unsupervisedlearning or the k-means algoritlun, and the neurons of thehidden layer are representedby these means, The architectureof the supervised part is a linear feed-forward layer. In contrastto the winner-take-all scheme in the Kohoncn network,Gaussianactivation functions stimulate several neuronsat thesame time and the output of the network is a weighted sumof these activations.

    For security assessment, the combination of an unsuper-vised step for operatingspace reductionand a supervised stepfor operating state classification has been applied by severalresearchers including [27, 24, 32]

    Another example in power systems, where supervised andunsupervised networks are employed for data clustering andestimation is the area of load forecasting [33]. A Kohonennetwork separates the forecasting data into representativeclasses like summer, winter, autumn and spring and furtherinto weekdays and holidays (see also [34]. For each class ofdata a supervisednetwork is then used for load prediction forthe classes data points. For a similar purpose Ranaweera etale [35] apply the RBF network in the area of load forecast-ing. Further detailed examples will be discussed in the othertutorialchapters.

    2.10 ACKNOWLEDGMENT

    2.11 REFERENCES

    [I] Churchland. P. S. and Sejnowski, T. J. The Computational Brain,111c MIT Press. Cambridge MA, 1992.

    and unsupervised learning were studied using examples fromthe area of power system security assessment.

    The next chapters of this tutorial will discuss some of theoutlined ANN models and their application to power systemsin moredetail.

    Delta ruleBack-propagationPrincipal Component AnalysisSelf-organization

    RxmnpTes

    Unsupervised learning(no y given)

    Training(x gi\'{'n, calculate w)

    Supervised learning(y given)

    The work described in this paper was started at the SwissFederal Instituteof Technology, Lausanne(EPFL), sponsoredby EPFL, and was completed at the Jet PropulsionLaboratory, California Institute of Technology sponsored bythe l1.S. Department of Energy through an agreement withthe National Aeronautics and Space Administration.

    Reference herein to any commercial product, process, orserviceby trade name, trademark, manufacturer, or otherwise,docs not constitute or imply its endorsement by the UnitedStates Government or the Jet Propulsion Laboratory,California Instituteof Technology.

    Part of this material has been reprinted from [36] with thekino permission of CRL Publishing Ltd.

    The interdisciplina.ry domain of biological and artificialneural networks is still a young discipline which profitsimmensely from the cross-fertilization of life, natural andtechnical sciences. New neural models as well as theinterpretation and improvement of existing ANN within theparadigms of signal processing, statistics or other well-established disciplines have motivated technical researchers toapply these techniques to a class of power system problemslike for example load forecasting which so far were difficultto solve in a systematicmanner.

    It was the goal of this chapter to introduce the technicalreader to the terminology and the basic concepts in the area ofartificialneural networks in order to illustrate the potential ofANNfor technical applications.

    In the previous sections it was shown how a simple artifi-cial neural model can be derived from basic biologicalparadigms. These neural units can be connected to form artifi-cial neural networks. The behavior of the ANN is character-ized by its architecture, its processingalgorithmand its train-ing algoritlun. Tables III-IV give a short overview on the dif-ferentcharacteristics of neural networks and a non-cxhausti velist of examples.

    The architecture can be characterized by the dimension ofinput and output vector, the number of neurons and theirweighted connections. The activation function g of the unitand the architecture determines how the ANN processesincoming information. It further adapts its behavior (i. e. itsweights) according to a learning algorithm which itself maybe governedby a learning rate 11.

    We have discussed learning as an optimization task and wehave pointed out common and distinct features betweenlearning and optimization. Finally the concepts of supervised

    18

  • [2] Hertz, J.,Krogh, A. and Palmer, R. G., Introduction to the TheoryofNeural Computing, Addison Wesley, Reading, Ma, 1991.

    [3] Niebur, D. et al., "Neural network applications in power systems,"Int. Journal of Engineering Intelligent Systems, vol. 1" no. 3,December 1993, 133-158.

    [4] CajaI, S. R.,"La fine structure des centres nerveux," Procs. Roy.Soc. Lond. 55, 1894, 444-468.

    [5] McCulloch, W. S. and Pitts, W. H., "A logical calculus of the ideasimmanent in nervous activity,", " Bulletin of MathematicalBiophysics, vol. 5,1943,115-133.

    [6] Hopfield, J.1., "Neural networks and physical systems with enter-gent collective computational abilities," Procs. Nat. Acad. Sc., 79,1982, 2554-2558.

    [7] Niebur, D., "An Example of unsupervised networks - Kohonen'sself-organizing feature map," IEEE-PES Tutorial 011 ArtificialNeural Networks for Power Systems, to be published, 1996.

    [8] Hebb, D. 0., The Organization of Behavior: A NeuropsychologicalTheor, Wiley and Sons, New York, 1949.

    [9] Cichoki, A. and Unbehauen, R., Neural Networks for Optimizationand Signal Processing, John Wiley and Sons, Chichester, 1993.

    [10] Werbos, P. J., "Beyond regression: New tools for prediction andanalysis in the behavioral sciences," Doctoral Dissertation, AI.Math., Harvard University, November 1974.

    [11] Rumelhart, D. E., Hinton, G