estrategias de control_procesos

7/28/2019 Estrategias de Control_procesos

1/9

b urn in g C o n f d Shwiegiesfor Chemical ProcessesA Distributed ApproachRiyaz Sikora, Univ ersity of I l l inois

lem of inducing rules from examples hasreceived much attention, and such algo-rithms as version spaces,' AQ,* ID3,3 andPLS l 4 have been successful (PLS 1 is de-scribed in a sidebar on p. 36).

Similarity-based learning methods fallinto two categories.5.6Some methods, i n -cluding PLSl and ID3, use data to instan-tiate a parameterized model that is de-fined over an instance-space. An instanceis a data point or unit of a sample and isrepresented as a list of attribute values.The attributes define an instance-space inwhich each attribute is a distinct dimen-sion; for example, (Age, Height). Theproblem here is to fit the data to a modelof the function.

Other methods such as genetic algo-rithms use data to select a candidate con-cept defined over a hypothesis-space, thatis, the space of all possible concepts witheach point representing a whole concept.The problem here is to optimize so me mea-sure of hypothesis quality. Instance-spacealgorithms are generally fast; hypothesis-space algorithms are slow but stable.

All these algorithms operate on com-plete data sets to find the concept or rulethat explains that data. Taking a differentapproach, Michael Shaw and I designedthe Distributed Learning whichJUNE 1992

THE ISTRIBUTEDEARNINGYSTEMOMBINESSPLIT-BASED A ND GEhETl C ALGORITHMS. SUCH

HYBRID MACHINE-LEARNING METHODS SHOW GREATPROMISE WHEN A N RVDUCTIVE LEARNER CANNOT

REPRESENT A CONCEPT COVERING ALL POSITIVEEXAMPLES AN D EXCLUDING ALL NEGATIVE EXAMPLES.

combines the features of instance-spaceand hypothesis-space methods. This algo-rithm decomposes a data set of trainingexamples into subsets. After applying aninductive learning program on each subset,it synthesizes the results using a geneticalgorithm (see the sidebar on p. 37). Thisparallel distributed approach is more effi-cient, since each inductive learning pro-gram works on only a subset of data. Also,since the genetic algorithm searches glo-bally in the hypothesis space, this approachgives a more accurate concept description.We implemented DLS in Common Lisp ona Texas Instruments Explorer machine.

The problemSince it will be easier to discuss how

DLS works from the perspective of a spe-cific application, consider the process

~0885/9000/92/0400-0035 $3.000 99 2 IEEE

control problem to which I applied thisapproach. A process in a particular chem-ical plant produces an undesirable by-prod-uct that is not in the dat a and is not directlymeasureable. An expensive chemical, whichwe'll call Fix, can be added in just suffi-cient quantities to remove the by-productchemically. The goal is to change the con-trollable process variables V,,V,,. ..,V, toreduce the amount of needed chemical Fix.Since it is not possible to derive mathe-matical equations linking the amount ofFix used to the process variables VI to V,,we used inductive learning to generatedecision rules explaining when the amountof Fix is minimal. Each example in the dataset consists of a set of plant readings show-ing the values of all the process variablesand the amount of Fix being used. Todichotomize the decisions, the factory ex-perts used a threshold value of 0.35(all values were scaled between 0 and 1 to

35


2/9

The PLSl algorithmThe PL S algorithm follows an inductiveprocess starting with the entire space of pos-sible events (called the feature space).

It splits the space into two regions: thosemore likely to he in a specific class (posi-tive events) , and those more likely to bein other cla5ses (negative events). Thesplitting continues un t i l a stopping criterionis satisfied. Each split uses only one at-tribute chosen according to an inforniation-theoretic approach.In each iteration, the region R in the fea-ture space can be defined by a tuple (r . ,u ,e):- r is the region or disjunct represented asa conjunction of conditions. This repre-sentation resembles DLS, in which a dis-junction of regions constitutes a conceptor hypothesis.

U is the u t i l i t y function, th a t is, the ratioof positive events to the total eventscobered by the disjunct. Since the algo-

rithms purpose i s to maximize thedissimilarity between disjuncts, eachsplit is made to maximize the differencein the utilities of the two disjuncts(known as the distant function).e is the error rate associated with a dis-junct. It is based on the number of posi-tive eveiits covered by the disjunct ascompared to the total number of positiveelents (always less t ha n 1 ).The distant function cfisr is defined asdist =l ~ o g , - og u21 ~ f x log(el x e > )

where u I and U? are utilities for a tentativeregion dichotomy. e , and c 1 are their respec-tive error factors, and f is a constant repre-senting th e degree of confidence. Larger val-ues of di.vt correspond to higher dissimilarity.Let S be the set of positive and negativetraining e\ents, and let R he the region thatcontains al l events in E . The PLSl algo-rithm follows these steps:

protect the process information). We con-verted the problem to a single-conceptlearning problem by treating the amount ofFix as a class membership variable:positive when it is below the 0.35 thresh-old, else negative. The data set has 57 4sets of examples, each taken at a differenttime.

The most common criteria for evaluat-ing a learning algorithm is its predictionaccuracy. However, since the objecti ve of

While any trial hyperplane remainsBegin

untested, doChoose a hyperplane not previouslyselected to become a tentativeboundary for two subregions of R,rl and r2Using the events from S, determinethe utilities U, and u2 of r, and r2 ,and their error factors el and e2.If this tentative dichotomy producesa dissimilarity dlarger than anyprevious value for dThen create two permanent regionsRI = ( r l ,u l , e,) and R2 = ( r2, Z, e2)having the (previously recorded)common boundary that gives themost dissimilar probabilities;Else place R in the defined region setto be output, and quit.

End.

this study was to learn strategies for con-trolling the use of a chemical, it was moreimport ant that the rules or strategies learnedwere concise and easy to implement.

Consider a general learning algorithmthat takes the training data set ( X , u ) asits input and finds a concept u(X) that

v uata set / Inductive -learner00

Inductive -earnerGeneticalgorithm

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ~_ _Figure 1. The Distributed learni ng System.36~

explains it . For each instance of ( X , u ) ,X isthe vector of attribute values and U is thecorresponding classification variable (usu-ally a binary variable with 1 for positiveand 0 for negative examples). The outputconcept, u ( X ) . s the classification variableasafunctio n oftheattribute vector. Wecanthus address the problem of learning theconcept as the problem of finding the func-tion u ( X ) .

DLS uses inductive learning programsto generate hypotheses that can be inputinto the genetic algorithm. By providing auseful initial population, these inputs de-crease the tim e needed to reach the correctconcept. The genetic algorithm searchesglobally and with implicit parallelism,which improves the quality of the l earningresults. Figure 1 shows a functional dia-gram of DL S, where ( X . r i ) is represented asP and ( X J ~ ) ~ .he ith subsample, is repre-sented by P, .

Decomposing the data set. We decom-pose the data set into subsamples, sinceaveraging a statistic over subsamples givesa better estimate than does calculating thestatistic from the whole data set P . We usethejackknif e technique of drawing randomsamples,x n which one or more data pointsare removed from the data set P to get asubsample P , . This is repeated to get moresubsamples P , ( ; = I , 2 ..._.1) .Specifically, we used the leave out r

IEEE EXPERT- ~


3/9

Genetic algorithmsGenetic algorithms are adaptive parallel-search algorithms that can locate globalmaxima without getting trapped in localmaxima. Goldberg describes genetic algo-rithms as search algorithms based on themechanics of natural selection and naturalgenetics.

( I ) a chromosomal representation of a so-lution to the problem;(2) a way to create an initial population ofsolutions;(3 ) an evaluation function that rates so lu-tions in terms of fitness; and(4) genetic operators that alter the compo-sition of solutions during reproduction.

A genetic algorithm includes

Starting from an initial population of so-lutions, the genetic algorithm works withone population of solutions at a time. Thealgorithm evaluates each solution and as-signs it a fitness score. By applying recom-bination and genetic operators to the oldpopulation, the algorithm generates a newpopulation of solutions, which it then ex-plores.Three genetic operators are commonlyused. The reproduction operator duplicatesthe members of the population (solutions)that will be used to derive new members.The number of copies of each member is

technique (0 5 r < l ) , which obtainsa random subsample where each examplein the original data set has a prob-ability of (1-7) of being included in thesubsample.We also use a decomposability index d tocalculate he amount of overlap among sub-samples. It is defined (for a particular de-composition of the data set) as the ratio ofthe average number of examples in eachsubsample to the total number of examplesin the data set: d =Num x (I-r)/Num =(I-r),where N um is the total number of examples.In general, there is no overlap when d =l/ n(where n is the number of subsamples).

PLSl programs within DL S use eachsubset to generate a concept to be passed tothe genetic algorithm. To compare the con-cepts based on their scope, we defined anindex s as the ratio of the num ber of exam-ples covered by a given concept or rule tothe number of all possible examples. Inother words, s is the fraction of instance-space covered. Thus, if s l > s*, hen aJUNE 1992

proportional to its fitness score. After repro-duction, new individuals are generated byselecting two individuals at a time from theresulting population and applying the cross-over operator. This exchanges genes be-tween the two selected individuals (parents)to form two different individuals, and it isusually applied with a constant probabilityP,.The mutation operator randomly chang-es some genes in a selected individual. It isapplied much less often, at a rate of P ,(where P,


4/9

If a particular k i s not present in C j , timplies that the attribute j can take anyvalue in Ci.n the above case, for example,there is no c l , l in C I , implying that theattribute VI can take any value in CI .

CIand Czare control strategies for keep-ing the value of chemical Fix below 0.35.Stated differently, the control strategy C1keeps the control variable V , above 0.262and V,above 0.07 1 to keep the value of Fixbelow the 0.35 threshold. In this applica-tion, one of the main criteria in judging aconcept is ease of use: We want to have aminimum number of disjuncts k (that is, aminimum k-D NF concept) and a minimumnumber of terms, cj,', n each disjunct Ci.

PLSl uses this type of representation.The genetic algorithm in DLS must repre-sent control strategies in a way compatiblewith the PLSl representation. I obtainedthe genetic algorithm's representation(called its genotype) by mapping the char-acteristics (that is, the phenotype) ofC, = ( c l , i ~ . . . ~ < ,,) into binary form (seethe sidebar on p. 37). For example, ifthe problem has two attributes A, and A2,and if we use a 3-bit binary representationto convert the integers into binary form,then a disjunct C, = A c2,i,where c l , i =(1 I A I 5 3) and c2,;=(4 5 A, I 2) can berepresented as

(( 1 3) ( 4 2 )) ---- phenotype(0 0 1 0 1 1 1 0 0 0 1 0) ---- genotypeDeriving rules. Genetic algorithms can

be used to derive rules in two ways: byletting each member in the population rep-resent a compl ete concept (C1/C2/. . C p ) .or by letting each member be a singledisjunct C, =(5 , , ,~ . . . ~ .5 ,) .DLSusesthesecond method because: ' unlike the firstmethod, it does not have to know the num-ber of disjuncts beforehand. If the data setis decomposed into n subsets and eachsubset is used by a PLS 1 program, then thePLS1 programs will generate n concepts,C1,C2 ...C", where each C i = C [ ~ . . . ~ C &The genetic algorithm takes each C; (f&j = 1 to m ,) from C' (for i= l to n ) as onemember of the population. The initial pop-ulation input to the genetic algorithm there-fore has a total of ml+m2+..+m, members,represented as genotypes. Thus, translat-ing the output of the inductive-learningcomponent into input for the genetic algo-rithm merely consists of breaking eachconcept C'into its members and convertingeach Cj into its binary genotype.

The algorithm tries to find the best pos-sible hypothesis (or disjunct) by applyinggenetic operators to the initial population.It then retains the best hypothesis, removesthe positive examples that are covered, andrepeats the process to find a new hypothe-sis that covers as many of the remainingpositive instances as possible. The processterminates either when all the positive in-stances are covered, or when the algorithmcannot find a disjunct covering a certainthreshold of positive examples (I kept thethreshold at 5 for this application). Thefinal concept is the disjunct of all the hy-potheses found. This method of searchingthe instance-space is called explanation-based filtering.5

THE RONG BIAS CANMKELEARNlNG INEFFlClEiW OR

A CONCEPT HARD TO LEARN.COMBZNZNGLGORITHMSCAN ENSURE THA T WE

AVOlD THESE PROBLEMS.

The genetic algorithm's fitness func-tion. The goal of the ideal learning systemis to obtain a concept that covers all thepositive examples without covering anynegative examples. Due to noise in real-world data, we have to relax this require-ment and allow a few negative examples tobe covered as well. On the other hand, thepositive and negative examples in the appli-cation described here constitute two classesthat are exclusive of each other. When thelearning program gives a concept C for thepositive examples, we also want C* (thecomplement of C) to represent the conceptfor the negative examples. T hus a conceptC actually represents two mutually exclu-sive classes and hence is associated withtwo accuracy terms. Consequently, the eval-uation function should maximize both ac-curacy terms.

The two accuracy terms are defined as

where Po s is the total number of positive

examples, N eg is the total number of neg-ative examples, po s is the number of posi-tive examples covered by the concept, andne g is the number of negative examplescovered. The multiobjective function isreduced to a single-objective function bytaking a convex combination of F1and F,:F =a.F, + ( 1 - a).FZ

where a is any number (0 5 a 5 1).Let a =Posl(Pos+ N e g ) , so that the two

accuracy terms are weighted proportion-ally to their class representativeness. T heobjective function then becomes

.[p os +Ne g - eg ]F=- Po s +NegMultiplying by the constant (Pos + N e g ) ,we get the fitness function F = (p o s +Ne g - eg ) , where 0


5/9

Table 1. The six best control strategies learned by C4.5.Vl v2 v3 v 4 v5 v6 v7 V8 V, CLASSIFICATION S NDEX

C, >0.028 - >0.256 - - >0.043 50.405 50.815 >0.457 (66, 7) 0.126Cz 50.028 - >0.256 - - >0.043 50.405 50.889 - (47, 7) 0.007c3 >0.028 - >0.256 - >0.616 >0.043 50.366 50.815 50.457 (38, 7) 0.037Cd d.028 - >0.633 - 50.818 >0.043 50.405 50.519 50.439 (20, 7) 0.026Cs >0.028 - >0.256 - 50.768 >0.043 50.405 [0.815, >0.439 (20, 20) 0.009

0.8891(14, 6) 0.1606 - >0.256 - - >0.043 >0.601 >0.444 -

Table 2. The six best control strategies learned by PLS l .Vl v2 v3 v4 v5 v6 v7 V8 V9 CLASSIFICATIONPREDICTIONINDEX

cy -c -

C{ 50.183 -

20.373 - 20.437 20.071 50.56420.373 - [0.437, 20.071 [0.071,0.8021 0.579120.373 - 20.437 20.071 20.72220.373 20.5 20.437 20.071 [0.563,0.786120.373 50.5 20.437 20.071 [0.564,0.7861[0.373,0.6271 20.802 20.071 [0.071,0.5791

s0.72220.722

[0.516, -0.7221[0.484,0.7221[0.484, -0.7221

20.722

(183,14) (41, 6) 0.134(32, 3) (IO,4) 0.030

(25, 0) (3, 2) 0.019(18, 6) (3, 2) 0.009

(7, 1) (1, 2) 0.002

(6, 2) (2, 3) 0.007

and V , above 0.457. In other words, thestrategy imposes constraints on six of thenine process variables. The s index is onemeasure of this restrictiveness. If the strat-egy is very restrictive, that is, if it calls forsevere control of all variables, the s indexwill be low. In this case, the best strategy(C,) has an s index of 0.126 and explainsonly 66 of the 290 cases where the value ofFix is below 0.35 (positive examples), butwrongly explains seven of the 168 caseswhere the value of Fix is above 0.35 (neg-ative examples).

While the trees prediction accuracy was78.3 percent, Table 1does not include a col-umn for prediction, since C4.5 does not pre-dict results for individual leaves of the tree.PLS1. PLSl describes its concepts

in the required disjunctive normal form.The variable sig is the significance level

(between 0 and 1) corresponding to theapproximate noise level in the data set. Theconcept given by the PLSl program forsi g =1 had a size of 18, and its predictionaccuracy was 77.2 percent. Table 2 pre-sents the six best control strategies of the18 PLS 1 developed.

The best strategy, C;, constrains five ofthe nine process variables and has an s index(scope) of 0.1335; about the same as that ofthe best strategy from C4.5. It explains 183of the 290 positive cases but also covers 1 4of the 168 negative cases-a much betterrule than the best one from C4.5. In terms ofprediction, C, correctly predicts 41 of thetesting sets 65 positive cases but wronglycovers six of the 49 negative cases. Inter-estingly, the s indices generally decreaseas the rules become weaker.

DLS. For this comparison, I used the

parameter values n=5 and d =0.2. For thegenetic algorithm, I used the StochasticUniversal Sampling (SUS) algorithm9 forreproduction, a uniform crossover opera-torlo with a0.7 probability, a 0.05 probabil-ity of mutation, and 100 generations. Thepopulation size of the genetic algorithmwas determined by the PLSl programsoutput and was found to be proportional tothe value of d for each n. I scaled the real-valued data set (between 0 and 1) to integervalues from 0 to 63 so that the geneticalgorithm could use a 6-bit binary repre-sentation for the genotypes. Later, I trans-formed the resulting concept given by DLSback to the original real-valued range. Forease of understanding, DLS outputs arereported as infinite intervals (even thoughthey are actually finite closed intervals) ifeither end point of an interval correspondsto the domains lower or upper limit.

1 I1 JUNE 1992 39 I


6/9

Table 3. The control strategies learned by DLS.Vl v2 v3 v4 v5 v6 v7 Vll V, CLASSIFICATION PREOICTIONS INDEX

C? - - 20.262 - - 20.071 50.579 - (235,39) (55,14) 0.397(6, 6) 0.006 - - 20.373 - 20.437 [0.071, 20.595 l0.42, [0.516, (42, 11)0.7541 0.7221 0.7061

17- n=2- 5+=7+LSl(sig=O)--It PLSl(sig=l)

2.00e-1 4.00e-1 6.00e-1 8.00e-1 1 OOetOd

2.71e-20

gure 2. Plot of accuracy o f PLSl and DLS for various values of n and d.

Taking a closer look at Tables I and 2,we can see patterns across the controlstrategies generated by C4.5 and PLS 1.For example, the strategies have no con-straints on the process variable V,, allplace a lower bound on the value of V ,and V,, and so on. It is exactly thesegeneralizations that DLS captures in itsstrategies. It has a lower bound on V 3 hatfalls between the lower bounds producedby C4.5 and PLS1. Although it is notobvious, the strategies generated by DLScombine and refine those generated byPLS 1 . For example, taking the union ofC, and C ; from PLSls output wouldeliminate the variable V , and place an up-per bound of 0.579 on V 7 ,which is exactlywhat DLS does. DLS has combined C[ andC i rom the output of PLS 1, modified it toremove the restriction on the variable V,,40

and relaxed the lower bound on V 3 o 0.262to produce a more powerful rule Cr . Thisdemonstrates DLSs ability to refine rules,that is, to combine and modify one or morerules from PLSls output to form a moreconcise and powerful rule.

The concept learned by DLS, shown inTable 3, has only two control strategies, ascompared to 18 from PLS 1 and 20 fromC4.5. The best DLS strategy, C y , calls formaintaining the process variable V3 above0.262, V, above 0.07 1,and V7below 0.579.Because it restricts only three of the nineprocess variables, C;is easy to implement ,and it has an s index of about 0.397 com-pared with 0.134 for PLSls output and0.126 for C4 .5 ~utput. In terms of classi-fication, c;explains 235 of the 290 posi-tive cases-much more than any strategyfrom the other two tests-but at the cost

Table 4. Overall comparison of (4.5,PLSl, and DLS results.

LEAFINING PREOICTION NUMBERF RULES/MEMOD ACCURACY CONTROL STRATEGIES

( P E RCE NT)C4.5 78.3 20PLS l 77.2 18(sig=1DLS 79 2(n=5,d=0.2)

of also covering 39 of the 168 negativeexamples. In terms of pred iction, C;pre-dicts 5 5 of the 65 positive test cases, againmuch better than the previous results, butalso wronglypredic ts 14 of the45 negativetest cases. Overall, the prediction accuracyof the DLS results is 79 percent. DLS stopslearning after covering 277 of the 2 90 pos-itive training-set examples, because thegenetic algorithm cannot find a new strat-egy explaining more than the threshold offive positive examples.

The results of the experiment show thatDLSs control strategy is about 90 percentmore concise than those produced by PLS 1and C4.5, and its predictions are moreaccurate. Table 4 compares the overallconcepts generated by each method.

An empirical studyTwo DLS parameters, the number of

subsamples IZ and the decomposability in-dex d , must be tuned to achieve good per-formance. Using the same data set from thechemical process control problem, MichaelShaw and I experimented further to gaugethe effect of changing these parameters onsystem perf ~rm ance .~e randomly dividedthe data set into a training set of 458 and atesting set of I14 examples. We used thesame parameter values for the genetic


7/9

algorit hm as in the first set of experiments,and a different training and testing set ineach of five runs, averaging the results.

Figures 2 and 3 show the predictionaccuracy and CPU time, respectively, ofPLS 1 and DLS for different value s of n asa function of d. For each value of n, DLS isgenerally the most accu rate when the aver-age amount of overlap between subsam-ples is minimal, that is, around d = I /n.Also, the peak performances when n =2and n =5 are better than PLS 1 in terms ofprediction accuracy, and the CPU times arecomparable. The average size of the con-cept for DLS was approximately t hree, andforPLSl itwas49.8forsig=Oand 17.4forsig = 1 . Thus the accuracy for n =5 andd = 0.2 is comparable to that of PLSl(about 2 percent better), but rule size im-proves by about 82 percent. However, theprediction accuracy of DLS starts to de-crease as the value of n increases, due to theloss of data representativeness as the dataset is decomposed into too many insignifi-cant subsamples. The performance of DLSthus depends on both the parameters nan d d.

Other applicationsWe also appli ed this approach to finan-

cial problems5 using real-world data setsfor bankruptcy analysis, loan default eval-uation, and loan risk classification. Anexample on loan default evaluation againdemonstra tes DLSs ability to refine rules.We used this data set to classify firms intothose that would default on loan paymentsand those that would not. Our testing setincluded I6 different positive instancesand a data set of 32 examples, of which 16belonged to the default class and the other16 to the nondefault class. We set theparameter values at n =20, d =0.6, theprobability of crossover =0.7, and theprob-ability of mutation = 0.01. Since the test-ing set had only positive examples, onecould argue that the best rule would be allfirms default. However, this involves us -ing information about the testing set priorto learning.

Figure4lists the rules obtained by PLS 1and DLS, along with the number of in-stances they correctly predicted. The PLS 1attribute (total-debt to total-assets ratio)has the value of 50.76 3 in C , and >0.763 inC;, with the Salne of (current-asset tocurrent-liability ratio) remaining the sameJUNE 1992

6

5

4r5._ 3-3a

2

1

0

- =2--t n=5--It n=7+LS l (s/g=O)-f LSl(s/g=l)

I I I I I I I I I I2.71 e-20 2.00e-1 4.00e-1 6.00e-1 8.00e-1 1 OOetO

dgure 3. Plot o f CPU time o f PLSl and DLS fo r various values o f n and d.

PtSl RULEC;: If [(total-debt to total-assets ratio 5 0.763) And

Then the firm would default on loan paymentIf

(current-asset t o current-liability ratio 2 1.967)]

[(net-income to total-assets ratio 2 -0.015) And((total-debt to total-assets ratio >0.763) And(current-asset to current-liability ratio 5 1.967)]Then the firm would default on loan payment

INSTANCESCORRECTLY PREDICTED

6

4

c,: If [(1.967


8/9

in both rules. DLS combined these PLSlrules to get the simpler rule C y , whicheliminates the variable (total-debt tototal-assets ratio). Thus DLS reduces therule size from four to three and improvesthe prediction accuracy from 68.8 percent(for PLS 1 ) to 93.8 percent. The predictionsof DLSs rule C{, 14 out of 16 defaultloans, outnumber the combined predictionsof PLSl S output.

l imitat ions and future workWhile this distributed-learning algorithm

improves performance, it has limitations.Real-coded and symbolic-coded ge-

netic algorithms.Since genetic algorithmswork on strings of 0s and Is , our methodis currently limited to problems where theinductive learners output can be translat-ed easily into binary code. Although weused a binary-coded genetic algorithm inDLS, genetic algorithms do work on morethan binary alphabets. I am working onreplacing the binary-coded genetic algo-rithm with a real-coded one; that is, onethat works directly on real intervals. Theonly changes needed are slightly differentcrossover and mutation operators:

The uniform crossover operator can beused with representations involving realintervals by treating each attributesinterval range as a single entity (seeFigure 5 ) .The mutation operator can be designedas a specialization/generalization oper-ator that either increases or decreases(with equal probability) the interval rangeof an attribute.The same argument also applies to prob-

lems involving symbolic values. Instead ofusing intervals, the genetic algorithm woulduse sets to represent symbolic variables.The crossover operator would remain thesame, but the mutation operator would haveto be changed. The new mutation operatorcould simply replace one value of the sym-bolic variable with another value from itsdomain. For example, if color is one of theattributes, and if the domain of color is{red, blue, white, yel low), then the muta-tion operator would change the currentvalue of color in the concept by eitheradding a color from the domain or remov-ing an existing color.42

Handling complex representation an-guages.Although the task is nontrivial, weshould also be able to extend DLS to han-dle complex representation languages basedon first-order predicates. This would re-quire developing new genetic operators(crossover and mutation) that respect lan-guage syntax. We could use John Kozasgenetic-programming paradigm, in whichpopulations of compute r programs are ge-netically bred using a crossover (recombi-nation) operator appropriate for geneticallymating computer programs.Many seemingly different problems inartificial intelligence, symbolic process-ing, and machine learning require computerprograms that produce a desired output forparticular inputs. Applying this idea to

DZFFERENTLGORZTHMSUSE DZFFERENT BIASES(IMPLICITLY OR EXPLZCITLY),PROVZDNG DLS WITH THE

UNIQUE APPROACH OF U SN GMULTIPLE BIASES.

DLS, we recognize that the outputs of in-ductive learners can be thought of a s pro-grams that take an object as input and givethat objects classification as output. Forexample, the learners output could be inthe form of a schema (as in explanation-based learning) for a particular concept.Given an object as its input, the schemawould determine whether the object be-longs to the concept. Extending DLS tohandle languages based on first-order pred-icates would require mapping a learnersoutput to a function or program (such as aLisp s-expression), and incorporating intothe genetic algorithm special genetic oper-ators similar to the ones used in the genetic-programming paradigm.

This argument also applies to represen-tation languages based on decision trees.The trees generated by several inductivelearners (ID3, for example) can be mappedto Lisp s-expressions as input and com-bined to make a more concise tree. Thegenetic algorithm can then ope rate on thes-expressions by applying recombination

operators. Kozas example shows how agenetic algorithm learns the tree for a sym-bolic problem involving the attributes Tem-perature, Humidity, Outl ook, and Windy.3

Attribute-based decomposition.Anotherlimitation of DLS is the way in which itdecomposes its data set: Since it allocatesa fraction of the examples to each induc-tive learner, DLS might not be effective ifthe data set is small.

However, we could instead use an at-tribute-based decomposit ion scheme whereeach learning program gets only the datacorresponding to a subset of attributes.This might be especially useful in real-world situations where data is spatiallydistributed (as in air traffic control). Inthese cases it is more efficient to use theinductive learners where the data residesand then synthesize the results, rather thancollect the data at one place and thenanalyze it.

In this method of decomposition, how-ever, the results of the inductive learnersare underspecified with respect to the wholeproblem. This can be overcome by fillingthe unspecified portions of t he partial solu-tions with complete domains of the respec-tive attributes. We could then use the ge-netic algorithmon these complete solutions.

For example, lets assume that four at-tributes, AI , AI, A,, and A4, are distributedtwo each to two programs. The learnersgenerate partial solut ions that give a rangefor a particular attribute; for example,((0, 3.5) (-8.7, 3)) could be partial solu-tion 1 using the first two attributes, and((9, 12.5) (3, 8.1)) could be partial solu-tion 2 using the next two attributes. Inother words, partial solution 1 says that[(O 5 A I 5 3.5) and (-8.7 I A, I 3)], andpartial solution 2 says that [( 9 < A g 5 12.5)and(31A4S8.1)] . fweknowthedomainsof the respective attributes (which we usu-ally do in real-world problems), we canexpand the partial solutions by combiningthem with the domains of the missing at-tributes (that is, the dont cares). Say, forexample, the domains of th e attributes are(0, lo), (-10, lo) , (0, 201, (0, IO ) , respec-tively. Then the above partial solutions canbe converted into complet e solutions:

((0 3.5) (-8.7 3) (0 20) (0 10))1 1 1 1((0, 10) (-10, 10) (9, 12.5) (3, 8.1))

_ _ _ _ _ _ _ _ _ ~ IEEE EXPERT


9/9

I_--- .-. . .-. . . -- ,Iarentl: [(-0.5 0.5) (0 1) (0.12 0.34) (0 0.5)jMask: 1 0 1 0 Childl: [ (0 0.3) (0 1) (0.01 1) (0 0.5)]

~

Parent2: [ ( 0 0.3) (0.32 0.7) (0.01 1) (-0T1 0.43)] Child2: [(-0.5 0.5) (0.32 0.7) (0.12 0.34) (-0.1 0.43)]~Figure 5. Crossover example.

This accomplishes two things: It com- negative examples, or in the case of noisypletely specifies solutions that a genetic data. In such situations, a genetic algo- ialgorithm can use, and it retains the partial rithm can look for an optimal concept that Isolutions in their original form. This alsocorresponds nicely to the theory of sche-mata of genetic algorithms, 2 since the par-tial results of the induct ive learners corre-spond to schemata (that is, similaritytemplates that the genetic algorithm treats

covers as many positive exam ples as pos-sible and as few negative exampl es as pos-sible. This is important, since such situa-tions are common in real-world applications.The successful implementation of DLSdemonstrates a promising direction foras building blocks in searching for an opti- ~ hybrid machine-learning systems, which

mal solution). synergistically combine separate learningMy initial experiments using attribute- paradigms.

baseddecomposition on areal-worl d bank-ruptcy prediction problem show promisingresults. However, more work needs to bedone before making any conclusions. Acknowledgmentsers in DLS. ~ l ~ h ~ ~ ~ he used o n l y on esimilarity-based earningalgorithm pLs ),

couragement an d continued support during thisu80rk,and Selwyn Piramuthu for providing theresults of C4.5 on the process control data set. Iwe could use different learning algorithms ~ would also like to thank the anonymous review-ondifferent samples. For instance, w ecould

use PLS 1 and ID3 simultaneously on dif-ferent samples, and then decompose theirresults into rules before giving them to thegenetic algorithm for refinement.

Another important concept in inductivelearni ng is that of bias: Th e representationallanguage used by a learning algorithm con-strains the search space of possibl e hypoth-eses. Since different algorithms use differ-ent biases (either implicitly or explicitly),they provide DLS with the unique approachof using multiple biases. This also hasimportant implications for solution qualityand efficiency: Before testing, we do notknow which bias is suitable for the prob-lem at hand, and the wrong bias can some-times make learning inefficient or a con-cept hard to learn. Combining differentalgorithms can ensure that we avoid theseproblems.

O U R ISTRIBUTED MACHINE- ~learning method is useful when the lan-guage of the empirical inductive learnercannot represent a concept covering all thepositive examples without covering anyJUNE 1992

ers for their comments and suggestions

References1 . T. Mitchell, Version Spaces: A CandidateElimination Approach to Rule Learning,Proc. Fifth Intl Joint Con$ Artificial Intel-ligence, Morgan Kaufmann, San Mateo,Calif., 1977, pp . 305-310.2. R . S . Michalski, A Theory and Methodol-ogy of Inductive Learning, in Machine ,Learning, R. Michalski, J. Carbonell, andT. Mitchell, eds., Tioga Publishing, PaloAlto, Calif., 1983,pp . 83-134.3. J.R. Quinlan, Induction of Decision Trees,in Machine Learning, Vol. 1, N O . I , 1986,pp . 81.106.4. L.A. Rendell, A General Framework forInduction and a Study of Selective Induc-tion, in Machine Learning, Vol. I , No . 2,1986, pp . 177-226.

IEEE Trans. Systems, Man, and Cybernet-ics , Vol. 20, No . 2, 1990, pp. 326-338.7. R. Sikora and M. S h a w , The DistributedLearning System: A Group Problem-Solv-ing Approach to Rule Learning, Tech.Report 9 1-0143, Bureau of Economic an dBusiness Research, Univ. of Illinois at Ur-bana-Champaign, 1991.8. B. Efron, The Jackknife, the B ootstrap, andOther Resampling Plans, SIAM, Philadel-phia, Pa., 1982.9. J.E. Baker,Reducing Bias and Efficiencyin the Selection Algorithm, Proc. SecondIn! I Conf: Genetic Algorithms, LawrenceErlbaum, Hillsdale, N.J., 1987, pp.14-21.

10. G. Syswerda, Uniform Crossover in Ge-netic Algorithms, Proc. Third In1 1 Conj:Genetic Algorithms, Morgan Kaufmann,Sa n Mateo, Calif., 1989,pp . 2-9.1 . J.R. Koza, Genetic Programming: A Par-adigm for Genetically Breeding Popula-tions of Computer Programs To Solve Prob-lems,Tech. report Stan-CS-90-1314,Dept.of Computer Science, Stanford Univ., Stan-ford, Calif., 1990.12. J . Holland, Adaptations in Natural andArtjficial System s, Univ. ofMichigan Press,A n n Arbor, Mich., 1975.

Riyaz Sikora is a doc-toral candidate in man-agement informationsystems at the Univer-sity of Illinois, Urbana-Champaign. His r e -search interests includebusiness applications ofgenetic algorithms, ma-chine learning, intelli-gent manufacturing, andmodeling of group decision making in distri--buted systems. After receiving his BS from5 . R. Sikora and M. Shaw, A Double-LaY- Osmania University in India, he was a researcheredLearningApproachtoAcquiringRules assistant at Washington University. He is amem-for Financial Classification, Tech. Report he r of The ~~~~i~~~~ f M~~~~~~~~~science,90-1693, Bureau of Economic and Busi- 1 A C M , and ~ ~ ~ i ~ iciences ~ ~ ~ t i t ~ t ~ .ness Research, Univ. of Illinois at Urbana- ~~~d~~~ an reach Sikora at the BeckmanChampaign, 1990.

~ Institute, University of Illinois, 405 N. Mat-thewrAvenue,Urbana,IL6180l,orhye-mailt6. L.A. Rendell, Induction as Optimization, 1 [email protected]
mailto:[email protected]:[email protected]

estrategias de control_procesos

Documents