committee neural networks with fuzzy genetic algorithm.pdf

Upload: saurabh-tewari

Post on 25-Feb-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    1/7

    Committee neural networks with fuzzy genetic algorithm

    S.A. Jafari a,, S. Mashohor a, M. Jalali Varnamkhasti b

    a Computer and Communication Systems, Faculty of Engineering, University Putra Malaysia, 43400, Serdang, Selangor, Malaysiab Laboratory of Applied and Computational Statistic, Institute for Mathematical Research, UPM, 43400, Serdang, Selangor, Malaysia

    a b s t r a c ta r t i c l e i n f o

    Article history:

    Received 17 October 2009

    Accepted 10 January 2011

    Available online 28 January 2011

    Keywords:

    back propagation neural network

    committee neural network

    fuzzy genetic algorithm

    reservoir properties

    Combining numerous appropriate experts can improve the generalization performance of the group when

    compared to a single network alone. There are different ways of combining the intelligent systems' outputs in

    the combiner in the committee neural network, such as simple averaging, gating network, stacking, support

    vector machine, and genetic algorithm. Premature convergence is a classical problem in nding optimal

    solution in genetic algorithms. In this paper, we propose a new technique for choosing the female

    chromosome during sexual selection to avoid the premature convergence in a genetic algorithm. A bi-linear

    allocation lifetime approach is used to label the chromosomes based on their tness value, which will then be

    used to characterize the diversity of the population. The label of the selected male chromosome and the

    population diversity of the previous generation are then applied within a set of fuzzy rules to select a suitable

    female chromosome for recombination. Finally, we use fuzzy genetic algorithm methods for combining the

    output of experts to predict a reservoir parameter in petroleum industry. The results show that the proposed

    method (fuzzy genetic algorithm) gives the smallest error and highest correlation coefcient compared to ve

    members and genetic algorithm and produces signicant information on the reliability of the permeability

    predictions.

    2011 Elsevier B.V. All rights reserved.

    1. Introduction

    There are some reasons for distributing a learning task among a

    number of individual networks. The main reason is due to improving

    the generalization ability, because the generalization of individual

    networks is not unique. The combination of some Articial Neural

    Network (ANN) when they do the same task is called as the

    ensemble of neural network or committee of neural network. When

    the networks are different it is called a committee of machine. In

    ensemble methods, the ensemble candidates are different. There are a

    number of methods to create different individual training data, the

    initial condition, the topology of nets, and the training algorithms.

    After selecting individuals and training them, their generated results

    will be combined by some methods. The committee machine

    structure can be viewed in Fig. 1. In the committee machine, the

    expectation is that difference experts converge to different local

    minima on the error surface, and the overall output improved the

    performance, (Wolpert, 1992; Efron and Tibshirani, 1993; Rezaee,

    2001). The mean square error (MSE) between individual output and

    expectation output (target) can be expressed in terms of the bias

    squared plus the variance, (Haykin, 1999). The MSE equation makes it

    clear that we can reduce either the bias or the variance to reduce the

    neural network error. Unfortunately, it is found that for theconcerned

    individual ANN, the bias is reduced at the cost of a large variance.However, the variance can be reduced by an ensemble of ANNs. From

    the MSE equation,Naftaly et al. (1997)obtained two conclusions:

    (1) The bias of the ensemble averaged function is exactly the same

    as that of the function connected to a single NN.

    (2) The variance of the ensemble averaged function is less than

    that of the function connected to a single NN.

    These theoretical ndings indicate that ensembles of ANNs can

    easily reduce the variance with less cost from the bias. Therefore, an

    effective approach is to select or create a set of nets or experts that

    show high variance but low bias, because with combining, we can

    reduce the variance. Several methods have beenemployed for creating

    committee members. Generally, these methods can be divided into

    three categories:

    (1) Method to select diverse training data sets from the original

    source data

    (2) Method to create different experts or individual neural

    network

    (3) Method to combine these individuals and their results

    2. Some methods for constructing committee member

    In this section, we introduce some methods for committee

    member construction as mentioned in part 1. There are some

    approaches that have been used for selecting training data set by

    Journal of Petroleum Science and Engineering 76 (2011) 217223

    Corresponding author. Tel.: +60 124422445.

    E-mail addresses:[email protected](S.A. Jafari),[email protected]

    (S. Mashohor),[email protected](M.J. Varnamkhasti).

    0920-4105/$ see front matter 2011 Elsevier B.V. All rights reserved.

    doi:10.1016/j.petrol.2011.01.006

    Contents lists available at ScienceDirect

    Journal of Petroleum Science and Engineering

    j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / p e t r o l

    http://dx.doi.org/10.1016/j.petrol.2011.01.006http://dx.doi.org/10.1016/j.petrol.2011.01.006http://dx.doi.org/10.1016/j.petrol.2011.01.006mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.petrol.2011.01.006http://www.sciencedirect.com/science/journal/09204105http://www.sciencedirect.com/science/journal/09204105http://dx.doi.org/10.1016/j.petrol.2011.01.006mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.petrol.2011.01.006
  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    2/7

    varying the source data sets. Bagging, noise injection, cross-validation,

    stacking and boosting are the most common techniques. Other

    methods have been used by a researcher to construct committee

    members are: Fuzzy Logic (FL) with a different fuzzy inference

    system, Genetic Algorithm (GA), Neuro-Fuzzy, empirical formula and

    etc.Kadkhodaie-Ilkhchi et al. (2009a), have used the neural network,

    fuzzy logic and fuzzy neural network as a committee member.Chen

    and Lin (2006) have used three empirical formulas as committee

    members. Kadkhodaie-Ilkhchi (2009b), Rezaee et al. (2009) have

    used a back-propagation neural network with the different training

    algorithm for the construction committee neural network. In this

    paper, we have used ve signicant training algorithms in articial

    neural network as committee members. They were Levenberg

    Marquardt (LM), Bayesian Regularization (BR), One Step Secant

    (OSS), Resilient Back Propagation (RP), and Scaled Conjugate Gradient

    (SCG).The followingis a brief description of some important methods to

    create committee members.

    2.1. Bagging (Breiman, 1996)

    One of the important methods of manipulating the data set and

    creates M training sets is bootstrap aggregation or bagging. The

    basic idea of bagging is to generate a collection of experts, such

    that every expert uses bootstrap training data set. Given a data set,X=(x1,x2,...,xn), bootstrap sampling means to create N new data set

    X1,X2,X3,...,XNsuch that every Xi is generated by randomly picking n

    data point xi of X. It is clear, in creating Xi, some xi of X may be

    repeated and some xi may be ignored. In bagging, we repeat this

    learning algorithm to create M different training sets for M experts.

    The bagging method is designed to reduce the error variance, and it is

    very efcient in constructing a set of training data when the source

    data size is small.

    2.2. Noise injection (Raviv and Intrator, 1996)

    As mentioned before, simple bootstrap generates several training

    data using the source data, all with the same size. Efron and Tibshirani

    (1993), noted that, bootstrap can also be viewed as a method forsimulating noise inherent in thedata andthus, increase effectively the

    number of training patterns. Raviv and Intrator (1996), presented

    another algorithm in a bootstrap, which is Bootstrap Ensemble with

    Noise (BEN).In this method, a variable amountof noise is added to the

    input data before using the bootstrap sampling to ensemble training

    sets. This method can effectively reduce the variance, since the

    injection of noise increases the independence between the different

    training sets derived from the sourcedata sets. Bhatt and Helle (2002)

    have used noise injection for the construction committee members.

    2.3. Cross-validation (Krogh and Vedelsby, 1995)

    In this method the available data set is partitioned into M disjoint

    andequalsubsets.Afterthat we select oneof these subsets as a test set

    and the (M-1) remaining data set as the training data. After M time

    repeating, we have M numbers of overlapping training sets and M

    independent test sets. Since the training sets are different, then the

    generated errors after training are expected to fall in different local

    error minima and therefore lead to different results. The performance

    of experts is measured on the corresponding test data set. Breiman,

    Friedman, Olshen and Stone, use cross-validation to prune classica-

    tion tree algorithms.

    2.4. Stacking (Wolpert, 1992)

    The rst part of the stacking method is similar to the cross-

    validation method. As mentioned above,there is an M training set and

    an M test set. After that we use the M training sets to train two

    generalizersG1,G2and then the M test set is put intoG1andG2, (these

    outputs will be used as second space generalizer inputs). The output

    ofG1 and G2andtarget value, (g1i,g2i,yi) will beusedas the training set

    of generalizerG as a second space generalizer.

    2.5. Boosting byltering (Schapire, 1990), AdaBoost (Freundand Schapire)

    In this method, there are three experts. The rst expert is trained

    with the M training data of the source training data set and the result

    of the rst expert will be applied to the second expert. After that, the

    second expert will be trained with this data set. After training the

    second expert, the training data of the sourcedata will be passedfrom

    the rst and second experts. Finally the third expert will be trained

    onlyon the data set inwhich the output of the rstand secondexperts

    is disagreed. That means, if there are disagreements between the rst

    and second experts on a certain data, this data will be passed to the

    third expert. The nal result is related to the outputs of the three

    experts. Freund and Schapire (1995); Drucker et al. (1994), have

    shown that the boosting algorithm is very effective in many

    experiences. Another method of boosting is adaptive boosting. In

    this method, the training data will be selected with their probability.

    For every data, the predicted value is close to the target value and the

    probability to choose this data is low, otherwise the probability is

    high. This methodgivesmore chances to such data forretraining.For aclassication problem, we canuse majority votingand fora regression

    problem the result with lowest error rate is selected. AdaBoost is

    sensitive to noisy data and outliers, but it is less sensitive to the over

    tting for most learning algorithms.

    3. Combination methods

    Thelast stage of designCommittee Machine (CM) is thecombination

    of the expert outputs. Many investigations have been done to nd the

    combining methods to combine the expert outputs and produce the

    nal outputs. In this section, we have introduced some traditional

    combining methods in the CM. Some of them are suitable for the

    classier and some of them performed well in regression.

    3.1. Simple averaging (Lincoln and Skrzypek, 1990)

    One of the most frequently used combination methods is simple

    averaging. In this method after training the committee members, the

    nal output canbe obtained by averaging the output of thecommittee

    members. It is easy to illustrate by Cauchy's inequality which the

    Mean Square Error (MSE) for committee machine with the simple

    averaging method is less or equal than the average of MSE for every

    expert. This method is more useful when the variances of the

    ensemble members are different, because the simple averaging can

    reduce the variance of the nets. The disadvantage of simple averaging

    is the equal weight for every committee member, i.e. there is no

    difference between the weights of two committee members with low

    and high generalizations.

    NN 1

    NN 2Input

    X(n)

    Output

    Y(n)

    NN k

    Combin

    )(

    )(

    )(

    2

    1

    ny

    ny

    ny

    k

    Fig. 1.Committee neural network with k members.

    218 S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223

  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    3/7

    3.2. Weighted averaging (Jacobs, 1995)

    In this method, every committee member has a suitable weight

    related to their ability to generalization. InJacobs (1995) the researcher

    introduced a gating method to determine the weight of every expert.

    The authors Opitz andShavlik (1996)have usedGenetic Algorithm (GA)

    to determinethe weight of each member. To obtain theoptimal weights

    for combining with the GA algorithm, the tness function is dened as

    below:

    MSEGA = n

    i = 1

    1

    n w1y1i + w2y2i + :::+ wkykiTi

    2;

    k

    i = 1wi = 1 1

    where, y1i is the output of the rst network on the ith input or ith

    training pattern, w i is the weight of the ith member, Ti is the target

    value of thei-thinput, andn is the number of training data.

    3.3. Majority voting (Hansen and Salamon, 1990)

    This combination method is most popular for classication

    problems. If more than half of the individuals vote for a prediction,

    majority votingwill selectthis predictionto be thenal output.Majority

    voting ignores the fact that some networks that lie in a minority

    sometimescan produce the correct results. At this stage of combination,

    it ignores the existenceof diversity that is the motivation forensembles.

    3.4. Ranking (Ho et al., 1994; Al-Ghoneim and Kumar, 1996)

    This method uses experimental results obtained by a set of experts

    on a set of dataset to generate a ranking of those experts (each expert

    has a rank related with an input dataset). After that the results of the

    ranks of each expert will be calculated by some methods such as

    average rank, success rate ratio, and signicant wins to generate nal

    ranking for experts. The nal rank can be used to select one or more

    suitable experts for a test (unseen) data (Brazdil and Soares, 2000).

    There are no unique criteria on the selection of the mentioned

    combination methods. The choice mainly depends on the characteristic

    of the particular application that we have in hand, e.g. the nature of theapplication (classier or regression), the size and quality of thetraining

    data and the generated errors on the region of the input space. Using

    thesame combination methodon an ensemble fora regression problem

    may generate good results. However, it may not work on a classication

    problem and vice versa. Much work has been done to introduce

    combining method in ensemble approaches. Major contribution in

    ranking is as the weighted majority voting (Kuncheva, 2004), decision

    templates (Kuncheva et al., 2001), naive Bayesian fusion (Xu et al.,

    1992), Dumpster Shafer combination (Ahmadzadeh and Petrou, 2003)

    and Fuzzy integral (Cho and Kim, 1995).

    4. Fuzzy genetic algorithm (FGA) for combining

    Genetic Algorithm (GA) is a search optimization technique thatmimics some of the processes of natural selection and evolution. In

    optimization, when a GA fails tond the global optimum, the problem

    is often credited to premature convergence, which means that the

    sampling process converges on a local optimum rather than the global

    optimum. Sexual selection by means of female preferences has

    promoted the evolution of complex male ornaments in many animal

    groups. A sex-determination system is a biological system that

    determines the expansion of sexual characteristics in an organism.

    Most sexual organisms have two sexes. In many cases, sex determina-

    tion is genetic: males and females have different allelesor even different

    genes that state their sexual morphology. In a classical GA, chromo-

    somes reproduce asexually: any two chromosomes may be parents in

    crossover. Gender division and sexual selection inspired a model of

    gendered GA in which crossover takes place only between chromo-

    somes of an opposite sex. In this study, a relation between the age and

    tness as in biological systems affecting the selection procedure is

    proposed. A bi-linear allocation lifetime approach is used to label the

    chromosomes based on their tness value, which will then be used to

    characterize the diversityof the population. Inspired by the non-genetic

    sex-determination system that exists in some species of reptiles,

    including alligators and some turtles where sex is determined by the

    temperature at which the egg is incubated, we divided the population

    into two groups, male and female, so that the male and female can beselected in an alternate way. In each generation, the layout of the

    selection of male andfemale is different. Duringthe sexual selection, the

    male chromosome is selected randomly. The label of the selected male

    chromosome and the population diversity of the previous generation

    are then applied within a set of fuzzy rules to select a suitable female

    chromosome (Jalali and Lee, 2009). Fuzzy systems are encountered in

    numerous areas of application. Fuzzy rules, for example, viewed as a

    generic mechanism of grainy knowledge representation, are positioned

    in the center of the knowledge-based systems. A fuzzy IF-THEN rule

    consists of an IF part (antecedent) and a THEN part (consequent).

    The antecedent is a combination of terms, whereas the consequent is

    exactly oneterm. In theantecedent, thetermscan be combined by using

    fuzzy conjunction, disjunction and negation. A term is an expression

    of the form: X=T, where X is a linguistic variable and T is one of its

    linguistic terms. In this paper, we use a linguistic variable age for

    chromosomes.Fig. 2describes the linguistic variable age where Infant,

    Teenager, Adult and Elderly are the linguistic values.

    The system applied in our study uses triangular membership

    functions, the (minimum) intersection operator and correlation-

    product inference procedure. Defuzzication of the outputs was

    performed using the fuzzy centroid method described by Kosko

    (1992). To nd the membership function, we use the tness value of

    each chromosome and the minimum, maximum and average tness

    values of the population in each generation. Each chromosome has

    its own label determined by the age function. Let

    = fifminfavrfmin

    ; = fifavrfmaxfavr

    ; = favrfi 2

    wherefi=tness value of chromosomei,favr=averagetness value,

    fmin=minimum tness value, and fmax=maximum tness value of

    population. Then the age function is:

    age ci =

    L+

    n ; 0

    +

    n ; b0

    8>>>:

    3

    or

    age ci =

    U L +

    n ; 0

    U +

    n ; b 0

    8>>>:4

    where, ci=Chromosome i, L =Minimum age, U=maximum age,

    n =Population size, =UL

    2 ,=

    U+ L

    2 , and, , are dened in

    Eq.(2).

    Linguistic variable

    Syntactic rules

    Linguistic terms

    Age

    Infant Teenager Adult Elder ly

    Fig. 2.The linguistic variable

    age

    .

    219S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223

  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    4/7

    Eq.(3)is suited for maximization problems which relate to higher

    tness values while Eq. (4) is suited for minimization problems which

    relate to lower tness values. This idea is inspired by the idea of

    the lifetime proposed in Arabas et al. (1994). The fuzzication

    interface denes the possibilities of the four linguistic values for each

    chromosome: {Infant, Teenager, Adult, and Elderly}. These values

    determine the degree of truth for each rule premise. This computation

    takes into account all chromosomes in each generation and relies on

    the triangular membership functions shown inFig. 3, withL =2 and

    U=10. A bi-linear allocation lifetime approach proposed in Kosko

    (1992) is used to label the chromosomes based on their tness

    value which will then be used to characterize the diversity of the

    population.

    D ci = L+ ; 0+ ; b 0

    :

    5

    Let=label of half of the population, then the population can be

    divided into four levels, Very Low, Low, Medium and High diversity as

    follows:

    PopulationDiversity

    =

    High L+ tMedium L+ tb L+ t+ 1

    Low L+ t+ 1b L+ t+ 2Very Low NL + t+ 2

    8>>>:

    6

    where, t= L+ U

    n

    is a parameterthat hasa correlation with the

    domain of labels in the population and = n

    10h i, (where [x] means

    nearest integer number to x, for example [2.3]=2 and[2.8]=3). This

    computation is performed in every generation and relies on the

    triangular membership functions shown in Fig. 4. The inputs are

    combined logically using the AND operator to produce output

    response values for all expected inputs. A ring strength for each

    output membership function is computed. All that remains is to

    combine these logical sums in a defuzzication process to produce the

    crisp output. The fuzzy outputs for all rules are nally aggregated to

    one fuzzy set. To obtain a crisp decision from this fuzzy output, we

    have to defuzzify the fuzzy set. Defuzzication of the outputs was

    performed using the fuzzy centroid method of the ring behavior

    (Kosko, 1992), which may show that some of the rules are

    unnecessary. The number of fuzzy rules in its rule base is 16. Table 1

    liststhe fuzzy rules for selecting the female chromosome. Although we

    can obtain theFage, we may not be able to nd a female chromosome

    that has the exact Fage. We will select a female chromosome having

    the nearesttness value toFageto be the parent. In case there is more

    than one female chromosome which satises the Fage condition, we

    will choose a female chromosome with the highest tness value to

    be the parent. This technique is called the Complement Method (Jalali

    and Lee, 2009).

    5. Case study

    In this section, we used a data set from oil wells in Iran. First

    several crossplots were generated between well log data and core

    permeability to nding, which log has a good relationship with

    permeability. With this method, we found a logical relationship

    between ve inputs including Sonic transit time (DT), Neutron log

    (NPHI), Density log (RHOB), Gamma Ray (GA), and True Formation

    Resistivity (Rt) and rock permeability (K) as a target respectively. The

    total of the data points divided randomly into three parts, sixty

    percent for training, twenty percent forvalidation and twenty percent

    for test. Five training algorithms of back propagation neural network

    are selected as committee members. They were Levenberg Marquardt

    (LM), Bayesian Regularization (BR), One Step Secant (OSS), Resilient

    Back Propagation (RP), and Scaled Conjugate Gradient (SCG). As

    mentioned above we used ve wireline logs as input data and a core

    permeability as output data for analysis of our combining methods.

    A brief description of this data set is provided here.

    Sonic log(DT): The sonic tool measures the time required for the

    transmission of an acoustic wave through a unit of formation thickness.

    Sonic transit time (DT) is used bothin theporosity determinationand to

    compute secondary porosity in carbonate reservoirs (Service, 1999).Neutron log(NPHI): A radioactivity well log is used to determine

    formation porosity. The logging tool bombards the formation with

    neutrons. When the neutrons strike the hydrogen atoms in water

    or oil, gamma rays are released. Since water or oil exists only in

    pore spaces, a measurement of the gamma rays indicates formation

    porosity. See radioactivity well logging (Service, 1999).

    Density log: (RHOB): A special radioactivity log for open-hole

    surveying responds to variations in the specic gravity of formations.

    It is a contact log (i.e., the logging tool is held against the wall of the

    hole). It emits neutrons and then measures the secondary gamma

    radiation that is scattered back to the detector in the instrument. The

    density log is an excellent porosity-measure device, especially for

    shaley sands (Service, 1999).

    Gamma Ray(GR): A type of radioactivity well log records naturalradioactivity around the wellbore. Shales generally produce higher

    levels of gamma radiation and can be detected and studied with the

    gamma ray tool. See radioactivity well logging (Service, 1999).

    True formation resistivity (Rt): With reference to log analysis, the

    resistivity of the undisturbed formation. It is derived from a resistivity

    log that has been corrected as far as possible for all environmental

    effects, such as borehole, invasion and surrounding bad effects. Hence,

    it is taken as the true resistivity of the undisturbed formation in situ

    and is called Rt. With reference to the core analysis, the resistivity

    1

    00.45 0.650.25

    Infant Teenager Adult Elderly

    0.85(ci)age

    Fig. 3.The age linguistic variable for male and female.

    1

    04.52.5 6.5

    High Medium Low Very Low

    8.5D(ci)

    Fig. 4.The population diversity linguistic variable.

    Table 1

    Fuzzy rules for selecting female chromosome.

    Male age

    (Mage)

    Diversity Female age

    (Fage)

    Male age

    (Mage)

    Diversity Female age

    (Fage)

    Infant High Elderly or adul t Adul t High Elderly or adult

    Medium Adult or teenager Medium Adult or teenager

    Low Teenager or infant Low Teenager or infant

    Very low Infant Very low Infant

    Teenager High Elderly or adult Elderly High Adult or teenager

    Medium Adult or teenager Medium Teenageror infantLow Teenager or infant Low Infant

    Very low Infant Very low Infant

    Male Female Randomly numbers

    Fig. 5.The technique for two points cut in offspring.

    220 S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223

  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    5/7

  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    6/7

    Mutation is performed in four steps:

    (1) A random real number from an interval (0, 1) is generated for

    the probability of mutation.

    (2) GA considers thisprobability andsome chromosomesare selected.

    (3) For eachchromosome that is selected, a random natural number

    k, varying from 1 to the number of genes in the chromosome is

    generated.

    (4) The gene numberk is replaced by another randomly-generatedgene.

    A standard GA is used in this experiment with a population size

    of 20, and the total length of the chromosomes is 85 bits. Crossover

    probability,pc=0.50 and mutation probability,pm=0.02. Each test

    function is tested on the GA for 30 times with a maximum of 5000

    generations per each run.Fig. 6 shows the variable of female's age

    when male's age and diversity are changing. Fig. 7shows when the

    male's age chromosomes are increasing, the system for female's age

    chromosome is considering decreasing age. Fig. 8 shows that when

    the diversity of population is increasing, the system for female's agechromosomes is considering the age decrement. This technique will

    Fig. 9.(a

    f). Crossplot showing R between core and predicted permeability using ve training algorithms and FGA.

    222 S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223

    http://localhost/var/www/apps/conversion/tmp/scratch_4/image%20of%20Fig.%E0%B9%80
  • 7/25/2019 Committee neural networks with fuzzy genetic algorithm.pdf

    7/7

    maintain the diversity of the population and then GA cannot converge

    very soon, and premature convergence will be avoided. Fig. 9(af)

    shows the correlation coefcient between the core and predicted

    permeabilities from ve training algorithms and FGA methods.

    Table 2, shows both MSE and R2 values for overall data points using

    ve training algorithms, GA and weighted averaging (FGA). This

    table helps us to decide, which combining model is better in its

    performance. A good combining scheme should have a higher R2 and

    lowest MSE.

    6. Conclusion

    There are different ways of combining the intelligent system's

    outputs in the combiner in the committee neural network, one of

    these methods is Genetic Algorithm (GA). The failure to nd good

    results when we use GA in a CM is highly due to premature

    convergence. The population diversity in GA is an important

    parameter on premature convergence. A technique for controlling

    the population diversity using Fuzzy rules and sexual selection is

    proposed in this paper. In conclusion, the female choiceby Fuzzy logic

    is a suitable way for improving the performance of GAs in keeping

    with the diversity of the population and premature convergence can

    be eliminated. In this paper, we used the FGA methods for combining

    the output of experts to the prediction of permeability in the oil

    industry. From the simulation results, the correlation coefcient and

    MSE forve training algorithms are shown inTable 2.T he R2 and MSE

    for GA combining method are 0.8438 and 0.001 respectively, whichare better than all training algorithms. With applying FGA method to

    combining, the correlation coefcient and MSE are improved, which

    are 0.8523 and 0.00092 respectively.

    References

    Ahmadzadeh, M.R., Petrou, M., 2003. Use of DempsterShafer theory to combineclassiers which use different class boundaries. Pattern Anal. Appl. 6 (1), 4146.

    Al-Ghoneim, K.A., Kumar, B.V., 1996. Combining neural networks using the rankinggure of merit. Proc. SPIE 2760, 213.

    Arabas, J., Michalewicz, Z., et al., 1994. GAVaPSa genetic algorithm with varyingpopulation size. Evolutionary Computation, 1994: IEEE World Congress onComputational Intelligence, vol.1, pp. 7378.

    Bhatt, A., Helle, H.B., 2002. Committee neural networks for porosity and permeabilityprediction from well logs. Geophys. Prospect. 50 (6), 645660.

    Brazdil, P., Soares, C., 2000. A Comparison of Ranking Methods for ClassicationAlgorithm Selection, pp. 6375.

    Breiman, L., 1996. Bagging predictors. Mach. Learn. 24 (2), 123140.Chen, C.-H., Lin, Z.-S., 2006. A committee machine with empirical formulas for

    permeability prediction. Comput. Geosci. 32 (4), 485496.Cho, S.-B., Kim, J.H., 1995. An HMM/MLP architecture for sequence recognition. Neural

    Comput. 7 (2), 358

    369.Drucker, H., Cortes, C., et al., 1994. Boosting and other ensemble methods. NeuralComput. 6 (6), 12891301.

    Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. [u.a.] Chapman & Hall,New York. 1993.

    Freund, Y., Schapire, R., 1995. A decision-theoretic generalization of on-line learningand an application to boosting. European Conference on Computational LearningTheory, pp. 2337.

    Hansen, L.K., Salamon, P., 1990. Neural network ensembles. IEEE Trans. Pattern Anal.Mach. Intell. 12 (10), 9931001.

    Haykin, S., 1999. Neural NetworksA Comprehensive Foundation, Upper Saddle River.Prentice-Hall, NJ.

    Ho, T.K., Hull, J.J., et al., 1994. Decision combination in multiple classi er systems. IEEETrans. Pattern Anal. Mach. Intell. 16 (1), 6675.

    Jacobs, R.A., 1995. Methods for combining experts' probability assessments. NeuralComput. 7 (5), 867888.

    Jalali, M., Lee, L.S., 2009. Fuzzy genetic algorithm with sexual selection (FGASS). SecondInt. Conf. and Workshop on Basic and Applied Science, 24 June, Johor Bahru,Malaysia.

    Kadkhodaie-Ilkhchi, A., Rahimpour-Bonab, H., et al., 2009a. A committee machinewith intelligent systems for estimation of total organic carbon content frompetrophysical data: an example from Kangan and Dalan reservoirs in South ParsGas Field, Iran. Comput. Geosci. 35 (3), 459474.

    Kadkhodaie-Ilkhchi, A., Rezaee, M.R., et al., 2009b. A committee neural network forprediction of normalized oil content from well log data: an example from SouthPars Gas Field, Persian Gulf. J. Petrol. Sci. Eng. 65 (12), 2332.

    Kosko, B., 1992. Neural Networks and Fuzzy Systems: A Dynamical Systems Approachto Machine Intelligence. Prentice-Hall, Inc, p. 449.

    Krogh, A., Vedelsby, J., 1995. Neural network ensembles, cross validation, and activelearning. Adv. Neural Inf. Process. Syst. 7, 231238.

    Kuncheva, L.I., 2004. Classier Ensembles for Changing Environments, pp. 115.Kuncheva, L.I., Bezdek, J.C., et al., 2001. Decision templates for multiple classier fusion:

    an experimental comparison. Pattern Recognit. 34 (2), 299314.Lincoln, W., Skrzypek, J., 1990. Synergy of clustering multiple back propagation

    networks. Adv. Neural Inf. Process. systems 2, 650657.Naftaly, U., Intrator, N., Horn, D., 1997.Optimal ensembleaveraging of neural networks.

    Network 8, 283296.Opitz, D.W., Shavlik, J.W., 1996. Actively searching for an effective neural network

    ensemble. Connect. Sci. 8 (3), 337354.Raviv, Y., Intrator, N., 1996. Bootstrapping with noise: an effective regularization

    technique. Connect. Sci. 8 (3), 355372.Rezaee, M.R., 2001. Petroleum Geology. Alavi Publications, Tehran, Iran.Schapire, R.E., 1990. The strength of weak learnability. Mach. Learn. 5 (2), 197227.Service, U. o. T. a. A. P. E., 1999. A Dictionary for the Petroleum Industry. Petroleum

    Extension Service.Wolpert, D.H., 1992. Stacked generalization. Neural Netw. 5, 241259.Xu, L., Krzyzak, A., et al., 1992. Methods of combining multiple classi ers and their

    applications to handwriting recognition. Syst. Man Cybern. IEEE Trans. 22 (3),418435.

    Table 2

    The comparison of MSE and R2 for test data using ve training algorithm, GA and FGA.

    Algorithm R 2 MSE

    LM 0.8274 0.0012

    BR 0.8239 0.0012

    OSS 0.7257 0.0015

    RP 0.751 0.0016

    SCG 0.7885 0.0015

    GA 0.8438 0.001

    FGA 0.8523 0.00092

    223S.A. Jafari et al. / Journal of Petroleum Science and Engineering 76 (2011) 217223