final thesis pk12

Upload: suman-pradhan

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Final Thesis Pk12

    1/97

    CHAPTER-1

    INTRODUCTION:

    1.1 BACK GROUND:

    Outof many applications of adaptive filtering, direct modeling and inverse modeling are very

    important. The direct modeling or system identification finds applications in control system

    engineering including robotics [1], intelligent sensor design [2], process control [3], powersystem engineering [4], image and speech processing [4], geophysics [5], acoustic noise and

    vibration control [6] and biomedical engineering [7]. Similarly inverse modeling technique is

    used in digital data reconstruction [8], channel equalization in digital communication [9], digital

    magnetic data recording [10], and intelligent sensor [2], deconvolution of seismic data [11]. The

    direct modeling mainly refers to adaptive identification of unknown plants. Simple static linear

    plants are easily identified through parameter estimation using conventional derivative based

    least mean square (LMS) type algorithms [12]. But most of the practical plants are dynamic,

    nonlinear and combination of these two characteristics. In many applications Hammerstein and

    MIMO plants need identification. In addition the output of the plant is associated with

    measurement or additive white Gaussian noise (AWGN). Identification of such complex plants

    is a difficult task and poses many challenging problems. Similarly inverse modeling of

    telecommunication and magnetic medium channels are also important for reducing the effect of

    inter symbol interference (ISI) and achieving faithful reconstruction of original data. Similarly

    adaptive inverse modeling of sensors is required to extend their linearity's for direct digital

    readout and enhancement of dynamic range. These two important and complex issues are

    addressed in the thesis and attempts have been made to provide improved efficient and alternate

    promising solutions.

    [1]

  • 7/31/2019 Final Thesis Pk12

    2/97

    The conventional LMS and recursive least square (RLS) [13] techniques work well for

    identification of static plants but when the plants are of dynamic type, the existing forward-

    backward LMS [14] and the RLS algorithms very often lead to non optimal solution due to

    premature convergence of weights to local minima [15]. This is a major drawback of the use of

    existing derivative based techniques. To alleviate this burning issue this thesis suggests the use

    of derivative free optimization techniques in place of conventional techniques.

    In recent past population based optimization techniques have been reported which fall under the

    category of evolutionary computing [16] or computational intelligence [17]. These are also

    called bio-inspired techniques which include genetic algorithm (GA) and its variants [18],

    Differential Evolution [19]. These techniques are suitably employed to obtain efficient iterative

    learning algorithms for developing adaptive direct and inverse models of complex plants and

    channels.Development of direct and inverse adaptive models essentially consists of two components. The

    first component is an adaptive network which may be linear or nonlinear in nature. Use of a

    nonlinear network is preferable when nonlinear plants or channels are to be identified or

    equalized. The linear networks used in the thesis are adaptive linear combiner or all-zero or FIR

    structure [7]under nonlinear category GA and DE are used.

    1.2 MOTIVATIONIn summary the main motivations of the research work carried in the present thesis are the

    following:

    i. To formulate the direct and inverse modeling problems as error square optimization

    problems

    ii. To introduce bio-inspired optimization tools such as GA and DE and their variants to

    efficiently minimize the squared error cost function of the models. In other words todevelop alternate identification scheme.

    iii. To achieve improved identification (direct modeling) of complex nonlinear and channel

    equalization (inverse modeling) of nonlinear noisy digital channels by introducing new

    and improved updating algorithms.

    [2]

  • 7/31/2019 Final Thesis Pk12

    3/97

    1.3 MAJOR CONTRIBUTION OF THE THESIS

    The major contribution of the thesis is outlined below

    i. The GA based approach for both linear and nonlinear system identifications are

    introduced. The GA based approach is found to be more efficient for nonlinear system

    than other standard derivative based learning. In addition the DE based identification

    have been proposed and shown to have better performance and involve less

    computational complexity.

    ii. The GA based approach for linear and nonlinear channel equalizations are introduced.

    The GA based approach is found to be more efficient than other standard derivative

    based learning. In addition DE based equalizers have been proposed and shown to have

    better performance and involve less computational complexity.

    1.4 CHAPTER WISE CONTRIBUTION

    The research work undertaken is embodied in 7 Chapters.

    Chapter-1 gives an introduction to System identification, channel equalization and reviews

    of various learning algorithm such as Least-mean-square (LMS) algorithm, Recursive-

    least-square (RLS) algorithm, Artificial Neural Network (ANN), Genetic Algorithm (GA),

    Differential Evolution (DE) used to identify the system and train to the equalizer. It also

    includes the motivation behind undertaking the thesis work.

    Chapter-2 Discusses about the general form of adaptive algorithm, Adaptive filtering

    problem, derivative based algorithm such as LMS and overview of derivative free basedalgorithm such as Genetic Algorithm and Differential Evolution.

    Chapter-3 Discusses various system identification technique, Develop the algorithm of GA

    for simulation on system identification and taking a comparison study between LMS and

    GA on both linear and nonlinear system.

    [3]

  • 7/31/2019 Final Thesis Pk12

    4/97

    Chapter-4 Discusses various channel equalization technique, Develop the algorithm of GA

    for simulation on channel equalization and taking a comparison between LMS and GA on

    both linear and nonlinear channel.

    Chapter-5 Develop the algorithm of DE for simulation on system identification and taking

    a comparison between LMS, GA and DE on both linear and nonlinear system.

    Chapter-6 Develop the algorithm of DE for simulation on channel equalization and taking a

    comparison between LMS, GA and DE on both linear and nonlinear channel equalizers.

    Chapter-7 deals with the conclusion of the investigation made in the thesis. This chapter

    also suggests some future research related to the topic.

    [4]

  • 7/31/2019 Final Thesis Pk12

    5/97

    CHAPTER-2GENETIC ALGORITHM AND

    DIFFERENTIAL EVOLUTION

    2.1 INTRODUCTION:

    There are many learning algorithms which are employed to train various adaptive models. The

    performance of these models depends on rate of convergence, training time, Computational

    complexity involved and minimum mean square error achieved after training. The learning

    algorithms may be broadly classified into two categories (a) derivative based (b) derivative free.

    The derivative based algorithms include least means square (LMS), IIR LMS (ILMS), back

    propagation (BP) and FLANN-LMS. Under the derivative free algorithms, genetic algorithm

    (GA), differential evolution (DE), particle swarm optimization (PSO), bacterial foraging

    optimization (BFO) and artificial immune system (AIS) have been employed. In this section the

    details of LMS, GA and DE algorithms are outlined in sequel.

    2.2 GRADIENT BASED ADAPTIVE ALGORITHIM:An adaptive algorithm is a procedure for adjusting the parameters of an adaptive filter to

    minimize a cost function chosen for the task at hand. In this section, we describe the general

    form of many adaptive FIR filtering algorithms and present a simple derivation of the LMS

    adaptive algorithm. In our discussion, we only consider an adaptive FIR filter structure. Such

    systems are currently more popular than adaptive IIR filters because

    1. The input-output stability of the FIR filter structure is guaranteed for any set of fixed

    coefficients, and2. The algorithms for adjusting the coefficients of FIR filters are simpler in general than

    those for adjusting the coefficients of IIR filters.

    [5]

  • 7/31/2019 Final Thesis Pk12

    6/97

    2.2.1 GENERAL FORM OF ADAPTIVE FIR ALGORITHMS:

    The general form of an adaptive FIR filtering algorithm is

    W(n+1)=W(n) + (n)G(e(n),X(n),(n)) (2.1)

    Where G(-) is a particular vector-valued nonlinear function, (n) is a step sizeparameter, e(n)

    and X(n) are the error signal and input signal vector, respectively, and (n) is a vector of states

    that store pertinent information about the characteristics of the input and error signals and/or the

    coefficients at previous time instants. In the simplest algorithms, (n) is not used, and the only

    information needed to adjust the coefficients at time n are the error signal, input signal vector,

    and step size.

    The step size is so called because it determines the magnitude of the change or step that is

    taken by the algorithm in iteratively determining a useful coefficient vector. Much research

    effort has been spent characterizing the role that (n) plays in the performance of adaptive

    filters in terms of the statistical or frequency characteristics of the input and desired response

    signals. Often, success or failure of an adaptive filtering application depends on how the value

    of(n) is chosen or calculated to obtain the best performance from the adaptive filter.

    2.2.2 THE MEAN-SQUARED ERROR COST FUNCTION:

    The form ofG(-) depends on the cost function chosen for the given adaptive filtering task. We

    now consider one particular cost function that yields a popular adaptive algorithm. Define the

    mean-squared error(MSE) cost function as

    ( ) ( )21( ) ( ) ( )2mse nJ n e n p e n de n+

    = (2.2)

    = ( )212Ee n (2.3)

    [6]

  • 7/31/2019 Final Thesis Pk12

    7/97

    Wherepn(e) represents the probability density function of the error at time n. and E- is short

    hand for the expectation integral on the right hand side (2.3).

    The MSE cost function is useful for adaptive FIR filters because

    Jmse(n) has a well-defined minimum with respect to the parameters in W(n)

    The coefficient values obtained at this minimum are the ones that minimize the power in

    the error signal e(n), indicating that y(n) has approached d(n) and

    Jmse(n) a smooth function of each of the parameters in W(n), such that it is differentiable

    with respect to each of the parameters in W(n).

    The third point is important in that it enables us to determine both the optimum coefficient

    values given knowledge of the statistics of d(n) and x(w) as well as a simple iterative procedure

    for adjusting the parameters of an FIR filter.

    2.2.3 THE WIENER SOLUTION:

    For the FIR filter structure, the coefficient values in W(n) that minimizeJM SE (n) are well-

    defined if the statistics of the input and desired response signals are known. The formulation of

    this problem for continuous-time signals and the resulting solution was first derived by Wiener.

    Hence, this optimum coefficient vector WM SE (n) is often called the Wiener solution to the

    adaptive filtering problem. The extension of Wieners analysis to the discrete-time case is

    attributed to Levinson . To determine WM SE (n) we note that the function JM SE (n) in is quadraticin the parameters{wi(n)}, and the function is also differentiable. Thus, we can use a result from

    optimization theory that states that the derivatives of a smooth cost function with respect to each

    of the parameters is zero at a minimizing point on the cost function error surface. Thus,WM SE

    (n) can be found from the solution to the system of equation

    ( ) 0.0 ( ) ( )

    ( )

    mse

    i

    J ni L

    w n

    =

    (2.4)

    Taking derivative ofJmse(n) in (2.3) and noting that e(n)=d(n) - y(n) and

    ( ) ( )1

    0

    ( ) ( ) ( ) TL

    ii

    w n X ny n w n x n i

    =

    == respectively, we obtain

    [7]

  • 7/31/2019 Final Thesis Pk12

    8/97

    ( ) ( ( ))( )( ) ( )

    mse

    i i

    J n e nE e n

    w n w n

    = (2.5)

    ( )( ) ( )iy nEe nw n

    = (2.6)

    [ ( ) ( )]E e n x n i= (2.7)

    1

    0( ) ( ) ( ) ( ) ( )

    L

    jj

    E d n x n i E x n i x n j w n

    =

    = (2.8)

    Where we have used the dentitions of e(n) and of y(n) for the FIR filter structure in and

    respectively to expand the last result in (2.8).

    By defining the matrixRXX (n) and vectorPdx (n) as

    ( ) ( )TXX

    R E X n X n = (2.9)

    ( ) ( ) ( )dx

    P n E d n X n = (2.10)

    respectively, we can combine the above equations to obtain the system of equations in vector

    form as

    ( ) ( ) ( ) 0XX MSE dxR n W n P n = (2.11)Where 0 is the zero vector. Thus, so long as the matrix RXX (n) is invertible, the optimum

    Wiener solution vector for this problem is

    [8]

  • 7/31/2019 Final Thesis Pk12

    9/97

    1( ) ( ) ( )XXMSE dx

    W n R n P n= (2.12)

    2.2.4THEMETHODOFSTEEPESTDESCENT:

    The method of steepest descent is a celebrated optimization procedure for minimizing the value

    of a cost function J(n) with respect to a set of adjustable pa-rameters W(n). This procedure

    adjusts each parameter of the system according to

    ( )( 1) ( ) ( )( )i i i

    J nw n w n n

    w n

    + = (2.13)

    In other words, the Ithparameter of the system is altered according to the derivative of the cost

    function with respect to theIthparameter. Collecting these equations in vector form, we have

    ( )( 1) ( ) ( )( )

    J nW n W n n

    W n

    + = (2.14)

    Where( )

    ( )

    J n

    W n

    be the vector form of( )

    ( )i

    J n

    w n

    For an FIR adaptive filter that minimizes the MSE cost function, we can use the result in toexplicitly give the form of the steepest descent procedure in this problem. Substituting these

    results into yields the update equation for W(n) as

    ( )( 1) ( ) ( ) ( ) ( ) ( )XXdxW n W n n P n R n W n+ = + (2.15)

    However, this steepest descent procedure depends on the statistical quantities E{d(n)x(n i)}

    andE{x(n i)x(n j)} contained inPdx(n) andRxx(n) respectively. In practice, we only have

    measurements of both d(n) and x(n) to be used within the adaptation procedure. While suitable

    estimates of the statistical quantities needed for (2.15) could be determined from the signals x(n)

    and d(n) we instead develop an approximate version of the method of steepest descent that

    depends on the signal values themselves. This procedure is known as theLMSalgorithm.

    [9]

  • 7/31/2019 Final Thesis Pk12

    10/97

    2.2.5 THE LMS ALGORITHM:

    The cost function J(n) chosen for the steepest descent algorithm of determines the coefficient

    solution obtained by the adaptive filter. If the MSE cost function in is chosen, the resultingalgorithm depends on the statistics of x(n) and d(n) because of the expectation operation that

    defines this cost function. Since we typically only have measurements of d(n) and of x(n)

    available to us, we substitute an alternative cost function that depends only on these

    measurements.

    One such cost function is the least-squares cost function given by

    ( )2

    0( ) ( ) ( ) ( ) ( )

    n

    TLMSk

    j n k d k W n X k== (2.16)

    Where ( )ka is a suitable weighting sequence for the terms within the summation. This cost

    function, however, is complicated by the fact that it requires numerous computations to

    calculate its value as well as its derivatives with respect to each W(n), although efficient

    recursive methods for its minimization can be developed. Alternatively, we can propose the

    simplified cost function JLM S(n ) Given by

    21( ) ( )2LMS

    J n e n= (2.17)

    function can be thought of as an instantaneous estimate of the MSE cost function, as

    JMSE(n)=EJLMS(n). Although it might not appear to be useful, the resulting algorithm obtained

    when JLMS (n) is used for J(n) in (2.13) is extremely useful for practical applications. Taking

    derivatives of JLMS (n) with respect to the elements of W(n) and substituting the result into(2.13), we obtain the LMS adaptive algorithm given by

    ( 1) ( ) ( ) ( ) ( )W n W n n e n X n+ = + (2.18)

    [10]

  • 7/31/2019 Final Thesis Pk12

    11/97

    Note that this algorithm is of the general form in. It also requires only multiplications and

    additions to implement. In fact, the number and type of operations needed for the LMS

    algorithm is nearly the same as that of the FIR filter structure with fixed coefficient values,

    which is one of the reasons for the algorithms popularity. The behavior of the LMS algorithm

    has been widely studied, and numerous results concerning its adaptation characteristics under

    different situations have been developed. For now, we indicate its useful behavior by noting that

    the solution obtained by the LMS algorithm near its convergent point is related to the Wiener

    solution. In fact, analyses of the LMS algorithm under certain statistical assumptions about the

    input and desired response signals show that

    lim [ ( )] MSEn E W n W = (2.19)

    When the Wiener solution WM SE (n) is a fixed vector. Moreover, the average behavior of the

    LMS algorithm is quite similar to that of the steepest descent algorithm in that depends

    explicitly on the statistics of the input and desired response signals. In effect, the iterative nature

    of the LMS coefficient updates is a form of time-averaging that smoothes the errors in the

    instantaneous gradient calculations to obtain a more reasonable estimate of the true gradient.

    The problem is that gradient descent is a local optimization technique, which is limited because

    it is unable to converge to the global optimum on a multimodal error surface if the algorithm is

    not initialized in the basin of attraction of the global optimum. Several medications' exist for

    gradient based algorithms in attempt to enable them to overcome local optima. One approach is

    to simply add noise or a momentum term to the gradient computation of the gradient descent

    algorithm to enable it to be more likely to escape from a local minimum. This approach is only

    likely to be successful when the error surface is relatively smooth with minor local minima, or

    some information can be inferred about the topology of the surface such that the additional

    gradient parameters can be assigned accordingly. Other approaches attempt to transform the

    error surface to eliminate or diminish the presence of local minima , which would ideally result

    in a unimodal error surface. The problem with these approaches is that the resulting minimum

    [11]

  • 7/31/2019 Final Thesis Pk12

    12/97

    transformed error used to update the adaptive filter can be biased from the true minimum output

    error and the algorithm may not be able to converge to the desired minimum error condition.

    These algorithms also tend to be complex, slow to converge, and may not be guaranteed to

    emerge from a local minimum. Some work has been done with regard to removing the bias of

    equation error LMS and Steiglitz-McBride adaptive IIR filters, which add further complexity

    with varying degrees of success. Another approach, attempts to locate the global optimum by

    running several LMS algorithms in parallel, initialized with different initial coefficients. The

    notion is that a larger, concurrent sampling of the error surface will increase the likelihood that

    one process will be initialized in the global optimum valley. This technique does have potential,

    but it is inefficient and may still suffer the fate of a standard gradient technique in that it will be

    unable to locate the global optimum if none of the initial estimates is located in the basin of

    attraction of the global optimum. By using a similar congregational scheme, but one in whichinformation is collectively exchanged between estimates and intelligent randomization is

    introduced, structured stochastic algorithms are able to hill-climb out of local minima. This

    enables the algorithms to achieve better, more consistent results using a fewer number of total

    estimates. These types of algorithms provide the framework for the algorithms discussed in the

    following sections.

    2.3 DERIVATIVE FREE BASED ALGORITHIM:

    Since the beginning of the nineteenth century, a significant evolution in optimization theory has

    been noticed. Classical linear programming and traditional non-linear optimization techniques

    such as Lagranges Multiplier, Bellmans principle and Pontyagrins principle were prevalent

    until this century. Unfortunately, these derivative based optimization techniques can no longer

    be used to determine the optima on rough non-linear surfaces. One solution to this problem has

    already been put forward by the evolutionary algorithms research community. Genetic

    algorithm (GA), enunciated by Holland, is one such popular algorithm. This chapter provides

    recent algorithms for evolutionary optimization known as deferential evolution (DE). Thealgorithms are inspired by biological and sociological motivations and can take care of

    optimality on rough, discontinuous and multimodal surfaces. The chapter explores several

    schemes for controlling the convergence behaviors DE by a judicious selection of their

    parameters. Special emphasis is given on the hybridizations DE algorithms with other soft

    computing tools.

    [12]

  • 7/31/2019 Final Thesis Pk12

    13/97

    2.4 GENETIC ALGORITHM:

    Genetic algorithms are a class of evolutionary computing techniques, which is a rapidly

    growing area of artificial intelligence. Genetic algorithms are inspired by Darwin's theory ofevolution. Simply said, problems are solved by an evolutionary process resulting in a best

    (fittest) solution (survivor) - in other words, the solution is evolved. Evolutionary computing

    was introduced in the 1960s by Rechenberg in his work "Evolution strategies" (Evolutions

    strategies'in original). His idea was then developed by other researchers. Genetic Algorithms

    (GAs) were invented by John Holland and developed by him and his students and colleagues .

    This led to Holland's book "Adaption in Natural and Artificial Systems" published in 1975.

    The algorithm begins with a set of solutions (represented by chromosomes) called population.

    Solutions from one population are taken and used to form a new population.

    This is motivated by a hope, that the new population will be better than the old one. Solutions

    which are then selected to form new solutions (offspring) are selected according to their fitness -

    the more suitable they are, the more chances they have to reproduce. This is repeated until some

    condition (for example number of populations or improvement of the best solution) is satisfied.

    2.4.1 OUTLINE OF BASIC GA:

    1. [Start] Generate random population of n chromosomes (suitable solutions for the

    problem)

    2. [Fitness] Evaluate the fitnessf(x) of each chromosomex in the population

    3. [New population] Create a new population by repeating following steps until the new

    population is complete

    4. a [Selection] Select two parent chromosomes from a population according to their

    fitness (the better fitness, the bigger chance to be selected)

    5. [Replace] Use new generated population for a further run of the algorithm6. [Test] If the end condition is satisfied, stop, and return the best solution in current

    7. population

    8. [Loop] Go to step 2

    9. The outline of the Basic GA provided above is very general. There are many parameters

    [13]

  • 7/31/2019 Final Thesis Pk12

    14/97

    and settings that can be implemented differently in various problems. Elitism is often

    used as a method of selection. Which means, that at least one of a generation's best

    solution is copied without changes to a new population, so the best solution can survive

    to the succeeding generation

    a. [Crossover] With a crossover probability cross over the parents to form new

    offspring (children). If no crossover was performed, offspring is the exact copy of

    parents.

    b. [Mutation] With a mutation probability mutate new offspring at each locus (position

    in chromosome).

    c. [Accepting] Place new offspring in the new population

    2.4.2OPERATORS OF GA:OVERVIEW:

    The crossover and mutation are the most important parts of the genetic algorithm. The

    performance is influenced mainly by these two operators.

    ENCODING OF A CHROMOSOME:

    A chromosome should in some way contain information about solution that it represents. The

    most commonly used way of encoding is a binary string. A chromosome then could look like

    this:Table-2.1 (Encoding of a chromosome)

    Each chromosome is represented by a binary string. Each bit in the string can represent some

    characteristics of the solution. There are many other ways of encoding. The encoding depends

    mainly on the problem to be solved. For example, one can encode directly integer or real

    numbers; sometimes it is useful to encode some permutations and so on.

    CROSSOVER:

    [14]

    Chromosome 1 1101100100110110

    Chromosome 2 1101111000011110

  • 7/31/2019 Final Thesis Pk12

    15/97

    Crossover operates on selected genes from parent chromosomes and creates new offspring. The

    simplest way how to do that is to choose randomly some crossover point and copy everything

    before this point from the first parent and then copy everything after the crossover point from

    the other parent. Crossover is illustrated in the following (| is the Crossover point)

    Table-2.2 (crossover of Chromosome)

    Chromosome 1 11011 00100110110

    Chromosome 2 11011 11000011110

    Chromosome 3 11011 11000011110

    Chromosome 4 11011 00100110110

    There are other ways how to make crossover, for example we can choose more crossover points.

    MUTATION:

    Mutation is intended to prevent falling of all solutions in the population into a local optimum of

    the solved problem. Mutation operation randomly changes the offspring resulted fromcrossover. In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or

    from 0 to 1. Mutation can be then illustrated as follows

    Table-2.3(Mutation operation)

    [15]

  • 7/31/2019 Final Thesis Pk12

    16/97

    Original offspring 1 1101111000011110

    Original offspring 2 1101100100110110

    Original offspring 3 1100111000011110

    Original offspring 4 1101101100110110

    The technique of mutation (as well as crossover) depends mainly on the encoding of

    chromosomes. For example when we are encoding by permutations, mutation could be

    performed as an exchange of two genes.

    2.4.3 PARAMETERS OF GA:

    There are two basic parameters of GA - crossover probability and mutation probability.

    CROSSOVER PROBABILITY:

    It indicates how often crossover will be performed. If there is no crossover, offspring are exactcopies of parents. If there is crossover, offspring are made from parts of both parent's

    chromosome. If crossover probability is 100%, then all

    offspring are made by crossover. If it is 0%, whole new generation is made from exact copies of

    chromosomes from old population (but this does not mean that the new generation is the same!).

    Crossover is made in hope that new chromosomes will contain good parts of old chromosomes

    and therefore the new chromosomes will be better. However, it is good to leave some part of old

    population survives to next generation.

    MUTATION PROBABILITY:

    This signifies how often parts of chromosome will be mutated. If there is no mutation, offspring

    [16]

  • 7/31/2019 Final Thesis Pk12

    17/97

    are generated immediately after crossover (or directly copied) without any change. If mutation

    is performed, one or more parts of a chromosome are changed. If mutation probability is100%,

    whole chromosome is changed, if it is 0%, nothing is changed. Mutation generally prevents the

    GA from falling into local extremes. Mutation should not occur very often, because then GA

    will in fact change to random search.

    OTHER PARAMETERS:

    There are also some other parameters of GA. One another particularly important parameter is

    population size.

    POPULATION SIZE:

    It signifies how many chromosomes are present in population (in one generation). If there aretoo few chromosomes, then GA has few possibilities to perform crossover and only a small part

    of search space is explored. On the other hand, if there are too many chromosomes, then GA

    slows down.

    SELECTION:

    The chromosomes are selected from the population to be parents for crossover. The problem is

    how to select these chromosomes. According to Darwin's theory of evolution, the best ones

    survive to create new offspring. There are many methods in selecting the best chromosomes.

    Examples are roulette wheel selection, Boltzmann selection, tournament selection, rank

    selection, steady state selection and some others. In this thesis we have used the tournament

    selection as it performs better than the others.

    TOURNAMENT SELECTION:

    A selection strategy in GA is simply a process that favors the selection of better individuals in

    the population for the mating pool. There are two important issues in the evolution process of

    genetic search, population diversity and selective pressure. Population diversity means that the

    genes from the already discovered good individuals are exploited while promising the new areas

    of the search space continue to be explored. Selective pressure is the degree to which the better

    individuals are favored. The tournament selection strategy provides selective pressure by

    [17]

  • 7/31/2019 Final Thesis Pk12

    18/97

    holding a tournament competition among individuals.

    2.5 DIFFERENTIAL EVALUATION:

    The aim of optimization is to determine the best-suited solution to a problem under a given setof constraints. Several researchers over the decades have come up with different solutions to

    linear and non-linear optimization problems. Mathematically an optimization problem involves

    a fitness function describing the problem, under a set of constraints representing the solution

    space for the problem. Unfortunately, most of the traditional optimization techniques are

    centered around evaluating the first derivatives to locate the optima on a given constrained

    surface. Because of the difficulties in evaluating the first Derivatives, to locate the optima for

    many rough and discontinuous optimization surfaces, in recent times, several derivative free

    optimization algorithms have emerged. The optimization problem, now-a-days, is represented as

    an intelligent search problem, where one or more agents are employed to determine the optima

    on a search landscape, representing the constrained surface for the optimization problem [20].

    In the later quarter of the twentieth century, Holland pioneered a new concept on evolutionary

    search algorithms, and came up with a solution to the so far open-ended problem to non-linear

    optimization problems. Inspired by the natural adaptations of the biological species, Holland

    echoed the Darwinian Theory through his most popular and well known algorithm, currently

    known as genetic algorithms (GA) [21]. Holland and his coworkers including Goldberg and

    Dejong popularized the theory of GA and demonstrated how biological crossovers and

    mutations of chromosomes can be realized in the algorithm to improve the quality of the

    solutions over successive iterations [22]. In mid 1990s Eberhart and Kennedy enunciated an

    alternative solution to the complex non-linear optimization problem by emulating the collective

    behavior of bird flocks, particles, the boids method of Craig Reynolds [23] and socio-cognition

    and called their brainchild the particle swarm optimization (PSO)[23-27]. Around the same

    time, Price and Storn took a serious attempt to replace the classical crossover and mutationoperators in GA by alternative operators, and consequently came up with a suitable deferential

    operator to handle the problem. They proposed a new algorithm based on this operator, and

    called it deferential evolution (DE) [28].

    Both algorithms do not require any gradient information of the function to be optimized uses

    only primitive mathematical operators and are conceptually very simple. They can be

    [18]

  • 7/31/2019 Final Thesis Pk12

    19/97

    implemented in any computer language very easily and requires minimal parameter tuning.

    Algorithm performance does not deteriorate severely with the growth of the search space

    dimensions as well. These issues perhaps have a great role in the popularity of the algorithms

    within the domain of machine intelligence and cybernetics.

    2.5.1 CLASSICAL DE:

    Like any other evolutionary algorithm, DE also starts with a population ofNPD-dimensional

    search variable vectors. We will represent subsequent generations in DE by discrete time steps

    liket = 0, 1, 2. . . t, t+1, etc. Since the vectors are likely to be changed over different generations

    we may adopt the following notation for representing the ith vector of the population at the

    current generation (i.e., at timet = t) as

    Xi(t)= [xi,1(t), xi,2(t), xi,3(t) . . . . . xi,D (t)] (2.20)

    These vectors are referred in literature as genomes or chromosomes. DE is a very simple

    evolutionary algorithm. For each search-variable, there may be a certain range within which

    value of the parameter should lie for better search results. At the very beginning of a DE run or

    at t= 0, problem parameters or independent variables are Initialized somewhere in their feasible

    numerical range. Therefore, if the jth parameter of the given problem has its lower and upperbound as xLj andxUj respectively, then we may initialize the jth component of the ithpopulation

    members as xi,j (0) =xLj + rand(0, 1) (xUj xLj),where rand (0,1) is a uniformly distributed

    random number lying between 0 and 1. Now in each generation (or one iteration of the

    algorithm) to change each population memberXi(t) (say), a Donor vectorVi(t) is created. It is

    the method of creating this donor vector, which demarcates between the various DE schemes.

    However, here we discuss one such specific mutation strategy known as DE/rand/1. In this

    scheme, to create Vi(t) for each ith member, three other parameter vectors (say the r1, r2, and r3th

    vectors) are chosen in a random fashion from the current population. Next, a scalar numberF

    scales the deference of any two of the three vectors and the scaled deference is added to the

    third one whence we obtain the donor vector Vi(t). We can express the process for thejth

    component of each vector as

    [19]

  • 7/31/2019 Final Thesis Pk12

    20/97

    , 1, 2, 3, .( 1) ( ) .( ( ) ( ))..............i j r j r j r jV t x t F x t x t + = + (2.21)

    The process is illustrated in Fig. 2. Closed curves in Fig. 2denote constant cost contours, i.e., for

    a given cost function f, a contour corresponds to f (X) = constant. Here the constant cost

    contours are drawn for the Ackley Function. Next, to increase the potential diversity of the

    population a crossover scheme comes to play. DE can use two kinds of cross over schemes

    namely Exponential and Binomial. The donor vector exchanges its body parts, i.e.,

    components with the target vectorXi(t) under this scheme. In Exponential crossover, we first

    choose an integern randomly among the numbers [0, D1]. This integer acts as starting point in

    the target vector, from where the crossover or exchange of components with the donor vector

    starts. We also choose another integer L from the interval [1, D]. L denotes the number of

    components; the donor vector actually contributes to the target. After a choice ofn and L thetrial vector

    ,1 ,2 ,( ) [ ( ), ( ),....... ( )]i i i i DU t u t u t u t = (2.22)

    is formed with , ,( ) ( )i j i ju t v t = for j= < n > D, < n+1 > D,..,< n L+1 >D

    = ( )ijx t (2.23)

    Where the angular brackets D denote a modulo function with modulus D. The integer L is

    drawn from [1, D] according to the following pseudo code.

    [20]

  • 7/31/2019 Final Thesis Pk12

    21/97

    Fig. 1.1. Illustrating creation of the donor vector in 2-D parameter space (The

    constant cost contours are for two-dimensional Ackley Function)

    L=0;

    Do

    {

    L=L+1;

    }

    While (rand (0, 1) < CR) AND (L m) = (CR)m1 for any m > 0. CR is called Crossover constant

    and it appears as a control parameter of DE just likeF. For each donor vectorV, a new set ofn

    and L must be chosen randomly as shown above. However, in Binomial crossover scheme,

    the crossover is performed on each of the D variables whenever a randomly picked numberbetween 0 and 1 is within the CR value. The scheme may be outlined as

    ui,j (t) =vi,j (t) if rand (0, 1) < CR,

    [21]

  • 7/31/2019 Final Thesis Pk12

    22/97

    =xi,j (t) else. (2.26)

    In this way for each trial vectorXi(t) an offspring vectorUi(t) is created. To keep the population

    size constant over subsequent generations, the next step of the algorithm calls for selection to

    determine which one of the target vector and the trial vector will survive in the next generation,

    i.e., at time t= t+ 1. DE actually involves the Darwinian principle of Survival of the fittest in

    its selection process which may be outlined as

    Xi(t+ 1) =Ui(t) iff(Ui(t)) f(Xi(t)),

    = Xi(t) iff(Ui(t)) f(Xi(t)) (2.27)

    Where f () is the function to be minimized. So if the new trial vector yields a better value of the

    fitness function, it replaces its target in the next generation; otherwise the target vector is

    retained in the population. Hence the population either gets better (w.r.t. the fitness function) or

    remains constant but never deteriorates. The DE/rand/1 algorithm is outlined below

    2.5.2 PROCEDURE:

    Input: Randomly initialized position and velocity of the particles: xi(0)

    Output: Position of the approximate global optima X

    Begin

    Initialize population;

    Evaluate fitness;

    For i = 0 to max-iteration do

    Begin

    Create Difference-Offspring;

    Evaluate fitness;

    If an offspring is better than its parent

    Then replace the parent by offspring in the next generation;

    End If;

    End For;

    End.

    [22]

  • 7/31/2019 Final Thesis Pk12

    23/97

    2.5.3 THE COMPLETE DE FAMILY:

    Actually, it is the process of mutation, which demarcates one DE scheme from another. In the

    former section, we have illustrated the basic steps of a simple DE. The mutation scheme in

    (2.21) uses a randomly selected vectorXr1 and only one weighted difference vectorF (Xr2

    Xr3) is used to perturb it. Hence, in literature the particular mutation scheme is referred to as

    DE/rand/1. We can now have an idea of how different DE schemes are named. The general

    convention used, is DE/x/y. DE stands for DE, x represents a string denoting the type of the

    vector to be perturbed (whether it is randomly selected or it is the best vector in the population

    with respect to fitness value) and y is the

    number of difference vectors considered for perturbation ofx. Below we outline the other four

    different mutation schemes, suggested by Price et al.

    SCHEME DE/RAND TO BEST/1

    DE/rand to best/1 follows the same procedure as that of the simple DE scheme illustrated

    earlier. The only difference being that, now the donor vector, used to perturb each population

    member, is created using any two randomly selected member of the population as well as the

    best vector of the current generation (i.e., the vector yielding best suited objective function

    value at t= t). This can be expressed for the ith donor vector at time t= t+ 1 as

    Vi(t+ 1) = Xi(t) + (Xbest(t) Xi(t)) + F (Xr2 (t) Xr3(t)) (2.28)

    Where is another control parameter of DE in [0, 2], Xi(t) is the target vector and Xbest(t) is the

    best member of the population regarding fitness at current time step t= t. To reduce the number

    of control parameters a usual choice is to put = F

    SCHEME DE/BEST/1

    In this scheme everything is identical to DE/rand/1 except the fact that the

    trial vector is formed as

    Vi(t+ 1) = Xbest(t) + F (Xr1(t) Xr2(t)) (2.29)

    [23]

  • 7/31/2019 Final Thesis Pk12

    24/97

    here the vector to be perturbed is the best vector of the current population and the perturbation is

    caused by using a single difference vector.

    SCHEME DE/BEST/2

    Under this method, the donor vector is formed by using two difference vectors as shown below:

    Vi(t+ 1) = Xbest(t) + F (Xr1(t) + Xr2(t) Xr3(t) Xr4(t)) (2.30)

    Owing to the central limit theorem the random variations in the parameter vector seems to shift

    slightly into the Gaussian direction which seems to be beneficial for many functions.

    SCHEME DE/RAND/2

    Here the vector to be perturbed is selected randomly and two weighted difference vectors are

    added to the same to produce the donor vector. Thus for each target vector, a totality of five

    other distinct vectors are selected from the rest of the population. The process can be expressed

    in the form of an equation as

    Vi(t

    + 1) =Xr

    1(t) +

    F1

    (Xr

    2(t)

    Xr3(

    t)) +

    F2

    (Xr

    4(t)

    X(t)) (2.31)

    Here F1 and F2 are two weighing factors selected in the range from 0 to 1. To reduce the

    number of parameters we may choose F1 = F2 = F.

    SUMMARY OF ALL SCHEMES:

    In 2001 Storn and Price [21] suggested total ten different working strategies of DE and some

    guidelines in applying these strategies to any given problem. These strategies were derived from

    the five different DE mutation schemes outlined above. Each mutation strategy was combined

    with either the exponential type crossover or the binomial type crossover. This yielded 5

    2 = 10 DE strategies, which are listed below.

    DE/best/1/exp

    [24]

  • 7/31/2019 Final Thesis Pk12

    25/97

    DE/rand/1/exp

    DE/rand-to-best/1/exp

    DE/best/2/exp

    DE/rand/2/exp

    DE/best/1/bin

    DE/rand/1/bin

    DE/rand-to-best/1/bin

    DE/best/2/bin

    DE/rand/2/

    The general convention used above is again DE/x/y/z, where DE stands for DE, x represents a

    string denoting the vector to be perturbed, y is the number of difference vectors considered forperturbation of x, and z stands for the type of crossover being used (exp: exponential; bin:

    binomial)

    2.5.4 MORE RECENT VARIANTS OF DE:

    DE is a stochastic, population-based, evolutionary search algorithm. The strength of the

    algorithm lies in its simplicity, speed (how fast an algorithm can find the optimal or suboptimal

    points of the search space) and robustness (producing nearly same results over repeated runs).The rate of convergence of DE as well as its accuracy can be improved largely by applying

    different mutation and selection strategies. A judicious control of the two key parameters

    namely the scale factor F and the crossover rate CR can considerably alter the performance of

    DE. In what follows we will illustrate some recent medications in DE to make it suitable for

    tackling the most difficult optimization problems.

    DE WITH TRIGONOMETRIC MUTATION:Recently, Lampinen and Fan [29] has proposed a trigonometric mutation operator for DE to

    speed up its performance. To implement the scheme, for each target vector, three distinct

    vectors are randomly selected from the DE population. Suppose for the ith target vectorXi(t),

    the selected population members are Xr1(t), Xr2(t) and Xr3(t). The indices r1, r2 and r3 are mutually

    different and selected from [1, 2. . . N] Where N denotes the population size. Suppose the

    [25]

  • 7/31/2019 Final Thesis Pk12

    26/97

    objective function values of these three vectors are given by, f(Xr1(t)), f(Xr2(t)) and f(Xr3(t)).

    Now three weighing coefficients are formed according to the following equations:

    p = f (Xr1) + f (Xr2) + f (Xr3) (2.32)

    p1 = f (Xr1) p (2.33)

    p2 = f (Xr2) p (2.34)

    p3 = f (Xr3) p (2.35)

    Let rand (0, 1) be a uniformly distributed random number in (0, 1) and be the trigonometric

    mutation rate in the same interval (0, 1). The trigonometric mutation scheme may now beexpressed as

    Vi(t+ 1) = (Xr1 + Xr2 + Xr3)/3 + (p2 p1) (Xr1 Xr2)

    + (p3 p2) (Xr2 Xr3) + (p1 p3) (Xr3 Xr1)

    if rand (0, 1) < (2.36)

    Vi(t+ 1) = Xr1 + F (Xr2 + Xr3) else (2.37)

    Thus, we find that the scheme proposed by Lampinen et al. uses trigonometric mutation with a

    probability of and the mutation scheme of DE/rand/1 with a probability of (1 ).

    DERANDSF (DE WITH RANDOM SCALE FACTOR)

    In the original DE [28] the deference vector (Xr1(t) Xr2(t)) is scaled by a constant factor F.

    The usual choice for this control parameter is a number between 0.4 and 1. We propose to vary

    this scale factor in a random manner in the range (0.5, 1) by using the relation

    F = 0.5 (1 + rand (0, 1)) (2.38)

    [26]

  • 7/31/2019 Final Thesis Pk12

    27/97

    where rand (0, 1) is a uniformly distributed random number within the range [0, 1]. We call this

    scheme DERANDSF (DE with Random Scale Factor) . The mean value of the scale factor is

    0.75. This allows for stochastic variations in the amplification of the difference vector and thus

    helps retain population diversity as the search progresses. Even when the tips of most of the

    population vectors point to locations clustered near a local optimum due to the randomly scaled

    difference vector, a new trial vector has fair chances of pointing at an even better location on the

    multimodal functional surface. Therefore, the

    fitness of the best vector in a population is much less likely to get stagnant until a truly global

    optimum is reached.

    DETVSF (DE WITH TIME VARYING SCALE FACTOR)

    In most population-based optimization methods (except perhaps some hybrid global-local

    methods) it is generally believed to be a good idea to encourage

    Fig. 1.2. Illustrating DETVSF scheme on two-dimensional cost contours of Ackley

    Function

    the individuals (here, the tips of the trial vectors) to sample diverse zones of the search spaceduring the early stages of the search. During the later stages it is important to adjust the

    movements of trial solutions finely so that they can explore the interior of a relatively small

    space in which the suspected global optimum lies. To meet this objective we reduce the value of

    the scale factor linearly with time from a (predetermined) maximum to a (predetermined)

    [27]

  • 7/31/2019 Final Thesis Pk12

    28/97

    minimum value:

    R = (Rmax Rmin)(MAXIT iter)/MAXIT (2.39)

    where Fmax and Fmin are the maximum and minimum values of scale factor F, iteris the current

    iteration number and MAXITis the maximum number of allowable iterations. The locus of the

    tip of the best vector in the population under this scheme may be illustrated as in Fig. 2. The

    resulting algorithm is referred as DETVSF (DE with a time varying scale factor).

    DE WITH LOCAL NEIGHBORHOOD:

    Only in 2006, a new DE-variant, based on the neighborhood topology of the parameter vectors

    was developed [30] to overcome some of the disadvantages of the classical DE versions. Theauthors in proposed a neighborhood-based local mutation operator that draws inspiration from

    PSO. Suppose we have a DE population P= [X1, X2. . . XNp ] where each Xi (i = 1, 2. . . Np) is a

    D-dimensional vector. Now for every vectorXi we define a neighborhood of radius k, consisting

    of vectors Xik . . . Xi . . .Xi+k. We assume the vectors to be organized in a circular fashion such

    that two immediate neighbors of vectorX1 are XNp and X2. For each member of the population

    a local mutation is created by employing the fittest vector in the neighborhood of the model may

    be expressed as:

    Li(t)=Xi(t)+ (Xnbest(t) Xi(t)) + F (Xp(t) Xq (t)) (2.40)

    where the subscript nbestindicates the best vector in the neighborhood ofX i and p, q (i k,

    i + k). Apart from this, we also use a global mutation expressed as:

    Gi(t) =Xi(t) + (Xbest(t) Xi(t)) + F (Xr(t) Xs(t)) (2.41)

    where the subscript best indicates the best vector in the entire population, and r, s (1, NP).

    Global mutation encourages exploitation, since all members (vectors) of a population are biased

    by the same individual (the population best); local mutation, in contrast, favors exploration,

    since in general different members of the population are likely to be biased by different

    individuals. Now we combine these two models using a time-varying scalar weight w (0, 1)

    [28]

  • 7/31/2019 Final Thesis Pk12

    29/97

    to form the actual mutation of the new DE as a weighted mean of the local and the global

    components:

    Vi(t) = w Gi(t) + (1 w) Li(t). (2.42)

    The weight factor varies linearly with time as follows:

    w = wmin + (wmax wmin) iter (2.43)

    Where iter is the current iteration number, MAXIT is the maximum number of iterations

    allowed and wmax, wmin denotes, respectively, the maximum and minimum value of the weight,

    with wmax, wmin (0, 1). Thus the algorithm starts at iter = 0 with w = wmin but as iterincreases towards MAXIT, w increases gradually and ultimately when iter = MAXIT w reaches

    wmax. Therefore at the beginning, emphasis is laid on the local mutation scheme, but with time,

    contribution from the global model increases. In the local model attraction towards a single

    point of the search space is reduced, helping DE avoid local optima. This feature is essential at

    the beginning of the search process when the candidate vectors are expected to explore the

    search space vigorously. Clearly, a judicious choice of wmax and wmin is necessary to strike a

    balance between the exploration and exploitation abilities of the algorithm. After someexperimenting, it was found that wmax = 0.8 and wmin = 0.4 seem to improve the performance

    of the algorithm over a number of benchmark function

    [29]

  • 7/31/2019 Final Thesis Pk12

    30/97

    CHAPTER -3

    ADAPTIVE SYSTEM IDENTIFICATION

    USING GA

    3.1 INTRODUCTION:

    Generally the identification of linear system is performed by using LMS algorithm. But most of

    the dynamic systems exhibit nonlinearity. The LMS based technique [31] does not perform

    satisfactory to identify nonlinear system. To improve the identification performance of

    nonlinear systems various techniques such as Artificial Neural Network (ANN) [32], Functional

    Link Artificial Neural Network (FLANN) [33], Radial Basis Function (RBF) [34], etc.

    In this chapter we propose a novel adaptive model based on GA technique for identification of

    nonlinear systems. To apply GAs in systems identification, each individual in the population

    must represent a model of the plant and the objective becomes a quality measure of the model,

    by evaluating its capacity of predicting the evolution of the measured outputs. The measured

    output predictions, inherent to each individual i, is compared with the measurements made on

    the real plant. The obtained error is a function of the individuals quality. As less is this error, as

    more performing the individual is. There are many ways in which the GAs can be used to solve

    system identification tasks.

    3.2. BASIC PRINCIPLE OF ADAPTIVE SYSTEM

    IDENTIFICATION:

    An adaptive filter can be used in modeling that is, imitating the behavior of physical dynamic

    systems which may be regarded as unknown black boxes having one or more inputs andoutputs. Modeling a single input, single output dynamic system is shown in fig(3).Noise is taken

    into consideration because in many practical cases the system to be modeled is noisy, that is,

    has internal random disturbing forces. Internal system noise appears at the system output and is

    commonly represented there as an additive noise. This noise is generally uncorrelated with the

    plant input. If this is the case and if the adaptive model is an adaptive linear combiner whose

    [30]

  • 7/31/2019 Final Thesis Pk12

    31/97

    weights are adjusted to minimize mean square error, it can be shown that the least squares

    solution will be unaffected by the presence of plant noise. This is not to say that the

    convergence of the adaptive process will be unaffected by system noise, only that the expected

    weight vector of the adaptive model after convergence will be unaffected. The least square

    solution will be determined primarily by the impulse response of the system to be modeled. It

    could also be significantly affected by the statistical or spectral character of the system input

    signal.

    Fig.3.1 Modeling the single input, single output System..

    The problem of determining a mathematical model for an unknown system by observing its

    input-output data is known as system identification, Which is performed by suitably

    adjusting the parameters within a given model, such that for a particular input, the model output

    matches with the corresponding actual system output .After a system is identified, the output

    can be predicted for a given input to the system which is the goal of system identification

    problem. When the plant behavior is completely unknown it may be characterized using certain

    adaptive model and then its identification task is carried out using adaptive algorithms like the

    [31]

    Adaptive model

    Unknown System

    Adaptive Algorithm

    +

    -

    x

    noise

    e

    y

  • 7/31/2019 Final Thesis Pk12

    32/97

    LMS. The system identification task is at the heart of numerous adaptive filtering applications.

    We list several of these applications here.

    Channel Identification

    Plant Identification.

    Echo Cancellation for long distance transmission.

    Acoustic Echo Cancellation

    Adaptive Noise Cancellation.

    Fig .4 represents a schematic diagram of system identification of time invariant, causal discrete

    time dynamic plant The output of the plant is given by y = p(x) where x is the input which is

    uniformly bounded function of time .the operator p describes the dynamic plant . The objective

    of identification problem is to construct model generating an output which approximate the

    plant output y when subjected to the same input x so that the squared error(e2) is minimum .

    Fig.3.2 schematic block diagram of a GA based adaptive identification system

    In this chapter the modeling is done in an adaptive manner such that after training the model

    iteratively y and become almost equal and the squared error becomes almost zero. The

    minimization of error in an iterative manner is usually achieved by LMS or RLS methods which

    [32]

    y

    -

    x

    noise

    e+

    Adaptive model

    System P(x)

    GA Based AdaptiveAlgorithm

  • 7/31/2019 Final Thesis Pk12

    33/97

    are basically derivative based. The shortcoming of this method is that for certain type of plant

    the squared error cannot be optimally minimized due to error surface falling to local minima. In

    this chapter we propose a novel and elegant method which employs Genetic algorithm for

    minimizing the squared error in a derivative free manner. In essence, in this chapter the system

    identification problem is viewed as a squared error minimization problem.

    The adaptive modeling constitutes two step. In the first step the model is trained using GA

    based updating technique. After successful training of the model system performance is carried

    out by feeding zero mean uniformly distributed random input. Before we proceed to the

    identification task using GA let us discuss the basics of GA based optimization.

    3.3. DEVELOPMENT OF GA BASED ALGORITHEM

    FOR SYSTEM IDENTIFICATION:

    Referring to Fig.3.2 let the system p(x) be an FIR system represent by the transfer function

    given by

    p(z)=a0+a1z-1+a2z-2+a3z-3+.+anz-n (3.1)

    Where a0, a1, a2 an represent the impulse response (parameter) of the system . The

    measurement noise of the system is given by n(k) which is assumed to be white and Gaussian

    distributed . The input x is also uniformly distributed white noise lying between -23 to +23

    and have a variance of unity. The GA based model consists of an equal order FIR system with

    unknown coefficients. The purpose of the adaptive identification model is to estimate the

    unknown coefficients 0,1,2,...n such that they match with the corresponding

    parameters a0, a1,a2 ,an of the actual system p(z) . if the system is exactly

    identified(theoretically) then in case of a linear system (for example the FIR system ) the system

    parameters and the model parameters become equal i.e. a0=0, a1=1, a2=2an=n.

    Also the response of actual system(y) coincides with the response of the model system ().

    However, in case of nonlinear dynamic system the system parameters do not match but the

    responses of the system will match.

    The updating of the parameters of the model is carried out using GA rule as outlined in the

    following steps

    [33]

  • 7/31/2019 Final Thesis Pk12

    34/97

    I. As shown in fig.3.2 an unknown static dynamic system to be identified is connected is

    parallel with an adaptive model to be developed using GA.

    II. The coefficients () of the system are initially chosen from population of M

    chromosomes. Each chromosome constitutes NL number of random binary bits, eachsequential group of L-bits represent one coefficient of the adaptive model, where N is

    the number of parameters of the model.

    III. Generate k(=500) number of input signal samples each of which is having zero mean

    and uniformly distributed between -23 to +23 and having a variance of unity.

    IV. Each of the input samples is passed through the plant P(Z) and the contaminated with

    the additive noise of known strength .The resultant signal acts like the desired signal . in

    this way k number of desired signals are produced by feeding all the k input samples.

    V. Each of the input sample is also passed through the model using each chromosome as

    model parameters and M sets of K estimated output are obtained.

    VI. Each of the desired output is compared with corresponding estimated output and K

    errors are produced. The mean square error (MSE) for set of parameters (corresponding

    to mth chromosome) is determined by using relation.

    1

    2

    ( )

    k

    i

    i

    MSEn

    k

    e==

    (3.2)

    This is repeated for M times

    VII. Since the objective is to minimize MSE(m),m=1 to M the GA based optimization isused.

    VIII. The tournament selection, crossover, mutation and selection operator are sequentially

    carried out following the steps as given in section-3.3.

    [34]

  • 7/31/2019 Final Thesis Pk12

    35/97

    IX. In each generation the minimum MSE, MMSE is obtained and plotted against

    generation to show the learning characteristics.

    X. The learning process is stopped when MMSE reaches the minimum level.

    XI. At this step all the chromosomes attend almost identical genes, which represent the

    estimated parameters of the developed model.

    3. 4. SIMULATION STUDIES:

    To demonstrate the performance of the proposed GA based approach numerous simulation

    studied are carried out of several linear and non linear system. The performance of the proposedstructure is compared with corresponding LMS structure.

    The block diagram shown in the Fig.3.2is used for simulation study

    Case-1 (Linear System)

    A unit variance random system uniform signal lying in the range of -2 3 to +23

    is applied to known the system having transfer function.Experiment-1: H (z) =0.2090+ 0.9950Z-1 + 0.2090 Z-2 and

    Experiment-2: H (z) =0.2600 + 0.9300 Z-1 + 0.2600 Z-2

    The output of the system is contaminated with white Gaussian noise of different strengths of -20

    db and -30db. The resultant signal y is used as the desired on the training signal. The same

    random input is also applied to the GA based adaptive model having the same linear combinerstructure as that of H (z) but with random initial weights. The coefficients or weights of the

    linear combiner are updated using LMS algorithm as well as the proposed GA based algorithm.

    The training become complete when MSE in dB become parallel to x- axis. Under this

    condition, for a linear system, the parameter ais match with the corresponding estimated

    parameteris from the proposed system.

    [35]

  • 7/31/2019 Final Thesis Pk12

    36/97

    In Table -3.1 we represent actual and estimated parameter of a 3-tap linear combiner obtained

    by the LMS as well as GA models. From this table it is observed that the GA based model

    performs better than that of LMS based models under different noise conditions.

    ExperimentActual

    Parameter

    Estimated parametersLMS Based GA Based

    NSR = -30

    dB

    NSR = -20

    dB

    NSR = -30

    dB

    NSR = -20

    dB

    010.2090 0.2092 0.2064 0.2100 0.20610.9950 0.9941 1.0094 0.9943 0.99850.2090 0.2071 0.2153 0.2077 0.2077

    020.2600 0.2631 0.2705 0.2582 0.25660.9300 0.9308 0.9289 0.9301 0.93420.2600 0.2563 0.2624 0.2598 0.2598

    [36]

    Table-3.1 comparison of actual and estimated parameters of LMS and GA based models

  • 7/31/2019 Final Thesis Pk12

    37/97

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

    -25

    -20

    -15

    -10

    -5

    0

    MSE

    IN

    dB

    NUMBER OF ITERATIONS(SAMPLES)

    CH:[0.2090,0.9950,0.2090],NL=0

    NSR=-20dB

    NSR=-30dB

    Fig.3.3 Learning Chacteristics of LMS based Linear System Identification (Experiment-1)

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

    -25

    -20

    -15

    -10

    -5

    0

    MSE

    IN

    dB

    NUMBER OF ITERATIONS(SAMPLES)

    CH:[0.2600,0.9300,0.2600],NL=0

    NSR=-30dB

    NSR=-20dB

    Fig.3.4 Learning Chacteristics of LMS based Linear System Identification (Experiment-2)

    [37]

  • 7/31/2019 Final Thesis Pk12

    38/97

    0 10 20 30 40 50 60 70 80 90 100-35

    -30

    -25

    -20

    -15

    -10

    -5

    0

    Generation

    Mean

    squareerrorindB

    CH:[0.2090,0.9950,0.2090],NL=0

    NSR=-30dB

    NSR=-20dB

    Fig.3.5 Learning Characteristics of GA based Linear System Identification (Experiment-1)

    0 10 20 30 40 50 60 70 80 90 100-35

    -30

    -25

    -20

    -15

    -10

    -5

    0

    Generation

    Mean

    squareerrorindB

    CH:[0.2600,0.9300,0.2600],NL=0

    NSR=-20dB

    NSR=-30dB

    Fig.3.6 Learning Characteristics of GA based Linear System Identification (Experiment-2)

    [38]

  • 7/31/2019 Final Thesis Pk12

    39/97

    Case-2 (Non-Linear System)

    In this simulation the actual is assume to be non linear in nature .Computer simulation result of

    two different nonlinear system are presented in this case the actual system

    Experiment -3: yn (k) = tanh{y (k)}

    Experiment -4: yn (k) = y (k) + 0.2y2 (k) 0.1y3 (k)

    Where y (k) is the output of the linear system and yn (k) is the output of nonlinear system .

    In case of nonlinear system the parameter of two system do not match ,however the responses of

    the actual and adaptive model match .To demonstrate this observation training carried out using

    both LMS and GA based algorithm .

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

    -25

    -20

    -15

    -10

    -5

    0

    MSEIN

    dB

    NUMBER OF ITERATIONS(SAMPLES)

    CH:[0.2090,0.9950,0.2090],NL:y=tanh(y)

    NSR=20dB

    NSR=-30dB

    Fig.3.7 Learning Chacteristics of LMS based Non Linear System Identification

    (Experiment-3)

    [39]

  • 7/31/2019 Final Thesis Pk12

    40/97

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-25

    -20

    -15

    -10

    -5

    0

    MSE

    IN

    dB

    NUMBER OF ITERATIONS(SAMPLES)

    CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y2)-0.1(y3)

    NSR=-30dB

    NSR=-20dB

    Fig.3.8 Learning Chacteristics of LMS based Non Linear System Identification

    (Experiment-4)

    0 100 200 300 400 500 600-30

    -25

    -20

    -15

    -10

    -5

    0

    Generation

    Mean

    squareerrorindB

    CH:[0.2090,0.9950,0.2090],NL:y=tanhy

    NSR=-30dBNSR=-20db

    Fig.3.9 Learning Chacteristics of GA based Non Linear System Identification (Experiment-3)

    [40]

  • 7/31/2019 Final Thesis Pk12

    41/97

    0 100 200 300 400 500 600-25

    -20

    -15

    -10

    -5

    0

    Generation

    Mean

    squareerrorindB

    CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y.2)=0.1*(y.3)

    NSR=-30dB

    NSR=-20dB

    Fig.3.10 Learning Chacteristics of GA based Non Linear System Identification

    (Experiment-4)

    0 5 10 15 20 25 30 35 40 45 50-0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    output

    CH:[0.2090,0.9950,0.2090],NL:y=tanhy

    Actual

    GA

    LMS

    Fig.3.11 Comparision of Output response of (Experiment-3) at -30dBNSR.

    [41]

  • 7/31/2019 Final Thesis Pk12

    42/97

    0 5 10 15 20 25 30 35 40 45 50-0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    output

    CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y.2)-0.1*(y.3)

    Actual

    GA

    LMS

    Fig.3.12 Comparison of Output response of (Experiment-4) at -30dBNSR.

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

    -25

    -20

    -15

    -10

    -5

    0

    MSE

    IN

    dB

    NUMBER OF ITERATIONS(SAMPLES)

    CH:[0.2600,0.9300,0.2600],NL:y=tanh(y)

    NSR=-20dB

    NSR=-30dB

    Fig.3.13 Learning Chacteristics of LMS based Non Linear System Identification

    (Experiment-3)

    [42]

  • 7/31/2019 Final Thesis Pk12

    43/97

    0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-25

    -20

    -15

    -10

    -5

    0

    MSE

    IN

    dB

    NUMBER OF ITERATIONS(SAMPLES)

    CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y.2)-0.1*(y.3)

    NSR=-30dB

    NSR=-20dB

    Fig.3.14 Learning Chacteristics of LMS based Non Linear System Identification

    (Experiment-4)

    0 100 200 300 400 500 600-30

    -25

    -20

    -15

    -10

    -5

    0

    Generation

    Mean

    squareerrorindB

    CH:[0.2600,0.9300,0.2600],Nl;y=tanhy

    NSR=-30dB

    NSR=-20dB

    Fig.3.15 Learning Chacteristics of GA based Non Linear System Identification

    (Experiment-3)

    [43]

  • 7/31/2019 Final Thesis Pk12

    44/97

    0 100 200 300 400 500 600-25

    -20

    -15

    -10

    -5

    0

    Generation

    Mean

    squareerrorindB

    CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y. 2)-0.1*(y.3)

    NSR=-30dB

    NSR=-20dB

    Fig.3.16 Learning Chacteristics of GA based Non Linear System Identification

    (Experiment-4)

    0 5 10 15 20 25 30 35 40 45 50-0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    output

    CH:[0.2600,0.9300,0.2600],NL:y=tanhy

    Actual

    GALMS

    Fig.3.17 Comparison of Output response of (Experiment-3) at -30dBNSR.

    [44]

  • 7/31/2019 Final Thesis Pk12

    45/97

    0 5 10 15 20 25 30 35 40 45 50-0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    output

    CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y.2)-0.1*(y.3)

    Actual

    GA

    LMS

    Fig.3.18 Comparison of Output response of (Experiment-4) at -30dBNSR

    The MSE plots of experiment-3 and experiment-4 followed by experiment-1 for two differentnoise conditions for LMS based algorithm are obtained by simulation and shown in Fig.3.7

    &3.8 respectively .The corresponding plots for the same system for GA based model are shown

    in Fig.3.9 &3.10 respectively. The comparison of output responses of the two nonlinear models

    using LMS and GA techniques are shown in Fig.3.11 &3.12 respectively. Similarly the MSE

    plots of experiment-3 and experiment-4 followed by experiment-2 for two different noise

    conditions for LMS based algorithm are obtained by simulation and shown in Fig.3.13 &3.14

    respectively .The corresponding plots for the same system for GA based model are shown inFig.3.15 &3.16 respectively. The comparison of output responses of the two nonlinear models

    using LMS and GA techniques are shown in Fig.3.17 &3.18 respectively. Similar results are

    also observed in case of other non linear models and under various noise conditions.

    [45]

  • 7/31/2019 Final Thesis Pk12

    46/97

    3.5. RESULTS AND DISCUSSIONS:

    Table-1 reveals that for FIR linear system the coefficients of adaptive model using LMS arematched closely with the coefficients of actual system in comparison with GA.Hence for linear

    FIR system LMS works well.

    For nonlinear system the learning characteristics of LMS technique is poor (Fig.9) for both

    noise cases. But the same is much improved in case of GA (Fig.11).

    The output response of nonlinear system (Experiment-3) of GA is better than the LMS counter

    part because of GA is closer to the desired response (Fig.13).

    [46]

  • 7/31/2019 Final Thesis Pk12

    47/97

    CHAPTER-4

    ADAPTIVE CHANNEL EQUALIZATION

    USING GENETIC ALGORITHM.

    4.1 INTRODUCTION:

    The digital communication system suffers from the problem of ISI which essentially deteoriates

    the accuracy of reception. The probability of error at the receiver can be minimized and can be

    reduced to an acceptable level by introducing an equalizer at the front end of the receiver. An

    adaptive digital channel equalizer is essentially an inverse system of the channel model which

    primarily compacts the effect of ISI. Conventially the LMS algorithm is employed to design and

    develop adaptive equalizers [35]. Such equalizers use gradient based weight update algorithm

    and therefore there is a possibility that during training of the equalizers its weight do not attain

    to their optimal values due to the MSE being trapped to local minimum. On the other hand the

    GA and DE are derivative free technique and hence the local minima problem does not arise

    during weight updates. The present chapter has developed a novel GA based adaptive channel

    equalizer.

    4.2 BASIC PRINCIPLE OF CHANNEL EQUALIZATION:

    In an ideal communication channel, the received information is identical to that transmitted.

    However, this is not the case for real communication channels, where signal distortions take

    place. A channel can interfere with the transmitted data through three types of distorting effects.

    Power degradation and fades, multi-path time dispersions and background thermal noise [36].

    Equalization is the process of recovering the data sequence from the corrupted channel samples.

    A typical baseband transmission system is depicted in Fig.4.1, where an equalizer is

    incorporated within the receiver.

    [47]

  • 7/31/2019 Final Thesis Pk12

    48/97

    Fig. 4.1. A Baseband Communication System

    4.2.1 MULTIPATH PROPAGATION:

    Within telecommunication channels multiple paths of propagation commonly occur. In practical

    terms this is equivalent to transmitting the same signal through a number of separate channels,

    each having a different attenuation and delay. Consider an open-air radio transmission channel

    that has three propagation paths, as illustrated in Fig4.2. These could be direct, earth bound and

    sky bound.Multipath interference between consecutively transmitted signals will take place if one signal is

    received whilst the previous signal is still being detected. In Fig4.1 this would occur if the

    symbol transmission rate is greater than 1/ where, represents transmission delay. Because

    bandwidth efficiency leads to high data rates, multi-path interference commonly occurs.

    [48]

    InputOut putTransmitter

    FilterChannelMedium

    + ReceiverFilter EQUALISER

    noise

  • 7/31/2019 Final Thesis Pk12

    49/97

    Fig.4.2 Impulse Response of a transmitted signal in a channel which has 3

    modes of propagation, (a) The signal transmitted paths, (b) The received samples

    4.2.2MINIMUM & NON MINIMUM PHASE CHANNELS:

    When all the roots of the H(Z) lie within the unit circle, the channel are termed minimum phase.

    The inverse of a minimum phase [37] channel is convergent, illustrated by (4.1)

    [49]

    Sky Bound

    Direct

    Earth Bound

    Transmitter Receiver

    Multiple Transmission Paths

    (a)

    Signal Strengthat Receiver

    Direct

    Earth Bound

    Sky Bound

    (b)

  • 7/31/2019 Final Thesis Pk12

    50/97

    111.0 0.5( )

    ( ) 111.0 0.5

    1( )

    201 2 31 0.5 0.25 0.125 ..............

    ZH z

    H z

    z

    i iZ

    i

    Z Z Z

    - += - + - - = - - - - + - +

    (4.1)

    Whereas the inverse of non-minimum phase channels are not convergent, as shown in (4.2).

    111.0 0.5( )

    ( )

    1.0 0.5

    1.[ ( ) ]

    202 3.[1 0.5 0.25 0.125 ]

    ZH z

    H zZ

    Z

    i iZ Zi

    Z Z Z Z

    - += + - - = - + -

    (4.2)

    Since equalizers are designed to invert the channel distortion process they will in effect model

    the channel inverse. The minimum phase channel has a linear inverse model therefore a linear

    equalization solution exists. However, limiting the inverse model to m-dimensions will

    [50]

  • 7/31/2019 Final Thesis Pk12

    51/97

    approximate the solution and it has been shown that non-linear solutions can provide a superior

    inverse model in the same dimension.

    A linear inverse of a non-minimum phase channel does not exist without incorporating time

    delays. A time delay creates a convergent series for a non-minimum phase model, where longer

    delays are necessary to provide a reasonable equalizer. (4.3) describes a non-minimum phase

    channel with a single delay inverse and a four sample delay inverse. The latter of these is the

    more suitable form for a linear filter.

    11 10.5 1.0( )

    ( )1

    1 0.512 3 41 0.5 0.25 0.125 .........( )

    3 2 10.5 0.25 0.125 ........

    .( )

    .( )

    Z ZH z

    H z

    Z

    Z Z Z ZH z

    Z Z Z Z

    noncausal

    truncatedandcausal

    - - += + - - + - + - - - - + - +

    (4.3)

    The three-tap non-minimum phase channel H (z) = 0.3410+0.8760z 1+0.3410z 2 is used

    throughout this thesis for simulation purposes. A channel delay, D, is included to assist in the

    classification so that the desired output becomes u(n D).

    4.2.3 INTERSYMBOL INTERFERENCE:

    Inter-symbol interference (ISI) has already been described as the overlapping of the transmitted

    data. It is difficult to recover the original data from one channel sample dimension because there

    is no statistical information about the multipath propagation. Increasing the dimensionality of

    the channel output vector helps characterize the multipath propagation. This has the affect of not

    only increasing the number of symbols but also increases the Euclidean distance between the

    [51]

  • 7/31/2019 Final Thesis Pk12

    52/97

    output classes.

    Fig. 4.3 Interaction between two neighboring symbols

    When additive Gaussian noise, is present within the channel, the input sample will form

    Gaussian clusters around the symbol centers. These symbol clusters can be characterized by a

    probability density function (PDF) with a noise variance 2. where the noise can cause the

    symbol clusters to interfere. Once this occurs, equalization filtering will become inadequate to

    classify all of the input samples. Error control coding schemes can be employed in such cases

    but these often require extra bandwidth.

    4.4.4SYMBOL OVERLAP:

    The expected number of errors can be calculated by considering the amount of symbol

    interaction, assuming Gaussian noise. Taking any two neighboring symbols, the cumulative

    distribution function (CDF) can be used to describe the overlap between the two noise

    characteristics. The overlap is directly related to the probability of error between the two

    symbols and if these two symbols belong to opposing classes, a class error will occur.

    Fig4.3 shows two Gaussian functions that could represent two symbol noise distributions. The

    Euclidean distance, L, between symbol canters and the noise variance 2s can be used in the

    [52]

    L

    1 2

    1

    2

    Area of overlap =Probability of error

  • 7/31/2019 Final Thesis Pk12

    53/97

    cumulative distribution function of (4.4) to calculate the area of overlap between the two

    symbol noise distributions and therefore the probability of error, as in (4.5).

    12

    2exp 22( ) xCDF dx x s s

    = - - (4.4)

    ( ) 22L

    P c CDF

    = (4.5)

    Since each channel symbol is equally likely to occur the probability of unrecoverable errors

    occurring in the equalization space can be calculated using the sum of all the CDF overlap

    between each opposing class symbol. The probability of error is more commonly described as

    the BER. (4.6) describes the BER based upon the Gaussian noise overlap, whereNSP is the

    number of symbols in the positive class,Nm is the number of number of symbols in the negative

    class and iD is the distance between the Ithpositive symbol and its closest neighboring symbol

    in the negative class.

    2( ) log ( )

    21

    NspiBER CDF n N N isp m n

    ss

    D= + =

    (4.6)

    [53]

  • 7/31/2019 Final Thesis Pk12

    54/97

    4.3 CHANNEL EQUALIZATION:

    The inverse model of a system having an unknown transfer function is itself a system having a

    transfer function which is in some sense a best fit to the reciprocal of the unknown transfer

    function. Sometimes the inverse model response contains a delay which is deliberately

    incorporated to improve the quality of the fit. In Fig. 4.4, a source signal s(n) is fed into an

    unknown system that produces the input signal x(n) for the adaptive filter. The output of the

    adaptive filter is subtracted from a desired response signal that is a delayed version of the source

    signal, such that( ) ( )n nd = - D

    Where is a positive integer value. The goal of the adaptive filter is to adjust its characteristics

    such that the output signal is an accurate representation of the delayed source signal.

    There are many applications of adaptive inverse model of a system. If the system is a

    communication channel then the inverse model is an adaptive equalizer which compensates the

    effects of inter symbol interference (ISI) caused due to restriction of channel bandwidth [38].

    Similarly if this system is the models of a high density recording medium then its correspondinginverse model reconstruct the recorded data without distortion [39]. If the system represents a

    nonlinear sensor then its inverse model represents a compensator of environmental as well as

    inherent nonlinearities [40]. The adaptive inverse model also finds applications in adaptive

    control [41] as well as in deconvolution in geophysics application [42]

    [54]

  • 7/31/2019 Final Thesis Pk12

    55/97

    Fig. 4.4: Inverse Modeling

    Channel equalization is a technique of decoding of transmitted signals across non ideal

    Communication channels. The transmitter sends a sequence s(n) that is known to both the

    transmitter and receiver. However, in equalization, the received signal is used as the input

    Signalx(n) to an adaptive filter, which adjusts its characteristics so that its output closely

    matches a delayed version ( )nS - D of the known transmitted signal. After a suitable

    adaptation period, the coefficients of the system either are fixed and used to decode future

    transmitted messages or are adapted using a crude estimate of the desired response signal that is

    computed from y(n) . This latter mode of operation is known as decision-directed adaptation.

    Channel equalization is one of the first applications of adaptive filters and is described in the

    pioneering work of Lucky. Today, it remains as one of the most popular uses of an adaptive

    filter. Practically every computer telephone modem transmitting at rates of 9600 bits per second

    or greater contains an adaptive equalizer. Adaptive equalization is also useful for wireless

    communication systems. Qureshi [43] has written an excellent tutorial on adaptive equalization.

    A related problem to equalization is deconvolution, a problem that appears in the context of

    geophysical exploration.

    [55]

    System/Plant/Channel Adaptive Filter

    Delay

    Update Algorithm

    (n)+

    +

    x(n) y(n)

    +

    e(n)S(n)

    -

  • 7/31/2019 Final Thesis Pk12

    56/97

    In many control tasks, the frequency and phase characteristics of the plant hamper the

    convergence behavior and stability of the control system. We can use an adaptive filter shown

    in Fig. 4.4 to compensate for the nonideal characteristics of the plant and as a method for

    adaptive control. In this case, the signals(n) is sent at the output of the controller, and the signal

    x(n) is the signal measured at the output of the plant. The coefficients of the adaptive filter are

    then adjusted so that the cascade of the plant and adaptive filter can be nearly represented by the

    pure delay z-.

    Transmission and storing of high density digital information plays an important role in the

    present age of information technology. Digital information obtained from audio, video or text

    sources needs high density storage or transmission through communication channels.

    Communication channels and recording medium are often modeled as band-limited channel for

    which the channel impulse response is that of an ideal low pass filter. When sequences ofsymbols are transmitted recorded, the low pass filtering of the channel distorts the transmitted

    symbols over successive time intervals causing symbols to spread and overlap with adjacent

    symbols. This resulting linear distortion is known as inter symbol interference. In addition

    nonlinear distortion is also caused by cross talk in the channel and use of amplifiers. In the data

    storage channel, the binary data is stored in the form of tiny magnetized regions called bit cells,

    arranged along the recording track. At read back, noise and nonlinear distortions (ISI) corrupt

    the signal. An ANN based equalization technique has been proposed to alleviate the ISI present

    during read back from the magnetic storage channel. Recently, Sun et al [44] have reported an

    improved Vitoria detector to compensate the nonlinearities and media noise. Thus adaptive

    channel equalizers play an important role in recovering digital information from digital

    communication channels/storage media. Preparta had suggested a simple and attractive scheme

    for dispersal recovery of digital information based on the discrete Fourier transform.

    Subsequently Gibson et al have reported an efficient nonlinear ANN structure for reconstructing

    digital signal which has passed through a dispersive channel and corrupted with additive noise.

    In a recent publication the authors have proposed optimal preprocessing strategies for perfect

    reconstruction of binary signals from a dispersive communication channels. Tourietal have

    developed deterministic worst case framework for perfect reconstruction of discrete data

    transmission through a dispersive communication channel. In recent past, new adaptive

    equalizers have been suggested using soft computing tools such as artificial neural network

    [56]

  • 7/31/2019 Final Thesis Pk12

    57/97

    (ANN), polynomial perception network (PPN) and the functional link artificial neural network

    (FLANN). It is reported that these methods are best suited for nonlinear and complex channels.

    Recently, Chebyshev artificial neural network has also been proposed for nonlinear channel

    equalization [45]. The drawback of these methods is that the estimated weights may likely fall

    to local minima during training. For this reason genetic algorithm (GA) [46] and Differential

    evolution [19] has been suggested for training adaptive channel equalizers. The main attraction

    of GA lies in the fact that it does not rely on Newton like gradient-descent methods, and hence

    there is no need for calculation of derivatives. This makes them less likely to be trapped in local

    minima. But only two parameters of GA, the crossover and the mutation, help to avoid local

    minima problem.

    4.3.1 TRANSVERSAL EQUALIZER:

    The transversal equalizer uses a time-delay vector, Y (n) (4.7), of channel output samples to

    determine the symbol class. The {m} TE notation used to represent the transversal equalizer

    specifies m inputs. The equalizer filter output will be classified through a threshold activation

    device (Fig4.5) so that the equalizer decision will belong to one of the BPSK states u(n) {1,

    +1}

    Y (n) = [y (n), y (n 1)... y (n (m