final thesis pk12

7/31/2019 Final Thesis Pk12

1/97

CHAPTER-1

INTRODUCTION:

1.1 BACK GROUND:

Outof many applications of adaptive filtering, direct modeling and inverse modeling are very

important. The direct modeling or system identification finds applications in control system

engineering including robotics [1], intelligent sensor design [2], process control [3], powersystem engineering [4], image and speech processing [4], geophysics [5], acoustic noise and

vibration control [6] and biomedical engineering [7]. Similarly inverse modeling technique is

used in digital data reconstruction [8], channel equalization in digital communication [9], digital

magnetic data recording [10], and intelligent sensor [2], deconvolution of seismic data [11]. The

direct modeling mainly refers to adaptive identification of unknown plants. Simple static linear

plants are easily identified through parameter estimation using conventional derivative based

least mean square (LMS) type algorithms [12]. But most of the practical plants are dynamic,

nonlinear and combination of these two characteristics. In many applications Hammerstein and

MIMO plants need identification. In addition the output of the plant is associated with

measurement or additive white Gaussian noise (AWGN). Identification of such complex plants

is a difficult task and poses many challenging problems. Similarly inverse modeling of

telecommunication and magnetic medium channels are also important for reducing the effect of

inter symbol interference (ISI) and achieving faithful reconstruction of original data. Similarly

adaptive inverse modeling of sensors is required to extend their linearity's for direct digital

readout and enhancement of dynamic range. These two important and complex issues are

addressed in the thesis and attempts have been made to provide improved efficient and alternate

promising solutions.

[1]


2/97

The conventional LMS and recursive least square (RLS) [13] techniques work well for

identification of static plants but when the plants are of dynamic type, the existing forward-

backward LMS [14] and the RLS algorithms very often lead to non optimal solution due to

premature convergence of weights to local minima [15]. This is a major drawback of the use of

existing derivative based techniques. To alleviate this burning issue this thesis suggests the use

of derivative free optimization techniques in place of conventional techniques.

In recent past population based optimization techniques have been reported which fall under the

category of evolutionary computing [16] or computational intelligence [17]. These are also

called bio-inspired techniques which include genetic algorithm (GA) and its variants [18],

Differential Evolution [19]. These techniques are suitably employed to obtain efficient iterative

learning algorithms for developing adaptive direct and inverse models of complex plants and

channels.Development of direct and inverse adaptive models essentially consists of two components. The

first component is an adaptive network which may be linear or nonlinear in nature. Use of a

nonlinear network is preferable when nonlinear plants or channels are to be identified or

equalized. The linear networks used in the thesis are adaptive linear combiner or all-zero or FIR

structure [7]under nonlinear category GA and DE are used.

1.2 MOTIVATIONIn summary the main motivations of the research work carried in the present thesis are the

following:

i. To formulate the direct and inverse modeling problems as error square optimization

problems

ii. To introduce bio-inspired optimization tools such as GA and DE and their variants to

efficiently minimize the squared error cost function of the models. In other words todevelop alternate identification scheme.

iii. To achieve improved identification (direct modeling) of complex nonlinear and channel

equalization (inverse modeling) of nonlinear noisy digital channels by introducing new

and improved updating algorithms.

[2]


3/97

1.3 MAJOR CONTRIBUTION OF THE THESIS

The major contribution of the thesis is outlined below

i. The GA based approach for both linear and nonlinear system identifications are

introduced. The GA based approach is found to be more efficient for nonlinear system

than other standard derivative based learning. In addition the DE based identification

have been proposed and shown to have better performance and involve less

computational complexity.

ii. The GA based approach for linear and nonlinear channel equalizations are introduced.

The GA based approach is found to be more efficient than other standard derivative

based learning. In addition DE based equalizers have been proposed and shown to have

better performance and involve less computational complexity.

1.4 CHAPTER WISE CONTRIBUTION

The research work undertaken is embodied in 7 Chapters.

Chapter-1 gives an introduction to System identification, channel equalization and reviews

of various learning algorithm such as Least-mean-square (LMS) algorithm, Recursive-

least-square (RLS) algorithm, Artificial Neural Network (ANN), Genetic Algorithm (GA),

Differential Evolution (DE) used to identify the system and train to the equalizer. It also

includes the motivation behind undertaking the thesis work.

Chapter-2 Discusses about the general form of adaptive algorithm, Adaptive filtering

problem, derivative based algorithm such as LMS and overview of derivative free basedalgorithm such as Genetic Algorithm and Differential Evolution.

Chapter-3 Discusses various system identification technique, Develop the algorithm of GA

for simulation on system identification and taking a comparison study between LMS and

GA on both linear and nonlinear system.

[3]


4/97

Chapter-4 Discusses various channel equalization technique, Develop the algorithm of GA

for simulation on channel equalization and taking a comparison between LMS and GA on

both linear and nonlinear channel.

Chapter-5 Develop the algorithm of DE for simulation on system identification and taking

a comparison between LMS, GA and DE on both linear and nonlinear system.

Chapter-6 Develop the algorithm of DE for simulation on channel equalization and taking a

comparison between LMS, GA and DE on both linear and nonlinear channel equalizers.

Chapter-7 deals with the conclusion of the investigation made in the thesis. This chapter

also suggests some future research related to the topic.

[4]


5/97

CHAPTER-2GENETIC ALGORITHM AND

DIFFERENTIAL EVOLUTION

2.1 INTRODUCTION:

There are many learning algorithms which are employed to train various adaptive models. The

performance of these models depends on rate of convergence, training time, Computational

complexity involved and minimum mean square error achieved after training. The learning

algorithms may be broadly classified into two categories (a) derivative based (b) derivative free.

The derivative based algorithms include least means square (LMS), IIR LMS (ILMS), back

propagation (BP) and FLANN-LMS. Under the derivative free algorithms, genetic algorithm

(GA), differential evolution (DE), particle swarm optimization (PSO), bacterial foraging

optimization (BFO) and artificial immune system (AIS) have been employed. In this section the

details of LMS, GA and DE algorithms are outlined in sequel.

2.2 GRADIENT BASED ADAPTIVE ALGORITHIM:An adaptive algorithm is a procedure for adjusting the parameters of an adaptive filter to

minimize a cost function chosen for the task at hand. In this section, we describe the general

form of many adaptive FIR filtering algorithms and present a simple derivation of the LMS

adaptive algorithm. In our discussion, we only consider an adaptive FIR filter structure. Such

systems are currently more popular than adaptive IIR filters because

1. The input-output stability of the FIR filter structure is guaranteed for any set of fixed

coefficients, and2. The algorithms for adjusting the coefficients of FIR filters are simpler in general than

those for adjusting the coefficients of IIR filters.

[5]


6/97

2.2.1 GENERAL FORM OF ADAPTIVE FIR ALGORITHMS:

The general form of an adaptive FIR filtering algorithm is

W(n+1)=W(n) + (n)G(e(n),X(n),(n)) (2.1)

Where G(-) is a particular vector-valued nonlinear function, (n) is a step sizeparameter, e(n)

and X(n) are the error signal and input signal vector, respectively, and (n) is a vector of states

that store pertinent information about the characteristics of the input and error signals and/or the

coefficients at previous time instants. In the simplest algorithms, (n) is not used, and the only

information needed to adjust the coefficients at time n are the error signal, input signal vector,

and step size.

The step size is so called because it determines the magnitude of the change or step that is

taken by the algorithm in iteratively determining a useful coefficient vector. Much research

effort has been spent characterizing the role that (n) plays in the performance of adaptive

filters in terms of the statistical or frequency characteristics of the input and desired response

signals. Often, success or failure of an adaptive filtering application depends on how the value

of(n) is chosen or calculated to obtain the best performance from the adaptive filter.

2.2.2 THE MEAN-SQUARED ERROR COST FUNCTION:

The form ofG(-) depends on the cost function chosen for the given adaptive filtering task. We

now consider one particular cost function that yields a popular adaptive algorithm. Define the

mean-squared error(MSE) cost function as

( ) ( )21( ) ( ) ( )2mse nJ n e n p e n de n+

= (2.2)

= ( )212Ee n (2.3)

[6]


7/97

Wherepn(e) represents the probability density function of the error at time n. and E- is short

hand for the expectation integral on the right hand side (2.3).

The MSE cost function is useful for adaptive FIR filters because

Jmse(n) has a well-defined minimum with respect to the parameters in W(n)

The coefficient values obtained at this minimum are the ones that minimize the power in

the error signal e(n), indicating that y(n) has approached d(n) and

Jmse(n) a smooth function of each of the parameters in W(n), such that it is differentiable

with respect to each of the parameters in W(n).

The third point is important in that it enables us to determine both the optimum coefficient

values given knowledge of the statistics of d(n) and x(w) as well as a simple iterative procedure

for adjusting the parameters of an FIR filter.

2.2.3 THE WIENER SOLUTION:

For the FIR filter structure, the coefficient values in W(n) that minimizeJM SE (n) are well-

defined if the statistics of the input and desired response signals are known. The formulation of

this problem for continuous-time signals and the resulting solution was first derived by Wiener.

Hence, this optimum coefficient vector WM SE (n) is often called the Wiener solution to the

adaptive filtering problem. The extension of Wieners analysis to the discrete-time case is

attributed to Levinson . To determine WM SE (n) we note that the function JM SE (n) in is quadraticin the parameters{wi(n)}, and the function is also differentiable. Thus, we can use a result from

optimization theory that states that the derivatives of a smooth cost function with respect to each

of the parameters is zero at a minimizing point on the cost function error surface. Thus,WM SE

(n) can be found from the solution to the system of equation

( ) 0.0 ( ) ( )

( )

mse

i

J ni L

w n

=

(2.4)

Taking derivative ofJmse(n) in (2.3) and noting that e(n)=d(n) - y(n) and

( ) ( )1

0

( ) ( ) ( ) TL

ii

w n X ny n w n x n i

=

== respectively, we obtain

[7]


8/97

( ) ( ( ))( )( ) ( )

mse

i i

J n e nE e n

w n w n

= (2.5)

( )( ) ( )iy nEe nw n

= (2.6)

[ ( ) ( )]E e n x n i= (2.7)

1

0( ) ( ) ( ) ( ) ( )

L

jj

E d n x n i E x n i x n j w n

=

= (2.8)

Where we have used the dentitions of e(n) and of y(n) for the FIR filter structure in and

respectively to expand the last result in (2.8).

By defining the matrixRXX (n) and vectorPdx (n) as

( ) ( )TXX

R E X n X n = (2.9)

( ) ( ) ( )dx

P n E d n X n = (2.10)

respectively, we can combine the above equations to obtain the system of equations in vector

form as

( ) ( ) ( ) 0XX MSE dxR n W n P n = (2.11)Where 0 is the zero vector. Thus, so long as the matrix RXX (n) is invertible, the optimum

Wiener solution vector for this problem is

[8]


9/97

1( ) ( ) ( )XXMSE dx

W n R n P n= (2.12)

2.2.4THEMETHODOFSTEEPESTDESCENT:

The method of steepest descent is a celebrated optimization procedure for minimizing the value

of a cost function J(n) with respect to a set of adjustable pa-rameters W(n). This procedure

adjusts each parameter of the system according to

( )( 1) ( ) ( )( )i i i

J nw n w n n

w n

+ = (2.13)

In other words, the Ithparameter of the system is altered according to the derivative of the cost

function with respect to theIthparameter. Collecting these equations in vector form, we have

( )( 1) ( ) ( )( )

J nW n W n n

W n

+ = (2.14)

Where( )

( )

J n

W n

be the vector form of( )

( )i

J n

w n

For an FIR adaptive filter that minimizes the MSE cost function, we can use the result in toexplicitly give the form of the steepest descent procedure in this problem. Substituting these

results into yields the update equation for W(n) as

( )( 1) ( ) ( ) ( ) ( ) ( )XXdxW n W n n P n R n W n+ = + (2.15)

However, this steepest descent procedure depends on the statistical quantities E{d(n)x(n i)}

andE{x(n i)x(n j)} contained inPdx(n) andRxx(n) respectively. In practice, we only have

measurements of both d(n) and x(n) to be used within the adaptation procedure. While suitable

estimates of the statistical quantities needed for (2.15) could be determined from the signals x(n)

and d(n) we instead develop an approximate version of the method of steepest descent that

depends on the signal values themselves. This procedure is known as theLMSalgorithm.

[9]


10/97

2.2.5 THE LMS ALGORITHM:

The cost function J(n) chosen for the steepest descent algorithm of determines the coefficient

solution obtained by the adaptive filter. If the MSE cost function in is chosen, the resultingalgorithm depends on the statistics of x(n) and d(n) because of the expectation operation that

defines this cost function. Since we typically only have measurements of d(n) and of x(n)

available to us, we substitute an alternative cost function that depends only on these

measurements.

One such cost function is the least-squares cost function given by

( )2

0( ) ( ) ( ) ( ) ( )

n

TLMSk

j n k d k W n X k== (2.16)

Where ( )ka is a suitable weighting sequence for the terms within the summation. This cost

function, however, is complicated by the fact that it requires numerous computations to

calculate its value as well as its derivatives with respect to each W(n), although efficient

recursive methods for its minimization can be developed. Alternatively, we can propose the

simplified cost function JLM S(n ) Given by

21( ) ( )2LMS

J n e n= (2.17)

function can be thought of as an instantaneous estimate of the MSE cost function, as

JMSE(n)=EJLMS(n). Although it might not appear to be useful, the resulting algorithm obtained

when JLMS (n) is used for J(n) in (2.13) is extremely useful for practical applications. Taking

derivatives of JLMS (n) with respect to the elements of W(n) and substituting the result into(2.13), we obtain the LMS adaptive algorithm given by

( 1) ( ) ( ) ( ) ( )W n W n n e n X n+ = + (2.18)

[10]


11/97

Note that this algorithm is of the general form in. It also requires only multiplications and

additions to implement. In fact, the number and type of operations needed for the LMS

algorithm is nearly the same as that of the FIR filter structure with fixed coefficient values,

which is one of the reasons for the algorithms popularity. The behavior of the LMS algorithm

has been widely studied, and numerous results concerning its adaptation characteristics under

different situations have been developed. For now, we indicate its useful behavior by noting that

the solution obtained by the LMS algorithm near its convergent point is related to the Wiener

solution. In fact, analyses of the LMS algorithm under certain statistical assumptions about the

input and desired response signals show that

lim [ ( )] MSEn E W n W = (2.19)

When the Wiener solution WM SE (n) is a fixed vector. Moreover, the average behavior of the

LMS algorithm is quite similar to that of the steepest descent algorithm in that depends

explicitly on the statistics of the input and desired response signals. In effect, the iterative nature

of the LMS coefficient updates is a form of time-averaging that smoothes the errors in the

instantaneous gradient calculations to obtain a more reasonable estimate of the true gradient.

The problem is that gradient descent is a local optimization technique, which is limited because

it is unable to converge to the global optimum on a multimodal error surface if the algorithm is

not initialized in the basin of attraction of the global optimum. Several medications' exist for

gradient based algorithms in attempt to enable them to overcome local optima. One approach is

to simply add noise or a momentum term to the gradient computation of the gradient descent

algorithm to enable it to be more likely to escape from a local minimum. This approach is only

likely to be successful when the error surface is relatively smooth with minor local minima, or

some information can be inferred about the topology of the surface such that the additional

gradient parameters can be assigned accordingly. Other approaches attempt to transform the

error surface to eliminate or diminish the presence of local minima , which would ideally result

in a unimodal error surface. The problem with these approaches is that the resulting minimum

[11]


12/97

transformed error used to update the adaptive filter can be biased from the true minimum output

error and the algorithm may not be able to converge to the desired minimum error condition.

These algorithms also tend to be complex, slow to converge, and may not be guaranteed to

emerge from a local minimum. Some work has been done with regard to removing the bias of

equation error LMS and Steiglitz-McBride adaptive IIR filters, which add further complexity

with varying degrees of success. Another approach, attempts to locate the global optimum by

running several LMS algorithms in parallel, initialized with different initial coefficients. The

notion is that a larger, concurrent sampling of the error surface will increase the likelihood that

one process will be initialized in the global optimum valley. This technique does have potential,

but it is inefficient and may still suffer the fate of a standard gradient technique in that it will be

unable to locate the global optimum if none of the initial estimates is located in the basin of

attraction of the global optimum. By using a similar congregational scheme, but one in whichinformation is collectively exchanged between estimates and intelligent randomization is

introduced, structured stochastic algorithms are able to hill-climb out of local minima. This

enables the algorithms to achieve better, more consistent results using a fewer number of total

estimates. These types of algorithms provide the framework for the algorithms discussed in the

following sections.

2.3 DERIVATIVE FREE BASED ALGORITHIM:

Since the beginning of the nineteenth century, a significant evolution in optimization theory has

been noticed. Classical linear programming and traditional non-linear optimization techniques

such as Lagranges Multiplier, Bellmans principle and Pontyagrins principle were prevalent

until this century. Unfortunately, these derivative based optimization techniques can no longer

be used to determine the optima on rough non-linear surfaces. One solution to this problem has

already been put forward by the evolutionary algorithms research community. Genetic

algorithm (GA), enunciated by Holland, is one such popular algorithm. This chapter provides

recent algorithms for evolutionary optimization known as deferential evolution (DE). Thealgorithms are inspired by biological and sociological motivations and can take care of

optimality on rough, discontinuous and multimodal surfaces. The chapter explores several

schemes for controlling the convergence behaviors DE by a judicious selection of their

parameters. Special emphasis is given on the hybridizations DE algorithms with other soft

computing tools.

[12]


13/97

2.4 GENETIC ALGORITHM:

Genetic algorithms are a class of evolutionary computing techniques, which is a rapidly

growing area of artificial intelligence. Genetic algorithms are inspired by Darwin's theory ofevolution. Simply said, problems are solved by an evolutionary process resulting in a best

(fittest) solution (survivor) - in other words, the solution is evolved. Evolutionary computing

was introduced in the 1960s by Rechenberg in his work "Evolution strategies" (Evolutions

strategies'in original). His idea was then developed by other researchers. Genetic Algorithms

(GAs) were invented by John Holland and developed by him and his students and colleagues .

This led to Holland's book "Adaption in Natural and Artificial Systems" published in 1975.

The algorithm begins with a set of solutions (represented by chromosomes) called population.

Solutions from one population are taken and used to form a new population.

This is motivated by a hope, that the new population will be better than the old one. Solutions

which are then selected to form new solutions (offspring) are selected according to their fitness -

the more suitable they are, the more chances they have to reproduce. This is repeated until some

condition (for example number of populations or improvement of the best solution) is satisfied.

2.4.1 OUTLINE OF BASIC GA:

1. [Start] Generate random population of n chromosomes (suitable solutions for the

problem)

2. [Fitness] Evaluate the fitnessf(x) of each chromosomex in the population

3. [New population] Create a new population by repeating following steps until the new

population is complete

4. a [Selection] Select two parent chromosomes from a population according to their

fitness (the better fitness, the bigger chance to be selected)

5. [Replace] Use new generated population for a further run of the algorithm6. [Test] If the end condition is satisfied, stop, and return the best solution in current

7. population

8. [Loop] Go to step 2

9. The outline of the Basic GA provided above is very general. There are many parameters

[13]


14/97

and settings that can be implemented differently in various problems. Elitism is often

used as a method of selection. Which means, that at least one of a generation's best

solution is copied without changes to a new population, so the best solution can survive

to the succeeding generation

a. [Crossover] With a crossover probability cross over the parents to form new

offspring (children). If no crossover was performed, offspring is the exact copy of

parents.

b. [Mutation] With a mutation probability mutate new offspring at each locus (position

in chromosome).

c. [Accepting] Place new offspring in the new population

2.4.2OPERATORS OF GA:OVERVIEW:

The crossover and mutation are the most important parts of the genetic algorithm. The

performance is influenced mainly by these two operators.

ENCODING OF A CHROMOSOME:

A chromosome should in some way contain information about solution that it represents. The

most commonly used way of encoding is a binary string. A chromosome then could look like

this:Table-2.1 (Encoding of a chromosome)

Each chromosome is represented by a binary string. Each bit in the string can represent some

characteristics of the solution. There are many other ways of encoding. The encoding depends

mainly on the problem to be solved. For example, one can encode directly integer or real

numbers; sometimes it is useful to encode some permutations and so on.

CROSSOVER:

[14]

Chromosome 1 1101100100110110

Chromosome 2 1101111000011110


15/97

Crossover operates on selected genes from parent chromosomes and creates new offspring. The

simplest way how to do that is to choose randomly some crossover point and copy everything

before this point from the first parent and then copy everything after the crossover point from

the other parent. Crossover is illustrated in the following (| is the Crossover point)

Table-2.2 (crossover of Chromosome)

Chromosome 1 11011 00100110110

Chromosome 2 11011 11000011110

Chromosome 3 11011 11000011110

Chromosome 4 11011 00100110110

There are other ways how to make crossover, for example we can choose more crossover points.

MUTATION:

Mutation is intended to prevent falling of all solutions in the population into a local optimum of

the solved problem. Mutation operation randomly changes the offspring resulted fromcrossover. In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or

from 0 to 1. Mutation can be then illustrated as follows

Table-2.3(Mutation operation)

[15]


16/97

Original offspring 1 1101111000011110




The technique of mutation (as well as crossover) depends mainly on the encoding of

chromosomes. For example when we are encoding by permutations, mutation could be

performed as an exchange of two genes.

2.4.3 PARAMETERS OF GA:

There are two basic parameters of GA - crossover probability and mutation probability.

CROSSOVER PROBABILITY:

It indicates how often crossover will be performed. If there is no crossover, offspring are exactcopies of parents. If there is crossover, offspring are made from parts of both parent's

chromosome. If crossover probability is 100%, then all

offspring are made by crossover. If it is 0%, whole new generation is made from exact copies of

chromosomes from old population (but this does not mean that the new generation is the same!).

Crossover is made in hope that new chromosomes will contain good parts of old chromosomes

and therefore the new chromosomes will be better. However, it is good to leave some part of old

population survives to next generation.

MUTATION PROBABILITY:

This signifies how often parts of chromosome will be mutated. If there is no mutation, offspring

[16]


17/97

are generated immediately after crossover (or directly copied) without any change. If mutation

is performed, one or more parts of a chromosome are changed. If mutation probability is100%,

whole chromosome is changed, if it is 0%, nothing is changed. Mutation generally prevents the

GA from falling into local extremes. Mutation should not occur very often, because then GA

will in fact change to random search.

OTHER PARAMETERS:

There are also some other parameters of GA. One another particularly important parameter is

population size.

POPULATION SIZE:

It signifies how many chromosomes are present in population (in one generation). If there aretoo few chromosomes, then GA has few possibilities to perform crossover and only a small part

of search space is explored. On the other hand, if there are too many chromosomes, then GA

slows down.

SELECTION:

The chromosomes are selected from the population to be parents for crossover. The problem is

how to select these chromosomes. According to Darwin's theory of evolution, the best ones

survive to create new offspring. There are many methods in selecting the best chromosomes.

Examples are roulette wheel selection, Boltzmann selection, tournament selection, rank

selection, steady state selection and some others. In this thesis we have used the tournament

selection as it performs better than the others.

TOURNAMENT SELECTION:

A selection strategy in GA is simply a process that favors the selection of better individuals in

the population for the mating pool. There are two important issues in the evolution process of

genetic search, population diversity and selective pressure. Population diversity means that the

genes from the already discovered good individuals are exploited while promising the new areas

of the search space continue to be explored. Selective pressure is the degree to which the better

individuals are favored. The tournament selection strategy provides selective pressure by

[17]


18/97

holding a tournament competition among individuals.

2.5 DIFFERENTIAL EVALUATION:

The aim of optimization is to determine the best-suited solution to a problem under a given setof constraints. Several researchers over the decades have come up with different solutions to

linear and non-linear optimization problems. Mathematically an optimization problem involves

a fitness function describing the problem, under a set of constraints representing the solution

space for the problem. Unfortunately, most of the traditional optimization techniques are

centered around evaluating the first derivatives to locate the optima on a given constrained

surface. Because of the difficulties in evaluating the first Derivatives, to locate the optima for

many rough and discontinuous optimization surfaces, in recent times, several derivative free

optimization algorithms have emerged. The optimization problem, now-a-days, is represented as

an intelligent search problem, where one or more agents are employed to determine the optima

on a search landscape, representing the constrained surface for the optimization problem [20].

In the later quarter of the twentieth century, Holland pioneered a new concept on evolutionary

search algorithms, and came up with a solution to the so far open-ended problem to non-linear

optimization problems. Inspired by the natural adaptations of the biological species, Holland

echoed the Darwinian Theory through his most popular and well known algorithm, currently

known as genetic algorithms (GA) [21]. Holland and his coworkers including Goldberg and

Dejong popularized the theory of GA and demonstrated how biological crossovers and

mutations of chromosomes can be realized in the algorithm to improve the quality of the

solutions over successive iterations [22]. In mid 1990s Eberhart and Kennedy enunciated an

alternative solution to the complex non-linear optimization problem by emulating the collective

behavior of bird flocks, particles, the boids method of Craig Reynolds [23] and socio-cognition

and called their brainchild the particle swarm optimization (PSO)[23-27]. Around the same

time, Price and Storn took a serious attempt to replace the classical crossover and mutationoperators in GA by alternative operators, and consequently came up with a suitable deferential

operator to handle the problem. They proposed a new algorithm based on this operator, and

called it deferential evolution (DE) [28].

Both algorithms do not require any gradient information of the function to be optimized uses

only primitive mathematical operators and are conceptually very simple. They can be

[18]


19/97

implemented in any computer language very easily and requires minimal parameter tuning.

Algorithm performance does not deteriorate severely with the growth of the search space

dimensions as well. These issues perhaps have a great role in the popularity of the algorithms

within the domain of machine intelligence and cybernetics.

2.5.1 CLASSICAL DE:

Like any other evolutionary algorithm, DE also starts with a population ofNPD-dimensional

search variable vectors. We will represent subsequent generations in DE by discrete time steps

liket = 0, 1, 2. . . t, t+1, etc. Since the vectors are likely to be changed over different generations

we may adopt the following notation for representing the ith vector of the population at the

current generation (i.e., at timet = t) as

Xi(t)= [xi,1(t), xi,2(t), xi,3(t) . . . . . xi,D (t)] (2.20)

These vectors are referred in literature as genomes or chromosomes. DE is a very simple

evolutionary algorithm. For each search-variable, there may be a certain range within which

value of the parameter should lie for better search results. At the very beginning of a DE run or

at t= 0, problem parameters or independent variables are Initialized somewhere in their feasible

numerical range. Therefore, if the jth parameter of the given problem has its lower and upperbound as xLj andxUj respectively, then we may initialize the jth component of the ithpopulation

members as xi,j (0) =xLj + rand(0, 1) (xUj xLj),where rand (0,1) is a uniformly distributed

random number lying between 0 and 1. Now in each generation (or one iteration of the

algorithm) to change each population memberXi(t) (say), a Donor vectorVi(t) is created. It is

the method of creating this donor vector, which demarcates between the various DE schemes.

However, here we discuss one such specific mutation strategy known as DE/rand/1. In this

scheme, to create Vi(t) for each ith member, three other parameter vectors (say the r1, r2, and r3th

vectors) are chosen in a random fashion from the current population. Next, a scalar numberF

scales the deference of any two of the three vectors and the scaled deference is added to the

third one whence we obtain the donor vector Vi(t). We can express the process for thejth

component of each vector as

[19]


20/97

, 1, 2, 3, .( 1) ( ) .( ( ) ( ))..............i j r j r j r jV t x t F x t x t + = + (2.21)

The process is illustrated in Fig. 2. Closed curves in Fig. 2denote constant cost contours, i.e., for

a given cost function f, a contour corresponds to f (X) = constant. Here the constant cost

contours are drawn for the Ackley Function. Next, to increase the potential diversity of the

population a crossover scheme comes to play. DE can use two kinds of cross over schemes

namely Exponential and Binomial. The donor vector exchanges its body parts, i.e.,

components with the target vectorXi(t) under this scheme. In Exponential crossover, we first

choose an integern randomly among the numbers [0, D1]. This integer acts as starting point in

the target vector, from where the crossover or exchange of components with the donor vector

starts. We also choose another integer L from the interval [1, D]. L denotes the number of

components; the donor vector actually contributes to the target. After a choice ofn and L thetrial vector

,1 ,2 ,( ) [ ( ), ( ),....... ( )]i i i i DU t u t u t u t = (2.22)

is formed with , ,( ) ( )i j i ju t v t = for j= < n > D, < n+1 > D,..,< n L+1 >D

= ( )ijx t (2.23)

Where the angular brackets D denote a modulo function with modulus D. The integer L is

drawn from [1, D] according to the following pseudo code.

[20]


21/97

Fig. 1.1. Illustrating creation of the donor vector in 2-D parameter space (The

constant cost contours are for two-dimensional Ackley Function)

L=0;

Do

{

L=L+1;

}

While (rand (0, 1) < CR) AND (L m) = (CR)m1 for any m > 0. CR is called Crossover constant

and it appears as a control parameter of DE just likeF. For each donor vectorV, a new set ofn

and L must be chosen randomly as shown above. However, in Binomial crossover scheme,

the crossover is performed on each of the D variables whenever a randomly picked numberbetween 0 and 1 is within the CR value. The scheme may be outlined as

ui,j (t) =vi,j (t) if rand (0, 1) < CR,

[21]


22/97

=xi,j (t) else. (2.26)

In this way for each trial vectorXi(t) an offspring vectorUi(t) is created. To keep the population

size constant over subsequent generations, the next step of the algorithm calls for selection to

determine which one of the target vector and the trial vector will survive in the next generation,

i.e., at time t= t+ 1. DE actually involves the Darwinian principle of Survival of the fittest in

its selection process which may be outlined as

Xi(t+ 1) =Ui(t) iff(Ui(t)) f(Xi(t)),

= Xi(t) iff(Ui(t)) f(Xi(t)) (2.27)

Where f () is the function to be minimized. So if the new trial vector yields a better value of the

fitness function, it replaces its target in the next generation; otherwise the target vector is

retained in the population. Hence the population either gets better (w.r.t. the fitness function) or

remains constant but never deteriorates. The DE/rand/1 algorithm is outlined below

2.5.2 PROCEDURE:

Input: Randomly initialized position and velocity of the particles: xi(0)

Output: Position of the approximate global optima X

Begin

Initialize population;

Evaluate fitness;

For i = 0 to max-iteration do

Begin

Create Difference-Offspring;

Evaluate fitness;

If an offspring is better than its parent

Then replace the parent by offspring in the next generation;

End If;

End For;

End.

[22]


23/97

2.5.3 THE COMPLETE DE FAMILY:

Actually, it is the process of mutation, which demarcates one DE scheme from another. In the

former section, we have illustrated the basic steps of a simple DE. The mutation scheme in

(2.21) uses a randomly selected vectorXr1 and only one weighted difference vectorF (Xr2

Xr3) is used to perturb it. Hence, in literature the particular mutation scheme is referred to as

DE/rand/1. We can now have an idea of how different DE schemes are named. The general

convention used, is DE/x/y. DE stands for DE, x represents a string denoting the type of the

vector to be perturbed (whether it is randomly selected or it is the best vector in the population

with respect to fitness value) and y is the

number of difference vectors considered for perturbation ofx. Below we outline the other four

different mutation schemes, suggested by Price et al.

SCHEME DE/RAND TO BEST/1

DE/rand to best/1 follows the same procedure as that of the simple DE scheme illustrated

earlier. The only difference being that, now the donor vector, used to perturb each population

member, is created using any two randomly selected member of the population as well as the

best vector of the current generation (i.e., the vector yielding best suited objective function

value at t= t). This can be expressed for the ith donor vector at time t= t+ 1 as

Vi(t+ 1) = Xi(t) + (Xbest(t) Xi(t)) + F (Xr2 (t) Xr3(t)) (2.28)

Where is another control parameter of DE in [0, 2], Xi(t) is the target vector and Xbest(t) is the

best member of the population regarding fitness at current time step t= t. To reduce the number

of control parameters a usual choice is to put = F

SCHEME DE/BEST/1

In this scheme everything is identical to DE/rand/1 except the fact that the

trial vector is formed as

Vi(t+ 1) = Xbest(t) + F (Xr1(t) Xr2(t)) (2.29)

[23]


24/97

here the vector to be perturbed is the best vector of the current population and the perturbation is

caused by using a single difference vector.

SCHEME DE/BEST/2

Under this method, the donor vector is formed by using two difference vectors as shown below:

Vi(t+ 1) = Xbest(t) + F (Xr1(t) + Xr2(t) Xr3(t) Xr4(t)) (2.30)

Owing to the central limit theorem the random variations in the parameter vector seems to shift

slightly into the Gaussian direction which seems to be beneficial for many functions.

SCHEME DE/RAND/2

Here the vector to be perturbed is selected randomly and two weighted difference vectors are

added to the same to produce the donor vector. Thus for each target vector, a totality of five

other distinct vectors are selected from the rest of the population. The process can be expressed

in the form of an equation as

Vi(t

+ 1) =Xr

1(t) +

F1

(Xr

2(t)

Xr3(

t)) +

F2

(Xr

4(t)

X(t)) (2.31)

Here F1 and F2 are two weighing factors selected in the range from 0 to 1. To reduce the

number of parameters we may choose F1 = F2 = F.

SUMMARY OF ALL SCHEMES:

In 2001 Storn and Price [21] suggested total ten different working strategies of DE and some

guidelines in applying these strategies to any given problem. These strategies were derived from

the five different DE mutation schemes outlined above. Each mutation strategy was combined

with either the exponential type crossover or the binomial type crossover. This yielded 5

2 = 10 DE strategies, which are listed below.

DE/best/1/exp

[24]


25/97

DE/rand/1/exp

DE/rand-to-best/1/exp

DE/best/2/exp

DE/rand/2/exp

DE/best/1/bin

DE/rand/1/bin

DE/rand-to-best/1/bin

DE/best/2/bin

DE/rand/2/

The general convention used above is again DE/x/y/z, where DE stands for DE, x represents a

string denoting the vector to be perturbed, y is the number of difference vectors considered forperturbation of x, and z stands for the type of crossover being used (exp: exponential; bin:

binomial)

2.5.4 MORE RECENT VARIANTS OF DE:

DE is a stochastic, population-based, evolutionary search algorithm. The strength of the

algorithm lies in its simplicity, speed (how fast an algorithm can find the optimal or suboptimal

points of the search space) and robustness (producing nearly same results over repeated runs).The rate of convergence of DE as well as its accuracy can be improved largely by applying

different mutation and selection strategies. A judicious control of the two key parameters

namely the scale factor F and the crossover rate CR can considerably alter the performance of

DE. In what follows we will illustrate some recent medications in DE to make it suitable for

tackling the most difficult optimization problems.

DE WITH TRIGONOMETRIC MUTATION:Recently, Lampinen and Fan [29] has proposed a trigonometric mutation operator for DE to

speed up its performance. To implement the scheme, for each target vector, three distinct

vectors are randomly selected from the DE population. Suppose for the ith target vectorXi(t),

the selected population members are Xr1(t), Xr2(t) and Xr3(t). The indices r1, r2 and r3 are mutually

different and selected from [1, 2. . . N] Where N denotes the population size. Suppose the

[25]


26/97

objective function values of these three vectors are given by, f(Xr1(t)), f(Xr2(t)) and f(Xr3(t)).

Now three weighing coefficients are formed according to the following equations:

p = f (Xr1) + f (Xr2) + f (Xr3) (2.32)

p1 = f (Xr1) p (2.33)

p2 = f (Xr2) p (2.34)

p3 = f (Xr3) p (2.35)

Let rand (0, 1) be a uniformly distributed random number in (0, 1) and be the trigonometric

mutation rate in the same interval (0, 1). The trigonometric mutation scheme may now beexpressed as

Vi(t+ 1) = (Xr1 + Xr2 + Xr3)/3 + (p2 p1) (Xr1 Xr2)

+ (p3 p2) (Xr2 Xr3) + (p1 p3) (Xr3 Xr1)

if rand (0, 1) < (2.36)

Vi(t+ 1) = Xr1 + F (Xr2 + Xr3) else (2.37)

Thus, we find that the scheme proposed by Lampinen et al. uses trigonometric mutation with a

probability of and the mutation scheme of DE/rand/1 with a probability of (1 ).

DERANDSF (DE WITH RANDOM SCALE FACTOR)

In the original DE [28] the deference vector (Xr1(t) Xr2(t)) is scaled by a constant factor F.

The usual choice for this control parameter is a number between 0.4 and 1. We propose to vary

this scale factor in a random manner in the range (0.5, 1) by using the relation

F = 0.5 (1 + rand (0, 1)) (2.38)

[26]


27/97

where rand (0, 1) is a uniformly distributed random number within the range [0, 1]. We call this

scheme DERANDSF (DE with Random Scale Factor) . The mean value of the scale factor is

0.75. This allows for stochastic variations in the amplification of the difference vector and thus

helps retain population diversity as the search progresses. Even when the tips of most of the

population vectors point to locations clustered near a local optimum due to the randomly scaled

difference vector, a new trial vector has fair chances of pointing at an even better location on the

multimodal functional surface. Therefore, the

fitness of the best vector in a population is much less likely to get stagnant until a truly global

optimum is reached.

DETVSF (DE WITH TIME VARYING SCALE FACTOR)

In most population-based optimization methods (except perhaps some hybrid global-local

methods) it is generally believed to be a good idea to encourage

Fig. 1.2. Illustrating DETVSF scheme on two-dimensional cost contours of Ackley

Function

the individuals (here, the tips of the trial vectors) to sample diverse zones of the search spaceduring the early stages of the search. During the later stages it is important to adjust the

movements of trial solutions finely so that they can explore the interior of a relatively small

space in which the suspected global optimum lies. To meet this objective we reduce the value of

the scale factor linearly with time from a (predetermined) maximum to a (predetermined)

[27]


28/97

minimum value:

R = (Rmax Rmin)(MAXIT iter)/MAXIT (2.39)

where Fmax and Fmin are the maximum and minimum values of scale factor F, iteris the current

iteration number and MAXITis the maximum number of allowable iterations. The locus of the

tip of the best vector in the population under this scheme may be illustrated as in Fig. 2. The

resulting algorithm is referred as DETVSF (DE with a time varying scale factor).

DE WITH LOCAL NEIGHBORHOOD:

Only in 2006, a new DE-variant, based on the neighborhood topology of the parameter vectors

was developed [30] to overcome some of the disadvantages of the classical DE versions. Theauthors in proposed a neighborhood-based local mutation operator that draws inspiration from

PSO. Suppose we have a DE population P= [X1, X2. . . XNp ] where each Xi (i = 1, 2. . . Np) is a

D-dimensional vector. Now for every vectorXi we define a neighborhood of radius k, consisting

of vectors Xik . . . Xi . . .Xi+k. We assume the vectors to be organized in a circular fashion such

that two immediate neighbors of vectorX1 are XNp and X2. For each member of the population

a local mutation is created by employing the fittest vector in the neighborhood of the model may

be expressed as:

Li(t)=Xi(t)+ (Xnbest(t) Xi(t)) + F (Xp(t) Xq (t)) (2.40)

where the subscript nbestindicates the best vector in the neighborhood ofX i and p, q (i k,

i + k). Apart from this, we also use a global mutation expressed as:

Gi(t) =Xi(t) + (Xbest(t) Xi(t)) + F (Xr(t) Xs(t)) (2.41)

where the subscript best indicates the best vector in the entire population, and r, s (1, NP).

Global mutation encourages exploitation, since all members (vectors) of a population are biased

by the same individual (the population best); local mutation, in contrast, favors exploration,

since in general different members of the population are likely to be biased by different

individuals. Now we combine these two models using a time-varying scalar weight w (0, 1)

[28]


29/97

to form the actual mutation of the new DE as a weighted mean of the local and the global

components:

Vi(t) = w Gi(t) + (1 w) Li(t). (2.42)

The weight factor varies linearly with time as follows:

w = wmin + (wmax wmin) iter (2.43)

Where iter is the current iteration number, MAXIT is the maximum number of iterations

allowed and wmax, wmin denotes, respectively, the maximum and minimum value of the weight,

with wmax, wmin (0, 1). Thus the algorithm starts at iter = 0 with w = wmin but as iterincreases towards MAXIT, w increases gradually and ultimately when iter = MAXIT w reaches

wmax. Therefore at the beginning, emphasis is laid on the local mutation scheme, but with time,

contribution from the global model increases. In the local model attraction towards a single

point of the search space is reduced, helping DE avoid local optima. This feature is essential at

the beginning of the search process when the candidate vectors are expected to explore the

search space vigorously. Clearly, a judicious choice of wmax and wmin is necessary to strike a

balance between the exploration and exploitation abilities of the algorithm. After someexperimenting, it was found that wmax = 0.8 and wmin = 0.4 seem to improve the performance

of the algorithm over a number of benchmark function

[29]


30/97

CHAPTER -3

ADAPTIVE SYSTEM IDENTIFICATION

USING GA

3.1 INTRODUCTION:

Generally the identification of linear system is performed by using LMS algorithm. But most of

the dynamic systems exhibit nonlinearity. The LMS based technique [31] does not perform

satisfactory to identify nonlinear system. To improve the identification performance of

nonlinear systems various techniques such as Artificial Neural Network (ANN) [32], Functional

Link Artificial Neural Network (FLANN) [33], Radial Basis Function (RBF) [34], etc.

In this chapter we propose a novel adaptive model based on GA technique for identification of

nonlinear systems. To apply GAs in systems identification, each individual in the population

must represent a model of the plant and the objective becomes a quality measure of the model,

by evaluating its capacity of predicting the evolution of the measured outputs. The measured

output predictions, inherent to each individual i, is compared with the measurements made on

the real plant. The obtained error is a function of the individuals quality. As less is this error, as

more performing the individual is. There are many ways in which the GAs can be used to solve

system identification tasks.

3.2. BASIC PRINCIPLE OF ADAPTIVE SYSTEM

IDENTIFICATION:

An adaptive filter can be used in modeling that is, imitating the behavior of physical dynamic

systems which may be regarded as unknown black boxes having one or more inputs andoutputs. Modeling a single input, single output dynamic system is shown in fig(3).Noise is taken

into consideration because in many practical cases the system to be modeled is noisy, that is,

has internal random disturbing forces. Internal system noise appears at the system output and is

commonly represented there as an additive noise. This noise is generally uncorrelated with the

plant input. If this is the case and if the adaptive model is an adaptive linear combiner whose

[30]


31/97

weights are adjusted to minimize mean square error, it can be shown that the least squares

solution will be unaffected by the presence of plant noise. This is not to say that the

convergence of the adaptive process will be unaffected by system noise, only that the expected

weight vector of the adaptive model after convergence will be unaffected. The least square

solution will be determined primarily by the impulse response of the system to be modeled. It

could also be significantly affected by the statistical or spectral character of the system input

signal.

Fig.3.1 Modeling the single input, single output System..

The problem of determining a mathematical model for an unknown system by observing its

input-output data is known as system identification, Which is performed by suitably

adjusting the parameters within a given model, such that for a particular input, the model output

matches with the corresponding actual system output .After a system is identified, the output

can be predicted for a given input to the system which is the goal of system identification

problem. When the plant behavior is completely unknown it may be characterized using certain

adaptive model and then its identification task is carried out using adaptive algorithms like the

[31]

Adaptive model

Unknown System

Adaptive Algorithm

+

-

x

noise

e

y


32/97

LMS. The system identification task is at the heart of numerous adaptive filtering applications.

We list several of these applications here.

Channel Identification

Plant Identification.

Echo Cancellation for long distance transmission.

Acoustic Echo Cancellation

Adaptive Noise Cancellation.

Fig .4 represents a schematic diagram of system identification of time invariant, causal discrete

time dynamic plant The output of the plant is given by y = p(x) where x is the input which is

uniformly bounded function of time .the operator p describes the dynamic plant . The objective

of identification problem is to construct model generating an output which approximate the

plant output y when subjected to the same input x so that the squared error(e2) is minimum .

Fig.3.2 schematic block diagram of a GA based adaptive identification system

In this chapter the modeling is done in an adaptive manner such that after training the model

iteratively y and become almost equal and the squared error becomes almost zero. The

minimization of error in an iterative manner is usually achieved by LMS or RLS methods which

[32]

y

-

x

noise

e+

Adaptive model

System P(x)

GA Based AdaptiveAlgorithm


33/97

are basically derivative based. The shortcoming of this method is that for certain type of plant

the squared error cannot be optimally minimized due to error surface falling to local minima. In

this chapter we propose a novel and elegant method which employs Genetic algorithm for

minimizing the squared error in a derivative free manner. In essence, in this chapter the system

identification problem is viewed as a squared error minimization problem.

The adaptive modeling constitutes two step. In the first step the model is trained using GA

based updating technique. After successful training of the model system performance is carried

out by feeding zero mean uniformly distributed random input. Before we proceed to the

identification task using GA let us discuss the basics of GA based optimization.

3.3. DEVELOPMENT OF GA BASED ALGORITHEM

FOR SYSTEM IDENTIFICATION:

Referring to Fig.3.2 let the system p(x) be an FIR system represent by the transfer function

given by

p(z)=a0+a1z-1+a2z-2+a3z-3+.+anz-n (3.1)

Where a0, a1, a2 an represent the impulse response (parameter) of the system . The

measurement noise of the system is given by n(k) which is assumed to be white and Gaussian

distributed . The input x is also uniformly distributed white noise lying between -23 to +23

and have a variance of unity. The GA based model consists of an equal order FIR system with

unknown coefficients. The purpose of the adaptive identification model is to estimate the

unknown coefficients 0,1,2,...n such that they match with the corresponding

parameters a0, a1,a2 ,an of the actual system p(z) . if the system is exactly

identified(theoretically) then in case of a linear system (for example the FIR system ) the system

parameters and the model parameters become equal i.e. a0=0, a1=1, a2=2an=n.

Also the response of actual system(y) coincides with the response of the model system ().

However, in case of nonlinear dynamic system the system parameters do not match but the

responses of the system will match.

The updating of the parameters of the model is carried out using GA rule as outlined in the

following steps

[33]


34/97

I. As shown in fig.3.2 an unknown static dynamic system to be identified is connected is

parallel with an adaptive model to be developed using GA.

II. The coefficients () of the system are initially chosen from population of M

chromosomes. Each chromosome constitutes NL number of random binary bits, eachsequential group of L-bits represent one coefficient of the adaptive model, where N is

the number of parameters of the model.

III. Generate k(=500) number of input signal samples each of which is having zero mean

and uniformly distributed between -23 to +23 and having a variance of unity.

IV. Each of the input samples is passed through the plant P(Z) and the contaminated with

the additive noise of known strength .The resultant signal acts like the desired signal . in

this way k number of desired signals are produced by feeding all the k input samples.

V. Each of the input sample is also passed through the model using each chromosome as

model parameters and M sets of K estimated output are obtained.

VI. Each of the desired output is compared with corresponding estimated output and K

errors are produced. The mean square error (MSE) for set of parameters (corresponding

to mth chromosome) is determined by using relation.

1

2

( )

k

i

i

MSEn

k

e==

(3.2)

This is repeated for M times

VII. Since the objective is to minimize MSE(m),m=1 to M the GA based optimization isused.

VIII. The tournament selection, crossover, mutation and selection operator are sequentially

carried out following the steps as given in section-3.3.

[34]


35/97

IX. In each generation the minimum MSE, MMSE is obtained and plotted against

generation to show the learning characteristics.

X. The learning process is stopped when MMSE reaches the minimum level.

XI. At this step all the chromosomes attend almost identical genes, which represent the

estimated parameters of the developed model.

3. 4. SIMULATION STUDIES:

To demonstrate the performance of the proposed GA based approach numerous simulation

studied are carried out of several linear and non linear system. The performance of the proposedstructure is compared with corresponding LMS structure.

The block diagram shown in the Fig.3.2is used for simulation study

Case-1 (Linear System)

A unit variance random system uniform signal lying in the range of -2 3 to +23

is applied to known the system having transfer function.Experiment-1: H (z) =0.2090+ 0.9950Z-1 + 0.2090 Z-2 and

Experiment-2: H (z) =0.2600 + 0.9300 Z-1 + 0.2600 Z-2

The output of the system is contaminated with white Gaussian noise of different strengths of -20

db and -30db. The resultant signal y is used as the desired on the training signal. The same

random input is also applied to the GA based adaptive model having the same linear combinerstructure as that of H (z) but with random initial weights. The coefficients or weights of the

linear combiner are updated using LMS algorithm as well as the proposed GA based algorithm.

The training become complete when MSE in dB become parallel to x- axis. Under this

condition, for a linear system, the parameter ais match with the corresponding estimated

parameteris from the proposed system.

[35]


36/97

In Table -3.1 we represent actual and estimated parameter of a 3-tap linear combiner obtained

by the LMS as well as GA models. From this table it is observed that the GA based model

performs better than that of LMS based models under different noise conditions.

ExperimentActual

Parameter

Estimated parametersLMS Based GA Based

NSR = -30

dB

NSR = -20

dB

NSR = -30

dB

NSR = -20

dB

010.2090 0.2092 0.2064 0.2100 0.20610.9950 0.9941 1.0094 0.9943 0.99850.2090 0.2071 0.2153 0.2077 0.2077

020.2600 0.2631 0.2705 0.2582 0.25660.9300 0.9308 0.9289 0.9301 0.93420.2600 0.2563 0.2624 0.2598 0.2598

[36]

Table-3.1 comparison of actual and estimated parameters of LMS and GA based models


37/97

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

-25

-20

-15

-10

-5

0

MSE

IN

dB

NUMBER OF ITERATIONS(SAMPLES)

CH:[0.2090,0.9950,0.2090],NL=0

NSR=-20dB

NSR=-30dB

Fig.3.3 Learning Chacteristics of LMS based Linear System Identification (Experiment-1)

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

-25

-20

-15

-10

-5

0

MSE

IN

dB


CH:[0.2600,0.9300,0.2600],NL=0

NSR=-30dB

NSR=-20dB

Fig.3.4 Learning Chacteristics of LMS based Linear System Identification (Experiment-2)

[37]


38/97

0 10 20 30 40 50 60 70 80 90 100-35

-30

-25

-20

-15

-10

-5

0

Generation

Mean

squareerrorindB

CH:[0.2090,0.9950,0.2090],NL=0

NSR=-30dB

NSR=-20dB

Fig.3.5 Learning Characteristics of GA based Linear System Identification (Experiment-1)

0 10 20 30 40 50 60 70 80 90 100-35

-30

-25

-20

-15

-10

-5

0

Generation

Mean

squareerrorindB

CH:[0.2600,0.9300,0.2600],NL=0

NSR=-20dB

NSR=-30dB

Fig.3.6 Learning Characteristics of GA based Linear System Identification (Experiment-2)

[38]


39/97

Case-2 (Non-Linear System)

In this simulation the actual is assume to be non linear in nature .Computer simulation result of

two different nonlinear system are presented in this case the actual system

Experiment -3: yn (k) = tanh{y (k)}

Experiment -4: yn (k) = y (k) + 0.2y2 (k) 0.1y3 (k)

Where y (k) is the output of the linear system and yn (k) is the output of nonlinear system .

In case of nonlinear system the parameter of two system do not match ,however the responses of

the actual and adaptive model match .To demonstrate this observation training carried out using

both LMS and GA based algorithm .

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

-25

-20

-15

-10

-5

0

MSEIN

dB


CH:[0.2090,0.9950,0.2090],NL:y=tanh(y)

NSR=20dB

NSR=-30dB

Fig.3.7 Learning Chacteristics of LMS based Non Linear System Identification

(Experiment-3)

[39]


40/97

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-25

-20

-15

-10

-5

0

MSE

IN

dB


CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y2)-0.1(y3)

NSR=-30dB

NSR=-20dB


(Experiment-4)

0 100 200 300 400 500 600-30

-25

-20

-15

-10

-5

0

Generation

Mean

squareerrorindB

CH:[0.2090,0.9950,0.2090],NL:y=tanhy

NSR=-30dBNSR=-20db

Fig.3.9 Learning Chacteristics of GA based Non Linear System Identification (Experiment-3)

[40]


41/97

0 100 200 300 400 500 600-25

-20

-15

-10

-5

0

Generation

Mean

squareerrorindB

CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y.2)=0.1*(y.3)

NSR=-30dB

NSR=-20dB

Fig.3.10 Learning Chacteristics of GA based Non Linear System Identification

(Experiment-4)

0 5 10 15 20 25 30 35 40 45 50-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

output

CH:[0.2090,0.9950,0.2090],NL:y=tanhy

Actual

GA

LMS

Fig.3.11 Comparision of Output response of (Experiment-3) at -30dBNSR.

[41]


42/97

0 5 10 15 20 25 30 35 40 45 50-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

output

CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y.2)-0.1*(y.3)

Actual

GA

LMS

Fig.3.12 Comparison of Output response of (Experiment-4) at -30dBNSR.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30

-25

-20

-15

-10

-5

0

MSE

IN

dB


CH:[0.2600,0.9300,0.2600],NL:y=tanh(y)

NSR=-20dB

NSR=-30dB


(Experiment-3)

[42]


43/97

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-25

-20

-15

-10

-5

0

MSE

IN

dB


CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y.2)-0.1*(y.3)

NSR=-30dB

NSR=-20dB


(Experiment-4)

0 100 200 300 400 500 600-30

-25

-20

-15

-10

-5

0

Generation

Mean

squareerrorindB

CH:[0.2600,0.9300,0.2600],Nl;y=tanhy

NSR=-30dB

NSR=-20dB


(Experiment-3)

[43]


44/97

0 100 200 300 400 500 600-25

-20

-15

-10

-5

0

Generation

Mean

squareerrorindB

CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y. 2)-0.1*(y.3)

NSR=-30dB

NSR=-20dB


(Experiment-4)

0 5 10 15 20 25 30 35 40 45 50-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

output

CH:[0.2600,0.9300,0.2600],NL:y=tanhy

Actual

GALMS

Fig.3.17 Comparison of Output response of (Experiment-3) at -30dBNSR.

[44]


45/97

0 5 10 15 20 25 30 35 40 45 50-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

output

CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y.2)-0.1*(y.3)

Actual

GA

LMS

Fig.3.18 Comparison of Output response of (Experiment-4) at -30dBNSR

The MSE plots of experiment-3 and experiment-4 followed by experiment-1 for two differentnoise conditions for LMS based algorithm are obtained by simulation and shown in Fig.3.7

&3.8 respectively .The corresponding plots for the same system for GA based model are shown

in Fig.3.9 &3.10 respectively. The comparison of output responses of the two nonlinear models

using LMS and GA techniques are shown in Fig.3.11 &3.12 respectively. Similarly the MSE

plots of experiment-3 and experiment-4 followed by experiment-2 for two different noise

conditions for LMS based algorithm are obtained by simulation and shown in Fig.3.13 &3.14

respectively .The corresponding plots for the same system for GA based model are shown inFig.3.15 &3.16 respectively. The comparison of output responses of the two nonlinear models

using LMS and GA techniques are shown in Fig.3.17 &3.18 respectively. Similar results are

also observed in case of other non linear models and under various noise conditions.

[45]


46/97

3.5. RESULTS AND DISCUSSIONS:

Table-1 reveals that for FIR linear system the coefficients of adaptive model using LMS arematched closely with the coefficients of actual system in comparison with GA.Hence for linear

FIR system LMS works well.

For nonlinear system the learning characteristics of LMS technique is poor (Fig.9) for both

noise cases. But the same is much improved in case of GA (Fig.11).

The output response of nonlinear system (Experiment-3) of GA is better than the LMS counter

part because of GA is closer to the desired response (Fig.13).

[46]


47/97

CHAPTER-4

ADAPTIVE CHANNEL EQUALIZATION

USING GENETIC ALGORITHM.

4.1 INTRODUCTION:

The digital communication system suffers from the problem of ISI which essentially deteoriates

the accuracy of reception. The probability of error at the receiver can be minimized and can be

reduced to an acceptable level by introducing an equalizer at the front end of the receiver. An

adaptive digital channel equalizer is essentially an inverse system of the channel model which

primarily compacts the effect of ISI. Conventially the LMS algorithm is employed to design and

develop adaptive equalizers [35]. Such equalizers use gradient based weight update algorithm

and therefore there is a possibility that during training of the equalizers its weight do not attain

to their optimal values due to the MSE being trapped to local minimum. On the other hand the

GA and DE are derivative free technique and hence the local minima problem does not arise

during weight updates. The present chapter has developed a novel GA based adaptive channel

equalizer.

4.2 BASIC PRINCIPLE OF CHANNEL EQUALIZATION:

In an ideal communication channel, the received information is identical to that transmitted.

However, this is not the case for real communication channels, where signal distortions take

place. A channel can interfere with the transmitted data through three types of distorting effects.

Power degradation and fades, multi-path time dispersions and background thermal noise [36].

Equalization is the process of recovering the data sequence from the corrupted channel samples.

A typical baseband transmission system is depicted in Fig.4.1, where an equalizer is

incorporated within the receiver.

[47]


48/97

Fig. 4.1. A Baseband Communication System

4.2.1 MULTIPATH PROPAGATION:

Within telecommunication channels multiple paths of propagation commonly occur. In practical

terms this is equivalent to transmitting the same signal through a number of separate channels,

each having a different attenuation and delay. Consider an open-air radio transmission channel

that has three propagation paths, as illustrated in Fig4.2. These could be direct, earth bound and

sky bound.Multipath interference between consecutively transmitted signals will take place if one signal is

received whilst the previous signal is still being detected. In Fig4.1 this would occur if the

symbol transmission rate is greater than 1/ where, represents transmission delay. Because

bandwidth efficiency leads to high data rates, multi-path interference commonly occurs.

[48]

InputOut putTransmitter

FilterChannelMedium

+ ReceiverFilter EQUALISER

noise


49/97

Fig.4.2 Impulse Response of a transmitted signal in a channel which has 3

modes of propagation, (a) The signal transmitted paths, (b) The received samples

4.2.2MINIMUM & NON MINIMUM PHASE CHANNELS:

When all the roots of the H(Z) lie within the unit circle, the channel are termed minimum phase.

The inverse of a minimum phase [37] channel is convergent, illustrated by (4.1)

[49]

Sky Bound

Direct

Earth Bound

Transmitter Receiver

Multiple Transmission Paths

(a)

Signal Strengthat Receiver

Direct

Earth Bound

Sky Bound

(b)


50/97

111.0 0.5( )

( ) 111.0 0.5

1( )

201 2 31 0.5 0.25 0.125 ..............

ZH z

H z

z

i iZ

i

Z Z Z

- += - + - - = - - - - + - +

(4.1)

Whereas the inverse of non-minimum phase channels are not convergent, as shown in (4.2).

111.0 0.5( )

( )

1.0 0.5

1.[ ( ) ]

202 3.[1 0.5 0.25 0.125 ]

ZH z

H zZ

Z

i iZ Zi

Z Z Z Z

- += + - - = - + -

(4.2)

Since equalizers are designed to invert the channel distortion process they will in effect model

the channel inverse. The minimum phase channel has a linear inverse model therefore a linear

equalization solution exists. However, limiting the inverse model to m-dimensions will

[50]


51/97

approximate the solution and it has been shown that non-linear solutions can provide a superior

inverse model in the same dimension.

A linear inverse of a non-minimum phase channel does not exist without incorporating time

delays. A time delay creates a convergent series for a non-minimum phase model, where longer

delays are necessary to provide a reasonable equalizer. (4.3) describes a non-minimum phase

channel with a single delay inverse and a four sample delay inverse. The latter of these is the

more suitable form for a linear filter.

11 10.5 1.0( )

( )1

1 0.512 3 41 0.5 0.25 0.125 .........( )

3 2 10.5 0.25 0.125 ........

.( )

.( )

Z ZH z

H z

Z

Z Z Z ZH z

Z Z Z Z

noncausal

truncatedandcausal

- - += + - - + - + - - - - + - +

(4.3)

The three-tap non-minimum phase channel H (z) = 0.3410+0.8760z 1+0.3410z 2 is used

throughout this thesis for simulation purposes. A channel delay, D, is included to assist in the

classification so that the desired output becomes u(n D).

4.2.3 INTERSYMBOL INTERFERENCE:

Inter-symbol interference (ISI) has already been described as the overlapping of the transmitted

data. It is difficult to recover the original data from one channel sample dimension because there

is no statistical information about the multipath propagation. Increasing the dimensionality of

the channel output vector helps characterize the multipath propagation. This has the affect of not

only increasing the number of symbols but also increases the Euclidean distance between the

[51]


52/97

output classes.

Fig. 4.3 Interaction between two neighboring symbols

When additive Gaussian noise, is present within the channel, the input sample will form

Gaussian clusters around the symbol centers. These symbol clusters can be characterized by a

probability density function (PDF) with a noise variance 2. where the noise can cause the

symbol clusters to interfere. Once this occurs, equalization filtering will become inadequate to

classify all of the input samples. Error control coding schemes can be employed in such cases

but these often require extra bandwidth.

4.4.4SYMBOL OVERLAP:

The expected number of errors can be calculated by considering the amount of symbol

interaction, assuming Gaussian noise. Taking any two neighboring symbols, the cumulative

distribution function (CDF) can be used to describe the overlap between the two noise

characteristics. The overlap is directly related to the probability of error between the two

symbols and if these two symbols belong to opposing classes, a class error will occur.

Fig4.3 shows two Gaussian functions that could represent two symbol noise distributions. The

Euclidean distance, L, between symbol canters and the noise variance 2s can be used in the

[52]

L

1 2

1

2

Area of overlap =Probability of error


53/97

cumulative distribution function of (4.4) to calculate the area of overlap between the two

symbol noise distributions and therefore the probability of error, as in (4.5).

12

2exp 22( ) xCDF dx x s s

= - - (4.4)

( ) 22L

P c CDF

= (4.5)

Since each channel symbol is equally likely to occur the probability of unrecoverable errors

occurring in the equalization space can be calculated using the sum of all the CDF overlap

between each opposing class symbol. The probability of error is more commonly described as

the BER. (4.6) describes the BER based upon the Gaussian noise overlap, whereNSP is the

number of symbols in the positive class,Nm is the number of number of symbols in the negative

class and iD is the distance between the Ithpositive symbol and its closest neighboring symbol

in the negative class.

2( ) log ( )

21

NspiBER CDF n N N isp m n

ss

D= + =

(4.6)

[53]


54/97

4.3 CHANNEL EQUALIZATION:

The inverse model of a system having an unknown transfer function is itself a system having a

transfer function which is in some sense a best fit to the reciprocal of the unknown transfer

function. Sometimes the inverse model response contains a delay which is deliberately

incorporated to improve the quality of the fit. In Fig. 4.4, a source signal s(n) is fed into an

unknown system that produces the input signal x(n) for the adaptive filter. The output of the

adaptive filter is subtracted from a desired response signal that is a delayed version of the source

signal, such that( ) ( )n nd = - D

Where is a positive integer value. The goal of the adaptive filter is to adjust its characteristics

such that the output signal is an accurate representation of the delayed source signal.

There are many applications of adaptive inverse model of a system. If the system is a

communication channel then the inverse model is an adaptive equalizer which compensates the

effects of inter symbol interference (ISI) caused due to restriction of channel bandwidth [38].

Similarly if this system is the models of a high density recording medium then its correspondinginverse model reconstruct the recorded data without distortion [39]. If the system represents a

nonlinear sensor then its inverse model represents a compensator of environmental as well as

inherent nonlinearities [40]. The adaptive inverse model also finds applications in adaptive

control [41] as well as in deconvolution in geophysics application [42]

[54]


55/97

Fig. 4.4: Inverse Modeling

Channel equalization is a technique of decoding of transmitted signals across non ideal

Communication channels. The transmitter sends a sequence s(n) that is known to both the

transmitter and receiver. However, in equalization, the received signal is used as the input

Signalx(n) to an adaptive filter, which adjusts its characteristics so that its output closely

matches a delayed version ( )nS - D of the known transmitted signal. After a suitable

adaptation period, the coefficients of the system either are fixed and used to decode future

transmitted messages or are adapted using a crude estimate of the desired response signal that is

computed from y(n) . This latter mode of operation is known as decision-directed adaptation.

Channel equalization is one of the first applications of adaptive filters and is described in the

pioneering work of Lucky. Today, it remains as one of the most popular uses of an adaptive

filter. Practically every computer telephone modem transmitting at rates of 9600 bits per second

or greater contains an adaptive equalizer. Adaptive equalization is also useful for wireless

communication systems. Qureshi [43] has written an excellent tutorial on adaptive equalization.

A related problem to equalization is deconvolution, a problem that appears in the context of

geophysical exploration.

[55]

System/Plant/Channel Adaptive Filter

Delay

Update Algorithm

(n)+

+

x(n) y(n)

+

e(n)S(n)

-


56/97

In many control tasks, the frequency and phase characteristics of the plant hamper the

convergence behavior and stability of the control system. We can use an adaptive filter shown

in Fig. 4.4 to compensate for the nonideal characteristics of the plant and as a method for

adaptive control. In this case, the signals(n) is sent at the output of the controller, and the signal

x(n) is the signal measured at the output of the plant. The coefficients of the adaptive filter are

then adjusted so that the cascade of the plant and adaptive filter can be nearly represented by the

pure delay z-.

Transmission and storing of high density digital information plays an important role in the

present age of information technology. Digital information obtained from audio, video or text

sources needs high density storage or transmission through communication channels.

Communication channels and recording medium are often modeled as band-limited channel for

which the channel impulse response is that of an ideal low pass filter. When sequences ofsymbols are transmitted recorded, the low pass filtering of the channel distorts the transmitted

symbols over successive time intervals causing symbols to spread and overlap with adjacent

symbols. This resulting linear distortion is known as inter symbol interference. In addition

nonlinear distortion is also caused by cross talk in the channel and use of amplifiers. In the data

storage channel, the binary data is stored in the form of tiny magnetized regions called bit cells,

arranged along the recording track. At read back, noise and nonlinear distortions (ISI) corrupt

the signal. An ANN based equalization technique has been proposed to alleviate the ISI present

during read back from the magnetic storage channel. Recently, Sun et al [44] have reported an

improved Vitoria detector to compensate the nonlinearities and media noise. Thus adaptive

channel equalizers play an important role in recovering digital information from digital

communication channels/storage media. Preparta had suggested a simple and attractive scheme

for dispersal recovery of digital information based on the discrete Fourier transform.

Subsequently Gibson et al have reported an efficient nonlinear ANN structure for reconstructing

digital signal which has passed through a dispersive channel and corrupted with additive noise.

In a recent publication the authors have proposed optimal preprocessing strategies for perfect

reconstruction of binary signals from a dispersive communication channels. Tourietal have

developed deterministic worst case framework for perfect reconstruction of discrete data

transmission through a dispersive communication channel. In recent past, new adaptive

equalizers have been suggested using soft computing tools such as artificial neural network

[56]


57/97

(ANN), polynomial perception network (PPN) and the functional link artificial neural network

(FLANN). It is reported that these methods are best suited for nonlinear and complex channels.

Recently, Chebyshev artificial neural network has also been proposed for nonlinear channel

equalization [45]. The drawback of these methods is that the estimated weights may likely fall

to local minima during training. For this reason genetic algorithm (GA) [46] and Differential

evolution [19] has been suggested for training adaptive channel equalizers. The main attraction

of GA lies in the fact that it does not rely on Newton like gradient-descent methods, and hence

there is no need for calculation of derivatives. This makes them less likely to be trapped in local

minima. But only two parameters of GA, the crossover and the mutation, help to avoid local

minima problem.

4.3.1 TRANSVERSAL EQUALIZER:

The transversal equalizer uses a time-delay vector, Y (n) (4.7), of channel output samples to

determine the symbol class. The {m} TE notation used to represent the transversal equalizer

specifies m inputs. The equalizer filter output will be classified through a threshold activation

device (Fig4.5) so that the equalizer decision will belong to one of the BPSK states u(n) {1,

+1}

Y (n) = [y (n), y (n 1)... y (n (m

final thesis pk12

Documents