chapter ii literature survey -...
TRANSCRIPT
![Page 1: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/1.jpg)
7
Chapter II
LITERATURE SURVEY
This chapter provides a literature review on Artificial Intelligence systems, computer
aided medical diagnosis, image denoising and feature extraction. In AI systems
section, literature reviews on ANNs, fuzzy systems and GAs are provided. In the
image denoising section, its evolution and classification are dealt with.
2.1 ARTIFICIAL INTELLIGENCE SYSTEMS
AI is the intelligence of machines and the branch of computer science which aims to
create it. Computational intelligence (CI) was seen as a comprehensive framework to
design and analyze intelligent systems with a focus on all fundamentals of autonomy,
learning, and reasoning (Duch 2007). The idea is to consider computing systems that
are able to learn and deal with new situations using reasoning, generalization,
association, abstraction, and discovery capabilities (Eberhart et al 1996). The
paradigm of CI is shown in Figure 2.1.
Figure 2.1 Paradigm of Computational Intelligence systems
Swarm Intelligence
Evolving Systems
Genetic Fuzzy
Systems
Neural Evolutionary
Systems
Immune Systems
Neural Fuzzy
Systems
Neural Networks
Fuzzy Set Theory
Evolutionary
Systems
Computational Intelligence
![Page 2: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/2.jpg)
8
Growing as a stand-alone field in itself, CI nowadays contains evolving systems
(Angelov 2002) and swarm intelligence (Kennedy and Eberhart 2001; Dorigo and
Stutzle 2004), immune systems (Castro and Timmis 2002), and other forms of natural
(viz., biologically inclined) computation. A key issue in CI is adaptation of behavior
as a strategy to handle changing environments and deal with unforeseen situations. CI
exhibits interesting links with machine intelligence (Mitchell et al 1997), statistical
learning (Tibshirani et al 2001) and intelligent data analysis and data mining
(Berthold and Hand 2006), pattern recognition and classification (Duda et al 2001),
control systems (Dorf and Bishop 2004), team learning in robotic soccer (Geetha
Ramani 2009) and operations research (Hillier and Lieberman 2005). This research
focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis and a
detailed literature review on each is provided.
2.1.1 Artificial Neural Networks
Much of the research in ANN is related to the understanding of nonlinear dynamics of
specific architectures and searching for fast, efficient, convergent, stable and robust
ways for training and adoption from noisy sampled data, and it is this that is
dominating developments in the field rather than developments in fundamental
theory. This is unfortunate as there is no rigorous mathematical foundation for the
determination of the characteristics of a sampled data set and specific network ability
to generalize from that set (Kung 1993).
There is no general design theory to determine the allocation of neurons to data, what
weights to alter and in what ways to make an accurate record of the data. The distant
goal of ‘neural networkers’ is to understand how to store, retrieve, and process data in
neural networks (Judd 1990). In 1988, Hecht-Nielsen listed some of the best known
ANN in chronological order 1-13. In 1988, Specht announced the PNN which was
placed as 14th. The 15th (Albus 1975; Miller et al 1990) Cerebellar Model Arithmetic
Computer (CMAC) is based on the cerebellation (4th in Hecht-Nielson's list) and has
found applications in robotics and other nonlinear industrial control systems. The
names of some ANN developers, features, advantages, disadvantages and some
possible areas of application (Hecht-Nielson 1988) are summarized in Table 2.1.
![Page 3: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/3.jpg)
9
Table 2.1 Milestones of ANN Research
Network Features Applications Advantages Disadvantages
PERCEPTRON, 1958 (Rosenblatt). Oldest artificial neural network
Built in hardware Rarely used today Cannot recognize complex
characters, eg. Chinese. Sensitive to differences in
scale Translation, distortion.
MADALINE, 1960-62 (Widrow).
Multiple adaptive linear elements. Have been in commercial use for over 20 years
Control systems Image processing Antenna systems Pattern recognition Noise cancellation Adaptive nulling of radar
jammers Adaptive modems Adaptive equalizers (echo
cancellers) in telephone lines
AVALANCHE, 1967 (Grossberg).
Is a class of networks and no single network can do all the tasks
Continuous speech recognition Teaching motor commands to
robotics arms
Requires a literal playback of motor sequences.
No simple way to alter speed or interpolate movements
![Page 4: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/4.jpg)
10
Table 2.1 Milestones of ANN Research (Contd.)
Network Features Applications Advantages Disadvantages
CEREBELLATION, 1969 (Marr, Albus and Pellionez).
Similar to avalanche network. Can blend several command sequences with different weights to interpolate motions smoothly.
Controlling motor action of robotics arms Requires complicated control
input
MULTI-LAYER PERCEPTRON (BACKPROPAGATION-OF-ERROR), 1974-86 (Werbos, Parker and Rumelhart).
Many classification applications Speech synthesis from text Adaptive control of robotics
arms Scoring bank loan applications Signal processing Control systems
Most popular network
Works well generally
Simple to learn
Supervised training only Abundant correct input/output
examples needed Slow to train May converge to inferior
solution or not at all
BRAIN STATE IN A BOX, 1977 (Anderson).
Similar to bi-directional associative memory in completing fragmented inputs.
Extraction of knowledge from data bases.
Psychological experimentation
One-shot decision making No iterative reasoning
NEOCOGNITRON, 1978-84 (Fukushima).
Most complicated network ever developed. Insensitive to differences in scale, rotation, translation. Able to identify complex characters, eg. Chinese.
Hand printed-character recognition
Requires usually a large number of processing elements and connections
ADAPTIVE RESONANCE THEORY (ART), 1978-86 (Carpenter and Grossberg).
Very sophisticated.
Pattern recognition, especially complicated or unfamiliar to humans, eg. radar, sonar and voiceprints
Decision making under risk Neurobiological connections and
classical conditioning
Sensitive to translation Distortion and changes in
scale.
![Page 5: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/5.jpg)
11
Table 2.1 Milestones of ANN Research (Contd.)
Network Features Applications Advantages Disadvantages
SELF-ORGANISING MAP (SOM), 1972-82 (Kohonen).
Maps one geometric region onto another, eg. rectangle to aircraft
More effective than many algorithmic techniques for numerical aerodynamic flow calculations
Requires extensive training
HOPFIELD, 1982 (Hopfield).
Can be implemented on a large scale. Normally used with binary inputs.
Retrieval of complete data or images from fragments
Olfactory processing Signal processing
The weights must be set in advance
Number of patterns that can be stored and accurately recalled is severely limited
An exemplar pattern will be unstable if it shares many bits in common with another exemplar
BI-DIRECTIONAL ASSOCIATIVE MEMORY, 1985 -88 (Kosko).
Associates fragmented pairs of objects with completed pairs.
Content-addressable associative memory
Resource allocation
Low storage density Data must be properly coded
BOLTZMANN / CAUCHY MACHINE, 1985-86 (Hinton and Sejnowsky).
Pattern recognition for images,
radar and sonar Graph search and optimization
Simple network in which noise functions find global minimum
Boltzmann - long training time
Cauchy - generating noise in proper statistical distribution
![Page 6: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/6.jpg)
12
Table 2.1 Milestones of ANN Research (Contd.)
Network Features Applications Advantages Disadvantages
COUNTER PROPAGATION, 1986 (Hecht-Nielsen).
Functions as a self-programming look-up table. Similar to backpropagation but less powerful.
Image compression Statistical analysis Scoring of bank loan
applications
Large number of processing elements and connections are required for high accuracy for any size of problem
PROBABILISTIC NEURAL NETWORK (PNN), 1990 (Specht).
Training is much faster than MLP and easy in one-pass. Decision surfaces are guaranteed to approach the Bayes'-optimal boundaries as the size of the training sample grows. Sparse samples can be adequate for good performance.
Pattern recognition and classification
Mapping Direct estimation of posteriori
probability density functions
All training sample points must be stored and used to classify new patterns so that a large memory is required
Classification time can be slower than MLP for software realization
CEREBELLAR MODEL ARITHMETIC COMPUTER (CMAC), 1971 (Albus).
Training is much faster than MLP. Large networks can be used and trained in practical time. Practical hardware realization using logic cell arrays.
Real-time robotics Pattern recognition Signal processing Speech processing
Generalization is not global, only local
Design care is necessary to assure a low error solution
SUPPORT VECTOR MACHINE (SVM), 1995 (Cortes and Vapnik)
Utilizes competitive learning. Suitable for numerical data.
Classification of web pages Image recognition Shape description
For classic categorization problem boundary is limited
NEURO-DYNAMIC PROGRAMMING 1996-06 (Bertsekas and Tsitsiklis)
Can deal explicitly with state and control constraints. Implemented using standard deterministic optimal control methodology.
Applies to both deterministic and stochastic problems
Connection with infinite-time reachability
![Page 7: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/7.jpg)
13
Amari (1998) indicated that most learning rules of ANNs are formulated as follows:
xrηΔw i (2.1)
where ‘r’ is a learning signal function, x is the input vector, w is the weight vector
and η is the learning signal vector. The function ‘r’ depends on whether the teacher
(or target) signals‘t’ is available or not:
r = r ( x , w, y) for supervised learning or
r = r (x, w) for unsupervised learning.
Equation 2.1 corresponds to the Widrow-Hoff rule when r = y – wT x.
For convenience, the learning signal vector (Amari 1998) is defined as
)(ηΔw i vectorsignallearning (2.2)
Table 2.2 summarizes typical neural learning rules, with t the target output, y the
neural network’s output, x the input vector, w the weight vector , J the Jacobian
matrix, e the error and α as the learning rate (Jang et al 1997). Unsupervised learning
is useful for analyzing data without desired outputs; the networks evolve to capture
density characteristics of a data set.
Table 2.2 Typical Neural Network Learning Formulas
Learning algorithm Learning signal vector Learning mode
Hebbian (Hebb 1949) y x Unsupervised
Perceptron (Rosenblatt 1962) { t – sgn (wT x)} x Supervised
Outstar (Grossberg 1973) t - wi Supervised
Oja’s (reversed least mean square) (Oja 1982) (x – y w) y Unsupervised
Winner-take-all (competitive) ( Lippman 1987) x – w Unsupervised
Correlation (Grossberg 1988) t x Supervised
Least mean square (Widrow 1990) (t – wT x) x Supervised
Delta (Widrow 1990) {(t – y) α} x Supervised
Recursive Levenberg-Marquardt (Ngia and sjoberg 2000)
wi+1= wi (JiTJi + αI) Ji
T e Supervised
![Page 8: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/8.jpg)
14
In addition to the above standard learning methods in table 2.2, there are lot of other
learning algorithms available in literature like natural-descent method (Amari 1998),
stochastic learning algorithm (Sheta and De Jong 2001), terminal attractor-based BP
algorithm (Jiang and Yu 2001) etc. Levenberg-Marquardt learning algorithm is
widely used with slight modifications.
Table 2.3 shows the list of classifiers used in different studies to perform the task of
classification. Even though many classifiers are available in literature, study has been
restricted to most commonly used standard classifiers.
Table 2.3 List of different Classifiers in Literatures
Sl. No Classifier References
1 Neural networks
Choi et al (1997) Einstein et al (1998) Furundzic et al (1998) Spyridonos et al (2002) Papadopoulos et al (2002) Demir et al (2004) Gunduz et al (2004) Cho and Won (2006) Rodríguez et al (2010)
2 K-nearest neighborhood
Schnorrenberg et al (1996) Weyn et al (1999) Ginneken and Mendrik (2006) Huang et al (2009)
3 Logistic regression Wolberg et al (1995) Einstein et al (1998) Hong and Mitchell(2007)
4 Fuzzy systems
Blekas et al (1998) Nauck et al (1999) Jin (2000) Monzon and Pisarello (2005) Abu-Amara and Abdel-Qader (2009)
5 Linear discriminant analysis
Hamilton et al (1997) Esgiar et al (1998) Smolle (2000) Li and Yuan(2005)
6 Decision trees Wiltgen et al (2003) Silahtaroğlu (2009)
![Page 9: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/9.jpg)
15
A classification system requires data for training and testing process separately to
evaluate its success. Since there is a limited available data in training, it is important
to test the system with extra data. How to use this limited amount of data in both
training and testing is an issue. More data used in training lead to better system
designs, whereas more data used in testing lead to more reliable evaluation of the
system (Kaski 1997). Evaluating the system according to the success obtained on the
training set brings in the risk of memorization of data and obtaining over-optimistic
error rates.
To circumvent the memorization problem, the system should be evaluated on a
separate data set that is not used in training. For that, one approach is to split the data
Table 2.4 List of Classifier Evaluation techniques used in different Studies
Sl. No List of Techniques References
1 No separate evaluation set:
Thiran and Macq (1996) Anderson et al (1997) Smolle (2000) Cho and Won (2006)
2 Separate training and test sets:
Choi et al (1997) Blekas et al (1998) Esgiar et al (1998) Pena-Reyes and Sipper (1999) Wiltgen et al (2003) Gunduz et al (2004) Demir et al (2005) Resul Das et al (2009) Posawang et al (2009) Muthu Rama Krishnan et al(2010)
3 K-fold cross-validation:
Wolberg et al (1995) Zhou et al (2002) Wiens et al (2008) Rodríguez (2010)
4 Leave-one-out:
Schnorrenberg et al (1996) Einstein et al (1998) Weyn et al (1999) Albregtsen et al (2000) Spyridonos et al (2001) Hong and Mitchell (2007) Kemal Polat and Salih Gunes (2009)
![Page 10: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/10.jpg)
16
into two disjoint sets and use these sets to train and test the system. In case it is not
feasible to use a significant portion of the data as the testing set, k-fold cross-
validation can be used. This approach randomly partitions the data set into k groups.
Then, it uses k-1 groups to train the system and uses the remaining group to estimate
an error rate. This procedure is repeated k times such that each group is used for
testing the system. Leave-one-out is a special case of the k-fold cross validation where
k is selected to be the size of the data; therefore only a single sample is used to
estimate the error rate in each step.
Since the testing stage should measure how well the system will work on unknown
samples in the future, the test set should also consist of the samples that are
independent from those used in the training. However, in the case of k-fold
cross-validation, random partitioning may result in using the test sets that do not
include such independent samples. Therefore, over-optimistic results may be
obtained. An illustrative example that shows the effects of each approach on the
system success can be found in Schulerud et al (1998). In this example, by using the
same data, 95% accuracy is achieved when the entire data is used in both training and
testing; 87% testing accuracy is obtained in the case of k-fold cross-validation; but,
only 60% testing accuracy is obtained when separate training and test sets are used.
Table 2.4 shows the list of techniques for classifier evaluation that are used by
different studies and the approach of using separate training and test datasets, which is
becoming a standard method for classifier evaluation.
2.1.2 Fuzzy Systems
To devise a concise theory of logic, and later mathematics, Aristotle postulated the
so-called ‘Laws of Thought’. One of these, the ‘Law of the Excluded Middle’, states
that every proposition must either be True (T) or False (F). Even when Parminedes
proposed the first version of this law (around 400 B.C) there were strong and
immediate objections: for example, Heraclitus proposed that things could be
simultaneously True and not True. It was Plato who laid the foundation for what
would become Fuzzy Logic (FL), indicating that there was a third region (beyond T
and F) where these opposites ‘tumbled about’. The notion of an infinite-valued logic
![Page 11: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/11.jpg)
17
was introduced in Zadeh’s (1965) seminal work ‘Fuzzy Sets’ where he described the
mathematics of fuzzy set theory, and by extension FL. This theory proposed making
the membership function (or the values F and T) operate over the range of real
numbers [0, 1]. The door to the development of fuzzy computers was opened in 1985
by the design of the first logic chip by Masaki Togai and Hiroyuki Watanabe (1986)
at Bell Telephone laboratories. In the years to come fuzzy computers will employ
both fuzzy hardware and extended fuzzy software, and they will be much closer in
structure to the human brain than the present-day computers (Zadeh 2009). The
principle of incompatibility lucidly formulated by Zadeh (1965) states that:
‘As the complexity of a system increases, our ability to make precise and yet
significant statements about its behavior diminishes until a threshold is reached
beyond which precision and significance (or relevance) become almost mutually
exclusive characteristics’
In a fuzzy inference system, the knowledge base is comprised of fuzzy rule base and a
database. FIS are universal approximators capable of performing nonlinear mappings
between the inputs and outputs. The Mamdani (Mamdani 1974) and the TSK (Takagi
and Sugeno 1985) models are two popular FISs. The Mamdani model is a nonadditive
fuzzy model that aggregates the output of fuzzy rules using the maximum operator,
while the TSK model is an additive fuzzy model that aggregates the output of rules
using addition operator. Koskos standard additive model (Kosko 1997) is another
additive fuzzy model. All the models can be derived from fuzzy graph (Yen 1999)
and are universal approximators (Kosko 1997). There are two types of fuzzy rules
namely fuzzy mapping rules and fuzzy implication rules (Yen 1999).
Complex fuzzy sets and logic are mathematical extensions of fuzzy sets and logic
from the real domain to the complex domain (Ramot et al 2003).With the increase of
complexity, ability to make precise significant statements about its behavior
diminishes. The Japanese, in the application of the fuzzy technique has acquired more
than 2000 patents and the area spans a wide spectrum, from consumer products and
electronic instruments to automobile and traffic monitoring systems. In Wang and Lu
(2003), fuzzy system with nth order B-spline MFs and CMAC network (Albus 1971)
![Page 12: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/12.jpg)
18
with nth order B-spline basis function are proved to be universal approximators for a
smooth function and its derivatives up to the (n-2)th order. Fuzzy systems are widely
used in medicine as expert systems for providing disease diagnosis (Friedrich
Steimann 2001; Abu-Amara and Abdel-Qader 2009).
2.1.3 Neuro-Fuzzy System (NFS)
A NFS (Jang 1993) is based on a fuzzy system which is trained by a learning
algorithm derived from neural network theory. The (heuristical) learning procedure
operates on local information, and causes only local modifications in the underlying
fuzzy system. ANN learning provides a good way to adjust the expert’s knowledge
and automatically generate additional fuzzy rules and Membership Function (MFs), to
meet certain specifications and reduce design time and costs.The strength of NFS
involves two contradictory requirements in fuzzy modeling: interpretability versus
accuracy. In practice, one of the two properties prevails. Two universal approximation
theorems (A.1 and A.2), one based on linear basis function and the other based on
radial basis function are given in Appendix A and its proof can be referred from Kung
(1993). In ANN context, the theorems suggest that a feed forward network with a
single hidden layer with nonlinear units can approximate any arbitrary function but do
not suggest any method of determining the parameters, such as number of hidden
units and weights to achieve the given accuracy.
ANFIS is a well known neuro-fuzzy model (Jang et al 1997) and is a graphical
representation of TSK model. The sigmoid-ANFIS (Zhang et al 2004) is a special
form of ANFIS, where only sigmoidal MFs are employed. The ANFIS unfolded-in-
time (Sisman-Yilmaz et al 2004) is a method that duplicates the ANFIS T times to
integrate temporal information, where T is the number of time intervals needed in the
specific problem. Neuro-fuzzy systems are usually trained by gradient-descent
method (Jang et al 1997). Similar to ANFIS architecture, the self-organizing fuzzy
neural network (Leng et al 2004) has five layered fuzzy neural network architecture
and is an on-line implementation of TSK (Takagi and Sugeno 1985). The merits of
both neural and fuzzy systems can be integrated in a neuro–fuzzy approach
(Pal et al 2000).
![Page 13: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/13.jpg)
19
Dynamic Evolving Neuro-Fuzzy Inference System (Kasabov and Song 2002) is used
to perform the prediction where new fuzzy rules are created and updated during the
operation of the system. An online sequential fuzzy extreme learning machine (OS-
Fuzzy-ELM) has been developed by Rong et al 2000, for function approximation and
classification problems. The optimized Takagi-Sugeno-type (TS) neuro-fuzzy model
proposed in Siekmann et al (1997) for stock exchange consisted of 31 rules and 22
membership functions.
Fuzzy sets are considered to be advantageous in the logical field, and in handling
higher order processing easily. The higher flexibility is a characteristic feature of
neural nets produced by learning and, hence, this suits data-driven processing better
(Kosko 1997). Equivalence between fuzzy rule-based systems and neural networks is
studied in Zhang et al (2004). Jang et al (1997) have shown that fuzzy systems are
functionally equivalent to a class of Radial Basis Function (RBF) networks, based on
the similarity between the local receptive fields of the network and the membership
functions of the fuzzy system. CANFIS is an extension to ANFIS by using
nonlinearity in the TSK rules of ANFIS and can handle multiple inputs (Jang et al
1997).
2.1.4 GA for Optimization
Genetic algorithms are a class of general-purpose stochastic optimization algorithms
under the universally accepted neo-Darwinian paradigm which is a combination of
classical Darwinian evolutionary theory, the selection of Weismann and the genetics
of Mendel. Generally, the performance of GA is measured by the speed of the search
on one hand and the reliability of the search on the other. Reliability denotes the
chance of getting good results even if the problem is very complex (Back 1995).
There is always a tradeoff between the two factors and the success of GA for a
particular problem optimization depends on the choice of the right set of GA
parameters. GAs have been theoretically and empirically proven to provide robust
search in complex space and have found wide applicability in scientific and
engineering areas including function optimization, machine learning, scheduling, and
others (Buckles and Petry 1992).
![Page 14: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/14.jpg)
20
Goldberg and Grefenstette (2005) had choosen individuals for birth according to their
objective function values. Variants of unbiased tournament selection were analyzed
by Sokolov and Whitley (2005). Copying the corresponding gene from one or other
parent creates each gene in offspring according to a randomly generated crossover
mask (Syswerda 1989). There are also crossovers like cycle crossover, partially
mapped crossover, segmented crossover, shuffle crossover, etc as mentioned in
Potts et al (1994). For two-dimensional applications like image processing
conventional mutation and reproduction operators can be applied in a normal way, but
unbiased crossover like uniform block crossover has to be used. The uniform block
crossover is a two-dimensional wraparound crossover and can sample all the matrix
positions equally (Cartwright and Harris 1993).The convergence rates of two-
dimensional GAs are higher than that of simple GA for bitmaps (Cartwright and
Harris 1993).
The parameters of GA that play a vital role in determining the exploitation and
exploration characteristics of the genetic algorithm are the population size, number of
generations, termination condition, elitism strategy, reproduction, crossover and
mutation percentages (DeJong and Spears 1990). The convergence analysis of a
simple GA is based on the concept of schema (Holland 1973).
In the practice of designing efficient GA’s, there has been strong empirical evidence
showing that population size is one of the most important and also a critical parameter
that plays a significant role in the performance of the genetic algorithms (Lobo and
Lima 2005). This parameter is hard to estimate. If it is too small, then GA converges
to poor solutions, else if it is too large, GA spends unnecessary computational
resources. Determining an appropriate population size is an important task in the
design of genetic algorithms and is closely related to the principle of implicit
parallelism. The different methods of handling the population size parameter in
various GA can be classified as static, when the size of the population remains
unchanged throughout the GA run and dynamic, when the population size is adjusted
![Page 15: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/15.jpg)
21
on the fly during the GA execution (Arabas et al 1994; Hinterding et al 1996;
Back et al 1995).
De Jong and Spears (1990) suggests the following parameters for GA:
Population size = 50, Crossover rate = 0.6, Mutation rate = 0.001,
Crossover type = typically two point, Mutation types = bit flip and Number of
generations 1000. Schaffer et al 1989, has suggested the following parameter setting
after extensive research on these parameters: Population size = 20 - 30,
Crossover rate = 0.75 – 0.95, Mutation rate = 0.005 - 0.01. The genetic diversity of
the population can be improved so as to prevent premature convergence by adapting
the size of population (Goldberg et al 2005; Michalewicz 1996).
Initialization 1. Generate initial population randomly
Fitness Evaluation 2. Evaluate the fitness of each individual
Group and Breeding 4. Sort the individuals in accordance to their fitness values
5. Arrange the population into groups based on their fitness 6. For each group
a. Select Individuals from each group b. Apply Crossover/Mutation operators
c. Evaluate the fitness of offspring d. Add offspring to the same group
Migration 7. Combine all the groups into a single population
8. Sort the population based on their fitness values and trims it to the size of groups
Iteration 9. Repeat the process from Step-5 to the required number of
generations 10. Select the best (high fit) individual
Figure 2.2 Steps of GA (Melanie 1998).
![Page 16: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/16.jpg)
22
To evolve optimal solution, the steps to be followed are given in algorithm of
Figure 2.2 adapted from Melanie (1998). The genetic programming (Koza 1992) is a
variant of GA for symbolic regression and can be used for dicovery of empirical laws.
Summarizing, evolutionary algorithms (Goldberg et al 2005; Koza1992) are
Easy, modular and supports multi objective optimization
Inherently parallel and easily distributed
Easy to exploit for previous or alternate solutions
Flexible in forming building blocks for hybrid applications
Good for noisy environments and gets a solution which gets better with time
Table 2.5 Applications of GA (Goldberg et al 2005)
Domains Application Types
Control
Gas pipe line Pole balancing Missile evasion Pursuit
Design
Semi Conductor layout Aircraft design Keyboard configuration Communication networks
Scheduling Manufacturing Facility scheduling Resource allocation
Robotics Trajectory planning
Machine learning Designing neural networks Classification algorithms
Signal Processing Filter design
Game playing Poker Checker Prisoner’s dilemma
Combinatorial optimization
Set covering Traveling salesman Routing Bin packing Graph colouring Partitioning
![Page 17: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/17.jpg)
23
Table 2.5 gives the fields where GA has been successfully applied and utilized to
achieve multiobjective, multimodal and constraint-satisfaction optimizations (Du and
Swamy 2008).
2.2 COMPUTER AIDED MEDICAL DIAGNOSIS
Today, cancer constitutes a major health problem. Approximately one out of every
two men and one out of every three women get cancer at some point during their
lifetime (Cigdem Demir and Bulent Yener 2009). Furthermore, the risk of getting
cancer has been further increasing due to the change in the habits of people in this
century such as the increase in tobacco use, deterioration of dietary habits, and lack of
activity. Fortunately, the recent advances in medicine have significantly increased the
possibility of curing cancer. However, the chance of curing cancer primarily relies on
its early diagnosis and the selection of its treatment depends on its malignancy level
(Abbass 2002). Therefore, it is critical for us to detect cancer, distinguish cancerous
structures from the benign and healthy ones and identify its malignancy level.
Breast cancer is considered to be one of the most common and fatal cancers among
women in the USA (http://www.cancer.gov/cancertopics/types/breast,2008).
According to National Cancer Institute, 40 480 women died due to this disease and on
average every three minutes one woman is diagnosed with this cancer. Pisani et al
(1993) estimated worldwide mortality from eighteen major cancers including breast
cancer. Li et al (1995) developed a method for detecting tumors using a segmentation
process, adaptive thresholding, and modified Markov random fields, followed by a
classification step based on a fuzzy binary decision tree. Li (1995) used Markov
random field for tumor detection in digital mammography. Smart et al (1995)
analyzed the benefits of mammographic screening and showed that it has an overall
accuracy rate of 90%. Tsujii et al (1999) proposed classification of micro
calcifications in mammograms using RBF. Peña-Reyes and Sipper (1999) applied a
combined fuzzy-genetic approach with new methods as a CAD system. Kim et al
(1999) proposed statistical textural features for detection of microcalcifications in
digitized mammograms. These systems are regarded as a second reader, and the final
![Page 18: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/18.jpg)
24
decision is left to the radiologist. CAD algorithms have improved radiologist’s total
accuracy of detection of cancerous tissues (Giger et al 2001).
Sickles (2000) proposed mammographic follow-up of lesions and Rudy Setiono
(2000) proposed concise and accurate classification rules for breast cancer diagnosis.
Sheybani (2001) have taken up the challenge to create a tele-radiology system, which
consists of a fiber optic network derived by a set of asynchronous transfer mode
(ATM) that switches in association with CAD algorithms. This research explores a
new technology, which is ATM tele-radiology network and the high-speed fiber
backbone architecture that offers real time, online, more accurate screening, detection
and diagnosis of breast cancer. Thus, ATM tele-radiology network has been an
important tool in the development of tele-mammography (Sheybani 2001). Zhen and
Chan (2001) combined AI methods and discrete Wavelet Transform (WT) to build an
algorithm for mass detection. Lisboa (2002) analyzed a review on evidence of health
benefits from ANNs. Early detection of breast cancer via mammography improves
treatment chances and survival rates (Lee 2002). Unfortunately, mammography is not
perfect. False Positive (FP) rates are 15–30% due to the overlap in the appearance of
malignant and benign abnormalities while False Negative (FN) rates are 10–30%.
Bocchi et al (2004) developed an algorithm for microcalcification detection and
classification by which the existing tumors are detected using a region growing
method combined with ANN based classifier. Then, microcalcification clusters are
detected and classified by using a second fractal model. Hassanien and Ali (2004)
proposed an enhanced rough set technique for feature reduction and classification.
Swiniarski and Lim (2006) integrated independent component analysis (ICA) with
rough set model for breast cancer detection. First, features are reduced and extracted
using ICA. Then, extracted features are selected using a rough set model. Finally, a
rough set-based method is used for rule-based classifier design. Bommanna Raja
(2008) proposed a hybrid fuzzy neural system for CAD of kidney images. Park et al
(2009) proposed a method for improving the performance of CAD scheme by
combining results from machine learning classifiers. The first preprocessing step in
computer aided diagnosis is removing the noise from the image thereby enhancing the
quality of image.
![Page 19: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/19.jpg)
25
2.3 IMAGE DENOISING
In image denoising, a compromise has to be achieved between noise reduction and the
preservation of significant edges, corners and other image details (Civicioglu et al
2004). Window-based filtering algorithms (Yüksel 2006) such as median-based filters
are well known to suppress noise, but one of their big drawbacks is that they often
remove important details and blur the image when large window sizes are used, or
noise suppression cannot be obtained sufficiently for small window sizes. Another
pitfall is that median-based filters use the local information and they do not consider
the long-range correlation within natural images. In order to overcome this drawback,
various generations of the median filter, such as switching median filters ( Zhang and
Karim 2002), center weighted median filters (Chen and Wu 2001), rank-ordered
median filters, iterative median filters, and noise detection-based median filters (Fried
et al 2006) with thresholding operations have been proposed. Histogram based fuzzy
filters was proposed by Wang et al (2002) and neuro-fuzzy filters for impulse noise
removal was proposed by Yuksel and Bestok (2004). Besdok et al (2005) used ANFIS
to remove impulse noise. Most of the natural images have additive random noise,
which is modeled as Gaussian. Speckle noise (Guo et al 1994) is observed in US
images and rician noise (Robert Nowak 1999) affects MRI images.
2.3.1 Evolution of Image Denoising
Wavelets give a superior performance in image denoising due to properties such as
sparsity and multiresolution structure. Donoho’s (1995) methods did not require
tracking or correlation of the wavelet maxima and minima across the different scales.
Researchers described different ways to compute the parameters for the thresholding
of wavelet coefficients. These thresholding techniques were applied to the non-
orthogonal wavelet coefficients to reduce artifacts. Data adaptive thresholds (Imola K
Fodor and Chandrika Kamath 2003) were introduced to achieve optimum value of
threshold. Hidden Markov models (HMM) and Gaussian Scale Mixtures (GSM) have
also become popular and more research work continues to be reported. Research in
higher dimensional wavelet transforms has given rise to ridgelets, shearlets, curvelets
and contourlets etc (Latha Parthiban and Subramanian 2006).
![Page 20: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/20.jpg)
26
2.3.2 Classification of Denoising Algorithms
As shown in Figure 2.3, there are two basic approaches to image denoising, spatial
filtering methods and transform domain-filtering methods. Transform domain
filtering methods have less computational complexity compared to spatial filtering
methods but with little trade-off in quality.
Figure 2.3 Classifications of Image Denoising Techniques
Non-Orthogonal Wavelet Transform
IMAGE DENOISING METHODS
Spatial Domain Transform Domain
Linear Non - Linear Non- Data Adaptive Transform
Data Adaptive Transform
Mean
Weiner
Median
Weighted Median
ICA
Wavelet Domain Spatial Frequency Domain
Linear Filtering Non- Linear Threshold Filtering
Wavelet Co-efficient Model
Weiner UDWT
SIWPD
MultiwaveletsNon-Adaptive Adaptive Deterministic Statistical
VISUShrink
SUREShrink
BayesShrink
Cross Validation
Tree Approximation
Marginal Joint
GMM
GGD RMF
HMM
Contourlet
![Page 21: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/21.jpg)
27
2.3.2.1 Spatial Filtering
A traditional way to remove noise from image data is to employ spatial filters. Spatial
filters can be further classified into non-linear and linear filters. Linear filters can be
categorized to mean and weiner filter and non-linear filters can be categorized to
median and weighted median filter.
2.3.2.2 Transform Domain Filtering
The transform domain filtering methods can be subdivided according to the choice of
the basis functions. The basis functions can be further classified as data adaptive and
non-adaptive. Non-adaptive transforms are discussed first since they are popular.
a Spatial-Frequency Filtering
Spatial-frequency filtering refers to use of low pass filters using fast Fourier
transform. In frequency smoothing methods (Jain 1989) the removal of the noise is
achieved by designing a frequency domain filter and adapting a cut-off frequency
when the noise components are decorrelated from the useful signal in frequency
domain. These methods are time consuming and depend on filter function behavior.
b Wavelet domain Filtering
Several wavelet based methods, categorized as ‘denoising from singularity detection’,
have been reported in the literature (Hsung et al 1999). An area that attracted a lot of
attention is adaptive wavelet-based denoising (Mihcak et al 1999; Li and Orchard
2000). This kind of methods often assumes a general type of statistical model for the
image wavelet coefficients. For each wavelet coefficient, the parameters of the
statistical model is calculated and used to estimate the clean wavelet coefficient value.
Filtering operations in the wavelet domain can be either linear or nonlinear.
b.1 Linear Filters
Linear filters such as Wiener filter in the wavelet domain yield optimal results when
the signal corruption can be modeled as a Gaussian process and the accuracy criterion
is the Mean Square Error (MSE). However, designing a filter based on this
![Page 22: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/22.jpg)
28
assumption frequently results in a filtered image that is more visually displeasing than
the original noisy signal, even though the filtering operation successfully reduces the
MSE (Choi and Baraniuk 1998). Zhang et al (2000) proposed a wavelet-domain
spatially adaptive finite impulse response Wiener filtering for image denoising where
Wiener filtering is performed only within each scale and intrascale filtering is not
allowed and it has become standard linear filter in this domain.
b.2 Non-Linear Threshold Filtering
The most investigated domain in denoising using WT is the non-linear coefficient
thresholding based methods. The procedure exploits sparsity property of the WT and
the fact that WT maps white noise in the signal domain to white noise in the
transform domain. Thus, while signal energy becomes more concentrated into fewer
coefficients in the transform domain, noise energy does not.
The procedure in which small coefficients are removed while others are left
untouched is called hard thresholding (Donoho 1995). But the method generates
spurious blips, better known as artifacts, in the images as a result of unsuccessful
attempts of removing moderately large noise coefficients. To overcome the demerits
of hard thresholding, WT using soft thresholding was also introduced in (Donoho
1995). In this scheme, coefficients above the threshold are shrunk by the absolute
value of the threshold itself. Similar to soft thresholding, other techniques of applying
thresholds are semi-soft thresholding and Garrote thresholding (Imola K Fodor and
Chandrika Kamath 2003). Most of the wavelet shrinkage literature is based on
methods for choosing the optimal threshold which can be adaptive or non-adaptive to
the image.
b.3 Wavelet Coefficient Model
This approach focuses on exploiting the multiresolution properties of WT. This
technique identifies close correlation of signal at different resolutions by observing
the signal across multiple resolutions. This method produces excellent output but is
computationally much more complex and expensive. The modeling of the wavelet
coefficients can either be deterministic or statistical.
![Page 23: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/23.jpg)
29
i. Deterministic
The deterministic method of modeling involves creating tree structure of wavelet
coefficients with every level in the tree representing each scale of transformation and
nodes representing the wavelet coefficients. This approach is adopted in Baraniuk
1999. The optimal tree approximation displays a hierarchical interpretation of wavelet
decomposition. Wavelet coefficients of singularities have large wavelet coefficients
that persist along the branches of tree. Thus if a wavelet coefficient has strong
presence at particular node then in case of it being signal, its presence should be more
pronounced at its parent nodes. If it is noisy coefficient, for instance spurious blip,
then such consistent presence will be missing. Lu et al. (1992) tracked wavelet local
maxima in scale space, by using a tree structure. Another denoising method based on
wavelet coefficient trees was proposed by Donoho (1995).
ii. Statistical Modeling of Wavelet Coefficients
This approach focuses on appealing properties of the WT such as multiscale
correlation between the wavelet coefficients, local correlation between neighborhood
coefficients etc. and has an inherent goal of perfecting the exact modeling of image
data using WT. A good review of statistical properties of wavelet coefficients can be
found in Buccigrossi and Simoncelli 1999 and Romberg et al 2001. While the
objective of denoising is to remove the noise from a signal, it is also very important
that the edges in the image are not blurred by the denoising operation. The Lipschitz
regularity theory is widely used to detect the edge and non-edge wavelet coefficients,
based on the dyadic discrete WT (Mallat 1999). The following two techniques exploit
the statistical properties of the wavelet coefficients based on a probabilistic model.
Marginal Probabilistic Model
A number of researchers have developed homogeneous local probability models for
images in the wavelet domain. Specifically, the marginal distributions of wavelet
coefficients are highly kurtotic, and usually have a marked peak at zero and heavy
tails. The Gaussian mixture model (GMM) (Chipman et al 1997) and the generalized
Gaussian distribution (GGD) (Liu and Moulin 1999) are commonly used to model the
![Page 24: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/24.jpg)
30
wavelet coefficients distribution. Although GGD is more accurate, GMM is simpler
to use. In Mihcak et al (1999), authors proposed a methodology in which the wavelet
coefficients are assumed to be conditionally independent zero-mean Gaussian random
variables, with variances modeled as identically distributed, highly correlated random
variables. An approximate maximum a posteriori probability rule is used to estimate
marginal prior distribution of wavelet coefficient variances. All these methods
mentioned above require a noise estimate, which may be difficult to obtain in
practical applications. Simoncelli and Adelson (1996) used a two parameter
generalized Laplacian distribution for the wavelet coefficients of the image, which is
estimated from the noisy observations. Chang et al (2000) proposed the use of
adaptive wavelet thresholding for image denoising, by modeling the wavelet
coefficients as a generalized Gaussian random variable, whose parameters are
estimated locally (i.e., within a given neighborhood).
Joint Probabilistic Model
Hidden Markov models (HMM) (Romberg et al 2001) are efficient in capturing inter-
scale dependencies, whereas Random Markov Field (RMF) models are more efficient
to capture intrascale correlations (Malfait and Roose 1997). The correlation between
coefficients at same scale but residing in a close neighborhood are modeled by hidden
Markov chain model where as the correlation between coefficients across the chain is
modeled by hidden Markov trees (HMT). Once the correlation is captured by HMM,
expectation maximization is used to estimate the required parameters and from those,
de-noised signal is estimated from noisy observation using well-known maximum a
posteriori estimator. Portilla et al (2002) described a model in which each
neighborhood of wavelet coefficients is described as a GSM which is a product of a
Gaussian random vector, and an independent hidden random scalar multiplier. Strela
(2000) described the joint densities of clusters of wavelet coefficients as a GSM, and
developed a maximum likelihood solution for estimating relevant wavelet coefficients
from the noisy observations. A disadvantage of HMT is the computational burden of
the training stage. In order to overcome this computational problem, a simplified
HMT, named as uHMT (Romberg et al 2001) was proposed.
![Page 25: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/25.jpg)
31
b.4 Non-orthogonal Wavelet Transforms
Un-Decimated WT (UDWT) has also been used for decomposing the signal to
provide visually better solution. Since UDWT is shift invariant it avoids visual
artifacts such as pseudo-Gibbs phenomenon but adds a large overhead of
computations thus making it less feasible. In Lang et al (1995) normal hard/soft
thresholding was extended to shift invariant discrete WT. In Cohen et al (1999) Shift
Invariant Wavelet Packet Decomposition (SIWPD) is exploited to obtain number of
basis functions. Then using minimum description length principle, the best basis
function was found out which yielded smallest code length required for description of
the given data. Then, thresholding was applied to denoise the data. The multiwavelets
are obtained by applying more than one mother function (scaling function) to given
dataset. Multiwavelets possess properties such as short support, symmetry, and most
importantly higher order of vanishing moments. This combination of shift invariance
and multiwavelets are implemented (Bui et al 1998) which give superior results for
the Lena image in context of MSE.
b.5 Contourlet Domain
In 2002, Do and Vetterli pioneered a sparse representation for two-dimensional
piecewise smooth signals that resemble images and named it contourlet transform.
But, image denoising by means of the contourlet transform introduces many visual
artifacts because of the Gibbs-like phenomena around singularities (Ramin Eslami
and Hayder Radha 2003). The contourlet transform has a fast iterated filter bank
algorithm that requires order N operations for N-pixel images. It is easily adjustable
for detecting fine details in any orientation (Do and Vetterli 2005) at various scale
levels. Due to the lack of translation invariance of the contourlet transform, the NSCT
is proposed (Arthur L da Cunha 2006) whose structure consists of a bank of filters
and can be divided into the following two shift-invariant parts:
i. Nonsubsampled pyramid (NSP) and
ii. Nonsubsampled directional filter bank (NSDFB)
![Page 26: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/26.jpg)
32
2.4 FEATURE EXTRACTION
After preprocessing the image, the features from the image have to be extracted.
Although it is possible to extract a large set of features, only a small subset of them is
used in the classification due to the curse of dimensionality. The curse of
dimensionality states that as the dimensionality increases, the amount of required
training data increases exponentially. There might be a strong correlation between
different features which is an incentive to reduce the size of the feature set. Different
features selected from image during feature extraction quantify the properties of
biological structures of interest extracting features either at the cellular-level or tissue-
level. While cellular-level features focus on capturing the deviations in the cell
structures, tissue-level features focus on capturing the changes in the cell distribution
across the tissue. The five grouped features of interest are:
The textural features provide information about the variation in intensity of a
surface and quantify properties like smoothness, coarseness, and regularity.
The morphological features provide information about the size and the shape
of a nucleus/cell.
The fractal-based features provide information on the regularity and
complexity of a cell/tissue by quantifying its self-similarity level.
The topological features provide information on the cellular structure of a
tissue by quantifying the spatial distribution of its cells.
The intensity-based features provide information on the intensity (gray-level
or color) histogram of the pixels located in a nucleus/cell.
The types of features used for diagnosis of different types of cancers are given in
Table 2.6. Either one feature can be used as in the case of prostrate cancer or two
features can be used as in the case of skin, lung and liver cancers or three features can
be grouped as in the case of cervical, colorectal and gastric cancers or multiple
features can be grouped as in the case of bladder, breast and mesothelioma types of
cancer (Table 2.6).
The nuclear features (Morphological) to be computed for each identified nucleus
(Street et al 1993) from Figure 2.4 for ultimate cancer diagnosis are:
![Page 27: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/27.jpg)
33
Table 2.6 Types of Features used in the Diagnosis of different type of Cancers
S.No Type of Cancer Features References
1 Bladder
Morphological Textural Fractal-based Topological
Choi et al (1997) Rajesh and Dey (2003) Bommanna Raja (2008)
2 Brain Textural Topological
Spyridonos et al (2002) Gunduz et al (2004) Demir et al (2005)
3 Breast
Morphological Textural Fractal-based Intensity-based
Schnorrenberg et al (1996) Anderson et al (1997) Einstein et al (1998) Dey and Mohanty (2003) Swiniarski and Lim (2006) Zografos et al (2010)
4 Cervical Textural Fractal-based Topological
Keenan et al (2000) McGregor and Olaitan (2010)
5 Colorectal Textural Fractal-based Intensity-based
Hamilton et al (1997) Esgiar et al (1998) Vegard et al (2009)
6 Gastric Morphological Textural Intensity-based
Blekas et al (1998) Cunningham and Schulick (2009)
7 Liver Textural Fractal-based
Nielsen et al (1999) Albregtsen et al (2000) Rong Mu et al (2010)
8 Lung Morphological Intensity-based
Thiran and Macq (1996) Zhou et al (2002) Jiang et al (2009)
9 Mesothelioma
Morphological Textural Topological Intensity-based
Weyn et al (1999) Wong et al (2009)
10 Prostate Textural Diamond et al (2004) Wong et al (2009)
11 Skin Textural Intensity-based
Smolle (2000) Wiltgen et al (2003) Quéreux et al (2010)
![Page 28: Chapter II LITERATURE SURVEY - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/5550/11/12_chapter 2.pdf · focuses on CI using ANNs, FL and GA hybrid for efficient medical diagnosis](https://reader034.vdocument.in/reader034/viewer/2022042414/5f2e99f96a62cb3c0b0131d7/html5/thumbnails/28.jpg)
34
Figure 2.4 A Digital image taken from a breast FNA (Street et al 1993).
Radius: average length of a radial line segment, from centre of mass to
a snake point.
Perimeter: distance around the boundary.
Area: number of pixels in the interior of the nucleus.
Compactness: Area
)(Perimeter 2
Smoothness: average difference in length of adjacent radial lines.
Concavity: size of any indentations in nuclear border.
Concave points: number of points on the boundary that lie on an indentation.
Symmetry: relative difference in length between line segments
perpendicular to and on either side of the major axis.
Fractal dimension: the fractal dimension of the boundary based on the
‘coastline approximation’.
Texture: variance of grey-scale level of internal pixels.
2.5 SUMMARY
This chapter provides a detailed literature survey on soft computing techniques,
computer aided medical diagnosis, image denoising algorithms and feature extraction.