report (1)

83
BRAIN IMAGE CLASSIFICATION USING MACHINE LEARNING A PROJECT REPORT Submitted by R.VIGNESH KUMAR - 20110106070 P.YUVARAJ - 20110106074 D.ARUN KUMAR - 20110106301 In partial fulfillment for the award of the degree Of BACHELOR OF ENGINEERING In ELECTRONICS AND COMMUNICATION ENGINEERING ARIGNAR ANNA INSTITUTE OF SCIENCE AND TECHNOLOGY SRIPERUMBUDUR-602 117 1

Upload: arun-kumar

Post on 11-Sep-2014

167 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

BRAIN IMAGE CLASSIFICATION USING MACHINE LEARNING

A PROJECT REPORT

Submitted by

R.VIGNESH KUMAR - 20110106070

P.YUVARAJ - 20110106074

D.ARUN KUMAR - 20110106301

In partial fulfillment for the award of the degree

Of

BACHELOR OF ENGINEERING

In

ELECTRONICS AND COMMUNICATION ENGINEERING

ARIGNAR ANNA INSTITUTE OF SCIENCE AND TECHNOLOGY

SRIPERUMBUDUR-602 117

ANNA UNIVERSITY: CHENNAI 600 025

APRIL 2014

1

BRAIN IMAGE CLASSIFICATION USING MACHINE LEARNING

A PROJECT REPORT

Submitted by

R.VIGNESH KUMAR 20110106070

P.YUVARAJ 20110106074

D.ARUN KUMAR 20110106301

In partial fulfillment for the award of the degree

Of

BACHELOR OF ENGINEERING

In

ELECTRONICS AND COMMUNICATION ENGINEERING

ARIGNAR ANNA INSTITUTE OF SCIENCE AND TECHNOLOGY

SRIPERUMBUDUR-602 117

ANNA UNIVERSITY: CHENNAI 600 025

APRIL 2014

2

ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “BRAIN IMAGE CLASSIFICATION

USING MACHINE LEARNING” is the bonafide work of “R.VIGNESH

KUMAR (20110106070), P.YUVARAJ (20110106070), D.ARUN KUMAR

(20110106301)” who carried out the project work under my supervision.

SIGNATURE SIGNATURE

Mr. S.KARTHICK BABU M.E., MBA Ms. P. JAYASAKTHI M.E

HEAD OF THE DEPARTMENT ASSISTANT PROFESSOR

DEPARTMENT OF ECE DEPARTMENT OF ECE

Arignar Anna Institute of Science Arignar Anna Institute of Science

&Technology, Chennai- 602117. &Technology, Chennai -602117.

Submitted to Project and Viva Examination held on …………………………………..

INTERNAL EXAMINER EXTERNAL EXAMINER

3

ACKNOWLEDGEMENT

We would like to express our deep sense of heartiest thanks to our beloved

chairman Dr. VIRGAI.G.JAYARAMAN B.A and vice chairman Mr.

J.KUMARAN M.E., MBA for giving an opportunity to do and complete this

project.

Principal Dr. BHAGWAN SHREERAM M.E., Ph.D(Engg)., MBA.,

Ph.D(MBA) and our Vice Principal, Mr. M.SUBI STALIN M.E.,(Ph.D) for

creating a beautiful atmosphere, which inspired us to take over this project.

We take opportunity to convey our heartiest thanks to our Head of the

Department Mr. S.KARTHICK BABU M.E., MBA and our project coordinator

Ms. P.MOHANAPRIYA M.E Assistant professor for his much valuable support,

unfledged attention and direction, which kept this project on track.

We extremely thanks to our project supervisor Mr. M.SIVAKUMAR M.E.,

Mr. R.DHINESHRAM M.E., Assistant professor of Electronics and

communication Engineering for this much needed guidance and specially for

entrusting us with our project.

We would like to express my heartiest thanks to lab assistants and to all

other faculties, non-teaching members of our department for their support and

pears for having stood by us and help us to complete this project.

4

ABSTRACT

The Automatic Support Intelligent System is used to detect Brain Tumor

through the combination of neural network and fuzzy logic system. It helps in the

diagnostic and aid in the treatment of the brain tumor. The detection of the Brain

Tumor is a challenging problem, due to the structure of the Tumor cells in the

brain. This project presents an analytical method that enhances the detection of

brain tumor cells in its early stages and to analyze anatomical structures by training

and classification of the samples in neural network system and tumor cell

segmentation of the sample using fuzzy clustering algorithm. The artificial neural

network will be used to train and classify the stage of Brain Tumor that would be

benign, malignant or normal.

The Fast discrete curvelet transformation is used to analysis texture of an

image. In brain structure analysis, the tissues which are WM and GM are extracted.

Probabilistic Neural Network with radial basis function is employed to implement

an automated Brain Tumor classification. Decision making is performed in two

stages: feature extraction using GLCM and the classification using PNN-RBF

network. The segmentation is performed by fuzzy logic system and its result would

be used as a base for early detection of Brain Tumor which would improves the

chances of survival for the patient. The performance of this automated intelligent

system evaluates in terms of training performance and classification accuracies to

provide the precise and accurate results. The simulated result enhances and shows

that classifier and segmentation algorithm provides better accuracy than previous

methodologies.

5

CHAPTER 1

INTRODUCTION

6

CHAPTER 1

INTRODUCTION

1 INTRODUCTION

Automated classification and detection of tumors indifferent medical images

is motivated by the necessity of high accuracy when dealing with a human life.

Also, the computer assistance is demanded in medical institutions due to the fact

that it could improve the results of humans in such a domain where the false

negative cases must be at a very low rate. It has been proven that double reading of

medical images could lead to better tumor detection. Butte cost implied in double

reading is very high, that’s why good software to assist humans in medical

institutions is of great interest nowadays. Conventional methods of monitoring and

diagnosing the diseases rely on detecting the presence of particular features by a

human observer. Due to large number of patients in intensive care units and the

need for continuous observation of such conditions, several techniques for

automated diagnostic systems have been developed in recent years to attempt to

solve this problem. Such techniques work by transforming the mostly qualitative

diagnostic criteria into a more objective quantitative feature classification problem

In this project the automated classification of brain magnetic resonance images by

using some prior knowledge like pixel intensity and some anatomical features is

proposed. Currently there are no methods widely accepted therefore automatic and

reliable methods for tumor detection are of great need and interest. The application

of PNN in the classification of data for MR images problems are not fully utilized

yet. These included the clustering and classification techniques especially for MR

images problems with huge scale of data and consuming times and energy if done

7

manually. Thus, fully understanding the recognition, classification or clustering

techniques is essential to the developments of Neural Network systems particularly

in medicine problems.

Segmentation of brain tissues in gray matter, white matter and tumor on

medical images is not only of high interest in serial treatment monitoring of

“disease burden” in oncologic imaging, but also gaining popularity with the

advance of image guided surgical approaches. Outlining the brain tumor contour is

a major step in planning spatially localized radiotherapy (e.g., Cyber knife, iMRT )

which is usually done manually on contrast enhanced T1-weighted magnetic

resonance images (MRI) in current clinical practice. On T1 MR Images acquired

after administration of a contrast agent (gadolinium), blood vessels and parts of the

tumor, where the contrast can pass the blood–brain barrier are observed as hyper

intense areas. There are various attempts for brain tumor segmentation in the

literature which use a single modality, combine multi modalities and use priors

obtained from population atlases.

1.1 IMAGE SEGMENTATION

1.1.1 OVERVIEW

Segmentation problems are the bottleneck to achieve object extraction,

object specific measurements, and fast object rendering from multi-dimensional

image data. Simple segmentation techniques are based on local pixel-neighborhood

classification. Such methods fail however to global objects rather than local

appearances and require often intensive operator assistance. The reason is that the

“logic” of a object does not necessarily follow that of its local image

8

representation. Local properties, such as textures, edginess, and ridgeness etc. do

not always represent connected features of a given object.

1.2REGION GROWING APPROACH

Region growing technique segments image pixels that are belong to an

object into regions. Segmentation is performed based on some predefined criteria.

Two pixels can be grouped together if they have the same intensity characteristics

or if they are close to each other. It is assumed that pixels that are closed to each

other and have similar intensity values are likely to belong to the same object. The

simplest form of the segmentation can be achieved through thresholding and

component labeling. Another method is to find region boundaries using edge

detection. Segmentation process, then, uses region boundary information to extract

the regions. The main disadvantage of region growing approach is that it often

requires a seed point as the starting point of the segmentation process. This

requires user interaction. Due to the variations in image intensities and noise,

region growing can result in holes and over segmentation. Thus, it sometimes

requires post-processing of the segmentation result.

1.3 CLUSTERING

Clustering can be considered the most important unsupervised learning

problem, so it deals with finding a structure in a collection of unlabeled data. A

cluster is therefore a collection of objects which are “similar” between them and

are “dissimilar” to the objects belonging to other clusters

Clustering algorithms may be classified as listed below

9

(1)Exclusive Clustering

(2)Overlapping Clustering

(3)Hierarchical Clustering

(4)Probabilistic Clustering

In the first case data are grouped in an exclusive way, so that if a certain datum

belongs to a definite cluster then it could not be included in another cluster. On the

contrary the second type, the overlapping clustering, uses fuzzy sets to cluster data,

so that each point may belong to two or more clusters with different degrees of

membership. In this case, data will be associated to an appropriate membership

value. A hierarchical clustering algorithm is based on the union between the two

nearest clusters. The beginning condition is realized by setting every datum as a

cluster. After a few iterations it reaches the final clusters wanted

1.4 K-MEANS SEGMENTATION

K-means is one of the simplest unsupervised learning algorithms that solve

the well-known clustering problem. The procedure follows a simple and easy way

to classify a given data set through a certain number of clusters (assume k clusters)

fixed a priori. The main idea is to define k centroids, one for each cluster. These

centroids should be placed in a cunning way because of different location causes

different result. So, the better choice is to place them as much as possible far away

from each other. The next step is to take each point belonging to a given data set

and associate it to the nearest centroids. When no point is pending, the first step is

completed and an early group age is done. At this point we need to re-calculate k

new centroids as centers of the clusters resulting from the previous step. After we

have these k new centroids, a new binding has to be done between the same data

10

set points and the nearest new centroids. A loop has been generated. As a result of

this loop we may notice that the k centroids change their location step by step until

no more changes are done. In other words centroids do not move any more.

Finally, this algorithm aims at minimizing an objective function, in this case a

squared error function.

1.5 HIERARCHICAL SEGMENTATION

A hierarchical set of image segmentations is a set of several image

segmentations of the same image at different levels of detail in which the

segmentations at coarser levels of detail can be produced from simple merges of

regions at finer levels of detail. A unique feature of hierarchical segmentation is

that the segment or region boundaries are maintained at the full image spatial

resolution for all segmentations. In a hierarchical segmentation, an object of

interest may be represented by multiple image segments in finer levels of detail in

the segmentation hierarchy, and may be merged into a surrounding region at

coarser levels of detail in the segmentation hierarchy. If the segmentation hierarchy

has sufficient resolution, the object of interest will be represented as a single region

segment at some intermediate level of segmentation detail.

A goal of the subject analysis of the segmentation hierarchy is to identify the

hierarchical level at which the object of interest is represented by a single region

segment. The object may then be identified through its spectral and spatial

characteristics. Additional clues for object identification may be obtained from the

behavior of the image segmentations at the hierarchical segmentation level above

and below the level at which the object of interest is represented by a single region.

11

1.6 THRESHOLDING

The simplest method of image segmentation is called the thresholding

method. This method is based on a clip-level (or a threshold value) to turn a gray-

scale image into a binary image. The key of this method is to select the threshold

value (or values when multiple-levels are selected). Several popular methods are

used in industry including the maximum entropy method, Otsu's method

(maximum variance), and k-means clustering. Recently, methods have been

developed for thresholding computed tomography (CT) images. The key idea is

that, unlike Otsu's method, the thresholds are derived from the radiographs instead

of the (reconstructed) image.

1.7 DESIGN STEPS

(1) Set the initial threshold T= (the maximum value of the image brightness

+ the minimum value of the image brightness)/2.

(2) Using T segment the image to get two sets of pixels B (all the pixel

values are less than T) and N (all the pixel values are greater than T);

(3) Calculate the average value of B and N separately, mean ub and un.

(4) Calculate the new threshold: T= (ub+un)/2

(5) Repeat Second steps to fourth steps up to iterative conditions are met and

get necessary region from the brain image.

12

1.8 SYSTEM ANALYSIS

1.8.1 EXISTING TECHNIQUES

(1)PCA and Wavelet based texture characterization

(2)K means clustering and Thresholding method

(3)Manual analysis - time consuming, inaccurate and requires intensive trained

person to avoid diagnostic errors.

1.8.2 PRINCIPAL COMPONENT ANALYSIS

PCA is a mathematical procedure that uses an orthogonal transformation to

convert a set of observations of possibly correlated variables into a set of values of

linearly uncorrelated variables called principal components. The number of

principal components is less than or equal to the number of original variables. This

transformation is defined in such a way that the first principal component has the

largest possible variance (that is, accounts for as much of the variability in the data

as possible), and each succeeding component in turn has the highest variance

possible under the constraint that it be orthogonal to (i.e., uncorrelated with) the

preceding components. Principal components are guaranteed to be independent

only if the data set is jointly normally distributed. PCA is sensitive to the relative

scaling of the original variables. Depending on the field of application, it is also

named the discrete Karhunen–Loève transform (KLT), the Hotelling transform or

proper orthogonal decomposition (POD).

13

1.8.3 DRAWBACKS

(1)Poor discriminatory power

(2)High computational load

1.9 DISCRETE WAVELET TRANSFORM

In mathematics, a wavelet series is a representation of a square integrable

(real-or complex-valued) function by a certain Ortho normal series generated by

a wavelet. This article provides a formal, mathematical definition of an Ortho

normal wavelet and of the integral wavelet transform.

1.9.1 DEFINITION

A function   is called an Ortho normal wavelet if it can be used

to define a Hilbert basis, that is a complete Ortho normal system, for the

Hilbert   of square integrable functions. The Hilbert basis is constructed as

the family of functions for integers . This family is an Ortho normal

system if it is ortho normal under the inner product where   is the Kronecker

delta and   is the standard inner product on 

The requirement of completeness is that every function   may be

expanded in the basis as

14

With convergence of the series understood to be convergence in norm. Such a

representation of a function f is known as a wavelet series. This implies that an

ortho normal wavelet is self-dual.

1.9.2 DRAWBACKS

(1) Loss of edge details due to shift variant property.

1.10 KNN CLASSIFIER

In pattern recognition, the k-nearest neighbor algorithm (k-NN) is a method

for classifying objects based on closest training examples in the feature space. K-

NN is a type of instance-based learning, or lazy learning where the function is only

approximated locally and all computation is deferred until classification. The k-

nearest neighbor algorithm is amongst the simplest of all machine

learning algorithms: an object is classified by a majority vote of its neighbors, with

the object being assigned to the class most common amongst its k nearest

neighbors (k is a positive integer, typically small). If k = 1, then the object is

simply assigned to the class of its nearest neighbor.

The same method can be used for regression, by simply assigning the

property value for the object to be the average of the values of itsk nearest

neighbors. It can be useful to weight the contributions of the neighbors, so that the

nearer neighbors contribute more to the average than the more distant ones. (A

common weighting scheme is to give each neighbor a weight of 1/d, where d is the

distance to the neighbor. This scheme is a generalization of linear interpolation.)

15

1.10.1DRAWBACKS

(1)Less accuracy in classification

(2)Medical Resonance images contain a noise caused by operator performance

which can lead to serious inaccuracies classification.

1.10.2 PROPOSED METHOD

MRI Brain image Classification and anatomical structure analysis are proposed

based on,

(1)PNN-RBF Network for classification

(2)Fuzzy clustering for tumor detection and structural analysis

1.10.3 METHODOLOGIES

(1)Fast Discrete Curve let Decomposition

(2)GLCM Feature Extraction

(3)PNN -RBF Training and Classification

(4)Brain structural analysis

1.10.3 ADVANTAGES

(1) It can segment the Brain regions from the image accurately.

(2) It is useful to classify the Brain Tumor images for accurate detection.

(3)Brain Tumor will be detected in an early stages

16

CHAPTER 2

PROPOSED SYSTEM

CHAPTER 2

17

PROPOSED SYSTEM

2.1 FAST DISCRETE CURVELET TRANSFORMATION

Curvelets implementations are based on the original construction which

uses apre-processing step involving a special partitioning of phase-space followed

by the ridge let transform which is applied to blocks of data that are well localized

in space and frequency.

2.2 BLOCK DIAGRAM

In the last two or three years, however, Curvelets have actually been

redesigned in a effort to make them easier to use and understand. As a result, the

new construction is considerably simpler and totally transparent. What is

interesting here is that the new mathematical architecture suggests innovative

algorithmic strategies, and provides the opportunity to improve upon earlier

18

MRI Brain Image

Segmentation Process

Trained Network

Classification

If Input Is Abnormal

Fast discrete curvelet Decomposition

Training Samples Feature Extraction & PNN-RBF Training

Texture features Extraction

Tumor and structural part detection

Performance analysis

implementations. The two new fast discrete curvelet transforms (FDCTs) which

are simpler, faster, and less redundant than existing proposals:

(1) Curvelets via USFFT, and

(2)Curvelets via Wrapping.

The block size can be changed at each scale level. The wrapping construction is

shown is taken to be a Cartesian array and ˆ f[n1, n2 ] denotes its 2-D discrete

Fourier transform, then the architecture of the FDCT via wrapping is as follows.

1) Apply the 2-D FFT and obtain Fourier samples,

2) For each scale j and angle l, form the product

3) Wrap this product around the origin and obtain

4) Apply the inverse 2-D FFT to each ˜ fj,l , hence collecting the discrete

coefficient .

2.3 TEXTURE ANALYSIS

19

2.3.1OVERVIEW

Texture is that innate property of all surfaces that describes visual patterns,

each having properties of homogeneity. It contains important information about the

structural arrangement of the surface, such as; clouds, leaves, bricks, fabric, etc. It

also describes the relationship of the surface to the surrounding environment. In

short, it is a feature that describes the distinctive physical composition of a surface.

Texture properties include:

(1)Coarseness

(2)Contrast

(3)Directionality

(4)Line-likeness

(5)Regularity

(6)Roughness

Texture is one of the most important defining features of an image. It is

characterized by the spatial distribution of gray levels in a neighborhood. In order

to capture the spatial dependence of gray-level values, which contribute to the

perception of texture, a two-dimensional dependence texture analysis matrix is

taken into consideration. This two-dimensional matrix is obtained by decoding the

image file; jpeg, bmp, etc.

2.4 METHODS OF REPRESENTATION

There are three principal approaches used to describe texture; statistical,

structural and spectral…

20

(1)Statistical techniques characterize textures using the statistical properties

of the grey levels of the points/pixels comprising a surface image.

Typically, these properties are computed using: the grey level co-

occurrence matrix of the surface, or the wavelet transformation of the

surface.

(2)Structural techniques characterize textures as being composed of simple

primitive structures called “Texel’s” (or texture elements). These are

arranged regularly on a surface according to some surface arrangement

rules.

(3)Spectral techniques are based on properties of the Fourier spectrum and

describe global periodicity of the grey levels of a surface by identifying

high-energy peaks in the Fourier spectrum.

For optimum classification purposes, what concern us are the statistical

techniques of characterization… This is because it is these techniques that result in

computing texture properties… The most popular statistical representations of

texture are:

(1)Co-occurrence Matrix

(2)Tamura Texture

(3)Wavelet Transform

21

2.4.1 CO-OCCURRENCE MATRIX

Originally proposed by R.M. Haralick, the co-occurrence matrix

representation of texture features explores the grey level spatial dependence of

texture. A mathematical definition of the co-occurrence matrix is as follows:

- Given a position operator P(i,j),

- let A be an n x n matrix

- Whose element A[i][j] is the number of times that points with grey level

(intensity) g[i] occur, in the position specified by P, relative to points

with grey level g[j].

- Let C be the n x n matrix that is produced by dividing A with the total

number of point pairs that satisfy P. C[i][j] is a measure of the joint

probability that a pair of points satisfying P will have values g[i], g[j].

- C is called a co-occurrence matrix defined by P.

Examples for the operator P are: “i above j”, or “i one position to the right and two

below j”, etc.

This can also be illustrated as follows… Let t be a translation, then a co-occurrence

matrix Ctof a region is defined for every grey-level (a, b) by [1]:

22

Here, Ct(a, b) is the number of site-couples, denoted by (s, s + t) that are separated

by a translation vector t, with a being the grey-level of s, and b being the grey-level

of s + t.

For example; with an 8 grey-level image representation and a vector t that

considers only one neighbor, we would find [1]:

2.5 CLASSICAL CO-OCCURRENCE MATRIX

At first the co-occurrence matrix is constructed, based on the orientation and

distance between image pixels. Then meaningful statistics are extracted from the

matrix as the texture representation. Haralick proposed the following texture

features:

(1)Energy

(2)Contrast

(3)Correlation

(4)Homogeneity

Hence, for each Haralick texture feature, we obtain a co-occurrence matrix.

These co-occurrence matrices represent the spatial distribution and the dependence

of the grey levels within a local area. Each (i,j) th entry in the matrices, represents

the probability of going from one pixel with a grey level of 'i' to another with a

23

grey level of 'j' under a predefined distance and angle. From these matrices, sets of

statistical measures are computed, called feature vectors.

2.5.1 ENERGY

It is a gray-scale image texture measure of homogeneity changing, reflecting

the distribution of image gray-scale uniformity of weight and texture..

p(x,y) is the GLC M

2.5.2 CONTRAST

Contrast is the main diagonal near the moment of inertia, which measure the

value of the matrix is distributed and images of local changes in number, reflecting

the image clarity and texture of shadow depth.

2.5.3 CORRELATION COEFFICIENT

Measures the joint probability occurrence of the specified pixel pairs

Correlation: sum (sum ((x- μx)(y-μy)p(x , y)/σxσy))

HOMOGENEITY

Measures the closeness of the distribution of elements in the GLCM to the

GLCM diagonal.

Homogenity = sum (sum(p(x , y)/(1 + [x-y])))

24

CHAPTER 3

PROBABILISTIC NEURAL NETWORKS

25

CHAPTER 3

PROBABILISTIC NEURAL NETWORKS

3.1 NEURAL NETWORK

Neural networks are predictive models loosely based on the action of

biological neurons. The selection of the name “neural network” was one of the

great PR successes of the Twentieth Century. It certainly sounds more exciting

than a technical description such as “A network of weighted, additive values with

nonlinear transfer functions”. However, despite the name, neural networks are far

from “thinking machines” or “artificial brains”. A typical artificial neural network

might have a hundred neurons. In comparison, the human nervous system is

believed to have about 3x1010 neurons. We are still light years from “Data”.

The original “Perception” model was developed by Frank Rosenblatt in 1958.

Rosenblatt’s model consisted of three layers, (1) a “retina” that distributed inputs

to the second layer, (2) “association units” that combine the inputs with weights

and trigger a threshold step function which feeds to the output layer, (3) the output

layer which combines the values. Unfortunately, the use of a step function in the

neurons made the perceptions difficult or impossible to train. A critical analysis of

perceptions published in 1969 by Marvin Musky and Seymour Paper pointed out a

number of critical weaknesses of perceptions, and, for a period of time, interest in

perceptions waned.

Interest in neural networks was revived in 1986 when David Rumelhart,

Geoffrey Hinton and Ronald Williams published “Learning Internal

26

Representations by Error Propagation”. They proposed a multilayer neural network

with nonlinear but differentiable transfer functions that avoided the pitfalls of the

original perception’s step functions. They also provided a reasonably effective

training algorithm for neural networks.

In this paper, we investigate the possibility of utilizing the detection

capability of the neural networks as a diagnostic tool for early breast cancer. Breast

cancer is one of leading causes of women death in the world. Detection and

localization of tumor at early stages is the only way to decrease the mortality rate.

X-ray mammography is the most used diagnostic tool for breast cancer screening.

However, it has important limitations with. For example, it has high false -negative

and -positive rates that may reach up to 34%. Thus, the development of imaging

modalities which enhance, complement, or replace X-ray mammography has been

a priority in the medical imaging research. Recently, several microwave imaging

methods have been developed. Microwave ultra-wideband (UWB) imaging is

currently one of promising methods that are under investigation by many research

groups around the world. This method involves transmitting UWB signals through

the breast tissue and records the scattered signals from different locations

surrounding the breast using compact UWB antennas. The basis of this method is

the difference in the electrical properties between the tumors and the healthy

tissues.

It has been shown by different research groups worldwide that the healthy

fat tissues of the breast have a dielectric constant that ranges between S and 9 and

conductivity between 0.02 Sims and 0.2 S/m. On the hand, the malignant tissues

have a dielectric constant of around 60 and conductivity around 2 Sim. Thus, it is

clear that there is a significant contrast between the electrical characteristics of the

27

healthy and malignant tissues. From the electromagnetic point of view, this

contrast means a significant difference in the scattering properties of those tissues.

From the neural network point of view, it is possible to say that the difference in

the scattering properties of healthy and malignant tissues means that the scattered

signals recorded in different places around the breast have the signature of

existence of tumor. The comparison between those received signals recorded in

different places gives an indication about the place of tumor.

Reviewing the literature shows that most of the research on using neural

networks for breast cancer detection were aimed at enabling radiologist to make

accurate diagnoses of breast cancer on X-ray mammograms. For example, several

neural network methods were used in [IS] for automated classification of clustered

micro-calcifications in mammograms.

Concerning the UWB systems, the neural network has been used extensively

to detect the direction of arrival of ultra wideband signals. It is also utilized in a

simple manner to detect and locate tumor of 2.S mm radius along certain axis

(one dimension) and also in two dimensions but by rotating the place of receiving

the signal by steps of 1 degree around the breast.

This paper presents the use of neural network to detect and locate tumors in

breast with sizes down to 1 mm in radius by using four permanent probes around

the breast. A three dimensional breast model is built for the purpose of this

investigation. A pulsed plane electromagnetic wave is sent towards the breast and

the scattered signals are collected by four antennas surrounding the breast. The

transmitted pulse occupies the UWB frequency range from 3.1 GHz to 10.6 GHz.

28

A simple feed-forward back-propagation neural network is then used to detect the

tumor and locate its position.

3.2 THE USED NEURAL NETWORK

Neural network is the best tool in recognition and discrimination between

different sets of signals. To get best results using the neural network, it is necessary

to choose a suitable architecture and learning algorithm. Unfortunately there is no

guaranteed method to do that. The best way to do that is to choose what is expected

to be suitable according to our previous experience and then to expand or shrink

the neural network size until a reasonable output is obtained. In this work we tried

different sizes for the neural network using MA TLAB and we found that the best

in our case is the model shown in Fig. 2. It has an input layer with 2000 inputs,

first hidden layer with 11 nodes, and T ANSIG transfer function, second hidden

layer with 7 nodes, and T ANSIG transfer function, and output layer with

PURELIN transfer function and 2 outputs. One of the two outputs is used for the

detection of tumor, and the other for the localization. T ANSIG transfer function is

selected to limit the signal between -1 and1.

For the output layer, PURELIN transfer function is chosen to give all the

possible cases for the location of tumor. The proposed neural network has been

used to detect and locate the tumor in two cases. The first case was to detect and

locate a tumor in a two-dimensional sector of the breast model. The location of

tumor was considered randomly at the center and in any of the four quadrature .

The second case was to detect and locate tumor anywhere in the three-dimensional

model. The neural network has been trained using 100 sets of inputs using the

29

training function (TRAINSCG). Additional 40sets of inputs were used to test the

performance of each neural network.

3.3 TYPES OF NEURAL NETWORKS

1) Artificial Neural Network

2) Probabilistic Neural Networks

3) General Regression Neural Networks

DTREG implements the most widely used types of neural networks:

(1) Multilayer Perception Networks (also known as multilayer feed-forward

network),

(2) Cascade Correlation Neural Networks,

(3) Probabilistic Neural Networks (PNN)

(4) General Regression Neural Networks (GRNN).

3.4 RADIAL BASIS FUNCTION NETWORKS

(1) Functional Link Networks,

(2) Kohonen networks,

(3) Gram-Charier networks,

(4) Hebb networks,

(5) Adeline networks,

(6) Hybrid Networks.

30

3.5 THE MULTILAYER PERCEPTRON NEURAL NETWORK MODEL

The following diagram illustrates a perceptions network with three layers:

This network has an input layer (on the left) with three neurons, one hidden

layer (in the middle) with three neurons and an output layer (on the right) with

three neurons.

There is one neuron in the input layer for each predictor variable. In the case

of categorical variables, N-1 neurons are used to represent the N categories of the

variable.

3.5.1 INPUT LAYER

A vector of predictor variable values (x1...xp) is presented to the input layer.

The input layer (or processing before the input layer) standardizes these values so

31

that the range of each variable is -1 to 1. The input layer distributes the values to

each of the neurons in the hidden layer. In addition to the predictor variables, there

is a constant input of 1.0, called the bias that is fed to each of the hidden layers; the

bias is multiplied by a weight and added to the sum going into the neuron.

3.5.2 HIDDEN LAYER

Arriving at a neuron in the hidden layer, the value from each input neuron is

multiplied by a weight (wji), and the resulting weighted values are added together

producing a combined value uj. The weighted sum (uj) is fed into a transfer

function, σ, which outputs a value hj. The outputs from the hidden layer are

distributed to the output layer.

3.5.3 OUTPUT LAYER

Arriving at a neuron in the output layer, the value from each hidden layer

neuron is multiplied by a weight (wkj), and the resulting weighted values are added

together producing a combined value vj. The weighted sum (vj) is fed into a transfer

function, σ, which outputs a value yk. The y values are the outputs of the network.

If a regression analysis is being performed with a continuous target variable,

then there is a single neuron in the output layer, and it generates a single y value.

For classification problems with categorical target variables, there are N neurons in

the output layer producing N values, one for each of the N categories of the target

variable.

3.6 PROBABILISTIC NEURAL NETWORKS (PNN)

32

Probabilistic (PNN) and General Regression Neural Networks (GRNN) have

similar architectures, but there is a fundamental difference: Probabilistic networks

perform classification where the target variable is categorical, whereas general

regression neural networks perform regression where the target variable is

continuous. If you select a PNN/GRNN network, DTREG will automatically select

the correct type of network based on the type of target variable.

3.6.1 ARCHITECTURE OF A PNN

Fig 3.6

All PNN networks have four layers:

(1) Input layer - There is one neuron in the input layer for each predictor

variable. In the case of categorical variables, N-1 neurons are used where N

is the number of categories. The input neurons (or processing before the

input layer) standardizes the range of the values by subtracting the median

33

and dividing by the inter quartile range. The input neurons then feed the

values to each of the neurons in the hidden layer.

(2) Hidden layer - This layer has one neuron for each case in the training data

set. The neuron stores the values of the predictor variables for the case along

with the target value. When presented with the x vector of input values from

the input layer, a hidden neuron computes the Euclidean distance of the test

case from the neuron’s center point and then applies the RBF kernel function

using the sigma value(s). The resulting value is passed to the neurons in the

pattern layer.

(3)Pattern layer / Summation layer - The next layer in the network is different

for PNN networks and for GRNN networks. For PNN networks there is one

pattern neuron for each category of the target variable. The actual target

category of each training case is stored with each hidden neuron; the

weighted value coming out of a hidden neuron is fed only to the pattern

neuron that corresponds to the hidden neuron’s category. The pattern

neurons add the values for the class they represent (hence, it is a weighted

vote for that category).

For GRNN networks, there are only two neurons in the pattern layer.

One neuron is the denominator summation unit the other is the numerator

summation unit. The denominator summation unit adds up the weight values

coming from each of the hidden neurons. The numerator summation unit

adds up the weight values multiplied by the actual target value for each

hidden neuron.

34

(4)Decision layer - The decision layer is different for PNN and GRNN

networks. For PNN networks, the decision layer compares the weighted

votes for each target category accumulated in the pattern layer and uses the

largest vote to predict the target category.

For GRNN networks, the decision layer divides the value accumulated

in the numerator summation unit by the value in the denominator summation

unit and uses the result as the predicted target value.

The following diagram is actual diagram or propose network used in

our project

Fig 3.6b

(1) Input Layer:

The input vector, denoted as p, is presented as the black vertical bar. Its

dimension is R × 1. In this paper, R = 3.

35

(2) Radial Basis Layer:

In Radial Basis Layer, the vector distances between input vector p and the

weight vector made of each row of weight matrix W are calculated. Here, the

vector distance is defined as the dot product between two vectors [8]. Assume the

dimension of W is Q×R. The dot product between p and the i-th row of W

produces the i-th element of the distance vector ||W-p||, whose dimension is Q×1.

The minus symbol, “-”, indicates that it is the distance between vectors. Then, the

bias vector b is combined with ||W- p|| by an element-by-element

multiplication, .The result is denoted as n = ||W- p|| ..p. The transfer function in

PNN has built into a distance criterion with respect to a center. In this paper, it is

defined as radbas (n) = 2 n e- (1) each element of n is substituted into Eq. 1 and

produces corresponding element of a, the output vector of Radial Basis Layer. The

i-th element of a can be represented as ai = radbas (||Wi - p|| ..bi) (2) where Wi is

the vector made of the i-th row of W and bi is the i-th element of bias vector b.

Some characteristics of Radial Basis Layer:

The i-th element of a equals to 1 if the input p is identical to the ith row of input

weight matrix W. A radial basis neuron with a weight vector close to the input

vector p produces a value near 1 and then its output weights in the competitive

layer will pass their values to the competitive function. It is also possible that

several elements of a are close to 1 since the input pattern is close to several

training patterns.

36

(2)Competitive Layer:

There is no bias in Competitive Layer. In Competitive Layer, the vector a is

firstly multiplied with layer weight matrix M, producing an output vector d. The

competitive function, denoted as C in Fig. 2, produces a 1 corresponding to the

largest element of d, and 0’s elsewhere. The output vector of competitive function

is denoted as c. The index of 1 in c is the number of tumor that the system can

classify. The dimension of output vector, K, is 5 in this paper.

3.6.2 PNN NETWORK WORK

Although the implementation is very different, probabilistic neural networks

are conceptually similar to K-Nearest Neighbor (k-NN) models. The basic idea is

Fig 3.6.2 a

37

that a predicted target value of an item is likely to be about the same as other

items that have close values of the predictor variables. Consider this figure:

Assume that each case in the training set has two predictor variables, x and

y. The cases are plotted using their x,y coordinates as shown in the figure. Also

assume that the target variable has two categories, positive which is denoted by a

square and negative which is denoted by a dash. Now, suppose we are trying to

predict the value of a new case represented by the triangle with predictor values

x=6, y=5.1. Should we predict the target as positive or negative?

Notice that the triangle is position almost exactly on top of a dash

representing a negative value. But that dash is in a fairly unusual position

compared to the other dashes which are clustered below the squares and left of

center. So it could be that the underlying negative value is an odd case.

The nearest neighbor classification performed for this example depends on

how many neighboring points are considered. If 1-NN is used and only the closest

point is considered, then clearly the new point should be classified as negative

since it is on top of a known negative point. On the other hand, if 9-NN

classification is used and the closest 9 points are considered, then the effect of the

surrounding 8 positive points may overbalance the close negative point.

A probabilistic neural network builds on this foundation and generalizes it to

consider all of the other points. The distance is computed from the point being

evaluated to each of the other points, and a radial basis function (RBF) (also called

a kernel function) is applied to the distance to compute the weight (influence) for

38

each point. The radial basis function is so named because the radius distance is the

argument to the function.

Weight = RBF (distance)

The further some other point is from the new point, the less influence it has.

Fig 3.6.2b

3.7 RADIAL BASIS FUNCTION

Different types of radial basis functions could be used, but the most common

is the Gaussian function:

Fig 3.7

39

3.8 ADVANTAGES AND DISADVANTAGES OF PNN NETWORKS

(1) It is usually much faster to train a PNN/GRNN network than a multilayer

Perceptron network.

(2)PNN/GRNN networks often are more accurate than multilayer perceptron

Networks.

(3) PNN/GRNN networks are relatively insensitive to outliers (wild points).

(4) PNN networks generate accurate predicted target probability scores.

(5) PNN networks approach Bayes optimal classification.

(6) PNN/GRNN networks are slower than multilayer perceptron networks at

Classifying new cases.

(7)PNN/GRNN networks require more memory space to store the model.

3.9 REMOVING UNNECESSARY NEURONS

One of the disadvantages of PNN models compared to multilayer perceptron

networks is that PNN models are large due to the fact that there is one neuron for

each training row. This causes the model to run slower than multilayer perceptron

networks when using scoring to predict values for new rows.

DTREG provides an option to cause it remove unnecessary neurons from the

model after the model has been constructed.

Removing unnecessary neurons has three benefits:

(1)The size of the stored model is reduced.

(2)The time required to apply the model during scoring is reduced.

(3)Removing neurons often improves the accuracy of the model.

40

The process of removing unnecessary neurons is an iterative process. Leave-

one-out validation is used to measure the error of the model with each neuron

removed. The neuron that causes the least increase in error (or possibly the largest

reduction in error) is then removed from the model. The process is repeated with

the remaining neurons until the stopping criterion is reached.

When unnecessary neurons are removed, the “Model Size” section of the

analysis report shows how the error changes with different numbers of neurons.

You can see a graphical chart of this by clicking Chart/Model size.

There are three criteria that can be selected to guide the removal of neurons:

(1)Minimize error – If this option is selected, then DTREG removes neurons as

long as the leave-one-out error remains constant or decreases. It stops when

it finds a neuron whose removal would cause the error to increase above the

minimum found.

(2)Minimize neurons – If this option is selected, DTREG removes neurons until

the leave-one-out error would exceed the error for the model with all

neurons.

(3)# of neurons – If this option is selected, DTREG reduces the least significant

neurons until only the specified number of neurons remain.

41

3.10 PNN

NEWPNN creates a two layer network. The first layer has RADBAS

neurons, and calculates its weighted inputs with DIST, and its net input with

NETPROD. The second layer has COMPET neurons, and calculates its weighted

input with DOTPROD and its net inputs with NETSUM. Only the first layer has

biases. NEWPNN sets the first layer weights to P', and the first layer biases are

all set to 0.8326/SPREAD resulting in radial basis functions that cross 0.5 at

weighted inputs% of +/- SPREAD. The second layer weights W2 are set to T.

3.10.1 METHODOLOGY

A description of the derivation of the PNN classifier was given. PNNs had

been used for classification problems. The PNN classifier presented good

accuracy, very small training time, robustness to weight changes, and negligible

retraining time. There are 6 stages involved in the proposed model which are

starting from the data input to output. The first stage is should be the image

processing system. Basically in image processing system, image acquisition and

image enhancement are the steps that have to do. In this paper, these two steps are

skipped and all the images are collected from available resource. The proposed

model requires converting the image into a format capable of being manipulated by

the computer. The MR images are converted into matrices form by using

MATLAB. Then, the PNN is used to classify the MR images. Lastly, performance

based on the result will be analyzed at the end of the development phase.

42

CHAPTER 4

SPATIAL FUZZY CLUSTERING MODEL

43

SPATIAL FUZZY CLUSTERING MODEL

Fuzzy clustering plays an important role in solving problems in the areas of

pattern recognition and fuzzy model identification. A variety of fuzzy clustering

methods have been proposed and most of them are based upon distance criteria.

One widely used algorithm is the fuzzy c-means (FCM) algorithm. It uses

reciprocal distance to compute fuzzy weights. A more efficient algorithm is the

new FCFM. It computes the cluster center using Gaussian weights, uses large

initial prototypes, and adds processes of eliminating, clustering and merging. In the

following sections we discuss and compare the FCM algorithm and FCFM

algorithm.

Spatial Fuzzy C Means method incorporates spatial information, and the

membership weighting of each cluster is altered after the cluster distribution in the

neighborhood is considered. The first pass is the same as that in standard FCM

tocalculate the membership function in the spectral domain. In the second pass, the

membership information of each pixel is mapped to the spatial domain and the

spatial function is computed from that. The FCM iteration proceeds with the new

membership that is incorporated with the spatial function. The iteration is stopped

when the maximum difference between cluster centers or membership functions at

two successive iterations is less than a least threshold value.

The fuzzy c-means (FCM) algorithm was introduced by J. C. Bezdek [2].

The idea of FCM is using the weights that minimize the total weighted mean-

square error:

J(wqk, z(k)) = (k=1,K) (k=1,K) (wqk)|| x(q)- z(k)||2

(k=1,K) (wqk) = 1

44

wqk = (1/(Dqk)2)1/(p-1) / (k=1,K) (1/(Dqk)2)1/(p-1) , p > 1

The FCM allows each feature vector to belong to every cluster with a fuzzy

truth value (between 0 and 1), which is computed using Equation (4). The

algorithm assigns a feature vector to a cluster according to the maximum weight of

the feature vector over all clusters.

4.1 A NEW FUZZY C-MEANS IMPLEMENTATION

4.1.1 ALGORITHM FLOW

4.2 INITIALIZE THE FUZZY WEIGHTS

In order to comparing the FCM with FCFM, our implementation allows the

user to choose initializing the weights using feature vectors or randomly. The

process of initializing the weights using feature vectors assigns the first K init (user-

given) feature vectors to prototypes then computes the weights by Equation (4).

45

Input ImageNo of

ClustersInitialize Partition MatrixFind

Membership matrixFind the center Values

If Obj fun diff b/w two

iteration is

min

EndNo

Yes

Distance b/w data

and centers

Find objective function

Update Members

hip by spatial

function

4.3 STANDARDIZE THE WEIGHTS OVER Q

During the FCM iteration, the computed cluster centers get closer and closer.

To avoid the rapid convergence and always grouping into one cluster, we use

w[q,k] = (w[q,k] – wmin)/( wmax – wmin)

before standardizing the weights over Q. Where wmax, wmin are maximum or

minimum weights over the weights of all feature vectors for the particular class

prototype.

4.4 ELIMINATING EMPTY CLUSTERS

After the fuzzy clustering loop we add a step (Step 8) to eliminate the empty

clusters. This step is put outside the fuzzy clustering loop and before calculation of

modified XB validity. Without the elimination, the minimum distance of prototype

pair used in Equation (8) may be the distance of empty cluster pair. We call the

method of eliminating small clusters by passing 0 to the process so it will only

eliminate the empty clusters.

After the fuzzy c-means iteration, for the purpose of comparison and to pick

the optimal result, we add Step 9 to calculate the cluster centers and the modified

Xie-Beni clustering validity :

The Xie-Beni validity is a product of compactness and separation measures.

The compactness-to-separation ratio is defined by Equation (6).

= {(1/K)(k=1,K)k2}/Dmin

2

k2 =(q=1,Q) wqk || x(q) – c(k) ||2

Dmin is the minimum distance between the cluster centers.

46

The variance of each cluster is calculated by summing over only the members of

each cluster rather than over all Q for each cluster, which contrasts with the

original Xie-Beni validity measure.

k2 ={q: q is in cluster k} wqk || x(q) – c(k) ||2

The spatial function is included into membership function as given in Equation

4.5 MORPHOLOGICAL PROCESS

Morphological image processing is a collection of non-linear operations

related to the shape or morphology of features in an image. Morphological

operations rely only on the relative ordering of pixel values, not on their numerical

values, and therefore are especially suited to the processing of binary images.

Morphological operations can also be applied to grey scale images such that their

light transfer functions are unknown and therefore their absolute pixel values are of

no or minor interest.

Morphological techniques probe an image with a small shape or template

called a structuring element. The structuring element is positioned at all possible

locations in the image and it is compared with the corresponding neighbourhood of

pixels. Some operations test whether the element "fits" within the neighbourhood,

while others test whether it "hits" or intersects the neighbourhood:

A morphological operation on a binary image creates a new binary image in which

the pixel has a non-zero value only if the test is successful at that location in the

input image.

47

A morphological operation on a binary image creates a new binary image in

which the pixel has a non-zero value only if the test is successful at that location in

the input image.

The structuring element is a small binary image, i.e. a small matrix of

pixels, each with a value of zero or one:

The matrix dimensions specify the size of the structuring element.

The pattern of ones and zeros specifies the shape of the structuring element.

An origin of the structuring element is usually one of its pixels, although generally

the origin can be outside the structuring element.

fig 4.5 Examples of simple structuring elements.

A common practice is to have odd dimensions of the structuring matrix and

the origin defined as the centre of the matrix. Structuring elements play in

morphological image processing the same role as convolution kernels in linear

image filtering.

When a structuring element is placed in a binary image, each of its pixels is

associated with the corresponding pixel of the neighbourhood under the structuring

element. The structuring element is said to fit the image if, for each of its pixels set

to 1, the corresponding image pixel is also 1. Similarly, a structuring element is

48

said to hit, or intersect, an image if, at least for one of its pixels set to 1 the

corresponding image pixel is also 1.

fig 4.5a Fitting and hitting of a binary image with structuring elements s1 and s2.

Zero-valued pixels of the structuring element are ignored, i.e. indicate points where

the corresponding image value is irrelevant.

4.6 FUNDAMENTAL OPERATIONS

More formal descriptions and examples of how basic morphological

operations work are given in the Hypermedia Image Processing Reference (HIPR)

developed by Dr. R. Fisher et al. at the Department of Artificial Intelligence in the

University of Edinburgh, Scotland, UK.

4.7 EROSION AND DILATION

The erosion of a binary image f by a structuring element s (denoted f s)

produces a new binary image g = f s with ones in all locations (x,y) of a

structuring element's origin at which that structuring element s fits the input image

f, i.e. g(x,y) = 1 is s fits f and 0 otherwise, repeating for all pixel coordinates (x,y).

Erosion with small (e.g. 2×2 - 5×5) square structuring elements shrinks an

image by stripping away a layer of pixels from both the inner and outer boundaries

of regions. The holes and gaps between different regions become larger, and small

details are eliminated:

49

fig 4.7 Erosion: a 3×3 square structuring element

Larger structuring elements have a more pronounced effect, the result of

erosion with a large structuring element being similar to the result obtained by

iterated erosion using a smaller structuring element of the same shape. If s1 and s2

are a pair of structuring elements identical in shape, with s2 twice the size of s1,

thenf s2 ≈ (f s1) s1.

Erosion removes small-scale details from a binary image but simultaneously

reduces the size of regions of interest, too. By subtracting the eroded image from

the original image, boundaries of each region can be found: b = f − (f s ) where f

is an image of the regions, s is a 3×3 structuring element, and b is an image of the

region boundaries.

The dilation of an image f by a structuring element s (denoted f s)

produces a new binary image g = f s with ones in all locations (x,y) of a

structuring element's orogin at which that structuring element s hits the the input

image f, i.e. g(x,y) = 1 if s hits f and 0 otherwise, repeating for all pixel coordinates

(x,y). Dilation has the opposite effect to erosion -- it adds a layer of pixels to both

the inner and outer boundaries of regions.

The holes enclosed by a single region and gaps between different regions

become smaller, and small intrusions into boundaries of a region are filled in:

50

fig 4.7a Dilation: a 3×3 square structuring element

Results of dilation or erosion are influenced both by the size and shape of a

structuring element. Dilation and erosion are dual operations in that they have

opposite effects. Let f c denote the complement of an image f, i.e., the image

produced by replacing 1 with 0 and vice versa. Formally, the duality is written as

f s = f c srot

where srot is the structuring element s rotated by 180 . If a structuring

element is symmetrical with respect to rotation, then srot does not differ from s. If a

binary image is considered to be a collection of connected regions of pixels set to 1

on a background of pixels set to 0, then erosion is the fitting of a structuring

element to these regions and dilation is the fitting of a structuring element (rotated

if necessary) into the background, followed by inversion of the result.

51

CHAPTER 5

SIMULATED RESULTS

52

5.1 MRI Brain Image with Benign Case

(1) Input Image with its curvelet Decomposition

IMAGE SEGMENTATION

53

Fig. 5.1.1 MRI Brain Image with Benign Case

5.2MRI Brain with Malignant Case

Fig 5.2.1MRI Brain with Malignant Case

IMAGE SEGMENTATION

54

Fig 5.2.2MRI Brain with Malignant Case

5.3 MRI Brain with Normal Case

Fig 5.3.1MRI Brain with Normal Case

IMAGE SEGMENTATION

55

Fig 5.3.2 MRI Brain with Normal Case

CONCLUSION

The project presented that automated brain image classification for early

stage abnormality detection with use of neural network classifier and spotting of

tumor was done with image segmentation. Pattern recognition was performed

using probabilistic neural network with radial basis function and pattern will be

characterized with the help of fast discrete curvelet transform and haralick features

analysis. Here. Spatial fuzzy clustering algorithm was utilized effectively for

accurate tumor detection to measure the area of abnormal region. From an

experiment, system proved that it provides better classification accuracy with

various stages of test samples and it consumed less time for process.

56

REFERENCES

[1] N. Kwak, and C. H. Choi, “Input Feature Selection for Classification

Problems”, IEEE Transactions on Neural Networks, 13(1), 143–159, 2002.

[2] E. D. Ubeyli and I. Guler, “Feature Extraction from Doppler Ultrasound

Signals for Automated Diagnostic Systems”, Computers in Biology and

Medicine, 35(9), 735–764, 2005.

[3] D.F. Specht, “Probabilistic Neural Networks for Classification, mapping, or

associative memory”, Proceedings of IEEE International Conference on

Neural Networks, Vol.1, IEEE Press, New York, pp. 525-532, June 1988.

[4] D.F. Speech, “Probabilistic Neural Networks” Neural Networks, vol. 3,

No.1, pp. 109-118, 1990.

57

[5] Georgia is. Et all , “ Improving brain tumor characterization on MRI by

probabilistic neural networks and non-linear transformation of textural

features”, Computer Methods and program in biomedicine, vol 89, pp24-32,

2008

[6] Klaus M., “Automated segmentation of MRI brain tumors”, Journal of

Radiology ,vol. 218, pp. 585-591,2001

[7] Kernel, P., Bela, M., Rainer, S., Zalan, D., Zsolt, T. and Janos, F.,

“Application of neural network in medicine”, Diag. Med. Tech.,vol. 4,issue

3,pp: 538-54 ,1998.

[8] Mammadagha Mammadov, Engin tas ,”An improved version of back

propagation algorithm with effective dynamic learning rate and momentum “

Inter.Conference on Applied Mathematics ,pp:356-361, 2006.

[9] Messen W, Wehrens R, Buydens L, “Supervised Kohonen networks for

classification problems“,Chemometrics and Intelligent Laboratory Systems,

vol.83,pp:99-113,2006.

58