feature selection for traditional malay musical instruments sounds classification using rough set

8/7/2019 Feature Selection for Traditional Malay Musical Instruments Sounds Classification using Rough Set

1/13

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 72Feature Selection for Traditional MalayMusical Instruments SoundsClassification using Rough SetNorhalina Senan, Rosziati Ibrahim, Nazri Mohd Nawi,Iwan Tri Riyadi Yanto and Tutu t Herawan

Abstract Finding the most relevant features are crucial in data mining task including musical instruments soundsclassification problem. Various feature selection techniques have been proposed in this domain focusing on Western musical

instruments. However, study on rough set theory for feature selection of non-Western musical instruments sounds is

insufficient and still needs further exploration. Thus, in this paper, an alternative feature selection technique using maximum

attributes dependency based on rough set theory for Traditional Malay musical instruments sounds is proposed. The

modelling process comprises eight phases: data acquisition, sound editing, data representation, feature extraction, data

discretization, data cleansing, feature selection using the proposed technique and finally features evaluation via classifier.

The results show that the selected features generated from the proposed technique able to reduce the complexity process

and improve the classification performancesignificantly.Index Terms Rough set theory; Dependency attribute; Feature selection, Classification, Traditional Malay musicalinstruments sounds.

1 INTRODUCTIONITH the growing volume of digital audio data

and feature schemes, feature selection has be-

come very vital aspect in musical instruments sounds

classification problems. In general, the purpose of the

feature selection is to alleviate the effect of the curse of

dimensionality. While, from the classification point ofview, the main idea of feature selection is to construct

an efficient and robust classifier. It has been proven in

practice that the optimal classifier difficult to classify

accurately if the poor features are presented as the

input. This is because some of the input features have

poor capability to split among different classes and

some are highly correlated [1]. As a consequence, the

overall classification performance might decrease

with this large number of features available. For that,

finding only relevant subset of features may signifi-

cantly reduce the complexity process and improve theclassification performance by eliminating irrelevant

and redundant features.

This shows that the problem of feature selection must

be addressed appropriately. For that, various feature

selection algorithms in musical instruments sounds

classification have been proposed by several research-

ers [1,2,3]. Liu and Wan [1] carried out a study on

classifying the musical instruments into five families

(brass, keyboard, percussion, string and woodwind)

using NN, k-NN and Gaussian mixture model

(GMM). Three categories of features schemes whichare temporal features, spectral features and coefficient

features (with total of 58 features) were exploited. A

sequential forward selection (SFS) is used to choose

the best features. The k-NN classifier using 19 features

achieves the highest accuracy of 93%. In [2], they con-

ducted a study on selecting the best features schemes

based on their classification performance. The 44 fea-

tures from three categories of features schemes which

are human perception, cepstral features and MPEG-7

were used. To select the best features, three entropy-

based feature selection techniques which are Informa-tion Gain, Gain Ratio and Symmetrical Uncertainty

were utilized. The performance of the selected fea-

tures was assessed and compared using five classifi-

ers which are k-nearest neighbor (k-NN), naive bayes,

support vector machine (SVM), multilayer perceptron

(MLP) and radial basic functions (RBF). They found

that the Information Gain produce the best classifica-

tion accuracy up to 95.5% for the 20 best features with

SVM and RBF classifiers. Benetos et al. [3] applied

subset selection algorithm with branch-bound search

strategy for feature reduction. A combination of 41features from general audio data, MFCC and MPEG-7

was used. By using the best 6 features, the non-

negative matrix factorization (NMF) classifier yielded

an accuracy rate of 95.2% at best. They found that the

N. Senan is with the Universiti Tun Hussein Onn Malaysia, 86400 BatuPahat, Johor, Malaysia.

R. Ibrahim is with the Universiti Tun Hussein Onn Malaysia, 86400 BatuPahat, Johor, Malaysia.

N.M. Nawi is with the Universiti Tun Hussein Onn Malaysia, 86400 Batu

Pahat, Johor, Malaysia. I.T.R. Yanto is with the Universitas Ahmad Dahlan, Yogyakarta 55166,

Indonesia. T. Herawan is with the Universiti Malaysia Pahang, 26300 Gambang

Kuantan. Malaysia.

W


2/13



WWW.JOURNALOFCOMPUTING.ORG 73

feature subset selection method adopted in their

study able to increase the classification accuracy. In

overall, all these works demonstrate that the reduced

features able to produce highest classification rate

with less computational time. On the other hand,

Deng et al. [2] claimed that benchmarking is still an

open issue in this area of research. This shows that the

existing feature selection approaches applied in the

various sound files may not effectively work to other

conditions. Therefore, there were significant needs to

explore other feature selection methods with different

types of musical instruments sounds in order to find

the best solution.

One of the potential techniques for dealing with this

problem is based on the rough set theory. The theory

of rough set proposed by Pawlak in 1980s [4] is a ma-

thematical tool for dealing with the vague and uncer-tain data. Rough sets theory is one of the useful tools

for feature selection [5,6,7]. Banerjee et al. [5] claimed

that the concept of core in rough set is relevant in fea-

ture selection to identify the essential features

amongst the non-redundant ones. The attractive cha-

racteristics of rough set in tackling the problem of

imprecision, uncertainty, incomplete, irrelevant or

redundancy in the large dataset, has magnificently

attracted researchers in wide areas of data mining

domain to utilize rough set for feature selection.

However, to date, a study on rough sets for featureselection of musical instruments sounds classification

is scarce and still needs an intensive research. It is

well-known that one of the most crucial aspects of

musical instruments sounds classification is to find

the best features schemes. With the special capability

of rough set for feature selection, we are going to ap-

ply this technique in musical instruments sounds

classification to overcome this issue.

In this paper, an alternative feature selection tech-

nique based on rough set theory for Traditional Malay

musical instruments sounds classification is proposed.This technique is developed based on rough set ap-

proximation using maximum degree of dependency

of attributes proposed by [8]. The idea of this tech-

nique is to choose the most significant features by

ranking the relevant features based on the highest

dependency of attributes on the dataset and then re-

move the redundant features with the similar depen-

dency value. To accomplish this study, the quality of

the instruments sounds is first examined. Then, the 37

features from two combination of features schemes

which are perception-based and Mel-Frequency Cep-stral Coefficients (MFCC) are extracted [9].In order to

employ the rough set theory, this original dataset

(continuous values) is then discritized into categorical

values by using equal width and equal frequency bin-

ning algorithm [10]. Afterwards, data cleansing

process is done to remove the irrelevant features. The

propose technique is then adopted to rank and select

the best feature set from the large number of features

available in the dataset. Finally, the performance of

the selected features in musical instruments sounds

classification is further evaluated with two classifiers

which are rough set and Multi-Layer Perceptron

(MLP).

The rest of this paper is organized as follows: Section

2 presents the theory of rough set. Section 3 describes

the details of the modelling process. A discussion of

the result is presented in Section 4 followed by the

conclusion in Section 5.

2 ROUGH SET THEORYIn the 1980s, Pawlak [4] introduced rough set theory

to deal the problem of imprecise knowledge. Similarly

to fuzzy set theory it is not an alternative to classical

set theory but it is embedded in it. Fuzzy and rough

sets are not competitively, but complementary each

other [11,12]. Rough set theory has attracted attention

of many researchers and practitioners all over the

world, who contributed essentially to its development

and applications. The original goal of the rough set

theory is induction of approximations of concepts.The idea consists of approximation of a subset by a

pair of two precise concepts called the lower approxi-

mation and upper approximation. Intuitively, the lower

approximation of a set consists of all elements that

surely belong to the set, whereas the upper approxi-

mation of the set constitutes of all elements that pos-

sibly belong to the set. The difference of the upper

and the lower approximation is a boundary region. It

consists of all elements that cannot be classified uni-

quely to the set or its complement, by employing

available knowledge. Thus any rough set, in contrastto a crisp set, has a non-empty boundary region. Mo-

tivation for rough set theory has come from the need

to represent a subset of a universe in terms of equiva-

lence classes of a partition of the universe. In this Sec-

tion, the basic concepts of rough set theory in terms of

data are presented.

2.1 Information SystemData are often presented as a table, columns of which

are labeled by attributes, rows by objects of interest and

entries of the table are attribute values. By an informa-tion system , a 4-tuple (quadruple) ( )fVAUS ,,,= ,where U is a non-empty finite set of objects, A is a


3/13




non-empty finite set of attributes, Aa aVV = , aV isthe domain (value set) of attribute a, VAUf : is

a total function such that ( )a

Vauf , , for every

( ) AUau , , called information (knowledge) func-

tion. An information system is also called a know-

ledge representation systems or an attribute-valued

system and can be intuitively expressed in terms of an

information table (refer to Table 1).

In many applications, there is an outcome of classifi-

cation that is known. This a posteriori knowledge is

expressed by one (or more) distinguished attribute

called decision attribute; the process is known as su-

pervised learning. An information system of this kind

is called a decision system. A decision system is an in-

formation system of the form { }( )fVdAUD ,,, = ,

where Ad

is the decision attribute. The elements ofAare called condition attributes. A simple example of

decision system is given in Table 2.

TABLE1AN INFORMATION SYSTEMU 1a 2a Aa

1u ( )

11,auf ( )

21,auf Aauf ,1

2u ( )

12,auf ( )

22,auf Aauf ,2

Uu

1,auf

U

2,auf

U AU auf ,

Example 2.1. Suppose there are given data about 6

students, as shown in Table 2.

TABLE2ADECISION SYSTEMStudent Analysis Algebra Statistics Decision

1 bad good medium accept

2 good bad medium accept

3 good good good accept

4 bad good bad reject

5 good bad medium reject

6 bad good good accept

From Table 2, it has

{ }6,5,4,3,2,1=U ,{ } { } DCA === DecisionStatisticsAlgebra,Analysis,

,

{ }goodbad,Analysis

=V ,

{ }goodbad,Algebra

=V ,

{ }goodmedium,bad,Statistics

=V ,

{ }rejectaccept,Decision

=V .

A relational database may be considered as an infor-

mation system in which rows are labeled by the ob-

jects (entities), columns are labeled by attributes and

the entry in row u and column a has the value ( )auf , .

It is note that a each map ( ) VAUauf :, is a tup-

ple ( ) ( ) ( )Aiiiii

aufaufaufauft ,,,,,,,,321= , for

Ui 1 , where X is the cardinality of X. Note that

the tuple t is not necessarily associated with entity

uniquely (refer to students 2 and 5 in Table 2). In an

information table, two distinct entities could have the

same tuple representation (duplicated/redundant

tuple), which is not permissible in relational databases.

Thus, the concepts in information systems are a gene-

ralization of the same concepts in relational databases.

2.2 Indiscernibili ty RelationFrom Table 2, note that students 2, 3 and 5 are indis-

cernible (similar or indistinguishable) with respect to

the attribute Analysis. Meanwhile, students 3 and 6

are indiscernible with respect to attributes Algebra

and Decision, and students 2 and 5 are indiscernible

with respect to attributes Analysis, Algebra and Statis-

tics. The starting point of rough set theory is the in-

discernibility relation, which is generated by informa-

tion about objects of interest. The indiscernibility rela-

tion is intended to express the fact that due to the lack

of knowledge it is difficult to discern some objects

employing the available information. That means, in

general, it is unable to deal with single objects but

clusters of indiscernible objects must be considered.

Now the notion of indiscernibility relation between

two objects can be defined precisely.

Definition 2.1. Let ( )fVAUS ,,,= be an informationsystem and let B be any subset of A. Two elements Uyx ,

are said to be B-indiscernible (indiscernible by the set of

attribute AB in S) if and only if ( ) ( )ayfaxf ,, = , forevery Ba .

Obviously, every subset ofA induces unique indiscer-

nibility relation. Notice that, an indiscernibility rela-

tion induced by the set of attribute B , denoted by( )BIND , is an equivalence relation. It is well known

that, an equivalence relation induces unique partition.


4/13




The partition of U induced by ( )BIND in

( )fVAUS ,,,= denoted by BU/ and the equivalenceclass in the partition BU/ containing Ux , denoted

by [ ]Bx .

Given arbitrary subset UX , in general, X as unionof some equivalence classes in U might be not pre-

sented. It means that, it may not be possible to de-

scribe X precisely inAS . X might be characterized by

a pair of its approximations, called lower and upper

approximations. It is here that the notion of rough set

emerges.

2.3 Set ApproximationsThe indiscernibility relation will be used next to de-

fine approximations, basic concepts of rough set

theory. The notions of lower and upper approxima-tions of a set can be defined as follows.

Definition 2.2. Let ( )fVAUS ,,,= be an informationsystem, let B be any subset of A and let X be any subset of

U. The B-lower approximation of X, denoted by ( )XB and

B-upper approximations of X, denoted by ( )XB , respective-

ly, are defined by

( ) [ ] XxUxXBB= and

( ) [ ]{ }= XxUxXB B .

The accuracy of approximation (accuracy of rough-

ness) of any subset UX with respect to AB , de-

noted ( )XB

is measured by

( )( )

( )XB

XBX

B= ,

where X denotes the cardinality of X. For empty set

, ( ) 1=B

is defined. Obviously, ( ) 10 XB

. If X

is a union of some equivalence classes of U , then

( ) 1=XB

. Thus, the set X is crisp (precise) with re-

spect to B. And, if X is not a union of some equiva-

lence classes of U , then ( ) 1


5/13




1=k . Otherwise, D is partially depends on C. Thus, D

fully (partially) depends on C , if all (some) elements

of the universe Ucan be uniquely classified to equiva-

lence classes of the partition DU/ , employing C.

Example 2.3. From Table 2, there are no total depen-dencies whatsoever. If in Table 1, the value of the

attribute Statistics for student 5 were bad instead of

medium, there would be a total dependency

{ } { }DecisionStatistics , because to each value of the

attribute Statistics there would correspond unique

value of the attribute Decision. For example, for de-

pendency { } { }DecisionStatisticsAlgebra,Analysis, ,

3

2

6

4==k is obtained, because four out of six students

can be uniquely classified as having Decision or not,

employing attributes Mathematics, Algebra and Sta-tistics.

Note that, a table may be redundant in two ways. The

first form of redundancy is easy to notice: some ob-

jects may have the same features. This is the case for

tuples 2 and 3 of Table 2. A way of reducing data size

is to store only one representative object for every set

of so-called indiscernible tuples as in Definition 2.1.

The second form of redundancy is more difficult to

locate, especially in large data tables. Some columns

of a table may be erased without affecting the classifi-cation power of the system. This concept can be ex-

tended also to information systems, where the condi-

tional and decision attributes are do not distin-

guished. Using the entire attribute set for describing

the property is time-consuming, and the constructed

rules may be difficult to understand, to apply or to

verify [13]. In order to deal with this problem,

attribute reduction is required. The objective of reduc-

tion is to reduce the number of attributes, and at the

same time, preserve the property of information.

2.5 Reducts and CoreA reduct is a minimal set of attributes that preserve the

indiscernibility relation. A core is the common parts of

all reducts. In order to express the above idea more

precisely, some preliminaries definitions are needed.

Definition 2.5. Let ( )fVAUS ,,,= be an informationsystem and let B be any subsets of A and let a belongs to B.

It say that a is dispensable (superfluous) in B if

{ }( ) BUbBU // = , otherwise a is indispensable in B.

To further simplification of an information system,

some dispendable attributes from the system can be

eliminated in such a way that the objects in the table

are still able to be discerned as the original one.

Definition 2.6. Let ( )fVAUS ,,,= be an informationsystem and let B be any subsets of A. B is called indepen-

dent (orthogonal) set if all its attributes are indispensable.

Definition 2.7. Let ( )fVAUS ,,,= be an informationsystem and let B be any subsets of A. A subset *B of B is a

reduct of B if *B is independent and BUBU /*/ = .

Thus a reduct is a set of attributes that preserves parti-

tion. It means that a reduct is the minimal subset of

attributes that enables the same classification of ele-

ments of the universe as the whole set of attributes. In

other words, attributes that do not belong to a reduct

are superfluous with regard to classification of ele-ments of the universe. While computing equivalence

classes is straighforward, but the problem of finding

minimal reducts in information systems is NP-hard.

Reducts have several important properties. One of

them is a core.

Definition 2.8. Let ( )fVAUS ,,,= be an informationsystem and let B be any subsets of A.The intersection off

all reducts of is called the core of B, i.e.,

( ) ( ) BB RedCore = ,

Thus, the core of B is the set off all indispensable

attributes of B. Because the core is the intersection of

all reducts, it is included in every reduct, i.e., each

element of the core belongs to some reduct. Thus, in a

sense, the core is the most important subset of

attributes, for none of its elements can be removed

without affecting the classification power of

attributes.

Example 2.4. To illustrate in finding the reducts and

core, the information system as shown in Table 3 is

considered. The information system is modified from

Example 3 as in [14].


6/13



WWW.JOURNALOFCOMPUTING.ORG 77TABLE3A MODIFIED INFORMATION SYSTEM [14]# A B C D

1 low bad loss small

2 low good loss large

3 high good loss medium

4 high good loss medium

5 low good loss large

Let { }DCBAX ,,,= , { }CBAX ,,1= and { }DCX ,

2= .

These sets of attributes produce the following

partitions

{ } { } { } { }{ }5,4,3,2,1/ =XU , { } { } { } { }{ }5,4,3,2,1/ 1 =XU and

{ } { } { } { }{ }5,4,3,2,1/ 2 =XU ,

respectively.

Therefore, by Definition 2.5, the sets { }D and{ }BA,

are dispensable (superfluous). From definition 2.6, the

sets1

X and2

X are independent (orthogonal). Hence,

from Definition 2.7, conforming that1

X and2

X are

reducts of X . Furthermore, from Definiton 2.8, the

intersection { }CXX =21

is the core of X .

3 THE MODELING PROCESSIn this section, the process of this study is presented.

There are seven main phases which are data acquisi-

tion, sound editing, data representation, feature ex-

traction, data discretization, data cleansing and fea-

ture selection using proposed technique. Figure 1 illu-

strates the phases of this process. To conduct this

study, the proposed model is implemented in MAT-

LAB version 7.6.0.324 (R2008a). It is executed on a

processor Intel Core 2 Duo CPUs. The total main

memory is 2 gigabytes and the operating system is

Windows Vista. The details of the modelling process

as follows:

3.1 Data Acquisition, Sound Editing, DataRepresentation and Feature ExtractionThe 150 sounds samples of Traditional Malay musical

instruments were downloaded from personal [15] and

Warisan Budaya Malaysia web page [16]. The dataset

comprises four different families which are membra-

nophones, idiophones, aerophones and chordo-

phones. The distribution of the sounds into families is

shown in Table 4. This original dataset is non-

benchmarking (real work) data. The number of the

original sounds per family is imbalance which also

differs in term of the lengthwise. It is well-known that

the quality of the data is one of the factors that might

affect the overall classification task. To this, the data-

set is firstly edited and trimmed. Afterwards, two cat-

egories of features schemes which are perception-

based and MFCC features were extracted. All 37 ex-

tracted features from these two categories are shown

in Table 5. The first 1-11 features represent the percep-

tion-based features and 12-37 are MFCCs features.

The mean and standard deviation were then calcu-

lated for each of these features. In order to avoid bi-

ased classification, the dataset is then eliminated to

uniform size. The details of these phases can be found

in [17].

Fig 1. The modelling process for feature selection of the Tradi-tional Malay musical instruments sounds classification.

TABLE4DATA SETSFamily Instrument

Membranophone Kompang, Geduk,

Gedombak, Gendang,

Rebana, Beduk, Jidur,

Marwas, Nakara

Idiophone Gong, Canang, Kesi,

Saron, Angklung, Cak-

lempong, Kecerik, Kem-

pul, Kenong, Mong,

Mouth Harp

Aerophone Serunai, Bamboo Flute,

Nafiri, Seruling Buluh

Chordophone Rebab, Biola, Gambus


7/13




TABLE5FEATURES DESCRIPTIONSNo Feature Description

1 ZC Zero Crossing

2 MEANZCR Mean of Zero Crossings Rate

3 STDZCR Standard Deviation of Zero

4 MEANRMS Mean of Root-Mean-Square

5 STDRMS Standard Deviation of Root-

6 MEANC Mean of Spectral Centroid

7 STDC Standard Deviation of Spectral

8 MEANB Mean of Bandwidth

9 STDB Standard Deviation of Band-

10 MEANFLUX Mean of Flux

11 STDFLUX Standard Deviation of Flux

12 MMFCC1 Mean of the MFCCs #1













25 SMFCC1 Standard Deviation of the






31 SMFCC7 Standard Deviation of the32 SMFCC8 Standard Deviation of the





37 MFCC13 Standard Deviation of the

3.2 Data DiscretizationThe features (attributes) extracted in the dataset is in

the form of continuous value with non-categorical

features (attributes). In order to employ the rough set

approach in the proposed technique, it is essential to

transform the dataset into categorical ones. For that,

the discretization technique known as the equal width

binning in [10] is applied. In this study, this unsuper-

vised method is modified to be suited in the classifica-

tion problem. The algorithm first sort the continuous

valued attribute, then the minimummin

x and the

maximum maxx of that attribute is determined. Theinterval width, w, is then calculated by:

*

minmax

k

xxw

= ,

where, *k is a user-specified parameter for the num-

ber of intervals to discretize of each target class. The

interval boundaries are specified asi

wx +min

, where

1,,2,1 = ki . Afterwards, the equal frequency bin-

ning method is used to divide the sorted continuousvalues into k interval where each interval contains

approximately kn / data instances with adjacent val-

ues of each class. In this study, the difference of k val-

ue (from 2 to 10) is examined. The purpose is to iden-

tify the best k value which able to produce highest

classification rate. For that, rough set classifier is used.

3.3 Data Cleansing using Rough SetAs mentioned in Section 1, the dataset used in this

study is raw data obtained from multiple resources

(non-benchmarking data). In sound editing and datarepresentation phases, the reliability of the dataset

used have been assessed. However, the dataset may

contain irrelevant features. Generally, the irrelevant

features present in the dataset are features that having

no impact on processing performance. However, the

existence of these features in the dataset might in-

crease the response time. For that, in this phase, the

data cleansing process based on rough sets approach

explained in sub-section 2.5 is performed to eliminate

the irrelevant features from the dataset.

3.4 The Proposed TechniqueIn this phase, the construction of the feature selection

technique using rough set approximation in an infor-

mation system based on dependency of attributes is

presented. The idea of this technique is derived from

[8]. The relation between the properties of roughness

of a subset UX with the dependency between two

attributes is firstly presented as in Proposition 3.1.

Proposition 3.1. Let ( )fVAUS ,,,= be an information

system and let D and C be any subsets of A. If D dependstotally on C, then


8/13




( ) ( )XXCD

,

for every .UX

Proof. Let D and C be any subsets ofA in information

system ( )fVAUS ,,,= . From the hypothesis, the in-

clusion ( ) ( )DINDCIND holds. Furthermore, thepartition CU/ is finer than that DU/ , thus, it is clear

that any equivalence class induced by ( )DIND is a

union of some equivalence class induced by ( )CIND .

Therefore, for every UXx , the property of equi-

valence classes is given by

[ ] [ ]DC xx .

Hence, for every UX , we have the following rela-

tion

( ) ( ) ( ) ( )XDXCXXCXD .

Consequently,

( )( )

( )

( )

( )( )X

XC

XC

XD

XDX

CD == .

The generalization of Proposition 3.1 is given below.

Proposition 3.2. Let ( )fVAUS ,,,= be an informationsystem and let

nCCC ,,,

21 and D be any subsets of A. If

DCDCDCnknkk

,,,21 21

, where

121kkkk

nn

, then

( ) ( ) ( ) ( ) ( )XXXXXCCCCD nn 121

,

for every .UX

Proof. Let nCCC ,,, 21 and D be any subsets of A in

information system S. From the hypothesis and Prop-

osition 3.1, the accuracies of roughness are given as

( ) ( )XXCD 1

( ) ( )XXCD 2

( ) ( )XXnCD

Since121kkkk

nn

, then

[ ] [ ]1

nn CC

xx

[ ] [ ]21

nn CC

xx

[ ] [ ]

12 CCxx .

Obviously,

( ) ( ) ( ) ( ) ( )XXXXX CCCCD nn 121 .

Figure 2 shows the algorithm of the proposed tech-

nique. The technique uses the dependency of

attributes in the rough set theory in information sys-

tems. It consists of five main steps. The first step deals

with the computation of the equivalence classes of

each attribute (feature). The equivalence classes of the

set of objects Ucan be obtained using the indiscerni-

bility relation of attribute Aai in information sys-

tem ( )fVAUS ,,,= . The second step deals with thedetermination of the dependency degree of attributes.

The degree of dependency attributes can be deter-

mined using formula in equation (1). The third step

deals with selecting the maximum dependency de-

gree. Next step, the attribute is ranked with the as-

cending sequence based on the maximum of depen-

dency degree of each attribute. Finally, all the redun-

dant attributes are identified. The attribute with the

highest value of the maximum degree of dependency

within these redundant attributes is then selected.

Algorithm: FSDA

Input: Data set with categorical value

Output: Selected non-redundant attribute

Begin

Step 1. Compute the equivalence classes using

the indiscernibility relation on each attribute.

Step 2. Determine the dependency degree of

attributei

a with respect to all ja , where

ji .

Step 3. Select the maximum of dependency de-gree of each attribute.

Step 4. Rank the attribute with ascending se-

quence based on the maximum of dependency

degree of each attribute.

Step 5. Select the attribute with the highest val-

ue of maximum degree of dependency within

the redundant attributes.

End

Fig 2. The FSDA algorithm

The example to find the degree of dependency of

attributes of an information system based on formula

in equation (1) will be illustrated as in Example 3.1.


9/13




Example 3.1. To illustrate in finding the degree of de-

pendency of attributes, the information system as

shown in Table 3 is considered. From Table 3, based

on each attribute, there are four partitions of U in-

duced by indiscernibility relation on each attribute,

i.e.

{ } { }{ }4,3,5,2,1/ =AU , { } { }{ }5,4,3,2,1/ =BU ,

{ } { }{ }5,4,3,2,1/ =CU and { } { } { }{ }4,3,5,2,1/ =DU .

Based on formula in equation (1), the degree of de-

pendency of attribute B on attribute A , denoted

BAk

, can be calculated as follows.

BA k ,

( ) { }

{ } 4.05,4,3,2,1

4,3/

===

U

XA

kBUX

.

Using the same way, the following degrees are ob-

tained

CBk

,( ) { }

{ }2.0

5,4,3,2,1

1/

===

U

XBk CUX .

DCk

,( ) { }

{ }2.0

5,4,3,2,1

5/

===

U

XCk DUX .

The degree of dependency of all attributes of Table 3

can be summarized as in Table 6.

TABLE6THE DEGREE OF DEPENDENCY OF ATTRIBUTES OF TABLE 3Attribute

(Depends on)

Degree of dependen-

cy

Maximum

Dependency

of Attribute

A B C D 1

0.2 0.2 1B A C D 1

0.4 0.2 1

C A B D 0.6

0.4 0.2 0.6

D A B C 0.4

0.4 0.2 0.2

From Table 6, the attributes A, B, C and D are ranked

based on the maximum degree of dependency. It can

be seen that the attributesA and B have similar max-imum degree of dependency. In order to select the

best attributes and reduce the dimensionality respec-

tively, only one of the redundant attributes will be

chose. To do this, the selection approach in [8] is

adopted where it is suggested to look at the next

highest of maximum degree of dependency within the

attributes that are bonded and so on until the bind is

broken. In this example, attributeA is deleted from

the list.

3.5 Feature Evaluation via ClassificationThe performance of the best features generated from

Section 3.4 is then further evaluated using two differ-

ent classifiers which are rough set and MLP. Classifier

is used to verify the performance of the selected fea-

tures. The accuracy rate and response time achieved

by the classifier will be analysed to identify the effec-

tiveness of the selected features. Achieving a high

accuracy rate is important to ensure that the selected

features are the best relevance features that perfectly

serve to the classification architecture which able to

produce a good result. While, less response time is

important to allow the classifier to operate more effec-

tively. At the end of this phase, the result is compared

between full features and selected features. This is

done in order to identify the effectiveness of the se-

lected features in handling classification problem.

4 RESULTS AND DISCUSSIONThe main objective of this study is to select the bestfeatures using the proposed technique. Afterwards,

the performance of the selected features is assessed

using two different classifiers which are rough set and

MLP. As mentioned, the assessment of the perfor-

mance is based on the accuracy rate and response

time achieved. Thus, in this section, the results of this

study are presented as follows:

4.1 The Best k Value for Discretization isDeterminedThe original dataset in continuous value is discritizedinto categorical form in order to employ rough set

theory. For that, the modified equal width binning

technique is employed. In this study, the difference of

k (number of intervals) value from 2 to 10 is investi-

gated. The best k value is determined based on the

highest classification accuracy achieved by the rough

set classifier. The finding reveals that k=3 able to gen-

erate the highest classification accuracy up to 99% as

shown in Table 7. This k value is then applied in the

proposed feature selection technique to identify the

best features for the dataset.

4.2 Irrelevant Features is EliminatedThe dataset is represented in decision table form as


10/13




S=(U,A{d}, V, f). There are 1116 instances in the un-

iverse U , with the family of the instruments as the

decision attribute d and all other attributes shown in

Table 5 as the set of condition attributes,A. The distri-

bution of all instances in each class is uniform with no

missing values in the data. From the data cleansing

step, it is found that {MMFCC1, SMFCC1} is the dis-

pensable (irrelevant) set of features. It is means that

the number of the relevant features is 35 out of 37 of

original full features. Thus, this relevant features can

be represented as A{MMFCC1, SMFCC1}.

4.3 Finding the Best FeaturesIn this experiment, the proposed technique is em-

ployed to identify the best features for Traditional

Malay musical instruments sounds classification. As

demonstrated in Table 8, all the 35 relevant features

are ranked in ascending sequence based on the value

of the maximum degree of attribute dependency.

From the table, it is fascinating to see that some of the

features adopted in this study are redundant. In order

to reduce the dimensionality of the dataset, only one

of these redundant features is selected. It is revealed

that the propose feature selection technique able to

select the best 17 features out of 35 features available

successfully. The best selected features are given in

Table 9.

4.4 The Performance of the Selected FeaturesIn this study, two datasets which consists of the full

features and the selected features (generated from the

propose technique) are used as an input to classify the

Traditional Malay musical instruments sounds into

four families which are membranophone, idiophone,

chordophone and aerophone. This approach is meant

to assess the performance of the selected features as

compared to the full features. The performance is de-

termined based on two factors which are the accuracy

rate and the response time. For that, two different

classifiers which are rough set and MLP are exploited.

From Table 10, there was slightly improvement in

terms of the accuracy rate and response time with the

best 17 features as compared to 35 full features by

using MLP classifier. However, the overall perfor-

mance of this classifier is quite satisfactory up to 95%.

On the other hand, with rough set classifier, it can be

seen that the accuracy rate of 99% achieved by the

selected features is similar with the full features.

However, it is fascinating to see that the response time

is faster up to 80% for the selected features as com-

pared to full features. These results show that the

propose feature selection technique able to select the

best features for Traditional Malay musical instru-

ments sounds and improve the classification perfor-

mance especially on the response time.

TABLE7FINDING THE BEST K VALUE FOR DISCRETIZATIONk 2 3 4 5 6 7 8 9 10

Classification Ac-

curacy (%)93.6 98.9 98.6 98.4 98.4 98.3 98.3 98.3 98.3

TABLE8FEATURE RANKING USING PROPOSED METHODNumber of Features Name of Features Maximum Degree of Dependency of Attributes

3 STDZCR 0.826165

36 SMFCC12 0.655914

23 MMFCC12 0.52509

24 MMFCC13 0.52509

22 MMFCC11 0.237455

30 SMFCC6 0.208781

31 SMFCC7 0.208781

1 ZC 0.193548

37 SMFCC13 0.1819

32 SMFCC8 0.108423

33 SMFCC9 0.108423


11/13




34 SMFCC10 0.108423

35 SMFCC11 0.108423

27 SMFCC3 0.087814

29 SMFCC5 0.087814

11 STDFLUX 0.077061

21 MMFCC10 0.07706120 MMFCC9 0.074373

6 MEANC 0.065412

19 MMFCC8 0.065412

18 MMFCC7 0.056452

28 SMFCC4 0.056452

7 STDC 0.042115

8 MEANB 0.042115

9 STDB 0.042115

13 MMFCC2 0.031362

16 MMFCC5 0.03136217 MMFCC6 0.031362

5 STDRMS 0.021505

10 MEANFLUX 0.011649

2 MEANZCR 0

4 MEANRMS 0

14 MMFCC3 0

15 MMFCC4 0

26 SMFCC2 0

TABLE9THE BESTSELECTED FEATURESNumber of Features Name of Features Maximum Degree of Dependency of Attributes

3 STDZCR 0.826165

36 SMFCC12 0.655914

23 MMFCC12 0.52509

22 MMFCC11 0.237455

30 SMFCC6 0.208781

1 ZC 0.193548

37 SMFCC13 0.181932 SMFCC8 0.108423

27 SMFCC3 0.087814

11 STDFLUX 0.077061

20 MMFCC9 0.074373

6 MEANC 0.065412

18 MMFCC7 0.056452

7 STDC 0.042115

13 MMFCC2 0.031362

5 STDRMS 0.021505

10 MEANFLUX 0.011649


12/13




TABLE10THE COMPARISON OF FEATURES CAPABILITIES VIA CLASSIFICATION PERFORMANCE

Features

Rough Set MLP

Accuracy Rate

(%)

Response time

(sec)

Accuracy Rate

(%)

Response time

(sec)

All 35 99 2.075 94 125.53

Best 17 99 0.405 95 120.72

5 CONCLUSION AND FUTURE WORKSIn this study, an alternative technique of feature selection

using rough set theory based on the maximum depen-

dency of the attributes Traditional Malay musical in-

struments sounds is proposed. A non-benchmarkingdataset of Traditional Malay musical instruments sounds

is utilized. Two categories of features schemes which are

perception-based and MFCC which consist of 37

attributes are extracted. Afterward, the dataset is discre-

tized into 3 categorical values. The proposed technique is

then adopted for feature selection through feature rank-

ing and dimensionality reduction. Finally, two classifiers

which are rough set and MLP are employed to evaluate

the performance of the selected features in terms of the

accuracy rate and response time produced. In overall,

the finding shows that the relevant features selectedfrom the proposed model able to reduce the complexity

process and produce highest classification accuracy sig-

nificantly. Thus, the future work will investigate the ef-

fectiveness of the proposed technique towards other

musical instruments sounds domain and apply different

types of classifier to validate the performance of the se-

lected features.

ACKNOWLEDGEMENTThis work was supported by Universiti Tun Hussein

Onn Malaysia (UTHM).

REFERENCES[1] Liu, M., and Wan, C.: Feature Selection for Automatic Classifi-

cation of Musical Instrument Sounds. In Proceedings of the 1st

[2] Deng, J.D., Simmermacher, C., and Cranefield. S.: A Study on

Feature Analysis for Musical Instrument Classification. IEEE

Transactions on System, Man, and Cybernetics-Part B: Cyber-

netics 38 (2), 429438, (2008)

ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 01,

247248 (2001)

[3] Benetos, E., Kotti, M., and Kotropoulus, C.: Musical Instru-

ment Classification using Non-Negative Matrix Factorization

Algorithms and Subset Feature Selection. In Proceeding of

IEEE International Conference on Acoustics, Speech and Sig-

nal Processing, ICASSP 2006, 5, 221224, (2006)

[4] Pawlak, Z.: Rough Sets. International Journal of Computer and

Information Science 11, 341356 (1982)

[5] Banerjee, M., Mitra, S., and Anand, A.: Feature Selection usingRough Sets. In M. Banerjee et al.: Multi-Objective Machine

Learning, Studies in Computational Intelligence 16, 320,

(2006)

[6] Modrzejewski, M.: Feature Selection using Rough Sets Theory.

In Proceeding of the 11th

[7] Li, H., Zhang, W., Xu, P., and Wang, H..: Rough Set Attribute

Reduction in Decision Systems. In Proceeding of 7

International Conference on Machine

Learning, LNCS 667, 213226 (1993)

th

[8] Herawan, T., Mustafa, M.D., and Abawajy, J.H.: Rough set

approach for selecting clustering attribute. Knowledge Based

Systems, 23 (3), 220231, (2010)

Interna-

tional Conference on Artificial Immune Systems, LNCS 5132,

132141, (2008)

[9] Senan, N., Ibrahim, R., Nawi, N.M, and Mokji, M.M.: Feature

Extraction for Traditional Malay Musical Instruments Classifi-

cation. In Proceeding of International Conference of Soft

Computing and Pattern Recognition, SOCPAR 09, 454459,

(2009)

[10] Palaniappan, S., and Hong, T.K.: Discretization of Continuous

Valued Dimensions in OLAP Data Cubes. International Jour-

nal of Computer Science and Network Security, 8, 116126,

(2008)

[11] Pawlak, Z.: Rough set and Fuzzy sets. Fuzzy sets and systems,

17, 99102, (1985)

[12] Pawlak, Z. and Skowron, A.: Rudiments of rough sets. Infor-

mation Science, 177 (1), 327, (2007)[13] Zhao, Y., Luo, F., Wong, S.K.M. and Yao, Y.Y.: A general defi-

nition of an attribute reduct, LNAI 4481, 101108, (2007)

[14] Pawlak, Z.: Rough classification. International Journal of Hu-

man Computer Studies 51, 369383, (1983)

[15] Warisan Budaya Malaysia: Alat Muzik Tradisional,

http://malaysiana.pnm.my/kesenian/Index.htm

[16] Shriver, R.: Webpage, www. rickshriver.net/hires.htm

[17] Senan, N., Ibrahim, R., Nawi, N.M., Mokji, M.M., and Hera-

wan, T. The Ideal Data Representation for Feature Extraction

of Traditional Malay Musical Instrument Sounds Classifica-

tion. To appear in De-Shuang Huang et al. ICIC

2010, LNCS,

(2010)
http://www.ic-ic.org/paper/1240%5C1240.pdfhttp://www.ic-ic.org/paper/1240%5C1240.pdfhttp://www.ic-ic.org/paper/1240%5C1240.pdfhttp://www.ic-ic.org/paper/1240%5C1240.pdfhttp://www.ic-ic.org/paper/1240%5C1240.pdfhttp://www.ic-ic.org/paper/1240%5C1240.pdfhttp://www.ic-ic.org/paper/1240%5C1240.pdf


13/13




Norhalina Senan received her B. Sc. and M.Sc. degree in Com-puter Science in Computer Science from Universiti Teknologi Ma-laysia. She is currently pursuing her study for Ph.D. degree in Fea-ture Selection of Traditional Malay Musical Instruments SoundsClassification using Rough Set at Universiti Tun Hussein Onn Ma-laysia. She is a lecturer at Faculty of Computer Science and Infor-

mation Technology, Universiti Tun Hussein Malaysia. Her researcharea includes data mining, multimedia and rough set.

Rosziati Ibrahim is with the Software Engineering Department,Faculty of Computer Science and Information Technology, UniversitiTun Hussein Onn Malaysia (UTHM). She obtained her PhD in Soft-ware Specification from the Queensland University of Technology(QUT), Brisbane and her MSc and BSc (Hons) in ComputerScience and Mathematics from the University of Adelaide, Australia.Her research area is in Software Engineering that covers SoftwareSpecification, Software Testing, Operational Semantics, FormalMethods, Data Mining, Image Processing and Object-OrientedTechnology.

Nazri Mohd Nawi received his B.S. degree in Computer Sciencefrom University of Science Malaysia (USM), Penang, Malaysia.His

M.Sc.degree in computer science was received from University ofTechnology Malaysia (UTM), Skudai, Johor, Malaysia. He receivedhis Ph.D.degree in Mechanical Engineering department, SwanseaUniversity, Wales. He is currently a senior lecturer in Software En-gineering Department at Universiti Tun Hussein Onn Malaysia(UTHM). His research interests are in optimization, data miningtechniques and neural networks.

Iwan Tri Riyadi Yanto received his B.Sc degree Mathematics fromUniversitas Ahmad Dahlan, Yogyakarta. He is a Master candidate inData Mining at Universiti Tun Hussein Onn Malaysia (UTHM). Hisresearch area includes Data Mining, KDD, and numeric computa-tion.

Tutut Herawan received his B.Ed and M.Sc degrees in Mathemat-ics from Universitas Ahmad Dahlan and Universitas Gadjah MadaYogyakarta, respectively. He obtained his Ph.D from Universiti TunHussein Onn Malaysia. Currently, he is a senior lecturer at Com-puter Science Program, Faculty of Computer Systems and Soft-ware Engineering, Universiti Malaysia Pahang (UMP). He publishedmore than 40 research papers in journals and conferences. Hisresearch area includes data mining and KDD, rough and soft settheories.

feature selection for traditional malay musical instruments sounds classification using rough set

Documents