Download - Data Reduction :1. Aggregation
![Page 1: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/1.jpg)
Lecture 5/ 03-08-09 1
Data Reduction :1. Aggregation• Combining two or more attributes (or objects) into
a single attribute (or object)• Purpose
– Data reduction– Reduce the number of attrib. or objects
– Change of scale– Cities aggregated into regions, states, countries,
etc
– More “stable” data– Aggregated data tends to have less variability
![Page 2: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/2.jpg)
Lecture 5/ 03-08-09 2
Aggregation
Standard Deviation of Average Monthly Precipitation
Standard Deviation of Average Yearly Precipitation
Variation of Precipitation in Australia
![Page 3: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/3.jpg)
Lecture 5/ 03-08-09 3
Motivation for Aggregation• 1. Smaller datasets resulting from data reduction
require less memory and processing time.• 2. Aggregation can act as a change of
scope/scale by providing high-level view of the data instead of low-level view.
• For ex. Aggregating over store locations and months gives a monthly, per store view rather than of a daily, per item view of the store.
![Page 4: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/4.jpg)
Lecture 5/ 03-08-09 4
• DISADVANTAGE of Aggregation
– May lose interesting and potential details regarding data.
– Ex. aggregating over months loses information abt. which day of the week has the highest sales.
![Page 5: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/5.jpg)
Lecture 5/ 03-08-09 5
Data Reduction :2. Sampling • Sampling is the main technique employed for data
selection.– It is often used for both the preliminary
investigation of the data and the final data analysis.
• Statisticians sample because obtaining the entire set of data of interest is too expensive or time consuming.
• Sampling is used in data mining because processing the entire set of data of interest is too expensive or time consuming.
![Page 6: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/6.jpg)
Lecture 5/ 03-08-09 6
Types of Sampling• Simple Random Sampling
– There is an equal probability of selecting any particular item
• Sampling without replacement– As each item is selected, it is removed from the
population
• Sampling with replacement– Objects are not removed from the population as
they are selected for the sample. • In sampling with replacement, the same object can be
picked up more than once
![Page 7: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/7.jpg)
Lecture 5/ 03-08-09 7
• Stratified Sampling:– Population of different types with wide variety
of objects.– Entire population is divided into stratas or pre
specified groups – Random samples are picked up from them.
• Progressive/Adaptive sampling:– Such sampling starts with small samples
and size of sample keeps on increasing till a sufficient size is obtained.
![Page 8: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/8.jpg)
Lecture 5/ 03-08-09 8
3.Dimensionality Reduction• Purpose/Benefit:
– DM algos. work much better in low dimensions.
– Helps to eliminate irrelevant features or reduce noise.
– Avoid curse of dimensionality.– More understandable DM model can be
obtained b’coz of less attributes.– Reduce amount of time and memory required
by data mining algorithms.– Allow data to be more easily visualized.
![Page 9: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/9.jpg)
Lecture 5/ 03-08-09 9
Curse of dimensionality• Data becomes sparse as dim. Increases,
in the space, it occupies.
• In clustering, concept of density and distance bet. points becomes less meaningful.
• This produces poor quality clusters or results in poor classification results.
![Page 10: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/10.jpg)
Lecture 5/ 03-08-09 10
Techniques for Dimensionality reduction:
– Principle Component Analysis (PCA)– Singular Value Decomposition (SVD)– Others: supervised and non-linear techniques
![Page 11: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/11.jpg)
Lecture 5/ 03-08-09 11
PCA & SVD• Linear algebra technique for continuous
attributes• Looks towards combination of attributes to find
new attributes (principal components) that are:
– 1. linear comb. of original attributes.– 2. orthogonal to each other.– 3. finds a projection that captures maximum variability
in the data.
• SVD– Similar and related to PCA.
![Page 12: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/12.jpg)
Lecture 5/ 03-08-09 12
Dimensionality Reduction : PCA
• Goal is to find a projection that captures the largest amount of variation in data
x2
x1
e
![Page 13: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/13.jpg)
Lecture 5/ 03-08-09 13
Dimensionality Reduction : PCA• Find the eigenvectors of the covariance
matrix
• The eigenvectors define the new space
![Page 14: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/14.jpg)
Lecture 5/ 03-08-09 14
4.Feature Subset Selection
• Another way to reduce dimensionality of data
• Redundant features – duplicate much or all of the information contained in one or
more other attributes– Example: purchase price of a product and the amount of
sales tax paid
• Irrelevant features– contain no information that is useful for the data mining
task at hand– Example: students' ID is often irrelevant to the task of
predicting students' GPA
![Page 15: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/15.jpg)
Lecture 5/ 03-08-09 15
Techniques for FSS
Brute-force Approaches
EmbeddedApproaches
Filter Approaches
Wrapper Approaches
![Page 16: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/16.jpg)
Lecture 5/ 03-08-09 16
• Brute-force Approaches:– Try all possible feature subsets as input to data
mining algorithm
• Embedded Approaches:– Feature selection occurs automatically by DM
algos.– DM algos. itself decides which attribute is to be
used and which is to be left.– Algos. for constructing Decision tree classifiers
often operate in this manner.
![Page 17: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/17.jpg)
Lecture 5/ 03-08-09 17
• Filter approaches:– Features are selected before data mining
algorithm is run.– These appro. are independent of DM tasks.– Filtering those attributes whose pairwise
correlation is low.
• Wrapper approaches:– Use the data mining algorithm as a black box
to find best subset of attributes.
![Page 18: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/18.jpg)
Lecture 5/ 03-08-09 18
Architecture of FSS• Four steps in FSS:
– 1. A search strategy that generates new subsets of feature.
– 2. A measure for evaluating a subset– 3. A stopping crieteria– 4. A validation procedure
AttributesSearch Strategy
Subset of
attri.
EvaluationStopping criterion
Selected attri.
Validation Procedure
Not done
Done
![Page 19: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/19.jpg)
Lecture 5/ 03-08-09 19
Filter Approaches
1. Subset evaluation is independent of DM algorithm.
2. Evaluation procedure attempts to predict “how well DM algo. will perform for that particular set of attributes.
Wrapper Approaches
1. Subset evaluation uses DM Algorithm.
2. Evaluation procedure consists of actually running the DM application
![Page 20: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/20.jpg)
Lecture 5/ 03-08-09 20
• The no. of subsets are usually very large, so it is impractical to examine them all---------so some stopping criterion must be employed.
• SC can be• Dependent on no. of iterations.• The value of the subset evaluation measure is optimal or
exceeds some threshold value.• A subset of desired size has been obtained.
• Feature subset selection done, then results of target DM algo. on the selected features is validated.
![Page 21: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/21.jpg)
Lecture 5/ 03-08-09 21
5. Feature Creation
• Create new attributes that can capture the important information in a data set much more efficiently than the original attributes
Methodologies for FEATURE CREATION
Methodologies for FEATURE CREATION
Feature extractionFeature extraction Mapping Data toNew Space
Mapping Data toNew Space
Feature Construction
Feature Construction
![Page 22: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/22.jpg)
Lecture 5/ 03-08-09 22
• Feature Extraction
– The creation of new set of features from the original raw data.
– highly domain-specific. The techniques for FE, developed for one field are often not applicable to other fields.
– DM whenever applied to a relatively new field, new feature extraction methods are to be looked for.
![Page 23: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/23.jpg)
Lecture 5/ 03-08-09 23
Mapping Data to a New Space• A different view of data can reveal
interesting & important features.• Time series data: contains periodic
patterns.– If single periodic pattern without much noise---
pattern is easily detectable.– Or it can be multiple periodic patterns with
good amount of noise.– Such patterns are usually detected by
applying FT or WT.
![Page 24: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/24.jpg)
Lecture 5/ 03-08-09 24
Mapping Data to a New Space
Two Sine Waves Noisy time series
Frequency
• Fourier transform• Wavelet transform
Power spectrum
![Page 25: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/25.jpg)
Lecture 5/ 03-08-09 25
Feature Construction• Example
– A dataset consisting of information regarding antique items like mass, volume etc.
– Suppose these items are made up of say wood, clay, bronze, silver etc.
– DM task : Classify objects wrt the material they are made of.
– Density=mass/vol, provides the accurate classification
![Page 26: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/26.jpg)
Lecture 5/ 03-08-09 26
6.Binarization• Both cont. and discrete attr. may be
transformed to binary attr...Binari-zation .• ‘m’ catego. values, then assign each original
value to an integer value in [0,m-1].• binary digits are required to
represent these values.• Consider an ex., a catego. attri. With 5 values
{awful, poor, OK, good, great} would require 3 binary variables (n=log25=2.321928)
mn 2log
![Page 27: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/27.jpg)
Lecture 5/ 03-08-09 27
Cate.
value
Integer value
x1 x2 x3
awful 0 0 0 0
poor 1 0 0 1
OK 2 0 1 0
Good 3 0 1 1
Great 4 1 0 0
Table 2.5 conversion of a catego. Att. To 3 binary attr.
•For Association analysis- asymmetric binary attributes are (may be) required, where only the presence of the attri. (value=1) is essential.
•In such situations, one binary attribute for each categorical value is necessary as shown in next table:
![Page 28: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/28.jpg)
Lecture 5/ 03-08-09 28
Cate.
value
Integer value
x1 x2 x3
awful 0 1 0 0
poor 1 0 1 0
OK 2 0 0 1
Good 3 0 0 0
Great 4 0 0 0
Table 2.6 conversion of a catego. Att. To 5 binary attr.
x4 x5
0 0
0 0
0 01 00 1
If no. of attr. Is large, then first the no. of attr. Are to be reduced.
Symmetric Binary attr.: both states (0 & 1) are equally important & carry equal weight.
Ex. “gender” can be male or female.
Asymmetric Binary attr.: if the outcomes of the states are not equally important.
Ex. “+ve” or “–ve” outcome of a disease test.
![Page 29: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/29.jpg)
Lecture 5/ 03-08-09 29
6.Discretization• Transformation of a cont. attribute into
categorical attr….Discretization.
• Two step process:• Deciding the no. of categories (HOW?)
– Values of the continuous attr. are sorted– Partitioned into n intervals by specifying
n-1 split points LIKE {(x0,x1],(x1,x2],….(xn-1,xn)}
• Determination of how continuous values to categorical values. (VERY SIMPLE)
– All values in one interval are mapped to same categorical value.
![Page 30: Data Reduction :1. Aggregation](https://reader035.vdocument.in/reader035/viewer/2022081716/546ac0f0b4af9f7a2c8b48b4/html5/thumbnails/30.jpg)
Lecture 5/ 03-08-09 30
7.Variable Transformation• It refers to transformation applied to the values of an attribute
(variable).• Two types of Attr. Trans.:
– Simple functional Trans.:• A simple mathe. func. Is applied to each value individually.• If x is a variable then such func. Can be xk, log x, ex, sin x or |
x|.– Normalization:
• Its goal is to make an entire set of values to have a particular property.
X -- mean of some attr. Values Sx– SD
X’=(X-X)/Sx creates a new variable with mean=0 and SD=1