mining approximate functional dependencies (afds) as condensed representations of association rules...
Post on 20-Dec-2015
222 views
TRANSCRIPT
![Page 1: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/1.jpg)
Mining Approximate Functional Dependencies (AFDs) as
Condensed Representations of Association Rules
Master’s Thesis Defenseby Aravind Krishna Kalavagattu
Committee Members:Dr. Subbarao Kambhampati (chair)Dr. Yi ChenDr. Huan Liu
![Page 2: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/2.jpg)
Database Systems
• Well-defined schema and method for querying (SQL)
• Query optimization
• Lately, some systems started supporting IR-Style answering of user queries
Data mining
• Discovering useful patterns from data
• Rule learning is a well researched method for discovering interesting relations between variables in large databases
• Association Rules
Rule Mining with Several applicationsOver databases
![Page 3: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/3.jpg)
Introduction to AFDs Approximate Functional Dependencies are rules denoting
approximate determinations at attribute level. AFDs are of the form (X ~~> Y), where X and Y are sets
of attributes X is the “determining set” and Y is called “dependent set” Rules with singleton dependent sets are of high interest
A classic example of an AFD (Nationality ~~> Language)
More examples Make ~~> Model (Job Title, Experience) ~~> Salary
Indicates that we can approximately guess the language of a person if we know which country she is from.
![Page 4: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/4.jpg)
Introduction (contd..) Functional Dependency (FD)
Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written X → Y) if and only if each X value is associated with precisely one Y value.
AFDs can be loosely defined as FDs that approximately hold (there are some exception rows that fail to satisfy the Function over the current relation) Example: Make~~>Model (with error = 0.3)
70% of the tuples satisfy the dependency
![Page 5: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/5.jpg)
Applications of AFDs
Predicting Missing Values of attributes
In relational tables(QPIAD)
Using values of attributes in determining set of AFD
Query Optimization(CORDS, BHUNT)
Maintaining correct selectivity estimates
Query Rewriting(AIMQ, QPIAD, QUIC)
Example: Model~~>BodyStyleRewrite query on Model=“RAV4” to Retrieve tuples with bodystyle=“SUV”
Database design (Database normalization)(Efficient Storage)Similar to the way FDs are used
![Page 6: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/6.jpg)
FD Mining and Implications FD Mining aims at finding a minimal cover
Minimum set of FDs from which the entire set of FDs can be generated
Example: If A→B is an FD, then, ({A,C}→B) is considered redundant
Can we substitute this by generating only minimal dependencies in case of AFDs?
NO, because AFDs (Z~~>B) may be interesting for the application and we may prefer them to A~~>B.
Non-minimal dependencies perform better in QPIAD, QUIC etc
Example: AFD (JobTitle, Experience)~~>Salary Vs (JobTitle~~>Salary)
![Page 7: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/7.jpg)
Performance Concerns
AFD Mining is costly The pruning strategies of FDs are not applicable in
case of AFDs. For datasets with large number of attributes, the
search space gets worse! Method for determining whether a dependency
holds or not is costly Way to traverse the search space is tricky
Bottom-up Vs Top-down ?
![Page 8: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/8.jpg)
Quality Concerns Before algorithms for discovering AFDs can be developed,
AFDs need better Interestingness measures
AFDs used as feature selectors in classification are expected to give good Accuracy.
AFDs used in query rewriting are expected to give a high throughput per query.
(VIN~~>Make) Vs (Model~~>Make) (VIN~~>Make) looks good using the error metric But, intuitively (as well as practically) (Model~~>Make) is
a better AFD.
![Page 9: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/9.jpg)
Challenges in AFD Mining
1. Defining right interestingness measures
2. Performing an efficient traversal in the search space of possible rules
3. Employing effective pruning strategies
![Page 10: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/10.jpg)
Agenda/Outline Introduction Related Work Provide new perspective for AFDs
Roll-ups/condensed representations to association rules
Define measures for AFDs Present the AFDMiner algorithm Experimental Results
Performance Quality
![Page 11: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/11.jpg)
Agenda/Outline Introduction Related Work Provide new perspective for AFDs
Roll-ups/condensed representations to association rules
Define measures for AFDs Present the AFDMiner algorithm Experimental Results
Performance Quality
![Page 12: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/12.jpg)
Related WorkFD Mining Algorithms
•Aim at finding minimal cover•DepMiner, FUN, TANE, FD_Mine
Existing Approximation measures for AFDs•Tau, InD metrics
Grouping association rulesClustering association rules (v1~>u, v2~>u as (v1^v2~>u))
Do not work well for AFDs
•Metrics do not seem to matter in practice
•No accompanied algorithm to mine AFDs
No one combines them as AFDs
![Page 13: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/13.jpg)
Existing AFD Miners
CORDS•SoftFDs (C1=>C2)•Uses |C1,C2|/|C1||C2| as the approximation measure
AIMQ/QPIAD/QUIC•TANE• Post-processing over TANE
•Restricted to singleton determining set•Works from a sample•Measure used is not appropriate
•Highly Inefficient•Quality of some AFDs is bad
![Page 14: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/14.jpg)
Agenda/Outline Introduction Related Work Provide new perspective for AFDs
Roll-ups/condensed representations to association rules
Define measures for AFDs Present the AFDMiner algorithm Experimental Results
Performance Quality
![Page 15: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/15.jpg)
Condensing Association Rules
Viewing database relations as transactions Itemsets ≈attribute-value
pairs Association rules
Between Itemsets Beer~>Diapers
Here, they are between attribute value pairs
AFDs are rules between Attributes Corresponding to a lot of
association rules sharing the same attributes
Example
Example:
Association Rule: (Toyota, Camry)~>Sedan
![Page 16: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/16.jpg)
Rolling up association rules as AFDs
Honda~~>Accord Toyota~~>Camry Tata~~>Maruti800… …
Make~~>Model
![Page 17: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/17.jpg)
Confidence Consider an association rule of the form (α→β)
Confidence denotes the conditional probability of β (head) given α (body).
Similarly for an AFD (X~~>A), Confidence should denote the chance of finding the
values of A, given values of X Define AFD Confidence in terms of confidence of
association rules
Specifically, picking the best association rule for every distinct value-combination of the body of the association rule.
![Page 18: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/18.jpg)
Confidence
For the example carDB, Confidence = Support (Make:Honda~~>Model:Accord) +
Support (Make:Toyota~~>Model:Camry) = 3/8+2/8 = 5/8
Interestingly this is equal to (1-g3) g3 has a natural interpretation as the fraction of tuples with
exceptions affecting the dependency.
![Page 19: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/19.jpg)
Specificity For an association rule (α→β),
Support is the probability with which the conditioning event (i.e., α) occurs
Rule with High-Confidence, yet Low-Support is a bad rule!
Presence of a lot of association rules with low supports makes the AFD bad.
In classification, this affects prediction accuracy.
For query rewriting tasks, per-query throughput is less.
![Page 20: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/20.jpg)
Types of AFDs
1. Model ~~> Make Few Branches - Uniform Distribution Good, and might hold good universally
2. VIN ~~> Make Many Branches - Uniform Distribution Bad - Confidence of each association rule is high,
but bad supports
3. Model, Location ~~> Price Many Branches - Skewed Distribution Few association rules with high support and
many with low support
Accord~~>Honda Camry~~>Toyota Maruti800~~>Tata… …
Model~~>Make
![Page 21: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/21.jpg)
Specificity
The Specificity measure captures our intuition of different types of AFDs.
It is based on information entropy Higher the Specificity (above a threshold), worse the AFD is ! Shares similar motivations with the way SplitInfo is defined
in decision trees while computing Information Gain Ratio Follows Monotonicity
Normalized with the worst case Specificity i.e., X is a key
![Page 22: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/22.jpg)
Agenda/Outline Introduction Related Work Provide new perspective for AFDs
Roll-ups/condensed representations to association rules
Define measures for AFDs Present the AFDMiner algorithm Experimental Results
Performance Quality
![Page 23: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/23.jpg)
AFD Mining Problem Good AFDs are the ones within the desired
thresholds of the Confidence and Specificity measures.
Formally, the AFD mining problem can be stated as follows:
![Page 24: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/24.jpg)
AFD Mining The problem of AFD Mining is learn all AFDs
that hold over a given relational table
Two costs:1. Major cost is the Combinatoric cost of
traversing the search space2. Cost of visiting data to validate each rule
(To compute the interestingness measures)
Search process for AFDs is exponential in terms of the number of attributes
![Page 25: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/25.jpg)
Pruning Strategies
1. Pruning by Specificity Specificity(Y) ≥ Specificity(X), where Y is a superset of X If Specificity(X) > maxSpecificity, we can prune all AFDs
with X and its supersets as the determining set2. Pruning (applicable to FDs)
If (X→A) is an FD, all AFDs of the form (Y→A) can be pruned
3. Pruning keys Needed for FDs But, this is subsumed by case 1 in AFDMiner
Because if Specificity(X) = 1, it means X is a key
![Page 26: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/26.jpg)
AFDMiner algorithm Search starts from
singleton sets of attributes and works its way to larger attribute sets through the set containment lattice level by level.
When the algorithm is processing a set X, it tests AFDs of the form (X \{A})~~>A), where AєX.
Information from previous levels is captured by maintaining RHS+ Candidate Sets for each set.
![Page 27: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/27.jpg)
Traversal in the Search Space During the bottom-up breadth-first search, the
stopping criteria at a node are:1. The AFD confidence becomes 1, and thus it is an FD. 2. The Specificity value of the X is greater than the max
value given.
FD based Pruning
Specificity based Pruning
Example:
A→C is an FD
Then, C is removed from RHS+(ABC)
![Page 28: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/28.jpg)
Computing Confidence and Specificity
Methods are based on representing attribute sets by equivalence class partitions of the set of tuples
And, ∏X is the collection of equivalence classes of tuples for attribute set X
Example: ∏make = {{1, 2, 3, 4, 5}, {6, 7, 8}} ∏model = {{1, 2, 3}, {4, 5}, {6}, {7, 8}} ∏{make U model} = {{1, 2, 3}, {4, 5}, {6}, {7, 8}}
A functional dependency holds if ∏X = ∏XUA
For the AFD (X~~>A), Confidence = 1 – g3(X~~>A)In this example, Confidence(Model ~~>Make) = 1
Confidence(Make~~>Model) = 5/8
![Page 29: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/29.jpg)
Algorithms Algorithm AFDMiner:
•Computes Confidence
•Applies FD-based pruning
Computes Specificity and applies pruning
•Computes level Ll+1
•Ll+1 contains only those attribute sets of size l+1 which have their subsets of size l in Ll
![Page 30: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/30.jpg)
Agenda/Outline Introduction Related Work Provide new perspective for AFDs
Roll-ups/condensed representations to association rules
Define measures for AFDs Present the AFDMiner algorithm Experimental Results
Performance Quality
![Page 31: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/31.jpg)
Empirical Evaluation Experimental Setup
Data sets CensusDB (199523 tuples, 30 attrb) MushroomDB (8124 tuples, 23 attrb)
Parameters for AFDMiner minConf maxSpecificity No. of tuples No. of attributes MaxLength of determining set
Aim of the experiments is to show that the Dual-Measure approach (AFDMiner—using both confidence and specificity outperforms the Single-Measure approach (No_Specificity – that uses Confidence alone)
No_Specificity: A modified version of AFDMiner, which uses using only Confidence but not Specificity for AFDs. Thus, it generates all AFDs (X~~>A) with (Confidence(X~~>A) >minConf)
![Page 32: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/32.jpg)
Evaluating Quality BestAFD:
The highest confident AFD among all the AFDs with attribute A as their dependent attribute
Classification Task: Classifier is run with determining set of
BestAFD as features Used 10-fold cross-validation and computed
the average classification accuracy Weka tool-kit
Evaluated over the censusDB
82
83
84
85
86
87
88
89
90
91
92
93
No_InfoSupport AFDMiner
Cla
ssif
icati
on
Accu
racy
![Page 33: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/33.jpg)
Evaluation Quality
Average Classification accuracy for all attributesminConf = 0.8 ; maxSpecificity = 0.4
Shows that Specificity is effective in generating better quality AFDs.
No_Specificity
CensusDB
Choosing minConf !
![Page 34: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/34.jpg)
Choosing maxSpecificity
Classification Accuracy (by varying maxSpecificity) threshold low => good rules are pruned threshold high => bad rules are not being pruned
Classification accuracy approximately forms a double elbow shaped curve.
0
1000
2000
3000
4000
5000
6000
0 0.2 0.4 0.6 0.8
InfoSupport
Tim
e T
aken
(m
s)
MaxSpecificityMaxSpecificityCensusDB CensusDB
![Page 35: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/35.jpg)
Choosing maxSpecificity
Time to compute AFDs: Increases with increasing maxSpecificity Rate of change varies
A good threshold value for Specificity (i.e., maxSpecificity) is the value at the first elbow in the graph on quality
0
1000
2000
3000
4000
5000
6000
0 0.2 0.4 0.6 0.8
InfoSupport
Tim
e T
aken
(m
s)
Best Value
MaxSpecificityMaxSpecificity
![Page 36: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/36.jpg)
Query Throughput
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
2 3 4 6 7 8 9 10 11 12 14 15 17 18 19 20 21 22 23 24
A ttrib u te s
No
of T
uple
s R
etri
eved
A F DMiner
No_InfoS upport
No. of tuples returned for an top-10 queries on each distinct determining set (denotes query throughput)
No_Specificity
![Page 37: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/37.jpg)
Discussion on TANE
Primarily designed to generate FDs Modified version for generating
Approximate Dependencies
Uses the error metric g3 for AFDs Bottom-up search in the lattice
Generates only minimal dependencies Pruning applicable to FDs
![Page 38: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/38.jpg)
Comparison (AFDMiner Vs TANE)
TANENOMINP is a modified version of TANE that does not stop with just minimal dependencies.
minConf is 0.8 (thus, we set the g3 to be 0.2)
AFDMiner outperforms both the approaches -- thus strengthening the argument that AFDs with high confidence and with reasonable Specificity are the best
![Page 39: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/39.jpg)
Evaluating Performance
Time varies linearly with the number of tuples. AFDMiner takes less time compared to that of
NoSpecificity. Time varies exponentially on the number of
attributes. AFDMiner completes much faster than NoSpecificity
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 2000 4000 6000 8000 10000 12000
Number of Tuples
Tim
e T
aken
(m
s)
No_Specif icity
AFDMiner
CensusDB
0
5000
10000
15000
20000
25000
30000
35000
40000
0 5 10 15 20 25 30 35
No. of attributes
Tim
e t
aken
(m
s)
No_SpecificityAFDMiner
CensusDB
![Page 40: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/40.jpg)
Evaluating Performance
0
1000
2000
3000
4000
5000
6000
0 2000 4000 6000 8000 10000
No. of Tuples
Tim
e T
aken
(ms)
No_Specificity
AFDMiner
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25
No of attributes
Tim
e ta
ken
(ms)
No_Specificity
AFDMiner (ms)
0
20000
40000
6000080000
100000
120000
140000
160000
0 1 2 3 4 5 6 7
Length of determining set in each AFD
Nu
mb
er o
f ca
nd
idat
es
visi
ted
No_Specificity
AFDMiner
CensusDB
These experiments show that AFDMiner is fast
MushroomDB
0
5000
10000
15000
20000
25000
30000
0 1 2 3 4 5 6
Length of determining set in each AFD
Tim
e t
ak
en
(m
s)
No_Specif icity
AFD Miner
MushroomDB
CensusDB
![Page 41: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/41.jpg)
Conclusion Introduced a novel perspective for AFDs
Condensed roll-ups of association rules.
Two metrics for AFDs Confidence Specificity
Algorithm AFDMiner all AFDs (confidence > minConf; Specificity < maxSpecificity) Bottom-up search in a breadth-first manner in the set
containment lattice of attributes Pruning based on Specificity
Experiments – AFDMiner generates high-quality AFDs faster. AFDs with high Confidence and reasonable Specificity
A version of this thesis is currently under review at ICDE’ 09
![Page 42: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/42.jpg)
Future Direction Conditional Functional Dependencies (CFDs)
Dependencies of the form ({ZipCode→City} if country =”England”). i.e., Holding true only for certain values of one
or more of other attributes. CAFDs are the probabilistic counter part of CFDs CFDs and CAFDs are applied in data cleaning
and value prediction recently, but mining these
conditional rules is unexplored. Intuitively, CFDs are intermediate rules between association rules (value level) and FD (attribute level). So, we believe that our approach can help in generating them !
![Page 43: Mining Approximate Functional Dependencies (AFDs) as Condensed Representations of Association Rules Master’s Thesis Defense by Aravind Krishna Kalavagattu](https://reader036.vdocument.in/reader036/viewer/2022062320/56649d4a5503460f94a27d69/html5/thumbnails/43.jpg)
Questions ?