a brief history of data mining society
DESCRIPTION
A Brief History of Data Mining Society. 1989 IJCAI Workshop on Knowledge Discovery in Databases Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991) 1991-1994 Workshops on Knowledge Discovery in Databases - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/1.jpg)
A Brief History of Data Mining Society
1989 IJCAI Workshop on Knowledge Discovery in Databases Knowledge Discovery in Databases (G. Piatetsky-Shapiro
and W. Frawley, 1991) 1991-1994 Workshops on Knowledge Discovery in Databases
Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996)
1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD’95-98)
Journal of Data Mining and Knowledge Discovery (1997)
![Page 2: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/2.jpg)
A Brief History of Data Mining Society
ACM SIGKDD conferences since 1998 and SIGKDD Explorations
More conferences on data mining PAKDD (1997), PKDD (1997), SIAM-Data
Mining (2001), (IEEE) ICDM (2001), etc. ACM Transactions on KDD starting in
2007
![Page 3: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/3.jpg)
Conferences and Journals on Data Mining
KDD Conferences ACM SIGKDD Int. Conf. on Knowledge
Discovery in Databases and Data Mining (KDD)
SIAM Data Mining Conf. (SDM) (IEEE) Int. Conf. on Data Mining (ICDM) Conf. on Principles and practices of Knowledge
Discovery and Data Mining (PKDD) Pacific-Asia Conf. on Knowledge Discovery and
Data Mining (PAKDD)
![Page 4: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/4.jpg)
Where to Find References? DBLP, CiteSeer, Google
Data mining and KDD (SIGKDD: CDROM) Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM,
PKDD, PAKDD, etc. Journal: Data Mining and Knowledge Discovery, KDD
Explorations, ACM TKDD
Bioinformatics Conferences: RECOMB, CSB, PSB, BIBE, etc Journals: Bioinformatics, BMC Bioinformatics, TCBB,…
![Page 5: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/5.jpg)
Top-10 Algorithm Finally Selected at ICDM’06
#1: Decision Tree (61 votes) #2: K-Means (60 votes) #3: SVM (58 votes) #4: Apriori (52 votes) #5: EM (48 votes) #6: PageRank (46 votes) #7: AdaBoost (45 votes) #8: kNN (45 votes) #9: Naive Bayes (45 votes) #10: CART (34 votes)
![Page 6: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/6.jpg)
Association Rules
![Page 7: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/7.jpg)
Association Rules
![Page 8: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/8.jpg)
Association Rules
support, s, probability that a transaction contains X Y
confidence, c, conditional probability that a transaction having X also contains Y
![Page 9: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/9.jpg)
Association Rules
Let’s have an example
![Page 10: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/10.jpg)
Association Rules T100 1,2,5 T200 2,4 T300 2,3 T400 1,2,4 T500 1,3 T600 2,3 T700 1,3 T800 1,2,3,5 T900 1,2,3
![Page 11: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/11.jpg)
Association Rules
![Page 12: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/12.jpg)
Classification
![Page 13: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/13.jpg)
Classification—A Two-Step Process
Classification classifies data (constructs a model)
based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data
predicts categorical class labels (discrete or nominal)
![Page 14: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/14.jpg)
Classification
Typical applications Credit approval Target marketing Medical diagnosis Fraud detection And much more
![Page 15: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/15.jpg)
age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
![Page 16: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/16.jpg)
Decision Tree Decision Tree induction is the learning
of decision trees from class-labeled training tuples
A decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute
Each Branch represents an outcome of the test
Each Leaf node holds a class label
![Page 17: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/17.jpg)
Decision Tree Example
![Page 18: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/18.jpg)
Decision Tree Algorithm
Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive
divide-and-conquer manner At start, all the training examples are at the
root Attributes are categorical (if continuous-
valued, they are discretized in advance) Test attributes are selected on the basis of a
heuristic or statistical measure (e.g., information gain)
![Page 19: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/19.jpg)
Attribute Selection Measure: Information Gain (ID3/C4.5) Select the attribute with the highest
information gain Let pi be the probability that an
arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D|
Expected information (entropy) needed to
classify a tuple in D:)(log)( 2
1i
m
ii ppDInfo
![Page 20: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/20.jpg)
Attribute Selection Measure: Information Gain (ID3/C4.5) Information needed (after using A
to split D into v partitions) to classify D:
Information gained by branching on attribute A
)(||
||)(
1j
v
j
jA DI
D
DDInfo
(D)InfoInfo(D)Gain(A) A
![Page 21: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/21.jpg)
age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
![Page 22: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/22.jpg)
Decision Tree
940.0)14
5(log
14
5)
14
9(log
14
9)5,9()( 22 IDInfo
age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
![Page 23: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/23.jpg)
Decision Tree
694.0)2,3(14
5
)0,4(14
4)3,2(
14
5)(
I
IIDInfoage
age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
means “age <=30” has 5 out of 14 samples, with 2 yes’s and 3 no’s.
I(2,3) = -2/5 * log(2/5) – 3/5 * log(3/5)
)3,2(14
5I
![Page 24: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/24.jpg)
Decision Tree
Similarily, we can compute Gain(income)=0.029 Gain(student)=0.151 Gain(credit_rating)=0.048
Since “age” obtains highest information gain, we can partition the tree into:
246.0)()()( DInfoDInfoageGain age
![Page 25: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/25.jpg)
Decision Tree
![Page 26: A Brief History of Data Mining Society](https://reader035.vdocument.in/reader035/viewer/2022070410/56814605550346895db31044/html5/thumbnails/26.jpg)
Decision Tree