idea engineering
TRANSCRIPT
![Page 1: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/1.jpg)
Idea Engineering
Oct’13
0. algorithmmining
1. landscapemining
2. decisionmining
3. discussionmining
yesterday today
tomorrow future
![Page 2: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/2.jpg)
The Premises of PROMISE(2005)
– Wanted: predictions• Nope. Users want decision, or engagement
![Page 3: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/3.jpg)
The Premises of PROMISE(2005)
– Wanted: predictions• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]• Not(Better learners = better conclusions)
![Page 4: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/4.jpg)
The Premises of PROMISE(2005)
– Wanted: predictions• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]• Not(Better learners = better conclusions)
– Sooner or later: enough data for general conclusions• Found more differences than generalities• Special issues: [IST’13], [ESEj’13]• Best papers, ASE’11, MSR’12• Menzies, Zimmermann et al [TSE’13]• Lots of local models
![Page 5: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/5.jpg)
5
Landscape mining:look before your leap
• Report what is true about the data– Not trivia on how algorithms
walk that data
• Map the landscape– Reason on each part of map
• E.g. landscape mining– Unsupervised iterative
dichotomization– Cluster, prune– Then generate rules
![Page 6: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/6.jpg)
6
Landscape mining:look before your leap
• Report what is true about the data– Not trivia on how algorithms
walk that data
• Map the landscape– Reason on each part of map
• E.g. landscape mining– Unsupervised iterative
dichotomization– Cluster, prune– Then generate rules
• Different to “leap before you look”– i.e. skew learning by class variable– then study the results
• E.g. C4.5, CART, Fayya-Iranni, etc– Supervised iterative dichotomization
• E.g. 61% * 300+effort estimation papers– Algorithm tinkering, without end
![Page 7: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/7.jpg)
7
Find landscape = cluster data, assign “heights”
Find decisions = report delta highs to lows
Monitor discussions = watch, help, communities explore deltas
IDEA Engineering = <landscape, decisions, discussion>
![Page 8: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/8.jpg)
Spectral Landscape Mining• Spectrum = condition that is not
limited to a specific set of values but varies in a continuum.
• Groups together a broad range of conditions or behaviors under one single title
• In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues.
• Nystrom algorithms: approximations to eigenvalues– FASTMAP: linear time
![Page 9: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/9.jpg)
Project data on first 2 PCA; grid that datae.g. Nasa93dem
1) project 23 dimensions projected into 2 2a) cluster 2b) replace clusters with centroids.
MOEA: score= effort+defects +months
![Page 10: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/10.jpg)
Sanity check:What information loss?
• E.g. POI-3 – 400+ examples– 20 centroids
• Prediction via:– Extrapolation between two
nearest centroids
• Works as well as– Random forest, Naïve Bayes
• For defect prediction (10 data sets)
– Linear regression, M5’• For effort estimation (10 data sets)
![Page 11: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/11.jpg)
11
• Find delta between neighbors that go worse to better• Very small rules, found in logLinear time• Menzies et al. [TSE’13]
Planning = Inter-cluster contrast sets
![Page 12: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/12.jpg)
Applications
• Prediction• Planning• Monitoring• Multi-objective optimization
– Cluster first on N objectives • Anomaly detection• Incremental theory revision• Compression• Privacy• etc
![Page 13: Idea Engineering](https://reader033.vdocument.in/reader033/viewer/2022052618/554a0f89b4c90507558b4b80/html5/thumbnails/13.jpg)
Idea Engineering
0. algorithmmining
1. landscapemining
2. decisionmining
3. discussionmining
yesterday today
tomorrow future
Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear
13
Q: why call it mining?
• A1: because all the primitives for the above are in the data mining literature• So we know how to get from here to there
• A2: because data mining scales