data mining: 5. penelitian data mining romi satria wahono [email protected] wa/sms:...

21
Data Mining: 5. Penelitian Data Mining Romi Satria Wahono [email protected] http://romisatriawahono.net/dm WA/SMS: +6281586220090 1

Upload: morgan-polley

Post on 14-Dec-2015

235 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Data Mining:5. Penelitian Data Mining

Romi Satria [email protected]

http://romisatriawahono.net/dmWA/SMS: +6281586220090

1

Page 2: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Romi Satria Wahono

• SD Sompok Semarang (1987)• SMPN 8 Semarang (1990)• SMA Taruna Nusantara Magelang (1993)• B.Eng, M.Eng and Ph.D in Software Engineering from

Saitama University Japan (1994-2004)Universiti Teknikal Malaysia Melaka (2014)• Research Interests: Software Engineering,

Machine Learning• Founder dan Koordinator IlmuKomputer.Com• Peneliti LIPI (2004-2007)• Founder dan CEO PT Brainmatics Cipta Informatika

2

Page 3: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Course Outline

1. Pengenalan Data Mining2. Proses Data Mining3. Evaluasi dan Validasi pada Data Mining4. Metode dan Algoritma Data Mining5. Penelitian Data Mining

3

Page 4: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

5. Penelitian Data Mining

1. Standard Proses Penelitian pada Data Mining

2. Masalah Umum Penelitian Data Mining3. Journal Publications on Data Mining

4

Page 5: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

1. Standard Proses Penelitian pada Data Mining

5

Page 6: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Data Mining Standard Process (CRISP–DM)

• A cross-industry standard was clearly required that is industry neutral, tool-neutral, and application-neutral• The Cross-Industry Standard Process for Data

Mining (CRISP–DM) was developed in 1996 (Chapman, 2000) • CRISP-DM provides a nonproprietary and

freely available standard process for fitting data mining into the general problem-solving strategy of a business or research unit

6

Page 7: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

CRISP-DM

7

Page 8: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

1. Business Understanding Phase• Enunciate the project objectives and requirements

clearly in terms of the business or research unit as a whole• Translate these goals and restrictions into the

formulation of a data mining problem definition• Prepare a preliminary strategy for achieving these

objectives

8

Page 9: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

2. Data Understanding Phase• Collect the data• Use exploratory data analysis to familiarize yourself

with the data and discover initial insights• Evaluate the quality of the data• If desired, select interesting subsets that may

contain actionable patterns

9

Page 10: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

3. Data Preparation Phase

• Prepare from the initial raw data the final data set that is to be used for all subsequent phases. This phase is very labor intensive• Select the cases and variables you want to analyze

and that are appropriate for your analysis• Perform transformations on certain variables, if

needed• Clean the raw data so that it is ready for the

modeling tools

10

Page 11: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

4. Modeling phase

• Select and apply appropriate modeling techniques• Calibrate model settings to optimize results• Remember that often, several different techniques

may be used for the same data mining problem• If necessary, loop back to the data preparation

phase to bring the form of the data into line with the specific requirements of a particular data mining technique

11

Page 12: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

5. Evaluation phase

• Evaluate the one or more models delivered in the modeling phase for quality and effectiveness before deploying them for use in the field• Determine whether the model in fact achieves the

objectives set for it in the first phase• Establish whether some important facet of the

business or research problem has not been accounted for sufficiently• Come to a decision regarding use of the data

mining results

12

Page 13: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

6. Deployment phase

• Make use of the models created: Model creation does not signify the completion of a project• Example of a simple deployment: Generate a report• Example of a more complex deployment:

Implement a parallel data mining process in another department• For businesses, the customer often carries out the

deployment based on your model

13

Page 14: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Latihan

• Pelajari dan pahami Case Study 1-5 dari buku Larose (2005) Chapter 1

• Pelajari dan pahami bagaimana menerapkan CRISP-DM pada tesis Firmansyah (2011) tentang penerapan algoritma C4.5 untuk penentuan kelayakan kredit

14

Page 15: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

2. Masalah Umum Penelitian Data Mining

15

Page 16: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Masalah Utama Penelitian Data Mining• Mining Methodology

• Mining various and new kinds of knowledge• Mining knowledge in multi-dimensional space• Data mining: An interdisciplinary effort• Boosting the power of discovery in a networked

environment• Handling noise, uncertainty, and incompleteness of data• Pattern evaluation and pattern- or constraint-guided mining

• User Interaction• Interactive mining• Incorporation of background knowledge• Presentation and visualization of data mining results

16

Page 17: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Masalah Utama Penelitian Data Mining• Efficiency and Scalability• Efficiency and scalability of data mining algorithms• Parallel, distributed, stream, and incremental mining

methods• Diversity of data types• Handling complex types of data• Mining dynamic, networked, and global data

repositories

• Data Mining and Society• Social impacts of data mining• Privacy-preserving data mining• Invisible data mining

17

Page 18: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

3. Journal Publications on Data Mining

18

Page 19: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Transactions and Journals

• Review Paper (survey and state-of-the-art):• ACM Computing Surveys (CSUR)

• Research Paper (technical):• ACM Transactions on Knowledge Discovery from Data

(TKDD)• ACM Transactions on Information Systems (TOIS)• IEEE Transactions on Knowledge and Data Engineering• Springer Data Mining and Knowledge Discovery • International Journal of Business Intelligence and Data

Mining (IJBIDM)

19

Page 20: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Cognitive Assignment III

1. Baca paper yang ada di http://romisatriawahono.net/lecture/dm/paper/

2. Rangkumkan masing-masing dalam bentuk slide dengan struktur:1. Latar Belakang Masalah (Research Background)2. Pernyataan Masalah (Problem Statements)3. Pertanyaan Penelitian (Research Questions)4. Tujuan Penelitian (Research Objective)5. Metode-Metode yang Sudah Ada (Existing Methods)6. Metode yang Diusulkan (Proposed Method)7. Hasil (Results)8. Kesimpulan (Conclusion)

3. Presentasikan di depan kelas pada mata kuliah berikutnya

20

Page 21: Data Mining: 5. Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono.net  WA/SMS: +6281586220090 1

Referensi1. Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: Practical

Machine Learning Tools and Techniques 3rd Edition, Elsevier, 2011

2. Daniel T. Larose, Discovering Knowledge in Data: an Introduction to Data Mining, John Wiley & Sons, 2005

3. Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer, 2011

4. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques Third Edition, Elsevier, 2012

5. Oded Maimon and Lior Rokach, Data Mining and Knowledge Discovery Handbook Second Edition, Springer, 2010

6. Warren Liao and Evangelos Triantaphyllou (eds.), Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, World Scientific, 2007

21