an overview of domain-driven data mining: toward actionable knowledge discovery (akd)

36
D D 3 3 M: M: Domain-Driven Data Mining An Overview of Domain-Driven Data Mining: Toward Actionable Knowledge Discovery (AKD) Longbing Cao Faculty of Engineering and Information Technology University of Technology, Sydney, Australia

Upload: sulwyn

Post on 31-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

An Overview of Domain-Driven Data Mining: Toward Actionable Knowledge Discovery (AKD). Longbing Cao Faculty of Engineering and Information Technology University of Technology, Sydney, Australia. Outline. Why Do We Need D 3 M What Is D 3 M The D 3 M Framework - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

DD33M

:M

: D

om

ain

-Dri

ven

Data

Min

ing

An Overview ofDomain-Driven Data Mining:

Toward Actionable Knowledge Discovery (AKD)

Longbing Cao

Faculty of Engineering and Information TechnologyUniversity of Technology, Sydney, Australia

Page 2: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 2

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Outline

Why Do We Need D3M What Is D3M The D3M Framework D3M Theoretical Underpinnings D3M Research Issues D3M Applications D3M References

Page 3: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 3

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Why Do We Need D3M A common scenario in deploying data

mining algorithms I find something interesting!

“Many patterns are found”, “They satisfy technical metric threshold well”

What do business people say? “So what?” “They are just commonsense” “I don’t care about them” “I don’t understand them” “How can I use them?”

“Am I wrong? What can I do better for my business mate?”

Page 4: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 4

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Why Do We Need D3M Where is something wrong?

Gap: academic objectives || business goals Technical outputs || business expectation

macro-level methodological and fundamental issues Academic: technical interest; innovative algorithms &

patterns Practitioner: social, environmental, organizational

factors and impact; getting a problem solved properly micro-level technical and engineering issues

System dynamics, system environment, and interaction in a system

Business processes, organizational factors, and constraints

Human and domain knowledge involvement

Page 5: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 5

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

An example: Problem with association mining Existing association rule mining algorithms are

specifically designed to find strong patterns that have high predictive accuracy or correlation;

While frequent patterns are referred to as commonsense knowledge, they can be eager to discover new and hidden patterns in databases.

Many patterns are found; How associations can be taken over by business

people seamlessly and into operationalizable actions accordingly?

Page 6: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 6

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

What Is D3M Next-generation data mining

methodologies, frameworks, algorithms, evaluation systems, tools and decision support, Cater for business environment Satisfy business needs Deliver business-friendly and decision-making

rules and actions that are of solid technical and business significance

Can be understood & taken over by business people to make decision

aim to promote the paradigm shift from data-data-centered hidden pattern miningcentered hidden pattern mining to domain-domain-driven actionable knowledge discoverydriven actionable knowledge discovery (AKD)

Page 7: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 7

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Involve and synthesize Ubiquitous Intelligence human intelligence, domain intelligence, data intelligence, network intelligence, organizational and social intelligence,

and meta-synthesis of the above ubiquitous

intelligence

Page 8: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 8

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

The D3M Framework

AKD-based problem-solving

Page 9: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 9

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Interestingness & actionability

Page 10: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 10

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Conflicts & tradeoff

Page 11: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 11

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

A framework for AKD Post-analysis-based AKD

Page 12: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 12

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

D3M Theoretical Underpinnings artificial intelligence and intelligent systems, behavior informatics and analytics, business modeling, business process management, cognitive sciences, data integration, human-machine interaction, human-centered computing, knowledge representation and management, machine learning, ontological engineering, organizational and social computing, project management methodology, social network analysis, statistics, system simulation, and so on.

Page 13: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 13

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

D3M Research Issues Data Intelligence:

deep knowledge in complex data structure; mining in-depth data patterns, and mining structured & informative knowledge in complex data

Domain Intelligence: Domain & prior knowledge, business processes/logics/workflow, constraints, and

business interestingness; representation, modeling and involvement of them in KDD

Network Intelligence: network-based data, knowledge, communities and resources; information retrieval,

text mining, web mining, semantic web, ontological engineering techniques, and web knowledge management

Human Intelligence: empirical and implicit knowledge, expert knowledge and thoughts, group/collective

intelligence; human-machine interaction, representation and involvement of human intelligence

Social Intelligence: organizational/social factors, laws/policies/protocols, trust/utility/benefit-cost;

collective intelligence, social network analysis, and social cognition interaction Intelligence metasynthesis:

Synthesize ubiquitous intelligence in KDD; metasynthetic interaction (m-interaction) as working mechanism, and metasynthetic space (m-space) as an AKD-based problem-solving system

Page 14: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 14

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

How to reach an interest tradeoff Balance between technical and business

interests Suppose there are multiple metrics for

each aspect

Page 15: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 15

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

actionable knowledge discovery through m-spaces acquiring and representing unstructured, ill-

structured and uncertain domain/human knowledge supporting dynamic involvement of business experts

and their knowledge/intelligence acquiring and representing expert thinking such as

imaginary thinking and creative thinking in group heuristic discussions during KDD modeling

acquiring and representing group/collective interaction behavior and impact emergence

Building infrastructure supporting the involvement and synthesis of ubiquitous intelligence

Page 16: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 16

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

D3M Applications

Real-world data mining Our recent case studies

Capital markets actionable trading agents actionable trading strategies

Social security activity mining combined mining

Page 17: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 17

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Actionable Trading Evidence for Brokerage Firms

Trading strategy/evidence

Actionable trading evidence

Page 18: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 18

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Domain factors

Page 19: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 19

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Business interest

Page 20: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 20

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Developing in-depth trading strategy

Page 21: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 21

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Page 22: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 22

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Page 23: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 23

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Activity mining for Australian Commonwealth Governmental Debt Prevention

Impact-targeted activity mining

Page 24: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 24

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Impact-targeted activity mining Frequent impact-targeted activity

sequences Impact-contrasted activity sequences Impact-reversed activity sequences Impact-targeted combined association

clusters

Page 25: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 25

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Data intelligence Activity data Itemset imbalance Impact imbalance Seasonal effect Demographic data Transactional data Itemset/tuple selection/construction

Page 26: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 26

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Domain intelligence Business process/event for activity selection Domain knowledge Feature selection Sequence construction Impact target

Positive impact Negative impact Multi-level impacts

Feature/attribute selection Interestingness definition New pattern structures

Page 27: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 27

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Organizational/social factors Operational/intervention activities Seasonal business requirement/

interaction changes Business cost (debt amount/duration) Business benefit (saving/preventing debt

amount or reducing debt duration) Deliverable format

Page 28: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 28

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Impact-reserved pattern pair Underlying pattern 1: Derivative pattern 2:

Impact-targeted combined association clusters

Page 29: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 29

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Conditional impact ratio (Cir)

Conditional Piatetsky-Shapiro’s (P-S) ratio (Cps)

Page 30: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 30

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Interestingness: tech & biz

Page 31: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 31

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

The process

Page 32: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 32

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Impact-reversed sequential activity patterns

Page 33: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 33

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Demographic + transactional combined pattern

Page 34: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 34

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

D3M ReferencesBooks: Cao, L. Yu, P.S., Zhang, C., Zhao, Y. Domain Driven Data Mining, Springer, 2009. Cao, L. Yu, P.S., Zhang, C., Zhang, H.(ed.) Data Mining for Business Applications, Springer, 2008.

Workshops: Domain-driven data mining 2008, joint with ICDM2008. Domain-driven data mining 2007, joint with SIGKDD2007.

Special issues: Domain-driven data mining, IEEE Trans. Knowledge and Data Engineering, 2009. Domain-driven, actionable knowledge discovery, IEEE Intelligent Systems, Department, 22(4): 78-89, 2007.

Some of relevant papers: Longbing Cao, Yanchang Zhao, Huaifeng Zhang, Dan Luo, Chengqi Zhang. Flexible Frameworks for Actionable

Knowledge Discovery, submitted to IEEE Trans. on Knowledge and Data Engineering. Cao, L., Zhang, H., Zhao, Y., Zhang, C. Combined Mining: Discovering More Informative Knowledge in e-

Government Services, submitted to ACM TKDD, 2008. Cao, L., Dai, R., Zhou, M.: Metasynthesis, M-Space and M-Interaction for Open Complex Giant Systems, technical

report, 2008. Cao, L. and Ou, Y. Market Microstructure Patterns Powering Trading and Surveillance Agents. Journal of Universal

Computer Sciences, 2008 (to appear). Cao, L. and He, T. Developing actionable trading agents, Knowledge and Information Systems: An International

Journal, 2008. Cao, L. Developing Actionable Trading Strategies, in edited book: Intelligent Agents in the Evolution of WEB and

Applications, Springer, 2008.

Page 35: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 35

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Some of relevant papers: Cao, L., Zhao, Y., Zhang, C. (2008), Mining Impact-Targeted Activity Patterns in Imbalanced Data, IEEE

Trans. Knowledge and Data Engineering, IEEE, , Vol. 20, No. 8, pp. 1053-1066, 2008. Cao, L., Yu, P., Zhang, C., Zhao, Y., Williams, G.:DDDM2007: Domain Driven Data Mining, ACM SIGKDD

Explorations Newsletter, 9(2): 84-86, 2007. Cao, L., Zhang, C.: Knowledge Actionability: Satisfying Technical and Business Interestingness,

International Journal of Business Intelligence and Data Mining, 2(4): 496-514, 2007. Cao, L., Zhang, C.: The Evolution of KDD: Towards Domain-Driven Data Mining, International Journal of

Pattern Recognition and Artificial Intelligence, 21(4): 677-692, 2007. Cao, L.: Domain-Driven Actionable Knowledge Discovery, IEEE Intelligent Systems, 22(4): 78-89, 2007. Cao, L., and Zhang, C. Domain-driven data mining: A practical methodology, International Journal of

Data Warehousing and Mining (IJDWM), IGI Global, 2(4):49-65, 2006.

Page 36: An Overview of Domain-Driven Data Mining:   Toward Actionable Knowledge Discovery (AKD)

15 December 2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008 36

DD33MM:: D

om

ain

-Dri

ven

Data

Min

ing

The Smart Lab: datamining.it.uts.edu.au

Thank you!

Longbing CAO

Faculty of Engineering and ITUniversity of Technology, Sydney, Australia

Tel: 61-2-9514 4477 Fax: 61-2-9514 1807email: [email protected]: www-staff.it.uts.edu.au/~lbcao/The Smart Lab: datamining.it.uts.edu.au