transforming data into knowledge › ~a78khan › docs › information...transforming data into...

89
Transforming Data Transforming Data into Knowledge into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning

Upload: others

Post on 05-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

Transforming DataTransforming Datainto Knowledgeinto Knowledge

Presented by:

Atif Khan

using Knowledge Engineering

&Machine Learning

Page 2: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

Scope

Building Blocks ontology inference machine learning

DBpedia

InfoTrellisALLSight

Tools

Page 3: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 3

Motivation

Data Streams » Information?

.....1101111000011

.....1101100011111

.....000010010100

.....1101111111011

.....110110001100

.....011010010100

Page 4: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 4

Motivation

Information » KnowledgeCharles Smith @profcharlesJust finished registering for KDD 2012 in Beijing http://www.kdd.org/kdd2012/

Charles Smith shared a linkI am off to knowledge discovery and data mining conference in Beijing. Looking forward to Michael Jordan's keynote address

Page 5: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 5

Motivation

Information » Knowledge

KDD2012

Beijing

URLwww..kdd2012/

CharlesSmith

CharlesSmith

Conference

Knowledge Discoveryand Data Mining

Beijing

MichaelJordan

Page 6: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 6

Motivation

Information » Knowledge

KDD2012

Beijing

URLwww..kdd2012/

CharlesSmith

CharlesSmith

Conference

Knowledge Discoveryand Data Mining

Beijing

registered for

website

heldAt

MichaelJordan

Page 7: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 7

Motivation

Information » Knowledge

KDD2012

Beijing

URLwww..kdd2012/

CharlesSmith

CharlesSmith

Conference

Knowledge Discoveryand Data Mining

Beijing

registered for

website

heldAt

isa

heldAt

isKeyNoteSpeaker

attending

MichaelJordan

Page 8: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 8

Motivation

Information » Knowledge

KDD2012

Beijing

URLwww..kdd2012/

CharlesSmith

CharlesSmith

Conference

Knowledge Discoveryand Data Mining

Beijing

MichaelJordan

same as

registered for

same as

same as

website

heldAt

isa

heldAt

isKeyNoteSpeaker

attending

Page 9: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 9

Motivation

Information » Knowledge■ Charles Smith is an academic

■ KDD is a conference about data mining and knowledge discovery

■ Michael Jordan is an influential academic in data mining community

■ Charles Smith and Michael Jordan will both be in Beijing during KDD 2012

Page 10: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 10

Building Blocks

ontology inference

machinelearning

Page 11: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 11

Hybrid MDSS - Holmes**

Holmes■ a hybrid medical decision support system (MDSS)

Based on ■ ontological knowledge representation

■ logic-based inference

■ machine learning to deal with noise

**Atif Khan, John Doucette, Robin Cohen. "Validation of an Ontological Medical Decision Support System for Patient Treatment Using a Repository of Patient Data: Insights into the Value of Machine Learning". ACM Transactions on Intelligent Systems and Technology (TIST) 2012.

Page 12: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 12

Hybrid MDSS

Line of Inquiry■ which patients can be prescribed

what sleep medications?

Considerations■ patient-centric & evidence-based

■ automated

■ easy to explain and validate

■ noise tolerant

Page 13: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 13

Hybrid MDSS

Datasets

CDC–Behavioral Risk Factor Surveillance System (BRFSS) – 2010

Page 14: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 14

Hybrid MDSS

Datasets

multi-dimensionala. demographic informationb. medical informationc. behavioural Information

Page 15: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 15

Hybrid MDSS

Datasets

Characteristicsa. 450K+ individualsb. 400+ attributes/recordc. high “missingness”d. numeric coding

Page 16: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 16

Hybrid MDSS

Datasets

sleeping pill prescription protocol(HTML)

Page 17: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 17

Hybrid MDSS

Datasets

drug-to-drug interactions(HTML)

Page 18: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 18

Ontology Model

Drug

Patient

Disease

Condition

Page 19: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 19

Ontology Model

Drug

Patient

Disease

Condition

Pain pills

Sleepingpills

Age GenderMale

Female

MedicalRecord

BRFSSField

Page 20: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 20

Ontology Model

Drug

Patient

Disease

Condition

Pain pills

Sleepingpills

Age GenderMale

Female

MedicalRecord

BRFSSField

isa

Page 21: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 21

Ontology Model

Drug

Patient

Disease

Condition

Pain pills

Sleepingpills

Age GenderMale

Female

MedicalRecord

BRFSSField

hasContraIndication

isa

Page 22: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 22

Ontology Model

Drug

Patient

Disease

Condition

Pain pills

Sleepingpills

Age GenderMale

Female

MedicalRecord

BRFSSField

hasContraIndication

isa

hasRecord

isTakingprescribedTo hasC

ondition

hasAgehasGender

hasField

hasDisease

Page 23: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 23

An Ontology Model

Defines■ taxonomy

● a hierarchy of concepts

■ relationships

Scope - “domain of discourse”■ e.g. medical decision support system

Page 24: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 24

Mapping Raw Data

Patient1

MedicalRecord1

GenderField

2hasValue

AgeField

66hasValue

SnoreField

1hasValue

Page 25: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 25

Mapping Raw Data

Patient1

MedicalRecord1

GenderField

2hasValue

Female

hasGender

AgeField

66hasValue

SnoreField

1hasValue

Page 26: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 26

Mapping Raw Data

Patient1

MedicalRecord1

GenderField

2hasValue

Female

hasGender

AgeField

66hasValue hasCondition

Elderly

SnoreField

1hasValue

Page 27: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 27

Mapping BRFSS Data

Patient1

MedicalRecord1

GenderField

2hasValue

Female

hasGender

AgeField

66hasValue hasCondition

Elderly

SnoreField

1hasValue hasDisease

SleepApnea

Page 28: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 28

Mapping Expert Knowledge

Ramelteon

Eszopiclone

Insomnia

prescribedFor

Page 29: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 29

Mapping Expert Knowledge

Ramelteon

Eszopiclone

LungDisease

LiverDisease

Depression

SleepApnea

Asthma

Insomnia

hasContraIndication

prescribedFor

Page 30: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 30

Mapping Expert Knowledge

Ramelteon

Eszopiclone

Pregnancy

BreastFeeding

Elderly

AlcoholAbuse

LungDisease

LiverDisease

Depression

SleepApnea

Asthma

Insomnia

hasContraIndication

prescribedFor

Page 31: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 31

Mapping Expert Knowledge

Ramelteon

Eszopiclone

Pregnancy

BreastFeeding

Elderly

AlcoholAbuse

Propoxyphene

LungDisease

LiverDisease

Depression

SleepApnea

Asthma

Insomnia

hasContraIndication

prescribedFor

Page 32: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 32

Mapping Expert Knowledge

:Propoxyphene a :Drug; :isPrescribedFor :Pain; :hasContraIndication

:Eszopiclone.

:Wygesic a :Drug; :isPrescribedFor :Pain; :hasContraIndication :Eszopiclone.

:Trycet a :Drug; :isPrescribedFor :Pain; :hasContraIndication

:Eszopiclone.

:PropoxypheneCompound65 a :Drug; :isPrescribedFor :Pain; :hasContraIndication

:Eszopiclone.

:Propacet100 a :Drug; :isPrescribedFor :Pain; :hasContraIndication :Eszopiclone.

:Aspirin a :Drug; :isPrescribedFor :Pain.

:Tylenol1 a :Drug; :isPrescribedFor :Pain.

:Tylenol2 a :Drug; :isPrescribedFor :Pain; :hasContraIndication :SleepingMedication.

N3 representation

Page 33: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 33

Mapping Expert Knowledge

:Propoxyphene a :Drug; :isPrescribedFor :Pain; :hasContraIndication

:Eszopiclone.

:Wygesic a :Drug; :isPrescribedFor :Pain; :hasContraIndication :Eszopiclone.

:Trycet a :Drug; :isPrescribedFor :Pain; :hasContraIndication

:Eszopiclone.

:PropoxypheneCompound65 a :Drug; :isPrescribedFor :Pain; :hasContraIndication

:Eszopiclone.

:Propacet100 a :Drug; :isPrescribedFor :Pain; :hasContraIndication :Eszopiclone.

:Aspirin a :Drug; :isPrescribedFor :Pain.

:Tylenol1 a :Drug; :isPrescribedFor :Pain.

:Tylenol2 a :Drug; :isPrescribedFor :Pain; :hasContraIndication :SleepingMedication.

N3 representation

Page 34: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 34

Knowledge Representation-Recap

Why?■ to create, maintain

and share informationin a precise manner without ambiguity of meaning

How?■ ontologies

“Now! That should clear up a few things around here!”

http://photos1.blogger.com/blogger2/1715/1669/1600/larson-oct-1987.gif

Page 35: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 35

Knowledge-Inference

a discovery processto find implied knowledge

using explicitly defined information

Page 36: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 36

Knowledge-Inference

logic-based

a discovery processto find implied knowledge

using explicitly defined information

ontology

Page 37: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 37

Knowledge-Inference: example

What do we know about Mary?

■ Mary is grandmother

■ Mary is a grand parent

■ Mary is woman

Mary EmilyJames

hasChild

hasChild

Page 38: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 38

Knowledge-Inference: example

What do we know about Mary?

■ Mary is grandmother

■ Mary is a grand parent

■ Mary is woman

Mary EmilyJames

hasChild

hasChild

if a person has a child, and that child also has a child, then the person is a grandparent

Page 39: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 39

Inference Rules

Drug-to-Drug Interactions■ If a patient is taking an existing drug (D1) and

D1 has contraindication to another drug D2 then drug D2 should not be prescribed to the patient

{ ?P a :Patient. ?D1 a :Drug.?D2 a :Drug.

?P :isTaking ?D1.?D1 :hasContraIndication ?D2.

} => {?P :cannotBeGiven ?D2}.

Page 40: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 40

Inference Rules

Drug-to-Disease Interactions■ If a patient has a condition that has a contra

indication to a drugthen the patient should not be given the drug

{ ?P a :Patient.?D a :Drug.?DIS a :Disease.

?P :hasDisease ?DIS.?D :hasContraIndication ?DIS.

} => {?P :cannotBeGiven ?D}.

Page 41: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 41

Putting it all Together

triplestore

reasonerquery

inferencerules

Page 42: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 42

Putting it all Together

triplestore

reasonerquery

inferencerules

Result

Proof

Page 43: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 43

Putting it all Together

Result

Proof

logic-based, can be verified by traversing the knowledge graph

query answer

triplestore

reasonerquery

inferencerules

Page 44: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 44

But What about Noise?

Mary Emily?

hasChild

hasChild

Noise■ cripples knowledge-based solutions

■ limits inference capability

Page 45: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 45

But What about Noise?

Noise■ cripples knowledge-based solutions

■ limits inference capability

Mary EmilyJames

hasChild

?

Page 46: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 46

Data is Almost Always Noisy

use “machine learning”

to deal with noise

Page 47: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 47

Machine Learning?

feature 2age

feature 1income

query: given a person's age and income predict if they are happy or sad

Page 48: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 48

Machine Learning?

feature 2age

feature 1income

query: given a person's age and income predict if they are happy or sad

happy people

sad people

Page 49: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 49

Machine Learning?

query: is he a happy or sad person ?

happy person

sad person

Page 50: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 50

Machine Learning 101

Machine Learning■ classification: predict class of an instance

■ regression: prediction of a numeric value

■ clustering: group similar items together

Page 51: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 51

Machine Learning 101

Machine Learning■ classification: predict class of an instance

■ regression: prediction of a numeric value

■ clustering: group similar items together

Page 52: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 52

Machine Learning 101

Machine Learning■ classification: predict class of an instance

Page 53: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 53

Classification

0. Train■ build a classifier based on known exemplars

1. Predict

2. Update

3. Evaluate

Page 54: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 54

Dealing with Noise

query: is John depressed ?

but we don't have access to that information

Page 55: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 55

Dealing with Noise

query: is John depressed ?

depressed

Training exemplars

unknown

healthy

Page 56: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 56

Dealing with Noise

answer:

sameAs( , ) = 0.14

sameAs( , ) = 0.85

sameAs( , ) = 0.01

query: is John depressed ?

Page 57: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 57

Dealing with Noise

answer:

sameAs( , ) = 0.14

sameAs( , ) = 0.85

sameAs( , ) = 0.01

query: is John depressed ?

Page 58: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 58

Recap

a hybrid decision supportsystem with ontology-based knowledge representation and logic- based reasoning augmented with machine learning classification fornoise tolerance

Page 59: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 59

Recap

a hybrid decision support system with ontology-based knowledge representation and logic-based reasoning augmented with machine learning classification for noise tolerance

Page 60: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 60

Taming Data in the Wild

Page 61: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 61

Taming Data in the Wild

“Data-to-knowledge” ■ an expensive journey

however, connecting the “data dots” is the key

“Linked Data”*■ can make this a reality

* http://linkeddata.org/

Page 62: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 62

Linked Data

Brief Summary■ Tim Berners-Lee's vision

■ URIs* to identify 'things'

■ HTTP-based dereferencing of URIs

■ structured representation (RDF**)

■ hyperlink 'things' together

Tim Berners-Lee

http://www.w3.org/Press/Stock/Berners-Lee/2001-eur-head-quarter.jpg

* URI = Universal Resource Identifier ** RDF = Resource Description Framework

Page 63: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 63

DBpedia

Extract Information from Wikipedia■ unstructured information

● articles (free text + noise)

■ structured components● infobox templates● categorisation information● images● geo-coordinates, ● external web links

http://wiki.dbpedia.org

Page 64: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 64

DBpedia

Knowledge Engineering using Ontology*■ 359 classes

● in a subsumption hierarchy

■ 1,775 different properties

*http://wiki.dbpedia.org/Ontology

Page 65: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 65

DBpedia

Knowledge Engineering using Ontology

http://wiki.dbpedia.org/Datasets

“English version of the DBpedia knowledge base currently

describes 3.77 million things, out of which 2.35 million are classified in a consistent Ontology, including

764,000 persons, 573,000 places (including

387,000 populated places), 333,000 creative works (including 112,000 music albums, 72,000

films and 18,000 video games), 192,000 organizations (including 45,000 companies

and 42,000 educational institutions), 202,000 species and 5,500 diseases.”

Page 66: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 66

DBpedia – Interlinked Datasets

2007

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2007-05-01.png

Page 67: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 67

2009

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2009-03-05_colored.png

Page 68: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 68

2011

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png

Page 69: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 69

DBpedia in Action

http://www.visualdataweb.org/relfinder/relfinder.php

Demo/Video

Page 70: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 70

know your customer

ALLSight Platform

Page 71: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 71

AllSight – 360 View of a Customer

Page 72: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 72

AllSight – 360 View of a Customer

Page 73: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 73

AllSight – 360 View of a Customer

Page 74: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 74

AllSight – 360 View of a Customer

Page 75: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 75

product

Example

location

transactions

customer

Page 76: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 76

Example

female

teenager

adult

senior

male

tech buff

shopaholic

Bieber nation

customer

Page 77: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 77

Page 78: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 78

Page 79: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 79

Page 80: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 80

Under the Hood

Data Enrichment■ taxonomies/ontologies

■ dictionaries/catalogues

Matching (decision making)■ knowledge-based rules

■ machine learning

Page 81: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 81

Challenges

Entity Resolution

Pre-processing

Feature Selection

Training (& retraining)

Noise

Page 82: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 82

Tools of the Trade

Page 83: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 83

Some Basic Tools

Knowledge Engineering■ Protégé

■ Web Ontology Language – OWL

■ Resource Description Framework (RDF)● XML● Notation 3 (N3)

Page 84: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 84

Some Basic Tools

Semantic Reasoners■ CWM

■ Euler Sharp

■ Jenna

■ FuXi

Page 85: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 85

Some Basic Tools

Machine Learning Toolkits■ Weka

■ Stanford NLP

■ Apache Mahout*

■ LIBLINEAR*

■ LIBSVM*

■ numpy, scipy (python libs)

Page 86: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 86

In Summary

The collaborative work of this group has the potential to unlock data to create knowledge

There are many■ uses cases that can benefit from this work

■ tools available to facilitate this process

Page 87: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 87

Thank You!!

Page 88: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 88

Confidence in Your Data

* Adler on Data Governancehttps://www.ibm.com/developerworks/mydeveloperworks/blogs/adler/entry/the_ibm_big_data_governance_summit3?lang=en

“Velocity, Volume, and Variety without Veracity creates Vulnerability”*

Page 89: Transforming Data into Knowledge › ~a78khan › docs › Information...Transforming Data into Knowledge Presented by: Atif Khan using Knowledge Engineering & Machine Learning Scope

01-24-2013 89

Confidence in Your Data

“Velocity, Volume, and Variety without Veracity creates Vulnerability”*

* Adler on Data Governancehttps://www.ibm.com/developerworks/mydeveloperworks/blogs/adler/entry/the_ibm_big_data_governance_summit3?lang=en

evaluate confidence values