transforming big data into smart data: deriving value via harnessing volume, variety, and velocity...

90
Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web Put Knoesis Banner Keynote at 30 th IEEE International Conference on Data Engineering (ICDE) 2014 Amit Sheth LexisNexis Ohio Eminent Scholar & Exec. Director, The Ohio Center of Excellence in Knowledge-enabled Computing ( Kno.e.sis ) Wright State, USA

Upload: amit-sheth

Post on 07-May-2015

3.179 views

Category:

Education


3 download

DESCRIPTION

Keynote given at ICDE2014, April 2014. Details at: http://ieee-icde2014.eecs.northwestern.edu/keynotes.html A video of a version of this talk is available here: http://youtu.be/8RhpFlfpJ-A (download to see many hidden slides). Two versions of this talk, targeted at Smart Energy and Personalized Digital Health domains/apps at: http://wiki.knoesis.org/index.php/Smart_Data Previous (older) version replaced by this version: http://www.slideshare.net/apsheth/big-data-to-smart-data-keynote

TRANSCRIPT

Page 1: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity

using semantics and Semantic Web

Put Knoesis Banner

Keynote at 30th IEEE International Conference on Data Engineering (ICDE) 2014

Amit ShethLexisNexis Ohio Eminent Scholar & Exec. Director,

The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Wright State, USA

Page 2: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

2

Page 3: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Amit Sheth’s PHD students

Ashutosh Jadhav

Hemant Purohit

Vinh Nguyen Lu Chen

Pramod AnantharamSujan

Perera

Alan Smith

Maryam Panahiazar

Sarasi Lalithsena

Cory Henson

Kalpa Gunaratna

Delroy Cameron

Sanjaya Wijeratne

Wenbo Wang

Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)Special Thanks

Pavan Kapanipathi

Special Thanks Special Thanks

Special Thanks

Shreyansh Bhatt

Acknowledgements: Kno.e.sis team, Funds - NSF, NIH, AFRL, Industry…

Page 4: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

4

2011

How much data?

48(2013)

500(2013)

http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/

Page 5: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

5

Only 0.5% to 1% of the data is used for analysis.

http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume

Page 6: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

6

Variety – not just structure but modality: multimodal, multisensory

Structured

Unstructured

Semi structured

Audio

Video

Images

Page 7: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

7

Velocity

Fast Data

Rapid Changes

Real-Time/Stream Analysis

Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail

Page 8: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

9

• What if your data volume gets so large and varied you don't know how to deal with it?

• Do you store all your data?• Do you analyze it all?• What is coverage, skew, quality?

How can you find out which data points are really important?

• How can you use it to your best advantage?

Questions typically asked on Big Data

http://www.sas.com/big-data/

Page 9: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

10http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/

Variety of Data Analytics Enablers

Page 10: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

11

• Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematical

models to test the search terms that provided 45 important parameters

– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• NY city manholes problem [ICML Discussion, 2012]

Illustrative Big Data Applications

Page 11: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

12

Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination, personalized smart energy)

What is missing?

Page 12: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

13

highly personalized/individualized/contextualized Incorporate real-world complexity:

- multi-modal and multi-sensory nature of physical-world and human perception

Can More Data beat better algorithms? Can Big Data replace human judgment?

Many opportunities, many challenges, lessons to apply

Page 14: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

16

What is needed? Taking inspiration from cognitive models

• Bottom up and top down cognitive processes: – Bottom up: find patterns, mine (ML, …)– Top down: Infusion of models and background

knowledge (data + knowledge + reasoning)

Left(plans)/Right(perceives) BrainTop(plans)/Bottom(perceives) Brainhttp://online.wsj.com/news/articles/SB10001424052702304410204579139423079198270

Page 15: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

17

• Ambient processing as much as possible while enabling natural human involvement to guide the system

What is needed?

Smart Refrigerator: Low on Apples

Adapting the Plan: shopping for apples

Page 16: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

18

Contextual

Information Smart Data

Makes Sense to a human

Is actionable – timely and better decisions/outcomes

Page 17: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

20

My 2004-2005 formulation of SMART DATA - Semagix

Formulation of Smart Data strategy providing services for Search, Explore, Notify.

“Use of Ontologies and Data repositories to gain

relevant insights”

Page 18: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

21

Smart Data (2013 retake)

Smart data makes sense out of Big data

It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-

turn providing actionable information and improve decision

making.

Page 19: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

22

OF human, BY human FOR human

Smart data is focused on the actionable value achieved by human

involvement in data creation, processing and consumption phases

for improving the human experience.

Another perspective on Smart Data

Page 20: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

23

OF human, BY human FOR human

Another perspective on Smart Data

Page 21: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

24Petabytes of Physical(sensory)-Cyber-Social Data everyday!

More on PCS Computing: http://wiki.knoesis.org/index.php/PCS

‘OF human’ : Relevant Real-time Data Streams for Human Experience

Page 22: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

25

OF human, BY human FOR human

Another perspective on Smart Data

Page 23: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Use of Prior Human-created Knowledge Models

26

‘BY human’: Involving Crowd Intelligence in data processing workflows

Crowdsourcing and Domain-expert guided Machine Learning Modeling

Page 24: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

27

OF human, BY human FOR human

Another perspective on Smart Data

Page 25: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

28

Detection of events, such as wheezing sound, indoor temperature, humidity,

dust, and CO level

Weather Application

Asthma Healthcare Application

Close the window at home during day to avoid CO in

gush, to avoid asthma attacks at night

‘FOR human’ : Improving Human Experience

Population Level

Personal

Public Health

Action in the Physical World

Luminosity

CO levelCO in gush during day time

Page 26: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

29

Electricity usage over a day, device at work, power consumption, cost/kWh,

heat index, relative humidity, and public events from social stream

Weather Application

Power Monitoring Application

‘FOR human’ : Improving Human Experience

Population Level Observations

Personal Level Observations

Action in the Physical World

Washing and drying has resulted in significant cost

since it was done during peak load period. Consider

changing this time to night.

Page 27: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

30

Every one and everything has Big Data –It is Smart Data that matter!

Page 28: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

31

• Healthcare: ADFH, Asthma, GI– Using kHealth system

• Social Media Analysis:Crisis coordination– Using Twitris platform

• Smart Cities: Traffic management

I will use applications in 3 domains to demonstrate

Page 29: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

43

• Healthcare: ADFH, Asthma, GI– Using kHealth system

• Social Media Analysis:Crisis coordination– Using Twitris platform

• Smart Cities: Traffic management

Smart Data Applications

Page 30: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

44

A Historical Perspective on Collecting Health Observations

Diseases treated onlyby external observations

First peek beyond justexternal observations

Information overload!

Doctors relied only on external observations

Stethoscope was the first instrument to go beyond just external

observations

Though the stethoscope has survived, it is only one among many observations

in modern medicine

http://en.wikipedia.org/wiki/Timeline_of_medicine_and_medical_technology

2600 BC ~1815 Today

Imhotep

Laennec’s stethoscope

Image Credit: British Museum

Page 31: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

45

The Patient of the FutureMIT Technology Review, 2012

http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/

Page 32: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

46

Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information

canary in a coal mine

Empowering Individuals (who are not Larry Smarr!) for their own health

kHealth: knowledge-enabled healthcare

Page 33: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Weight Scale

Heart Rate Monitor

Blood PressureMonitor

47

Sensors

Android Device (w/ kHealth App)

Readmissions cost $17B/year: $50K/readmission; Total kHealth kit cost: <

$500

kHealth Kit for the application for reducing ADHF readmission

ADHF – Acute Decompensated Heart Failure

Page 34: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

48

1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.

25 million

300 million

$50 billion

155,000

593,000

People in the U.S. are diagnosed with asthma (7 million are children)1.

People suffering from asthma worldwide2.

Spent on asthma alone in a year2

Hospital admissions in 20063

Emergency department visits in 20063

Asthma: Severity of the problem

Page 35: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Sensordrone (Carbon monoxide,

temperature, humidity) Node Sensor

(exhaled Nitric Oxide)

49

Sensors

Android Device (w/ kHealth App)

Total cost: ~ $500

kHealth Kit for the application for Asthma management

*Along with two sensors in the kit, the application uses a variety of population level signals from the web:

Pollen level Air Quality Temperature & Humidity

Page 36: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

51

Data Overload for Patients/health aficionados

Providing actionable information in a timely manner is crucial to avoid information overload or fatigue

Personal level Signals

Public level Signals

Population level Signals

Page 37: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

52

Data Overload Spanning Physical-Cyber-Social Modalities

Increasingly, real-world events are: (a)Continuous: Observations are fine grained over time

(b)Multimodal, multisensory: Observations span PCS modalities

Page 38: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

54

what can we do to avoid asthma episode?

Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.

Variety Volume

VeracityVelocity

ValueWhat risk factors influence asthma control?What is the contribution of each risk factor?

sem

antic

s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information

WHY Big Data to Smart Data: Asthma example

Page 39: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

kHealth: Health Signal Processing Architecture

Personal level Signals

Public level Signals

Population level Signals

Domain Knowledge

Risk Model

Events from Social Streams

Take Medication before going to work

Avoid going out in the evening due to high pollen levels

Contact doctor

AnalysisPersonalized Actionable

Information

Data Acquisition & aggregation

55

Page 40: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

57

Asthma Domain Knowledge

Domain Knowledge

ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist

Asthma Control and Actionable Information

Page 41: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

58

Patient Health Score (diagnostic)

Risk assessment model

Semantic Perception

Personal level Signals

Public level Signals

Domain Knowledge

Population level Signals

GREEN -- Well Controlled YELLOW – Not well controlledRed -- poor controlled

How controlled is my asthma?

Page 42: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

59

Patient Vulnerability Score (prognostic)

Risk assessment model

Semantic Perception

Personal level Signals

Public level Signals

Domain Knowledge

Population level Signals

Patient health Score

How vulnerable* is my control level today?

*considering changing environmental conditions and current control level

Page 43: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

60

3.4 billion people will have smartphones or tablets by 2017 -- Research2Guidance

“Intelligence at the Edges” for Digital Health

http://www.digikey.com/us/en/techzone/energy-harvesting/resources/articles/zigbees-smart-energy-20-profile.html

m-health app market is predicted to reach $26 billion in 2017 -- Research2Guidance

Page 44: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

63

Sensordrone – for monitoring environmental air quality

Wheezometer – for monitoringwheezing sounds

Can I reduce my asthma attacks at night?

What are the triggers? What is the wheezing level?

What is the propensity toward asthma?

What is the exposure level over a day?

Commute to Work

Asthma: Actionable Information for Asthma Patients

Luminosity

CO level

CO in gush during day time

Actionable Information

Personal level Signals

Public level Signals

Population level Signals

What is the air quality indoors?

Page 45: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

64

Population Level

Personal

Wheeze – YesDo you have tightness of chest? –Yes

Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding

<Wheezing=Yes, time, location>

<ChectTightness=Yes, time, location>

<PollenLevel=Medium, time, location>

<Pollution=Yes, time, location>

<Activity=High, time, location>

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

RiskCategory

<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>

.

.

.

Expert Knowledge

Background Knowledge

tweet reporting pollution level and asthma attacks

Acceleration readings fromon-phone sensors

Sensor and personal observations

Signals from personal, personal spaces, and community spaces

Risk Category assigned by doctors

Qualify

Quantify

Enrich

Outdoor pollen and pollution

Public Health

Health Signal Extraction to Understanding

Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor

Page 46: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

70

RDF OWL

How are machines supposed to integrate and interpret sensor data?

Semantic Sensor Networks (SSN)

Page 47: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

71

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

Page 48: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

73

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

Page 49: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

SSNOntology

2 Interpreted data(deductive)[in OWL] e.g., threshold

1 Annotated Data[in RDF]e.g., label

0 Raw Data[in TEXT]e.g., number

Levels of Abstraction

3 Interpreted data (abductive)[in OWL]e.g., diagnosis

Intellego

“150”

Systolic blood pressure of 150 mmHg

ElevatedBlood

Pressure

Hyperthyroidism

less

use

ful …

mor

e us

eful

……

75

Page 50: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

76

Making sense of sensor data with

Page 51: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

77

People are good at making sense of sensory input

What can we learn from cognitive models of perception?• The key ingredient is prior knowledge

Page 52: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

78* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle*

Translating low-level signals into high-level knowledge

Focusing attention on those aspects of the environment that provide useful information

Prior Knowledge

Page 53: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

79

To enable machine perception,

Semantic Web technology is used to integrate sensor data with prior knowledge on the Web

Page 54: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

80

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Page 55: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

81

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

Page 57: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

85

Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features

ObserveProperty

PerceiveFeature

Explanation

Discrimination2

Focusing attention on those aspects of the environment that provide useful information

Discrimination

Page 58: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

89

Discrimination

Discriminating Property: is neither expected nor not-applicable

DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Discriminating Property Explanatory Feature

Page 59: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

90

Semantic scalability: Resource savings of abstracting sensor data

Orders of magnitude resource savings for generating and storing relevant abstractions vs. raw observations.

Relevant abstractions

Raw observations

Page 60: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

92

How do we implement machine perception efficiently on aresource-constrained device?

Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time

• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)

Page 61: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

93

intelligence at the edge

Approach 1: Send all sensor observations to the cloud for processing

Approach 2: downscale semantic processing so that each device is capable of machine perception

Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.

Page 62: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

94

Efficient execution of machine perception

Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning

0101100011010011110010101100011011011010110001101001111001010110001101011000110100111

Page 63: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

95

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to

linear

Evaluation on a mobile device

Page 64: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

96

2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web

3 Intelligence at the edgeBy downscaling semantic inference, machine perception can

execute efficiently on resource-constrained devices

Semantic Perception for smarter analytics: 3 ideas to takeaway

1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making

Page 65: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

98

• Healthcare: ADFH, Asthma, GI– Using kHealth system

• Social Media Analysis:Crisis coordination– Using Twitris platform

• Smart Cities: Traffic management

Smart Data Applications

Page 66: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

99

Smart Data for Social Good

Mining human behavior to help societal and humanitarian development• crisis response coordination,

harassment, gender-based violence, …

Page 67: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

100

20 million tweets with “sandy, hurricane” keywords between Oct 27th and Nov 1st

2nd most popular topic on Facebook during 2012

Social (Big) Data during Crisis- Example of Hurricane Sandy

• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding

• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html

• http://mashable.com/2012/10/31/hurricane-sandy-facebook/

Page 68: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

103

Social Semantic

Web Application

Real time

Multi Faceted

Analysis

Insights of Important Events including disaster response

coordination

http://usatoday30.usatoday.com/news/politics/twitter-election-meter

http://twitris.knoesis.org/

Page 69: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

104

Twitris’ Dimensions of Integrated Semantic Analysis

Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2014

Page 70: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

113

What is Smart Data in the context of Disaster Management

ACTIONABLE: Timely delivery of right resources and information to the right people at right location!

Because everyone wants to Help, but DON’T KNOW HOW!

Page 71: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

114

Really sparse Signal to Noise:• 2M tweets during the first 48 hrs. of #Oklahoma-tornado-2013

- 1.3% as the precise resource donation requests to help - 0.02% as the precise resource donation offers to help

• Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity (OFFER)

• I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

Disaster Response Coordination:Finding Actionable Nuggets for Responders to act

• Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST)

• Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST)

For responders, most important information is the scarcity and availability of resources

Blog by our colleague Patrick Meier on this analysis: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/

Page 72: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Join us for the Social Good!

http://twitris.knoesis.org

RT @OpOKRelief: Southgate Baptist Church

on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call 794

Text \"FOOD\" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10

in storm relief. #moore #oklahoma

#disasterrelief #donate

Want to help animals in #Oklahoma? @ASPCA tells

how you can help: http://t.co/mt8l9PwzmO

CITIZEN SENSORS

RESPONSE TEAMS (including humanitarian

org. and ‘pseudo’ responders)

VICTIM SITE

Coordination of needs and offers

Using Social MediaDoes anyone

know where to send a check to donate to the

tornado victims?

Where do I go to help out for

volunteer work around Moore? Anyone know?

Anyone know where to donate

to help the animals from the

Oklahoma disaster?

#oklahoma #dogs

Matched

Matched

Matched

Serving the need!

If you would like to volunteer today, help is desperately

needed in Shawnee. Call 273-5331 for more info

http://www.slideshare.net/knoesis/iccm-2013ignitetalkhemantpurohitunnairobi 115Purohit et al. Emergency-relief coordination on social media: Automatically matching resource requests and offers, 2014. With Int’l collaborator

QCRI

Page 73: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

126

Continuous Semantics for Evolving Events to Extract Smart Data

Page 74: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

127

Heliopolis is a suburb of

Cairo.

Dynamic Model Creation

Continuous Semantics

Page 75: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

130

• Healthcare: ADFH, Asthma, GI– Using kHealth system

• Social Media Analysis:Crisis coordination– Using Twitris platform

• Smart Cities: Traffic management

Smart Data Applications

Page 76: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

131

Traffic Management

To improve the everyday life entangled due to our most common problem of ‘stuck in traffic’

Page 77: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

1321IBM Smarter Traffic

Severity of the Traffic Problem

Page 78: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

133

Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual)

http://511.org/

Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.

Variety Volume

VeracityVelocity

ValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we estimate traffic delays in a road network?

sem

antic

s Representing prior knowledge of traffic lead to a focused exploration of this massive dataset

Big Data to Smart Data: Traffic Management example

Page 79: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

134

Duration: 36 months

Requested funding: 2.531.202 €

CityPulse Consortium

City of Aarhus

City of Brasov

Page 80: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Textual Streams for City Related Events

135

Page 81: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

City Infrastructure

Tweets from a cityPOS

Tagging

Hybrid NER+ Event term extraction

Geohashing

Temporal Estimation

Impact Assessment

Event Aggregation

OSM Locations

SCRIBE ontology

511.org hierarchy

City Event Extraction

City Event Extraction Solution Architecture

City Event Annotation

OSM – Google Open Street MapsNER – Named Entity Recognition 136

Page 82: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

City Event Annotation – CRF Annotation Examples

Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O

B-LOCATIONI-LOCATIONB-EVENTI-EVENTO

Tags used in our approach:

These are the annotations providedby a Conditional Random Field modeltrained on tweet corpus to spotcity related events and location

BIO – Beginning, Intermediate, and Other is a notation used in multi-phrase entity spotting 138

Page 83: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

City Events from Sensor and Social Streams can be…

• Complementary• Additional information• e.g., slow traffic from sensor data and accident from textual data

• Corroborative• Additional confidence• e.g., accident event supporting a accident report from ground truth

• Timely • Additional insight• e.g., knowing poor visibility before formal report from ground truth

143

Page 84: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Events from Social Streams and City Department*

Corroborative EventsComplementary Events

Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade

City event from twitter providing complementary and corroborative evidence for fog reported by 511.org

*511.org 146

Page 85: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

147

Actionable Information in City Management

Tweets from a CityTraffic Sensor Data OSM Locations

SCRIBE ontology

511.org hierarchy

Web of Data

How issues in a city can be resolved?e.g., what should I do when I have fog condition?

Page 86: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

149

• Big Data is every where– at individual level and not just limited to

corporation – with growing complexity: multimodal, Physical-

Cyber-Social• Analysis is not sufficient• Bottom up techniques is not sufficient, need

top down processing, need background knowledge

Take Away

Page 87: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

150

Take Away

• Focus on Humans and Improve human life and experience with SMART Data.– Data to Information to Contextually Relevant

Abstractions– Actionable Information (Value from data) to assist

and support Human in decision making.

• Focus on Value -- SMART Data– Big Data Challenges without the intention of deriving

Value is a “Journey without GOAL”.

Page 88: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

153

thank you, and please visit us at

http://knoesis.org/vision

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

Smart Data

Page 89: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

Ohio Center of Excellence in Knowledge-enabled Computing

• Among top universities in the world in World Wide Web (cf: 5-yr impact, Microsoft Academic Search: shared 2nd place in Mar13)

• Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications

• Exceptional student success: internships and jobs at top salary (IBM Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )

• 100 researchers including 15 World Class faculty (>3K citations/faculty) and 45+ PhD students- practically all funded

• $2M+/yr research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)

Page 90: TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, Variety, and Velocity using Semantic Techniques and Technologies

155

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity

using semantics and the Semantic WebAmit Sheth, Kno.e.sis, Wright State University