transforming big data into smart data
TRANSCRIPT
1
Ohio Center of Excellence in Knowledge-enabled Computing
• Shares 2nd position among all universities in the world in World Wide Web (cf: 5-yr impact, Microsoft Academic Search)
• Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )
• 100 researchers including 15 World Class faculty (>3K citations/faculty) and 45+ PhD students- practically all funded
• $2M+/yr research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)
2011
How much data?
48 (2013)
500 (2013)
3 http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
1% of the data is
used for analysis.
4 http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
Variety
Semi structured
5
Velocity
Fast Data
Rapid Changes
Real-Time/Stream Analysis
Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 6
• Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare
– Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns
– Emphasis on technologies to handle volume/scale, and to lesser extent velocity: Hadoop, NoSQL,MPP warehouse ….
– Full faith in the power of data (no hypothesis), bottom up analysis
7
Current Focus on Big Data
• What if your data volume gets so large and varied you don't know how to deal with it?
• Do you store all your data?
• Do you analyze it all?
• How can you find out which data points are really important?
• How can you use it to your best advantage?
8
Questions typically asked on Big Data
http://www.sas.com/big-data/
http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/
Variety of Data Analytics Enablers
9
• Prediction of the spread of flu in real time during H1N1 2009 – Google tested a mammoth of 450 million different mathematical
models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds
– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• NY city manholes problem [ICML Discussion, 2012]
10
Illustrative Big Data Applications
• Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination, personalized smart energy) that are highly personalized/individualized/contextualized – Incorporate real-world complexity: multi-modal and multi-sensory nature
of real-world and human perception – Need deeper understanding of data and its role to information (e.g., skew,
coverage)
• Human involvement and guidance: Leading to actionable
information, understanding and insight right in the context of human activities – Bottom-up & Top-down processing: Infusion of models and background
knowledge (data + knowledge + reasoning)
11
What is missing?
Makes Sense
Actionable or help decision support/making
12
13
Before Definition – A short recap of SMART DATA
2004-2005
Notice the formulation of Smart Data strategy providing services for Search, Explore, Notify
14
Semagix – A short recap of SMART DATA
Use of Ontologies and Data
repositories to gain relevant
insights
Smart Data
Smart data makes sense out of Big data
It provides value from harnessing the challenges posed by volume, velocity, variety and veracity
of big data, in-turn providing actionable information and improve decision making.
15
“OF human, BY human and FOR human”
Smart data is focused on the actionable value achieved by human involvement in data
creation, processing and consumption phases for improving
the human experience.
Another perspective on Smart Data
16
“OF human, BY human and FOR human”
Another perspective on Smart Data
18
Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 19
‘OF human’ : Relevant Real-time Data Streams for Human Experience
“OF human, BY human and FOR human”
20
Another perspective on Smart Data
Use of Prior Human-created Knowledge Models
21
‘BY human’: Involving Crowd Intelligence in data processing workflows
Crowdsourcing and Domain-expert guided Machine Learning Modeling
“OF human, BY human and FOR human”
Another perspective on Smart Data
22
Detection of events, such as wheezing
sound, indoor temperature, humidity,
dust, and CO2 level
Weather Application
Asthma Healthcare Application
Close the window at home during day to avoid CO2 in
gush, to avoid asthma attacks at night
23
‘FOR human’ : Improving Human Experience
Population Level
Personal
Public Health
Action in the Physical World
Electricity usage over a day, device at
work, power consumption, cost/kWh,
heat index, relative humidity, and public
events from social stream
Weather Application
Power Monitoring Application
24
‘FOR human’ : Improving Human Experience
Population Level Observations
Personal Level Observations
Action in the Physical World
Washing and drying has
resulted in significant cost
since it was done during peak
load period. Consider
changing this time to night.
25
Why do we care about Smart Data
rather than Big Data?
Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity
using semantics and Semantic Web
Put Knoesis Banner
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA
Pavan Kapanipathi
Pramod Anantharam
Amit Sheth
Cory Henson
Dr. T.K. Prasad
Maryam Panahiazar
Contributions by many, but Special Thanks to:
Hemant Purohit
Second-costliest hurricane in United States history estimated damage $75 billion
90-115 mph winds
State of Emergency in New York
285 people killed on the track of Sandy
750,000 without power (NY)
Immense devastation and Human suffering
27
Big Data to Smart Data: Disaster Management example
http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
20 million tweets with “sandy, hurricane” keywords between Oct 27th and Nov 1st
2nd most popular topic on Facebook during 2012
Social (Big) Data during Hurricane Sandy
28
• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding
• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html
• http://mashable.com/2012/10/31/hurricane-sandy-facebook/
For information seeking
For timely information
For unique information
For unfiltered information
To determine disaster magnitude
To check in with family and friends
To self-mobilize
To maintain a sense of community
To seek emotional support and healing
Governments
Emergency management organizations
Journalists
Disaster responders
Public
BIG DATA TO SMART DATA: WHY? and FOR WHOM?
29
Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.
Improving situational awareness - Timely delivery of necessary information to the right people
Improving coordination between resource seekers and suppliers
Detecting the magnitude of disaster by people sentiments.
Many more challenges…
Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data)
30
http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
Volume
Twitter hits half a billion tweets a day!
Challenges
Delivering the necessary actionable/information to the right people
31 http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
Velocity
Volume
@ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone?
Challenges
Delivering the necessary/actionable information to the right people
Rate of Data Arrival Approximately 7000 TPS 10 images per second on instagram
32
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
Velocity
Variety
Volume
Semi Structured
Structured
Unstructured
Sensors Linked Open Data
Wikipedia
Challenges
Delivering the necessary/actionable information to the right people
33
Velocity
Variety
Veracity
Volume
Challenges
Delivering the necessary/actionable information to the right people
34 http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
Velocity
Variety
Veracity
Volume
35
Value
-Make Sense -Actionable Information -Decision support/making
Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 36
Smart Data focuses on the
value
Value
-Make Sense -Actionable Information -Decision support/making
Disaster Management
Victims
Timely and Contextual Information about • Electricity, Food, Water, Shelter and
donation offers related to the disaster. Data
http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 37
Descriptive Exploratory Inferential Predictive
Causal
Human Centric Computing
Improved Analytics Creation
Processing
Experience
38
• Healthcare – kHealth
– SemHeath
• Social event coordination – Twitris
• Traffic monitoring – kTraffic
39
Applications of Smart Data Analytics
The Patient of the Future MIT Technology Review, 2012
http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 40
To gain new insight in patient care & early indications of disease
41
Smart Data in Healthcare
Sensing is a key enabler of the Internet of Things
BUT, how do we make sense of the resulting avalanche of sensor data?
50 Billion Things by 2020 (Cisco)
42
Parkinson’s disease (PD) data from The Michael J. Fox Foundation
for Parkinson’s Research.
43
1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data
8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data).
Variety Volume
Veracity Velocity
Value Can we detect the onset of Parkinson’s disease? Can we characterize the disease progression? Can we provide actionable information to the patient?
sem
anti
cs
Representing prior knowledge of PD led to a focused exploration of this massive dataset
WHY Big Data to Smart Data: Healthcare example
44
Big Data to Smart Data Using a Knowledge Based Approach
ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person) ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person) ParkinsonAdvanced(person) = Fall(person)
Control Group PD Patients
Movements of an active person has a good
distribution over X, Y, and Z axis
Restricted movements by a PD patient can be seen
in the acceleration readings
Audio is well modulated with good variations in the energy of the voice
Audio is not well modulated represented a
monotone speech
Declarative Knowledge of Parkinson’s Disease used to focus
our attention on symptom manifestations in sensor
observations
• 25 million people in the U.S. are diagnosed with asthma (7 million are children)1.
• 300 million people suffering from asthma worldwide2.
• Asthma related healthcare costs alone are around $50 billion a year2.
• 155,000 hospital admissions and 593,000 emergency department visits in 20063.
45
1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/ 2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.
Asthma: Severity of the problem
46
Patient Health Score (diagnostic)
Semantic Perception and risk assessment algorithms can transform raw data (hard to comprehend) to abstractions (e.g., Patient Health is 3 on a scale of 5) that is
intuitively understandable and valuable for decision makers.
Having health score for various patients will allow efficient utilization of a decision maker’s precious attention
Risk assessment model
Semantic Perception
Population health record
Personal health record
Expert opinion
Clinical research
Clinical decision support
47
Patient Vulnerability Score (prognostic)
The Clinical Decision Support systems such as EMR alert system in its current state follows the high recall philosophy by reporting every
possible alert!
Doctors need actionable information and not the deluge of alerts to make timely and important decisions. Providing a vulnerability score would
facilitate right use of Doctor’s time to investigate further on vulnerabilities.
Risk assessment model
Semantic Perception
Population health record
Personal health record
Expert opinion
Clinical research
Clinical decision support
48
Value: Patient Context
How could Smart Data help?
49
Data Overload for Patients/health aficionados
Providing actionable information in a timely manner is crucial to avoid information overload or fatigue
Sleep data Community data
Personal Schedule Activity data
Personal health records
50
Optimizing Cost, Benefit, and Preferences
Algorithms on the patient side should consider all the health signals and provide actionable and timely information for informed decision making
What are the reasons for my increasing weight? What should I consider before I get a kidney transplant?
Semantic Perception
Personalized optimization
Personalized recommendation
Img: http://marloncarvallovillae.blogspot.com/2011_02_01_archive.html http://www.1800timeclocks.com/icon-time-systems/icon-time-upgrades/icon-time-advanced-pack-upgrade-sb100-pro/
Sleep data
Community data
Personal Schedule
Activity data
Personal health records
51
3.4 billion people will have smartphones or tablets by 2017 -- Research2Guidance
“Intelligence at the Edges” of Digital Health
http://www.digikey.com/us/en/techzone/energy-harvesting/resources/articles/zigbees-smart-energy-20-profile.html
m-health app market is predicted to reach $26 billion in 2017 -- Research2Guidance
Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels.
52
Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.
Variety Volume
Veracity Velocity
Value
Can we detect the asthma severity level? Can we characterize asthma control level? What risk factors influence asthma control? What is the contribution of each risk factor?
sem
anti
cs
Understanding relationships between health signals and asthma attacks for providing actionable information
WHY Big Data to Smart Data: Healthcare example
53
Population Level
Personal
Public Health
Variety: Health signals span heterogeneous sources Volume: Health signals are fine grained Velocity: Real-time change in situations Veracity: Reliability of health signals may be compromised
Value: Can I reduce my asthma attacks at night?
Decision support to doctors by providing them with
deeper insights into patient asthma care
Asthma: Demonstration of Value
54
Sensordrone – for monitoring environmental air quality
Wheezometer – for monitoring wheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers? What is the wheezing level?
What is the propensity toward asthma?
What is the exposure level over a day?
What is the air quality indoors?
Commute to Work
Personal
Public Health
Population Level
Closing the window at home in the morning and taking an alternate route to office may
lead to reduced asthma attacks
Actionable Information
Asthma: Actionable Information for Asthma Patients
Personal, Public Health, and Population Level Signals for Monitoring Asthma
Asthma Control => Daily Medication Choices for starting
therapy
Not Well Controlled Poor Controlled
Severity Level of Asthma
(Recommended Action) (Recommended Action) (Recommended Action)
Intermittent Asthma SABA prn - -
Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS
Moderate Persistent
Asthma
Medium dose ICS alone
Or with LABA/montelukast
Medium ICS +
LABA/Montelukast Or High dose ICS
Medium ICS +
LABA/Montelukast Or High dose ICS*
Severe Persistent Asthma High dose ICS with LABA/montelukast
Needs specialist care Needs specialist care
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist
Asthma Control and Actionable Information
Sensors and their observations for understanding asthma
55
56
Personal Level Signals
Societal Level Signals
(Personal Level Signals)
(Personalized Societal Level Signal)
(Societal Level Signals)
Societal Level Signals Relevant to the Personal Level
Personal Level Sensors
(kHealth**) (EventShop*)
Qualify Quantify Action
Recommendation
What are the features influencing my asthma?
What is the contribution of each of these features?
How controlled is my asthma? (risk score)
What will be my action plan to manage asthma?
Storage
Societal Level Sensors
Asthma Early Warning Model (AEWM)
Query AEWM
Verify & augment
domain knowledge
Recommended
Action
Action
Justification
Asthma Early Warning Model
*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
57
Population Level
Personal
Wheeze – Yes Do you have tightness of chest? –Yes
Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,
Activity, Wheezing, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
.
.
.
Expert
Knowledge
Background
Knowledge
tweet reporting pollution level
and asthma attacks
Acceleration readings from
on-phone sensors
Sensor and personal
observations Signals from personal, personal
spaces, and community spaces
Risk Category assigned by
doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Health Signal Extraction to Understanding
Well Controlled - continue Not Well Controlled – contact nurse Poor Controlled – contact doctor
58
RDF OWL
How are machines supposed to integrate and interpret sensor data?
Semantic Sensor Networks (SSN)
59
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
60
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
61
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
62
Semantic Annotation of SWE
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
… and do it efficiently and at scale
Next: What if we could automate the sense making ability?
63
People are good at making sense of sensory input
What can we learn from cognitive models of perception? • The key ingredient is prior knowledge
64
* based on Neisser’s cognitive model of perception
Observe Property
Perceive Feature
Explanation
Discrimination
1
2
Perception Cycle*
Translating low-level signals into high-level knowledge
Focusing attention on those aspects of the environment that provide useful information
Prior Knowledge
65
To enable machine perception,
Semantic Web technology is used to integrate sensor data with prior knowledge on the Web
66
Prior knowledge on the Web
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
67
Prior knowledge on the Web
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
68
Observe Property
Perceive Feature
Explanation 1
Translating low-level signals into high-level knowledge
Explanation
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
69
Explanation
Inference to the best explanation • In general, explanation is an abductive problem; and
hard to compute Finding the sweet spot between abduction and OWL • Single-feature assumption* enables use of OWL-DL
deductive reasoner * An explanation must be a single feature which accounts for all observed properties
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
70
Explanation
Explanatory Feature: a feature that explains the set of observed properties
ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
71
Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features
Observe Property
Perceive Feature
Explanation
Discrimination 2
Focusing attention on those aspects of the environment that provide useful information
Discrimination
72
Discrimination
Expected Property: would be explained by every explanatory feature
ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
73
Discrimination
Not Applicable Property: would not be explained by any explanatory feature
NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Not Applicable Property Explanatory Feature
74
Discrimination
Discriminating Property: is neither expected nor not-applicable
DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Discriminating Property Explanatory Feature
75
Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information
canary in a coal mine
Our Motivation
kHealth: knowledge-enabled healthcare
76
Qualities -High BP -Increased Weight
Entities -Hypertension -Hypothyroidism
kHealth
Machine Sensors
Personal Input
EMR/PHR
Comorbidity risk score e.g., Charlson Index
Longitudinal studies of cardiovascular risks
- Find correlations - Validation - domain knowledge - domain expert
Parameterize the model
Risk Assessment Model
Current Observations -Physical -Physiological -History
Risk Score (Actionable Information)
Model Creation Validate correlations
Historical observations of each patient
Risk Score: from Data to Abstraction and Actionable Information
77
How do we implement machine perception efficiently on a resource-constrained device?
Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time • Runs out of resources with prior knowledge >> 15 nodes • Asymptotic complexity: O(n3)
78
intelligence at the edge
Approach 1: Send all sensor observations to the cloud for processing
Approach 2: downscale semantic processing so that each device is capable of machine perception
79
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
Efficient execution of machine perception
Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning
0101100011010011110010101100011011011010110001101001111001010110001101011000110100111
80
O(n3) < x < O(n4) O(n)
Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear
Evaluation on a mobile device
81
2 Prior knowledge is the key to perception
Using SW technologies, machine perception can be formalized and
integrated with prior knowledge on the Web
3 Intelligence at the edge By downscaling semantic inference, machine perception can
execute efficiently on resource-constrained devices
Semantic Perception for smarter analytics: 3 ideas to takeaway
1 Translate low-level data to high-level knowledge
Machine perception can be used to convert low-level sensory
signals into high-level knowledge useful for decision making
82
• Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg
• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4
83
Demos
84
Smart Data in Social Media Analytics
To Understand the human social dynamics in real world events
0.5B Tweets per day
0.5B Users
60% on Mobile
5530 Tweets per second related to the Japan earthquake and tsunami
17000 Tweets per second
85
Twitter During Real-world Events of Interest
http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/ http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitter http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
86 http://usatoday30.usatoday.com/news/politics/twitter-election-meter
http://twitris.knoesis.org/
State of the Art – Uni/Bi Dimensional Analysis During Elections
Topics
Sentiments
87
Twitris’ Dimensions of Integrated Semantic Analysis
88 Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
89 http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249 http://semanticweb.com/election-2012-the-semantic-recap_b33278
90
[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]
/t
91
Twitris: Sentiment Analysis- Smart Answers with reasoning!
How was Obama doing in the first debate?
92
Red Color: Negative Topics Green Color: Positive Topics
Twitris: Sentiment Analysis- Smart Answers with reasoning!
How was Obama doing in the second debate?
SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)
http://knoesis.wright.edu/library/resource.php?id=1787
Top 100 influential users that talks about Barack Obama
Positive or Negative Influence
Twitris: Network Analysis
SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS!
Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole
network of voters (ACTION)? 93
Twitris: Community Evolution
SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS!
Romney
Obama
Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates
Before 1st debate
After 1st debate
After Hurricane Sandy
After 3rd debate
94
The Dead People mentioned in the event OWC
Twitris: Impact of Background Knowledge
95
How People from Different parts of the world talked
about US Election
Images and Videos Related to US Election
Twitris: Analysis by Location
96
What is Smart Data in the context of Disaster Management
ACTIONABLE: Timely delivery of right resources and information to the right people at right location!
97
Because everyone wants to Help, but DON’T KNOW HOW!
Join us for the Social Good! http://twitris.knoesis.org
RT @OpOKRelief: Southgate Baptist Church
on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call 794
Text \"FOOD\" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10
in storm relief. #moore #oklahoma
#disasterrelief #donate
Want to help animals in #Oklahoma? @ASPCA tells
how you can help: http://t.co/mt8l9PwzmO
CITIZEN SENSORS
RESPONSE TEAMS (including humanitarian
org. and ‘pseudo’ responders)
VICTIM SITE
Coordination of needs and offers
Using Social Media Does anyone
know where to send a check to donate to the
tornado victims?
Where do I go to help out for volunteer work around Moore? Anyone know?
Anyone know where to donate
to help the animals from the
Oklahoma disaster? #oklah
oma #dogs
Matched
Matched
Matched
Serving the need!
If you would like to volunteer today, help is desperately needed in Shawnee. Call 273-5331 for more info
http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-11531612 98 Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress: with QCRI
Smart Data from Twitris system for Disaster Response Coordination
Which are the primary locations with most negative sentiments/emotions?
Who are all the people to engage with for better information
diffusion? Which are the most important organizations acting at my
location?
Smart data provides actionable information and improve decision making through
semantic analysis of Big Data.
Who are the resource seekers and suppliers? How can one donate?
99
Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 100
Disaster Response Coordination Framework
Disaster Response Coordination: Twitris Summary for Actionable Nuggets
101
Important tags to summarize Big Data flow
Related to Oklahoma tornado
Images and Videos Related to Oklahoma tornado
102
Disaster Response Coordination: Twitris Real-time information for needs
Incoming Tweets with need types to give quick idea of what is needed and where
currently #OKC
Legends for Different needs #OKC
(It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
103
Disaster Response Coordination: Influencers to engage with for specific needs
Influential users are respective needs and their interaction
network on the right.
Really sparse Signal to Noise: • 2M tweets during the first week after #Oklahoma-tornado-2013
- 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help
104
• Anyone know how to get involved to help the tornado victims in Oklahoma??\#tornado #oklahomacity (OFFER)
• I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)
Disaster Response Coordination: Finding Actionable Nuggets for Responders to act
• Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST)
• Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST)
For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?
• Features driven by the experience of domain experts at the responder organizations
• Examples, – ‘I want to <donate/ help/ bring>’ for extraction of offering intention
– ‘tent house’ OR ‘cots’ for shelter need types
105
Disaster Response Coordination: Human Knowledge to drive information extraction
• A knowledge-driven approach – A rich inventory of metadata for tweets
– Semantic matching for
needs (query) vs. offers (documents)
• Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send
to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)
– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)
106
Disaster Response Coordination: Automatic Matching of needs and offers
Matching the competitive intentions
(Needs and Offers) can offload humans for the
task of resource matchmaking for
coordination.
107
Disaster Response Coordination: Engagement Interface for responders
What-Where-How-Who-Why Coordination
Influential users to engage with and resources for
seekers/supplies at a location, at a timestamp
Contextual Information for a
chosen topical tags
• Illustrious scenario: #Oklahoma-tornado 2013
108
Disaster Response Coordination: Anecdote for the value of Smart Data
FEMA asked us to quickly filter out gas-leak related data
Mining the data for smart nuggets to inform FEMA (Timely needs)
Engaged with the author of this information to confirm (Veracity)
e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37)
Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders) e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)
An event is a dynamic topic that evolves and
might later fork into several distinct events.
Smart Data analytics to capture rapidly evolving social data events
109
Social Media is the pulse of the populace, a true reflection of
events all over the globe!
Continuous Semantics
110
Dynamic Model Creation:
112
Example of how background knowledge help understand situation described in the tweets, while
also updating knowledge model also
How is Continuous Semantics a form of Smart Data Analytics?
Keeping the Background Knowledge abreast with the changes of the event
Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data)
In-turn providing temporally relevant Smart Data through analysis
113
114
Smart Data Analytics in Traffic Management
To improve the everyday life entangled due to our most common problem of sticking in traffic
By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1
1The Crisis of Public Transport in India 2IBM Smarter Traffic
Modes of transportation in Indian Cities
Texas Transportation Institute (TTI) Congestion report in U.S.
115
Severity of the Traffic Problem
Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual)
116
http://511.org/
Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.
Variety Volume
Veracity Velocity
Value Can we detect the onset of traffic congestion? Can we characterize traffic congestion based on events? Can we provide actionable information to decision makers?
sem
anti
cs
Representing prior knowledge of traffic lead to a focused exploration of this massive dataset
Big Data to Smart Data: Traffic Management example
Slow moving traffic
Link Description
Scheduled Event
Scheduled Event
511.org
511.org
Schedule Information
511.org
Traffic Monitoring
117
Heterogeneity in a Physical-Cyber-Social System
118
Heterogeneity in a Physical-Cyber-Social System
• Observation: Slow Moving Traffic
• Multiple Causes (Uncertain about the cause): – Scheduled Events: music events, fair, theatre events, concerts, road
work, repairs, etc.
– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.
– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm
• Each of these events may have a varying impact on traffic.
• A delay prediction algorithm should process multimodal and multi-sensory observations.
Uncertainty in a Physical-Cyber-Social System
119
• Internal observations
– Speed, volume, and travel time observations
– Correlations may exist between these variables across different parts of the network
• External events
– Accident, music event, sporting event, and planned events
– External events and internal observations may exhibit correlations
Modeling Traffic Events
120
Accident
Music event
Sporting event
Road Work
Theatre event
External events <ActiveEvents, ScheduledEvents>
Internal observations <speed, volume, traveTime>
Weather
Time of Day
Modeling Traffic Events
121
Domain Experts
cold
PoorVisibility
SlowTraffic
IcyRoad
Declarative domain knowledge
Causal knowledge
Linked Open Data
Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)
1 0 1 1
1 1 1 0
1 1 1 1
1 0 1 0
Domain Observations
Domain Knowledge
Structure and parameters
Complementing Probabilistic Models with Declarative Knowledge
123
Correlations to causations using Declarative knowledge on the Semantic Web
• Declarative knowledge about various domains are increasingly being published on the web1,2.
• Declarative knowledge describes concepts and relationships in a domain (structure).
• Linked Open Data may be used to derive priors probability of events (parameters).
• Explored the use declarative knowledge for structure using ConceptNet 5.
1http://conceptnet5.media.mit.edu/ 2http://linkeddata.org/
Domain Knowledge
124
http://conceptnet5.media.mit.edu/web/c/en/traffic_jam
Delay
go to baseball game
traffic jam
traffic accident
traffic jam
ActiveEvent
ScheduledEvent
Causes traffic jam
Causes traffic jam
CapableOf slow traffic
CapableOf occur twice each day
Causes
is_a
bad weather CapableOf
slow traffic
road ice Causes
accident
TimeOfDay
go to concert HasSubevent
car crash
accident RelatedTo
car crash
BadWeather
Causes
Causes
is_a is_a
is_a is_a is_a
is_a
is_a
ConceptNet 5
125
Traffic jam
Link Description
Scheduled Event
traffic jam baseball game
Add missing random variables
Time of day
bad weather CapableOf slow traffic
bad weather
Traffic data from sensors deployed on road network in San Francisco Bay Area
time of day
traffic jam baseball game time of day
slow traffic
Three Operations: Complementing graphical model structure extraction
Add missing links
bad weather
traffic jam baseball game time of day
slow traffic
Add link direction
bad weather
traffic jam baseball game time of day
slow traffic
go to baseball game Causes traffic jam
Knowledge from ConceptNet5
traffic jam CapableOfoccur twice each day traffic jam CapableOf slow traffic
126
127
Scheduled Event
Active Event
Day of week Time of day
delay
Travel time
speed
volume
Structure extracted form traffic observations (sensors + textual) using statistical techniques
Scheduled Event
Active Event
Day of week
Time of day
delay Travel time
speed
volume
Bad Weather
Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions
Enriched Probabilistic Models using ConceptNet 5
Take Away
• It is all about the human – not computing, not device – Computing for human experience
• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!): – Of Human, By Human, For Human
– But in serving human needs, there is a lot more than
what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions
129
Acknowledgements
• Kno.e.sis team
• Funds: NSF, NIH, AFRL, Industry…
• Note:
• For images and sources, if not on slides, please see slide notes
• Some images were taken from the Web Search results and all such images belong to their respective owners, we are grateful to the owners for usefulness of these images in our context.
130
• OpenSource: http://knoesis.org/opensource
• Showcase: http://knoesis.org/showcase
• Vision: http://knoesis.org/node/266
• Publications: http://knoesis.org/library
131
References and Further Readings
Thanks …
132
133
Physical Cyber Social Computing
Amit Sheth, Kno.e.sis, Wright State
Amit Sheth’s PHD students
Ashutosh Jadhav
Hemant Purohit
Vinh Nguyen
Lu Chen
Pavan Kapanipathi
Pramod Anantharam
Sujan Perera
Alan Smith
Pramod Koneru
Maryam Panahiazar
Sarasi Lalithsena
Cory Henson
Kalpa Gunaratna
Delroy Cameron
Sanjaya Wijeratne
Wenbo Wang
Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
135
thank you, and please visit us at
http://knoesis.org/vision
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio, USA
Smart Data