data-driven application systems data science for dynamic...
TRANSCRIPT
Kirk Borne@KirkDBorne
Principal Data ScientistBooz Allen Hamilton, Strategic Innovation Group
http://www.boozallen.com/datascience
Data Science for Dynamic Data-Driven Application Systems in the Internet of Things (IoT)
Internet of Things
https://www.nsf.gov/news/news_images.jsp?cntn_id=122028
EverythingInterconnected
https://www.nsf.gov/news/news_images.jsp?cntn_id=122028
http://www.beechamresearch.com/article.aspx?id=4
http://www.datasciencecentral.com/profiles/blogs/what-is-the-internet-of-everything-ioe
1) The 3 E’s of an IoT World▪ Everywhere a sensor▪ Everything quantified and tracked (temporally & spatially)▪ Erosion of Data Privacy
2) The 3 V’s of IoT Data (Deep, Wide, Fast all the time)▪ (Volume) Deep Data = Ubiquitous Sensors everywhere▪ (Variety) Wide Data = Diverse and Complex Data Types▪ (Velocity) Fast Data = Streaming time series all the time
3) The 3 D2D’s of IoT ROI (Return On Innovation)▪ Data2Discovery –– Data2Decisions –– Data2Dollars
(or Data2Dividends)
Internet of Things (IoT) :so many challenges and benefits!
IoT Use Case examples■ Retail (Dynamic Pricing, Smart Supply Chain, Precision Demand Forecasting)■ Marketing (Personalized Real-time Ad Campaigns for Next Best Offer)■ Smart Highways (monitoring vehicles, weather, road conditions, closures)■ Precision Traffic (Self-driving & Self-parking Connected Cars)■ Smart Cities (Growth, Dynamic Street-lighting, Smart Energy Usage)■ Law Enforcement (Predictive, Prescriptive personnel & resource placements)■ Healthcare (Wearables, Personalized Medicine, Patient/Provider Monitoring)■ Online Education (Personalized Learning, Real-time interventions)■ Forests, Farms, Vineyards,… (Precision Planning, Nurturing, Harvesting)■ Financial / Banking / Insurance (Real-time Risk Mitigation, Fraud detection)■ Organizations (Smart Ergonomics, Improved Employee Workflow, Process
Mining for Efficiencies)■ Invisibles (under-the-skin smart sensors – not only measure, but also learn,
react, and proactively respond) = The Internet of Emotions!■ Machines (Early Warning, Prescriptive Maintenance, Smart Obsolescence,
M2M, IIoT = Industrial IoT)
■ Smart X■ Precision Y■ Personalized Z
http://blog.autoserviceintelligence.com/a-smart-crm-in-action-building-consumer-trust-part-2
In a nutshell … the XYZ of IoT:Intelligence at the edge of the network
(at the point of data collection)
The Internet of Things (IoT)is an interconnected universe
of Dynamic Data-Driven Application Systems (DDDAS)
https://www.nsf.gov/news/news_images.jsp?cntn_id=122028
The BIG Big Data Challenge in the IoT• Streaming Data Analytics:❖ Real-Time Event Mining for Actionable Intelligence : ❑ Identifying, characterizing, & responding to millions of events in real-time streaming data❑ Deciding which events (out of millions) need investigation and/or response (= Triage!)
• Web Analytics example:❑ Web Behavior Modeling and Automated System Response (from
online interactions & web browse patterns, behavioral analytics, user segmentation, data-driven discovery,…)
• Many other examples:❖ Health alerts (from EHRs and national health systems)❖ Tsunami alerts (from geo sensors everywhere)❖ Cybersecurity alerts (from network logs)❖ Social event alerts or early warnings (from social media)❖ Preventive Fraud alerts (from financial applications)❖ Predictive Maintenance alerts (from machine / engine sensors)❖ Infrastructure Monitoring alerts (from ubiquitous sensors) Ris
k M
itig
atio
n
Big Data and the fundamental business conflict:
RISK versus REWARD
http://www.telegraph.co.uk/news/worldnews/europe/russia/10061780/Russian-convicts-beat-Americans-in-cyber-chess-battle.html
Creating Rewards (Value) from Big Data in the IoT : The 3 D2D’so Knowledge Discovery
– Data-to-Discovery (D2D)o Data-driven Decision Support
– Data-to-Decisions (D2D)o Big ROI (Return On Innovation)
– Data-to-Dollars or Data-to-Dividends (D2D)– Innovative Applications of sense-making from
IoT sensors and sentinels everywhere
11
1) Class Discovery■ Finding new classes of objects, events, and behaviors■ Learning the rules that constrain class boundaries
2) Correlation (Predictive Power!) Discovery■ Finding patterns and dependencies, which reveal new
governing principles or behavioral patterns (the “DNA”)
3) Association (or Link) Discovery■ Finding unusual (improbable) co-occurring associations
4) Novelty (Surprise!) Discovery ■ Finding new, rare, one-in-a-[million / billion / trillion]
objects and events
Data Science = 4 Types of DiscoveryLearning from Data in the IoT
Consider 3 Case Studies:from the universe of IoT possibilities,
as metaphors for your IoT / DDDAS* challenges[*Dynamic Data-Driven Application Systems]
https://www.nsf.gov/news/news_images.jsp?cntn_id=122028
Case Study #1:Astronomy Big Data
The LSST (Large Synoptic Survey Telescope)14
Creating Rewards (Value) from Big Data in the IoT : The 3 D2D’s✓ Knowledge Discovery
– Data-to-Discovery (D2D)✓ Data-driven Decision Support
– Data-to-Decisions (D2D)✓ Big ROI (Return On Innovation)
– Data-to-Dollars or Data-to-Dividends (D2D)– Innovative Applications of sense-making from
IoT sensors and sentinels everywhere
15
LSST = Large
Synoptic Survey
Telescopehttp://www.lsst.org/
8.4-meter diameterprimary mirror =10 square degrees!
Hello !
(mirror funded by private donors)
16
LSST = Large
Synoptic Survey
Telescopehttp://www.lsst.org/
8.4-meter diameterprimary mirror =10 square degrees!
Hello !
(mirror funded by private donors)
17
Construction began August 2014
(funded by NSF and DOE)
LSST = Large
Synoptic Survey
Telescopehttp://www.lsst.org/
8.4-meter diameterprimary mirror =10 square degrees!
Hello !
(mirror funded by private donors)
18
– 100-200 Petabyte image archive– 20-40 Petabyte database catalog
LSST in time and space:– When? ~2022-2032– Where? Cerro Pachon, Chile
LSST Key Science Drivers: Mapping the Dynamic Universe– Complete inventory of the Solar System (Near-Earth Objects; killer asteroids???)– Nature of Dark Energy (Cosmology; Supernovae at edge of the known Universe)– Optical transients (10 million daily event notifications sent within 60 seconds)– Digital Milky Way (Dark Matter; Locations and velocities of 20 billion stars!)
Architect’s designof LSST Observatory
19
LSST Summaryhttp://www.lsst.org/
• 3-Gigapixel camera• One 6-Gigabyte image every 20 seconds• 30 Terabytes every night for 10 years• Repeat images of the entire night sky every
3 nights: Celestial Cinematography• 100-200 Petabyte final image data archive
anticipated – all data are public!!!• 20-40 Petabyte final database catalog
anticipated• Real-Time Event Mining: ~10 million events
per night, every night, for 10 years!– Follow-up observations required to classify these– Which ones should we follow up? … TRIAGE!
… Decisions! Decisions! ( = D2D !)20
LSST Database = massive complexity challengehttp://www.lsst.org/
• ~30-Petabyte final database catalog anticipated• ~30 trillion rows (astronomical sources), 200-300 columns• How can we find descriptive and predictive models of
trillions of astrophysical sources? …• … Soft10ware.com’s fast statistical modeling technology
offers an excellent approach:– Non-parametric, multi-model generation– Rapid model evaluation & ranking
21
Case Study #2:
Mars Rovers
22
Creating Rewards (Value) from Big Data in the IoT : The 3 D2D’so Knowledge Discovery
– Data-to-Discovery (D2D)o Data-driven Decision Support
– Data-to-Decisions (D2D)o Big ROI (Return On Innovation)
– Data-to-Dollars or Data-to-Dividends (D2D)– Innovative Applications of sense-making from
IoT sensors and sentinels everywhere
23
Mars Rover: intelligent data-gatherer, mobile data mining agent, and autonomous decision-support system
Rove around the surface of Mars and take samples of rocks (experimental data type: mass spectroscopy = data histogram = feature vector)
Intelligent Data Operations in Action:• Classification (assign rocks to known classes)• Supervised Learning (search for rocks with known compositions)• Unsupervised Learning (discover what types of rocks are present,
without preconceived biases)• Clustering (find the set of unique classes of rocks)• Association Mining (find unusual associations)• Deviation/Outlier Detection (one-of-kind; interesting?) • On-board Intelligent Data Understanding & Decision Support Systems
(Fuzzy Logic & Decision Trees & Cased-Based Reasoning ) = = Science Goal Monitoring :
– “stay here and do more” ; or else “follow trend to most interesting location” – “send results to Earth immediately” ; or “send results later”
24
Smart Sensors & Sentinels for Data-DrivenSense-Making and Decision Support
• New knowledge and insights are acquired by monitoring and mining actionable data from all digital inputs (Sensors!)
• Alerts are triggered autonomously, without intervention (if permitted), applying machine learning and actionable business decision rules for pattern detection and diagnosis. (Sentinels!)
• “Smart Sensors” (powered by Machine Learning-enabled sentinels) deliver actionable intelligence (Sense!)
http://legacy.samsi.info/200506/astro/presentations/tut1loredo-7.pdf
From Sensors to Sentinels to Sense (for any application domain with streaming data from sensors)
The MIPS Architecture Frameworkfor Dynamic Data-Driven Application Systems (DDDAS)• MIPS =
– Measurement – Inference – Prediction – Steering• This applies to any Network of Sensors:
– Web user interactions & actions (web analytics data), Cyber network usage logs, Social network sentiment, Machine logs (of any kind), Manufacturing sensors, Health & Epidemic monitoring systems, Financial transactions, National Security, Utilities and Energy, Remote Sensing, Tsunami warnings, Weather/Climate events, Astronomical sky events, …
• Machine Learning enables the “IP” part of MIPS:– Autonomous (or semi-autonomous) Classification– Intelligent Data Understanding– Rule-based– Model-based– Neural Networks– Markov Models– Bayes Inference Engines
Alert & Response systems:•Actionable insights from streaming business data•Automation of any data-driven operational system
http://dddas.org
The MIPS Architecture Frameworkfor Dynamic Data-Driven Application Systems (DDDAS)• MIPS =
– Measurement – Inference – Prediction – Steering• This applies to any Network of Sensors:
– Web user interactions & actions (web analytics data), Cyber network usage logs, Social network sentiment, Machine logs (of any kind), Manufacturing sensors, Health & Epidemic monitoring systems, Financial transactions, National Security, Utilities and Energy, Remote Sensing, Tsunami warnings, Weather/Climate events, Astronomical sky events, …
• Machine Learning enables the “IP” part of MIPS:– Autonomous (or semi-autonomous) Classification– Intelligent Data Understanding– Rule-based– Model-based– Neural Networks– Markov Models– Bayes Inference Engines
http://dddas.org
From Sensors to Sentinels to Sense
Alert & Response systems:•Actionable insights from streaming business data•Automation of any data-driven operational system
Big Data Analytics in the IoT:Take Data to Information to Knowledge to Insights (and Action!)in the Internet of Things using the MIPS Framework for DDDAS
✓ From Sensors (Measurement & Data Collection)…
✓ … to Sentinels (Monitoring & Alerts) …
✓ … to Sense-making (Data Science) …
✓ … to Cents-making (Business ROI)… Actionizing and Productizing your Big Data
28
Case Study #3:
Digital Marketing Analytics
29
http://www.webtwit.com/digital-marketing-company-india.html
Creating Rewards (Value) from Big Data in the IoT : The 3 D2D’so Knowledge Discovery
– Data-to-Discovery (D2D)o Data-driven Decision Support
– Data-to-Decisions (D2D)o Big ROI (Return On Innovation)
– Data-to-Dollars or Data-to-Dividends (D2D)– Innovative Applications of sense-making from
IoT sensors and sentinels everywhere
30
Digital Marketing Analytics – on fast streaming big data…… from Devices… … Intentions…
… Location, weather, and other geographic attributes… … Demographics…
31© SYNTASA.com 2015
AutomatingAnalytics
as-as-Service
• Based on the SYNTASA.com Marketing Analytics-as-a-ServiceTM (MAaaS)
• Your own “Smart Sentinel (Mars Rover) in a box”– Your business rules determine the goals, decision points, alerts, and responses.– Moving beyond historical hindsight and oversight (Descriptive & Diagnostic
Analytics) to new world of insight and foresight (Predictive & Prescriptive AaaS), eventually achieving right sight (Cognitive Analytics = the 360 view, enabling the right action, for right customer, at right place, at right time, in right context).
• Mining multi-channel big data streams (across your organization)• Target Marketing, Personalization, and Segmentation (“segment of one”)• Decision Automation in a rich content (Big Data) environment
32Based on Marketing Analytics-as-a-ServiceTM (MAaaS) from http://www.syntasa.com/
DomainModeling& EventResponse
Summary
33
http://art-of-stories.com/writing-tools-narrative-summary/
Data Science for Dynamic Data-Driven Application Systems in the Internet of Things
• Learning from data (Data Science):– Clustering (= New Class discovery, Segmentation)– Correlation & Association (Link) discovery– Classification, Diagnosis (Predictive power discovery)– Outlier / Anomaly / Novelty / Surprise discovery
• … for D2D in IoT big data:– Data-to-Discoveries– Data-to-Decisions– Data-to-Dividends
(big ROI = Return on Innovation)http://www.boozallen.com/datascience
http://www.dataev.com/it-experts-blog/bid/297713/The-Big-Data-Challenges-of-a-Biotechnology-Startup-Company