what is big data analytics and why should i care? is big... · 2016-09-28 · the 3 v’s of big...

104
Kirk Borne, Principal Data Scientist, Booz Allen Hamilton What is Big Data Analytics and Why Should I Care? Crystal City, VA September 30, 2016

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Instructor(s) Name Kirk Borne, Principal Data Scientist, Booz Allen Hamilton

What is Big Data Analytics and Why Should I Care?

Crystal City, VA

September 30, 2016

Page 2: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Workshop in Two Parts

Part 1: Big Data Analytics

Part 2: Going for the Gold

2

Page 3: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Outline for Part 1: Big Data Analytics

How did we arrive at Big Data?

Domains of Data

Data Science, ML, and Analytics

The Internet of Things = hyper-Big Data

Simple Applications Demonstrated

3

Page 4: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Outline for Part 1: Big Data Analytics

How did we arrive at Big Data?

Domains of Data

Data Science, ML, and Analytics

The Internet of Things = hyper-Big Data

Simple Applications Demonstrated

4

Page 5: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Ever since we first explored our world…

5 Source for graphic: http://www.livescience.com/27663-seven-seas.html

Page 6: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

… We have asked questions about everything around us.

6 Source for graphic: https://jefflynchdev.wordpress.com/tag/adobe-photoshop-lightroom-3/page/5/

Page 7: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The result is…

As we collect evidence (data) to answer our

questions, the data leads to more questions, etc…

7

Source for graphic: http://www.airshipman.com/use-people-to-do-your-advertising/

Page 8: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The result is…

As we collect evidence (data) to answer our

questions, the data leads to more questions, etc…

which leads to BIG DATA!

8

Source: https://www.andertoons.com/data/cartoon/7468/the-data-weve-gathered-discussions-about-big-data-are-up-72

Page 9: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Exponential & Combinatorial Growth (all numbers quoted here are from circa 2014)

9

16 BILLION DVDs needed to store the internet traffic

generated in a single hour, a stack 3x the

height of Mount Everest.

150 BILLION Emails sent every day, up to 70% of which

are spam.

33 PERCENT of children born in the United States

have an online presence prior to birth.

100 HOURS of video uploaded to YouTube every minute

= 16 years of content each day.

30 BILLION pieces of content shared monthly

on Facebook.

300 MILLION photos updated to Facebook daily, nearly

20 times larger than all the photos in the

Library of Congress.

2.4 BILLION tweets every 72 hours from more than 550

million active users.

Page 10: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Defining Big Data

We collect evidence (data) to answer our questions about the world around us … How? Why? What if?

– … and that is how we end up in a world of BIG DATA!

Big Data refers to data collections in which “everything is now being quantified and tracked” (= full-population samples of everything = The End of Demographics!)

– Examples: Social networks (Twitter, YouTube), search & online histories, web logs, financial and e-commerce transactions, environment & health monitors (wearable devices, EHRs), IoT, Astronomy,…

– Huge quantities of data are now being used everywhere.

10

Source for graphic: http://hinalockim.blogspot.com/2012/08/6th-week-cognitive-learning.html

Page 11: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Outline

How did we arrive at Big Data?

Domains of Data

Data Science, ML, and Analytics

The Internet of Things = hyper-Big Data

Simple Applications Demonstrated

11

Page 12: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Business: Monetization is a Big Challenge

12

Source for graphic: http://www.gladwinanalytics.com/blog/big-data-business-model-maturity-index-and-the-internet-of-things-iot

Page 13: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Business & Government: Analytics-driven innovation

13

Source: http://www.gao.gov/products/GAO-16-659SP

Page 14: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Government: R&D strategic plan

14

Source: https://www.whitehouse.gov/sites/default/files/microsites/ostp/NSTC/bigdatardstrategicplan-nitrd_final-051916.pdf

Page 15: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data from your body!

15 Source for graphic: https://datafloq.com/read/body-source-big-data-infographic/413

Page 16: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in your face!

16

Source for graphic: http://qz.com/779625/none-of-your-pixelated-or-blurred-information-will-stay-safe-on-the-internet/

Nothing pixelated (or blurred) will stay safe on the internet.

Deep Learning algorithms can discover deep hidden patterns.

Page 17: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Science: Discovery at Petascale & Exascale

17 http://www.extremetech.com/extreme/124561-ibm-to-build-exascale-supercomputer-for-the-worlds-largest-million-antennae-telescope

SKA = Square Kilometer Array

joint project: Australia and South Africa

http://www.ska.gov.au/

~5 exabytes (5,000,000 Terabytes) every day!

Page 18: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Environmental Monitoring

18

From Data to Information to Knowledge to Understanding

Page 19: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Environmental Monitoring

19

Page 20: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data in Science: Example from Astronomy

20

LSST Construction began 2014. Survey period = 2022-2032

Page 21: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Deep, Wide, Fast Data to answer Big Questions about the Universe

21

LSST Key Science Drivers: Mapping the Dynamic Universe – Complete inventory of the Solar System (Near-Earth Objects; killer asteroids???)

– Nature of Dark Energy (Cosmology; Supernovae at edge of the known Universe)

– Optical transients (10 million daily event notifications sent within 60 seconds)

– Digital Milky Way (Dark Matter; Locations and velocities of 20 billion stars!)

LSST in time and space: – When? ~2022-2032 – Where? Cerro Pachon, Chile

Architect’s design

of LSST Observatory

Page 22: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

LSST Summary: Big Data & Data Science

22

• http://www.lsst.org

• 3-Gigapixel camera

• One 6-Gigabyte image every 20 seconds

• 20 Terabytes every night for 10 years

• Repeat images of the entire night sky every 3 nights:

Celestial Cinematography

• 100-Petabyte final image data archive anticipated

all data are public!!!

• 20-Petabyte final database catalog anticipated

~20 trillion sources with 200+ database attributes each.

This is a combinatorial explosion!

~10 million events per night, every night, for 10 years.

Fast categorization and decisions (triage!) required.

Goal: understand our vast dynamic Universe

Page 23: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

LSST Summary: Big Data & Data Science

23

• http://www.lsst.org

• 3-Gigapixel camera

• One 6-Gigabyte image every 20 seconds

• 20 Terabytes every night for 10 years

• Repeat images of the entire night sky every 3 nights:

Celestial Cinematography

• 100-Petabyte final image data archive anticipated

all data are public!!!

• 20-Petabyte final database catalog anticipated

~20 trillion sources with 200+ database attributes each.

This is a combinatorial explosion!

~10 million events per night, every night, for 10 years.

Fast categorization and decisions (triage!) required.

Goal: understand our vast dynamic Universe

DEEP

WIDE

FAST

VALUE

Page 24: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The 4 Rewards of Big Data in all Domains

o Knowledge Discovery – Data-to-Discovery (D2D)

o Data-driven Decision Support – Data-to-Decisions (D2D)

o Big ROI (Return On Innovation) – Data-to-Dollars or Data-to-Dividends (D2D)

– Innovative Applications of sense-making from sensors and sentinels everywhere

o Data Science for Social Good – Data for Good (D4G) – follow @DataSci4Good

24

http://thinkfuture.com/

Page 25: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Challenges to Achieving Rewards The 3 V’s of Big Data are not just hype – they represent really big challenges:

1. Volume (DEEP)

2. Variety (WIDE)

3. Velocity (FAST)

But… Volume is not the problem! Storage is manageable.

Data Science & Analytics (integrating and combining disparate data sources to achieve Data-to-Discovery, Data-to-Decisions, and Data-to-Dividends) are hard…

… especially on complex (diverse, high-Variety) and fast-moving (real-time, high-Velocity) data!

Focus on Value Creation through Advanced Analytics / Data Science in order to conquer these challenges.

25 Source for graphic: http://www.vitria.com/blog/Big-Data-Analytics-Challenges-Facing-All-Communications-Service-Providers/

Page 26: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Outline

How did we arrive at Big Data?

Domains of Data

Data Science, ML, and Analytics

The Internet of Things = hyper-Big Data

Simple Applications Demonstrated

26

Page 27: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Some Quick Definitions

Statistics = the practice (and science) of collecting and analyzing numerical data.

Machine Learning (ML) = mathematical algorithms that learn from experience (historical data).

Data Mining = application of ML algorithms to data.

Artificial Intelligence (AI) = application of ML algorithms to robotics and machines.

27 Source for graphic #1: http://www.satyavedism.com/mathematics-astrophysics/mathematics-resources Source for graphic #2: http://blogs.sas.com/content/subconsciousmusings/2014/08/22/looking-backwards-looking-forwards-sas-data-mining-and-machine-learning/

Data Science = application of scientific method to discovery from data (including statistics, machine learning, and more: visual analytics, machine vision, computational modeling & simulation, semantics, graphs, network analysis, data indexing schemes, …).

Analytics = the products of machine learning & data science.

Page 28: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Machine Learning: 4 Types of Discovery (algorithms that learn from experience)

1) Class Discovery: Finding new classes of objects (population segments), events, and behaviors. This includes: learning the rules that constrain the class boundaries.

2) Correlation (Predictive and Prescriptive Power) Discovery: Finding patterns and dependencies, which reveal new governing principles or behavioral patterns (the “customer DNA”).

3) Novelty (Surprise!) Discovery: Finding new, rare, one-in-a-[million / billion / trillion] objects and events.

4) Association (or Link) Discovery: Finding unusual (improbable) co-occurring associations.

28

Page 29: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The Data Analytics Revolution

Exploiting the Value Chain: from Digital Data to Information to Knowledge to Insights (and Action) From Sensors (Measurement & Data Collection) …

… Big Data (Deep, Fast, Wide)

to Sentinels (Monitoring & Alerts = Information) …

… Machine Learning

to Sense-making (Knowledge & Insight Discovery) …

… Data Science

to Cents-making (Your Applications of Data = Action!)

… Analytics

… Productizing and Actionizing your Big Data

29

Page 30: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Data Analytics has evolved with growth in data

5 Levels of Analytics Maturity: 1) Descriptive = hindsight : what happened?

2) Diagnostic = oversight : what is happening? and why is it?

3) Predictive = foresight : what will happen?

– Predictive : given x, find y (needs historical training data)

4) Prescriptive = insight : how can I prescribe a better outcome?

– Prescriptive : given y, find x (needs comprehensive data set)

5) Cognitive = the “right sight” : asking the right question, at

right time, in the right context, in order to make the right decision!

– Cognitive : the “360 view”, take it all in, ask new questions!

– …to identify your “next-best move” or “next-best action”

– “It is not what you look at that matters – it’s what you see’’ (Henry David Thoreau)

30

Page 31: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

31

Page 32: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

32

Page 33: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

33

Page 34: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

From Descriptive to Predictive to Prescriptive Analytics via Cognitive Analytics: Exploring “Data in Context” leads to new questions and new hypotheses …

34 http://www.boozallen.com/datascience

Page 35: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

35

The Full Operational Data Analytics Spectrum

© Copyright 2016 Booz Allen Hamilton

Page 36: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Data Science and Analytics

Class Discovery

Correlation (Predictive / Prescriptive Power) Discovery

Surprise (anomaly) Discovery

Association (Link) Discovery

Each one can be applied at the 5 different levels of Analytics Maturity:

Descriptive → Diagnostic → Predictive → Prescriptive → Cognitive

36

Page 37: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The Future of Big Data Analytics and Data Science

37 http://www.boozallen.com/datascience

Page 38: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Machine Learning in our Lives

Your Purchase Preferences, Recommender Systems, Credit Scoring, Smart Phone auto-complete, …

38

PREDICT

OPTIMIZE

DISCOVER

DETECT

Your Thermostat, Your Commute Time and Routing, Personalized Learning, …

Your Health Issues (wearables), Your Best Deal (Bed & Breakfast or Restaurant), …

Your Social Sentiment, Flu Outbreaks, Credit Card Fraud, …

© Copyright 2016 Booz Allen Hamilton

Page 39: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Machine Learning in our Work

Predict outcomes, events, needs, costs, risks, product demand, … PREDICT

OPTIMIZE

DISCOVER

DETECT

Optimize processes, products, and people (delivery of services, supplies, personnel), …

Discover insights in publications, social media, quarterly business reports, electronic records, …

Detect fraud, anomalies in safety events, behaviors, outbreaks, data usage (HIPAA), cyber systems (data breaches), …

© Copyright 2016 Booz Allen Hamilton 39

Page 40: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Data Analytics in Medicine & Health Administration

40

1. Benefits Administration improvement (“ACO = HIE + Analytics”: process mining, best practices, cost-efficiency, success metrics validation)

2. Do Not Pay initiatives (payment error / fraud analytics) 3. Beneficiary Recommendations ("Amazon-style" predictive analytics, prescriptive modeling) 4. Consumer Engagement (personalized online web experience, "marketing analytics") 5. Health Information Exchange (HIE) Exploitation (population health discovery, link analysis,

ICD-10 mining) 6. Personalized Healthcare and Patient Wellness (wearables data-sharing/mining, health

baselining) 7. Personalized/Precision Medicine and Care Coordination (EHR, HIE monitoring / mining) 8. Predictive Medicine (readmissions, complications, adverse interactions) 9. At-Risk Precursor Analytics (early warning signals of cancer, diabetes, heart disease, suicidal /

mental health issues, ...) 10. Patient Trajectories Analysis (mining / segmentation of whole population EHR histories,

pathways, outcomes, outliers) 11. Learning Health System Decision Support (advanced analytics embedded in health system

data feeds) 12. What Question Should I Be Asking of My Data? (Cognitive Analytics)

© Copyright 2016 Booz Allen Hamilton – http://www.boozallen.com/datascience

Page 41: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Outline

How did we arrive at Big Data?

Domains of Data

Data Science, ML, and Analytics

The Internet of Things = hyper-Big Data

Simple Applications Demonstrated

41

Page 42: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Data Science: Applications and Use Cases are everywhere… Smart Apps (Find best price; real-time travel adjustments; type-ahead texting)

Predictive Retail (Dynamic Pricing, Smart Supply Chain, Precision Demand Forecasting)

Precision Marketing (SegOne, Personalized Real-time Ad Campaigns for Next Best Offer)

Smart Highways (Real-time intelligence among vehicles, weather, roads, repairs)

Precision Traffic (Self-driving & Self-parking Connected Cars)

Smart Cities (Growth, Dynamic Street-lighting, Smart Energy Usage)

Predictive Law Enforcement (Predictive, Prescriptive personnel & resource placements)

Smart Healthcare (Wearables, Personalized Medicine, Patient/Provider Monitoring)

Invisibles (under-the-skin smart sensors that measure, learn, respond) = The Internet of Emotions!

Personalized Online Education (Dynamic learning, Gamification, Real-time interventions)

Precision Forests, Farms, Vineyards,… (Data-driven Planning, Nurturing, Harvesting)

Fintech / Banks / Insurance (Fast Risk analysis, Fraud detection, Personalized services)

Smart Organizations (Talent Placement, Employee Retention, Workforce Deployment, Process Mining for Efficiencies, Workflow recommender engines)

Predictive Machines (Early Warning, Prescriptive Maintenance & Obsolescence, IoT, Industrial IoT) 42

Page 43: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The XYZ of Data Science: Intelligence at the edge of the network (Edge Analytics at the point of data collection)

Smart X

– Smart Cities

– Smart Highways

– Smart Supply Chain

Precision Y

– Precision Medicine

– Precision Farming

– Precision Pricing

Personalized Z

– Personalized Health

– Personalized Learning

– Personalized Shopping Experience

43

http://www.loopcayman.com/content/if-smart-cities-are-next-big-thing-what-about-smart-regions

Page 44: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Internet of Things

https://www.nsf.gov/news/news_images.jsp?cntn_id=122028

Page 45: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Everything Interconnected

https://www.nsf.gov/news/news_images.jsp?cntn_id=122028

Page 46: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The Internet of Things (IoT)

is an interconnected universe of Dynamic Data-Driven Application

Systems (DDDAS)

https://www.nsf.gov/news/news_images.jsp?cntn_id=122028

Page 47: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Drive Big Benefits with Big Data Analytics Triage

General example of Data Analytics Triage in IoT: Event Mining in Dynamic Big Data Collections for Actionable Intelligence:

Behavior modeling (anomaly & trend detection) and ad hoc inquiry for Discovery

Identifying, characterizing, & responding to events for data-driven Decisions

Deciding which events need immediate investigation and/or intervention = Action!

Many other examples: Web user engagement & recommendations (from web analytics data)

Customer churn early warning (from 360-view customer data)

Predictive Maintenance alerts (from machine / engine sensors)

Infrastructure Monitoring alerts (from ubiquitous sensors)

Supply chain monitoring (from manufacturing & shipping sensors)

Cybersecurity alerts (from network logs)

Preventive Fraud alerts (from financial applications)

Health alerts (from EHRs and national health systems)

Tsunami alerts (from geo sensors everywhere)

Social event alerts or early warnings (from social media)

47

Prescrip

tive

Ris

k M

itig

ati

on

Page 48: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Infusing Analytics Capability into your organization

48 © Copyright 2016 Booz Allen Hamilton

Booz Allen’s approach to helping organizations drive

competitive advantage through data analytics

Activities • Enrich • Integrate

and Transform Data

Methods • Descriptive

Statistics • Filtering • Aggregation

Activities • Reveal trends • Identify

Correlations • Learn

Patterns

Methods • Unsupervised

Learning • Clustering • Outlier

Detection

Activities • Classify

Signals • Predict Risks • Forecast

Resources

Methods • Random

Forest • Neural

Networks • Bayesian

Analysis • Collaborative

Filtering

Activities • Optimize

Resources

• Simulate Decision Outcomes

Methods • Genetic

Algorithms

• Integer Programming

• Non-Linear Programming

• Discrete Event Simulation

Acquisition, aggregation and enrichment of information from multiple entry points will help create a holistic

view that can enhance operations, reduce risk, provide powerful insight, and create value.

Enables Effective Operations

and Decision-Making

• Allows for accurate

analysis of trends

across the organization

against defined KPI’s

• Supports strategic C-

Suite decision making

• Reveals operational

risks and potential

bottlenecks in real-time

• Supports critical

information

infrastructure protection

efforts by early

detection of

vulnerabilities

Products

Reports | Dashboards |

Mitigations

360o Data Acquisition

Business Operations and

Performance Data

Logs: Systems, Customers,…

Reports, e-Docs, and Manuals

Open Data

Page 49: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Outline

How did we arrive at Big Data?

Domains of Data

Data Science, ML, and Analytics

The Internet of Things = hyper-Big Data

Simple Applications Demonstrated

49

Page 50: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Mars Rovers (metaphor for general use case)

50

Page 51: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Mars Rovers (metaphor for general use case)

51

• Mars Rover = intelligent data-gatherer, mobile data mining

agent, and autonomous decision-support system:

– Gathers data (in situ) for remote sensors

– Performs intelligent (autonomous, cognitive) data mining operations

• Class Discovery

• Correlation (Predictive and Prescriptive Power) Discovery

• Novelty Discovery

• Association Discovery

– Enacts on-board Intelligent Data Understanding & Decision Support

• “Stay here and do more, or move elsewhere”

• “Follow trend to more interesting, lucrative, and productive location”

• “Send results immediately, or store for later analysis”

Page 52: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

From Sensors to Sentinels to Sense

52

• New knowledge and insights are acquired by monitoring and mining actionable data from all digital inputs.

–Sensors!

• Alerts are triggered autonomously, without intervention (when it is permitted), applying machine learning and actionable business decision rules for pattern detection and diagnosis.

–Sentinels! (embedded machine learning / data science algorithms)

• “Smart Sensors” (powered by Machine Learning-enabled sentinels) deliver actionable intelligence.

–Sense!

(applies to any application domain with streaming data from sensors)

Page 53: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Dynamic Data-Driven Application Systems (DDDAS)

4 steps from data to action = MIPS:

– Measurement – Inference – Prediction – Steering

This applies to any Network of Sensors:

– Web user interactions & actions (web analytics data), Cyber network usage logs,

Social network sentiment, Machine logs (of any kind), Manufacturing sensors, Health &

Epidemic monitoring systems, Financial transactions, National Security, Utilities and

Energy, Remote Sensing, Tsunami warnings, Weather/Climate events, Astronomical

sky events, …

– IoT (the Internet of Things) and M2M (Machine-to-Machine): e.g., connected cars,

manufacturing plants, transportation systems, locomotive and jet engines, power grid,

“smart home”, “smart cities”, “smart farms”,…

Machine Learning enables the “IP” part of MIPS:

– Pattern (Segment) Discovery

– Correlation (Trend) Discovery

– Novelty (Anomaly) Discovery

– Association (Link) Discovery

53

http://dddas.org

Alert & Response systems:

• Actionable insights from

streaming business data

• Automation of any data-driven

operational system

Page 54: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Dynamic Data-Driven Application Systems (DDDAS)

4 steps from data to action = MIPS:

– Measurement – Inference – Prediction – Steering

This applies to any Network of Sensors:

– Web user interactions & actions (web analytics data), Cyber network usage logs,

Social network sentiment, Machine logs (of any kind), Manufacturing sensors, Health &

Epidemic monitoring systems, Financial transactions, National Security, Utilities and

Energy, Remote Sensing, Tsunami warnings, Weather/Climate events, Astronomical

sky events, …

– IoT (the Internet of Things) and M2M (Machine-to-Machine): e.g., connected cars,

manufacturing plants, transportation systems, locomotive and jet engines, power grid,

“smart home”, “smart cities”, “smart farms”,…

Machine Learning enables the “IP” part of MIPS:

– Pattern (Segment) Discovery

– Correlation (Trend) Discovery

– Novelty (Anomaly) Discovery

– Association (Link) Discovery

54

http://dddas.org

Alert & Response systems:

• Actionable insights from

streaming business data

• Automation of any data-driven

operational system

Page 56: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

…but the greatest of V’s is Variety

Source for graphic: http://www.cs.princeton.edu/courses/archive/spr04/cos598B/bib/BrunnerDPS.pdf

The discovery and separation of classes improves when a sufficient

number of “correct” features are available for exploration:

(a) 2 classes are discovered and become separable:

(b) One trend line becomes 2 clusters:

14 56

Page 57: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Feature Selection and Projection

Feature Selection is important to disambiguate different classes. More importantly, Class Discovery depends on selecting the right features!

57

Page 58: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Feature Selection and Model Bias: choosing features in the dark

I picked out two socks from my sock drawer this morning!

It was still dark, but that shouldn’t matter, right? After all, they are the same size … THE SAME ?!?

The Era of Big Data represents the END OF DEMOGRAPHICS (i.e., our models should no longer be based on and biased by a limited selection of attributes and features)

58

Page 59: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

59

Insufficient Variety: multiple classes are not distinguishable using this one feature

Page 60: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Sufficient Variety: two classes are discovered using this new feature

60

Page 61: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Another example of class discovery in a data set: by exploring high-variety (high-dimension data)

The separation and discovery of classes improves when a sufficient number of “correct”

features are available for exploration:

61

Source for graphic: http://www.cs.princeton.edu/courses/archive/spr04/cos598B/bib/BrunnerDPS.pdf

Page 62: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Clustering for Persona Discovery and Customer Personalization

Exploiting the 3rd V of Big Data

(Data Exploration and Data Exploitation)

1. Volume

2. Velocity

3. Variety

62

Page 63: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Digital Marketing Analytics Evolution: From Demographics to Personalization to Hyper-personalization

63

http://www.webtwit.com/digital-marketing-company-india.html

Page 64: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

360 Customer View in Digital Marketing

64

Page 65: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Clustering = Class / Segment Discovery

Clustering = the process of partitioning a set of data into subsets (segments or clusters) such that a data element belonging to any chosen cluster is more similar to data elements belonging to that same cluster than to the data elements belonging to other clusters. = Grouping together similar items, and separating dissimilar items = Identifying similar characteristics, patterns, or behaviors among subsets of the data elements.

Challenge #1) No prior knowledge of the number of clusters. #2) No prior knowledge of semantic meaning of the clusters. #3) Different clusters are possible from the same data set! #4) Selecting different features can lead to different clusters.

65

Page 66: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Types of Clustering

In general terms, there are two approaches to clustering:

– Partitional – One set of clusters is created (e.g., K-Means clustering – choose K, the number of clusters).

– Hierarchical – Nested sets of clusters are created sequentially.

66

Page 67: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Example of Hierarchical Clustering

67

67

Starting with (a), then going to (e): Bottom-up, Agglomerative Clustering

Starting with (e), then going to (a): Top-down, Divisive Clustering

( ( (

( (

The “Google Maps” view for your Customer Space

https://www.researchgate.net/figure/273456906_fig3_Figure-4-Example-of-hierarchical-clustering-clusters-are-consecutively-merged-with-the

Page 68: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Hierarchical Clustering Approaches

Clusters are created at multiple levels – creating a new set of clusters at each level.

There are 2 types of hierarchical clustering:

– Agglomerative Clustering

Bottom-Up

Initially, each item is in its own cluster.

Then, clusters are merged together iteratively ...

– ... based upon similarity of data items.

– Divisive Clustering

Top-Down

Initially, all items are in one cluster.

Then, large clusters are successively divided ...

– ... based upon distance between data items.

68

Segmentation of One = ‘SegOne’ Marketing = Personalization

Marketing Campaign Segments = Customer Personas

Page 69: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Digital Marketing: your “Mars Rover” in a box – 1

Mining multi-channel big data streams (across your organization) o Class Discovery

o Correlation (Predictive and Prescriptive Power) Discovery

o Novelty Discovery

o Association Discovery

Hierarchical Segmentation for Personalization (“SegOne Marketing”)

Decision Automation in a rich content (Big Data) environment

69

Page 70: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Digital Marketing: your “Mars Rover” in a box – 2

Your own “Smart Sentinel (Mars Rover)” – Your business rules determine the decision points,

alerts, and responses (IF-This-Then-That = IFTTT).

– Move beyond historical hindsight and oversight (Descriptive & Diagnostic Analytics)

– Apply insight and foresight (Predictive & Prescriptive Analytics)

– Achieve right sight for your next-best move (Cognitive Analytics)

the 360 view enables the right question, right action, for the right customer, at the right place, at the right time, in the right context.

70

Page 71: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Data Science improves your odds in the fundamental business gambit: RISK versus REWARD

71

http://www.telegraph.co.uk/news/worldnews/europe/russia/10061780/Russian-convicts-beat-Americans-in-cyber-chess-battle.html

Page 72: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Are you ready to reap rewards (the 3 D2D’s) from Hyper-Big Data through Data Science?

Learning from data (Data Science): – Clustering (= New Class discovery, Segmentation)

– Correlation, Trend, Association, & Link discovery

– Classification, Diagnosis (Predictive power discovery)

– Outlier, Anomaly, Novelty detection (Surprise discovery)

… for business value (the 3 D2D’s): – Data-to-Discoveries

– Data-to-Decisions

– Data-to-Dividends (big ROI = Return on Innovation)

72

http://thinkfuture.com/

http://www.hadoop360.com/blog/iot-101-everything-you-need-to-know-to-start-your-iot-project

http://www.dataev.com/it-experts-blog/bid/297713/The-Big-Data-Challenges-of-a-Biotechnology-Startup-Company

Page 73: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

SUMMARY – Part 1 Big Data is not about “Big” or “Data”

Big Data is a concept, focused on:

1) Data Science Discovery = Data-to-Discovery

2) Analytics Solutions = Data-to-Decisions

3) Value Creation = Data-to-Dividends (Data-to-Dollars)

… The Right ROI in a Big Data World = Return On Innovation

Machine Learning and Data Science are about:

a) Digital data transformations from Sensors to Sentinels to

Sense-Making; and

b) Insights through Predictive & Prescriptive Power Discovery

and Cognitive Exploration in DEEP, WIDE, FAST data!

73

http://www.boozallen.com/datascience @KirkDBorne

Page 74: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Part 2 – Going for the Gold

Steps to Cognitive Analytics

The Data Science Bowl (data for good)

Dare to Change the World

74

Page 75: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Part 2 – Going for the Gold

Steps to Cognitive Analytics

The Data Science Bowl (data for good)

Dare to Change the World

75

Page 76: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Simple Example of

Descriptive, Predictive,

Prescriptive, and

Cognitive Analytics

© Copyright 2016 Booz Allen Hamilton 76

Page 77: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Trend Lines in data: Descriptive!

Warning: it is tempting to over -f it every

wiggle in the data?

92 Naturally Occurring Elements

All Measurements are Degree Kelvin

© Copyright 2016 Booz Allen Hamilton

92 Naturally Occurring Elements

77

Page 78: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

This is a better fit to the trend line…

for use in Predictive & Prescriptive analytics!

92 Naturally Occurring Elements

All Measurements are Degree Kelvin

© Copyright 2016 Booz Allen Hamilton

92 Naturally Occurring Elements

78

Page 79: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Sometimes we are

tempted to think that

outliers are just noise.

Trend Lines and

Outliers:

© Copyright 2016 Booz Allen Hamilton 79

92 Naturally Occurring Elements

Page 80: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Sometimes we are

tempted to think that

outliers are just noise.

Trend Lines and

Outliers:

Where is the

real discovery?

© Copyright 2016 Booz Allen Hamilton 80

92 Naturally Occurring Elements

Page 81: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Add some

context to

the data!

…that diagonal line in the

plot (where melting point =

boiling point) provides some

context (your expectations)!

Trend Lines and

Outliers:

© Copyright 2016 Booz Allen Hamilton 81

92 Naturally Occurring Elements

Page 82: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Why is that

point below

the line?

…that diagonal line in the

plot (where melting point =

boiling point) provides some

context (your expectations)!

Trend Lines and

Outliers:

© Copyright 2016 Booz Allen Hamilton 82

92 Naturally Occurring Elements

Page 83: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

There’s

the Real

Discovery!

Trend Lines and

Outliers:

© Copyright 2016 Booz Allen Hamilton 83

92 Naturally Occurring Elements

Page 84: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Arsenic!

Trend Lines and

Outliers:

© Copyright 2016 Booz Allen Hamilton

Melts @ 1089oK

Boils @ 889oK

84

92 Naturally Occurring Elements

Page 85: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Arsenic!

Trend Lines and

Outliers:

© Copyright 2016 Booz Allen Hamilton

Melts @ 1089oK

Boils @ 889oK

85

Cognitive Surprise Discovery

(outlier / anomaly / deviation detection)

Knowing the right question to ask!

Page 86: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Part 2 – Going for the Gold

Steps to Cognitive Analytics

The Data Science Bowl (data for good)

Dare to Change the World

86

Page 87: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

55

Page 88: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

The Catalyst

Booz Allen’s Data

Science Practice

Our Passion for

Data Science

Lack of a National

Data Science Event http://www.boozallen.com/datascience/

http://www.datasciencebowl.com/

(www.DataScienceBowl.com)

Page 89: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Citizen Data Science!

About Kaggle

● World’s largest online data science competition

community

● Over 500,000 members across ~200 countries

● Community uses diverse backgrounds to solve some of the

most complex data science problems in the world

● Extremely strong brand within the data science community

“We and the broader data science community

share a common passion, culture, and vision

for using data science for social good.”

(www.DataScienceBowl.com)

Page 90: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Last year’s Grand Challenge:

$175,000 prizes (provided by Booz Allen)

Assess ocean health at a speed and scale

that were previously impossible.

(www.DataScienceBowl.com)

Page 91: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)
Page 92: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Services provided by Plankton: • Provide food for humans and marine animals • Produce oxygen (phytoplankton) • Remove CO2 from the atmosphere • Contribute to global biodiversity • Provide biomedical products • Major source of nutrients for indigenous populations

Page 93: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Assess Ocean Health by classifying

118 Classes of Plankton in >160K images

(www.DataScienceBowl.com)

Page 95: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

● Read all about it here: http://benanne.github.io/2015/03/17/plankton.html

● Deep Learning with convolutional neural networks

● Average accuracy of 81% across all 121 plankton classes

● Code available at https://github.com/benanne/kaggle-ndsb

● More than 1000 competing teams

● More than 15,000 submissions

● Recap: http://www.datasciencebowl.com/recap-first-annual-data-science-bowl/

Last year’s winning approach

(www.DataScienceBowl.com)

Page 96: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Data providers and partners: Drs. Michael Hansen and Andrew Arai, of the NIH National Heart, Lung, and Blood Institute (NHLBI); and the Children’s National Medical Center. Other partners include: NVIDIA; American College of Cardiology; The Children’s Inn at NIH; FNIH (Foundation for the NIH); MedStar Institute for Innovation; and more. The Challenge: improve diagnosis of heart disease through faster, more accurate measurement of ejection fraction (end-systolic and end-diastolic volumes) in cardiac MRI data. The Data: time-series of MRI scans from over 1000 patients.

(www.DataScienceBowl.com)

We did it again this year with a

$200K heart health challenge!

Page 97: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

We did it again this year with a

$200K heart health challenge!

Data providers and partners: Drs. Michael Hansen and Andrew Arai, of the NIH National Heart, Lung, and Blood Institute (NHLBI); and the Children’s National Medical Center. Other partners include: NVIDIA; American College of Cardiology; The Children’s Inn at NIH; FNIH (Foundation for the NIH); MedStar Institute for Innovation; and more. The Challenge: improve diagnosis of heart disease through faster, more accurate measurement of ejection fraction (end-systolic and end-diastolic volumes) in cardiac MRI data. The Data: time-series of MRI scans from over 1000 patients.

It was not just about improving Cardio Imaging Analytics. It’s about Reed’s story: One in 100 children are born with congenital heart defects!

(www.DataScienceBowl.com)

Page 98: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Results: Volume Predictions

Data Science Bowl co-winner Tencia Lee visits NIH NHLBI to discuss winning algorithm

http://www.datasciencebowl.com/leading-and-winning-team-submissions-analysis/

(www.DataScienceBowl.com)

Page 99: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

2016 Format & Logistics

● Web-based competition (www.DataScienceBowl.com)

● Competition Period: 14 December 2015 through 14 March 2016

● Models were quantitatively scored (i.e., no subjective judging panel)

● We are now seeking ideas for the 2017 Data Science Bowl #3 Challenge:

http://www.datasciencebowl.com

1 GRAND

CHALLENGE

90 DAYS =

$200,000 PRIZES

1st place: $125,000

2nd place: $50,000

3rd place: $25,000

NVIDIA also provided

complimentary GPU

Technology Conference

passes to top 3 teams

(www.DataScienceBowl.com)

Page 100: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Part 2 – Going for the Gold

Steps to Cognitive Analytics

The Data Science Bowl (data for good)

Dare to Change the World

100

Page 101: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Big Data + the IoT + Citizen Data Scientists =

= Partners in Sustainability The Internet of Things (IoT):

• Knowing the knowable via deep, wide, and fast data from ubiquitous sensors!

Big Data: • In the Big Data era,

Everything is Quantified and Tracked!

• Examples: – Social Networks – Population & Personal Health – Smart Cities & Highways – Retail Analytics – Cybersecurity – IoT = Internet of Things

17 SDGs are KPIs

for the World! (currently, the SDGs have 229

key performance indicators)

Sustainability Development Goals

101

Page 102: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Environmental Monitoring with IoT data

Check out and participate in the

EPA Smart City Air Quality Challenge: https://www.epa.gov/innovation/epa-challenges-prizes

EPA is challenging communities to deploy

hundreds of air quality sensors and

to make the data public!

Submissions due October 28, 2016

102

$100,000 in prizes

Page 103: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

@KirkDBorne

@DataSci4Good

@BoozAllen

Are you ready to

change the world

with Big Data

Analytics?

LISTEN

READ www.boozallen.com/datascience

The Field Guide to Data Science

Building a Data Science Capability

Data Science Answers on Demand

10 Signs of Data Science Maturity

© Copyright 2016 Booz Allen Hamilton

Booz | Allen | Hamilton

PARTICIPATE datasciencebowl.com

Page 104: What is Big Data Analytics and Why Should I Care? is Big... · 2016-09-28 · The 3 V’s of Big Data are not just hype – they represent really big challenges: 1. Volume (DEEP)

Thank you!

Contact information:

[email protected]

@KirkDBorne

http://www.boozallen.com/datascience

104