Big Data & Analytics: what this means to Governments John Palfreyman
Agenda
1. Big Data & Analytics for Government - Why? 2. Case 1 – Galway Bay Sonar 3. Case 2 – Base Protection 4. Case 3 – Predictive Policing 5. Case 4 – Ebola initiatives in Africa 6. Future - Watson
© 2014 International Business Machines Corporation
3
External Pressures on Government
Expanding impact of technology
Continuing economic and
budget challenges
Accelerated globalization
Pressure for transparency and
accountability
Rising environmental
concerns
Increased expectations for services and responsiveness
4
Big Data = Huge opportunity, if harnessed
Velocity Variety
Volume Veracity
4.6 billion camera phones world wide
Facebook processes 10 TBs of data every day
12 terabytes of Tweets each day, insight into public
sentiment
2 billion people on the Web as of 2011
5 Million financial transactions occur
every single day
5 billion mobile phones in use
Big Data – Increasing Veracity
© 2014 International Business Machines Corporation
© 2013 International Business Machines Corporation 42
The Dawn of Big Data: This is Only the Beginning The uncertainty of big data is growing alongside its complexity
2010
9000
2015
Sensors & Devices
VoIP
Enterprise Data
Social Media
We are here
8000
7000
6000
5000
4000
3000
Analytics can transform Government
To create a strong legacy of transformation
To spend public funds responsibly
To realize results-based government
To drive smarter decision-making
To achieve the best outcomes for everyone, from everyone
To drive transparency and accountability
Agenda
1. Big Data & Analytics for Government - Why? 2. Case 1 – Galway Bay Sonar 3. Case 2 – Base Protection 4. Case 3 – Predictive Policing 5. Case 4 – Ebola initiatives in Africa 6. Future - Watson
© 2014 International Business Machines Corporation
Galway Bay Marine Mammal Project Identify marine mammals
• Species • Count • Distance • Individual returning mamals
Method • Analysis of hydrophone data
• High frequency (500 kHz) • Medium resolution (16bit mono) • Contain environmental (natural and artificial) noise
Sen
sor A
rray
Transform Filter / Sample
Classify Correlate
Annotate
9
Stream Computing
Species Identification • “Click Detection” and “Click Profiling” • Three stages process
Pre-click detection
Dynamic filtering
Click profiling & detection
Pre-click Detection
High Pass Filter
Pre-click detector
Fast Fourier Transform
(FFT)
Mean Frequency
About 0.5s of WAV data
Porpoise f=137-144kHz
Dolphin f=115-120kHz
Dolphin f=115-120kHz
Dynamic Filtering
12
Band Pass Filter (175 dB) Porpoise
f=137-144kHz Calculate
Sound Pressure
Level
Sound pressure level (signal strength) determined by:
• Distance
• Salinity • Temperature
Apply filter based on:
• Species “hint” (frequency) • Sound pressure level
Band Pass Filter (161 dB)
Band Pass Filter (151 dB)
Band Pass Filter (230 dB) Calculate
Sound Pressure
Level
Band Pass Filter (216 dB)
Band Pass Filter (210 dB)
Click Profiling & Detection
13
Mean Frequency
Fast Fourier Transform
Band Energy
Peak Position & Width
Click Length
Click Counter
Spectral frequency in click
Agenda
1. Big Data & Analytics for Government - Why? 2. Case 1 – Galway Bay Sonar 3. Case 2 – Base Protection 4. Case 3 – Predictive Policing 5. Case 4 – Ebola initiatives in Africa 6. Future - Watson
© 2014 International Business Machines Corporation
Base Protection - Project Overview Requirement
• Detect, classify, locate and track potential threats, above and below ground, to secure base perimeters and border areas
Challenges • Continuously consume and analyse digital acoustic data
– biological, mechanical and environmental objects-in-motion • Gather and analyse information simultaneously, at very high speed
Capability • Collect data from multiple sensor types • Analyse and classify streaming acoustic data in real time
Base Protection – Solution Outline
Fibre Optic Cable Base Perimeter
Detect
Classify
Locate
Track
Streaming, Time Series and Partner Technology
Base Protection - Capability • Captures and transmits real-time, streaming acoustical data from
around the base • Enables security personnel to “hear” even when the incident miles away
• Identify and classify a potential security threat • Take appropriate action
• Capture, reduce, process and analyse 275Mbit of acoustic data from 1024 individual sensor channels in 1/14th second (42 TB/day)
• Extendable to include other sources (reduced false alarm rate) • Airborne • Video
FROM: Traditional Analysis & Classification
Hydrophone Array
Beam Forming
Bearing / Time
Detection
Classification
Tracking
Digital Signal Processing Fast, dedicated purpose hardware / firmware
Intercept data stream Look for patterns, trends, characteristics
History
TO: Adaptive Analysis & Classification
Hydrophone Array
Beam Forming
Bearing / Time
Detection
Classification
Tracking
Stream Computing As fast, low latency
Signal Processing Functions Adaptive
History
hadoop technologies Offline Analysis Build Models & Patterns Condition Real Time Processing
Agenda
1. Big Data & Analytics for Government - Why? 2. Case 1 – Galway Bay Sonar 3. Case 2 – Base Protection 4. Case 3 – Predictive Policing 5. Case 4 – Ebola initiatives in Africa 6. Future - Watson
© 2014 International Business Machines Corporation
Police Case Work
© 2014 International Business Machines Corporation
Domestic Violence Reduction Unit (DVRU) o 3,000+cases referred each year o Investigate ~15%
Original process o Manual review of case o Decision by team based on experience
Challenges o Time spent reviewing cases
(20% of overall unit; 2FTE) o Manual decision process:
Biased? Liability? Best result?
Project Approach • Data integration (SAS, Excel, SQL, CSV..) • Visualization • Development of multivariate predictive models • Integration of standardized scoring and item
weighting • Text analytics • Entity analytics (clustering and linking) • Automated scoring based on standardized
input
© 2014 International Business Machines Corporation
Understand Goal
Understand Data
Data Preparation
Modelling
Evaluation Deployment
Data
Project Outcomes • Fact-based decision making drives consistent and
better results • Standardized protocol for reviewing and assigning
cases: procedural consistency • Risk information available for prosecutors,
probation boards • Data collection improved to provide input needed
for evaluation model • Increased productivity
o Unit strength decreased (9 to 7 officers) o 111% in cases investigated (453 to 954) o 21% increase in arrest rate
© 2014 International Business Machines Corporation
Agenda
1. Big Data & Analytics for Government - Why? 2. Case 1 – Galway Bay Sonar 3. Case 2 – Base Protection 4. Case 3 – Predictive Policing 5. Case 4 – Ebola initiatives in Africa 6. Future - Watson
© 2014 International Business Machines Corporation
Ebola Initiatives in Africa
1. Citizen engagement and analytics system in Sierra Leone
• Communities communicate issues directly to government
2. IBM Connections technology donation to Nigeria • Coordinate public health efforts
3. Global platform for sharing Ebola-related data 4. ALL philanthropic
© 2014 International Business Machines Corporation
Citizen Engagement & Analytics (Sierra Leone)
• Citizen Reporting, promoted over radio • mobile voice – toll free number • toll free SMS number, via Airtel
• Machine learning & topic classification to identify clusters of issues
• Heat maps using spatio temporal data • Passed to Open Government
• cction & policy to contain disease
© 2014 International Business Machines Corporation
Agenda
1. Big Data & Analytics for Government - Why? 2. Case 1 – Galway Bay Sonar 3. Case 2 – Base Protection 4. Case 3 – Predictive Policing 5. Case 4 – Ebola initiatives in Africa 6. Future - Watson
© 2014 International Business Machines Corporation
The Jeopardy Challenge • Jeopardy = US TV game show
• English-‐language ques/ons, word plays • understand complex natural language • large knowledge base to find the best answer • Ability to “train” from previous shows
• Grand challenge in automatic, open domain question-answering
• IBM Research led • Massive effort • Won Jeopardy, beating champions • But then what?
Cognitive systems
Programmatic Systems
• Leverage traditional data sources • Follow pre-defined rules (programs) • Provide the same output to all users
• Are taught, not programmed. • Learn and improve based on experience • Interpret sensory & non-traditional data • Relate to each of us as individuals • Expand and scale our own thinking
Cognitive Systems
Expanding Watson Post Jeopardy
Explores
Reasons
Visualizes
Understands natural language
Generates and evaluates hypotheses
Adapts and learns
© 2014 International Business Machines Corporation
30
Three classes of cognitive services
© 2014 International Business Machines Corporation
Seek answers and insights from a defined
data repository comprised largely of
unstructured data
DISCOVER Provide supporting
evidence for confidence weighted responses to questions
DECIDE ASK User has a question
and answer requirement, with questions posed in natural language
Decision Support : Healthcare
32
Watson Analytics
33
Summary
Predictive analytics § Predict and target the needs of citizens and match programs and resources to meet highest-priority citizen needs.
§ Predict and help prevent outages in key public services. § Match programs and resources to meet highest-priority citizen
needs. § Position resources to focus on high-priority service areas. § Improved governance, reduced risk, and compliance
Analytical decision management
§ Get a strategic view to manage the delivery of citizen services and program requirements.
§ Position resources to focus on high-priority service areas.
Business intelligence
Business outcomes/benefits
§ Strategic view of revenue streams, budgets, costs and expenses at all levels of the government enterprise.
§ Leverage collaborative budget preparation and execution.
Performance management
Risk management § More effectively measure and monitor financial and operational risk across agencies.
§ Use reporting capabilities to support compliance with internal and external requirements.
Conclusion
35
1. Big Data = huge opportunity, if harnessed 2. Quality erodes if increasing amounts of low veracity data ignored 3. Stream Computing + hadoop = Adaptive Signal Processing 4. Analytics solutions can make a REAL difference 5. Future = Cognitive underpinning of Analytics