introduction to big datathousands hours of annotated data. 16 ... analytics regional data hub...
TRANSCRIPT
Introduction to Big Data: Harnessing the Benefits of Data Powered Governance
Board of Directors Item 10 | January 24, 2020
Data Science:Connecting the dots in
emerging science, society & living
Rajesh K. GuptaDirector
2
DRAFT
Data have always been with us.
What has changed?
3
1. Ubiquity and scale of data that can be collected
4
DRAFT
5
1. Ubiquity and scale of data that can be collected
2.Development of sophisticated analytics tools
6
DRAFT
7
1. Ubiquity and scale of data that can be collected
2. Development of sophisticated analytics tools
Use of data in ways not previously imagined
8
DRAFT
http://firemap.sdsc.edu
Santa Rosa fire, October 2017
http://wifire.ucsd.edu
9
New algorithms for locally sensitive hashing, approximate computing
10
DRAFT
Similar structures have similar effects
Toxicology tested by matching against a massive DB of molecular structures and safety data
11
Machine predictions that can reduce crime or amount of jailing
12
DRAFT
New algorithms for locally sensitive hashing, approximate computing
Similar structures have similar effects
Toxicology tested by matching against a massive DB of molecular structures and safety data
Machine predictions that can reduce crime or amount of jailing
Materials Society
Biology
13
14
DRAFT
15
Data being used everywhere in daily lives
[Faster R-CNN: Ren, He, Girshick, Sun 2015]
Detection Segmentation
[Farabet et al., 2012]
Lots of training data:• Vision systems trained on millions of images• Speech systems training on tens of
thousands hours of annotated data.
16
DRAFT
FB data
Psychographic data
Tweets
Browsing data
Voter registration
Public databases
. . . and more
Personality traits Race &
ethnicity
Political views
Religious views
Sexuality
Profession
17
HDSI: A Convergence of Multiple Long‐term Initiatives Launched in March 2018
18
DRAFT
188 Faculty Drawn from All Areas of University
EnvironmentTechnologyHealth
MicrobiomeScienceBig Data
Understandingand Protecting the Planet
HDSI
Enriching Human Life and Society
Exploring the Basis of Human Knowledge,
Learning and Creativity
Understanding Cultures and Addressing
Disparities in Society
19
Questions and Answers
20
DRAFT
21
Role of Data at SANDAGSpeaker: Ray Major
22
SANDAG is creating an environment where economic and social benefits are recognized and achieved using empirical information.
BigData
Regional Plan
DRAFT
23
SANDAG’s evolution as a data driven organization
INRIX Travel Time Data Incorporated in ABM
Work begins on Activity Based Model
1999 2005 2006 2009 2010 2012 2013 2016 2017 2018 2019 2020
Intermodal Transportation Management System Integrated
Corridor Management
Data Governance (2018)
Grant to Develop Solution for Analytics
Regional Data Hub
Intelligent Transportation System Architecture Regional
Arterials Management System
Streetlight Origin-Destination Data Acquired for ABM
ATRI Data Commercial Vehicles
AirSage Data for Military Base Trip Validation
Data used to drive Regional Plan
24
Where (BIG) data comes from
External Data
• Navigation apps• Google• Weather • Uber/Lyft• Bird/Lime• Construction• Inrix/streetlights• Telematics
Internal Data
• Sensors• Signals• Cameras• Public transit• Traffic incidents• Crime• Detectors
DRAFT
25
BenefitsFuture
Opportunities
Why big data is essential in making policy decisions for the region
26
Analytic Continuum
What happened
Why did it happen
.
What is happening
now
What may happen
Based on History
What possibly will happen
Based in Real Time
.
What actions should be
Reporting
High
Low High
Complexity
Value of Information
Analysis
Monitoring
Forecasting
Prescriptive
PredictiveDRAFT
27
Anonymization/privacy of the economic, demographic, and transportation data
Anonymization
Data Governance
PrivacySecurity
Smart Cities & Open Data
01/24/20
Andrell Bower, Chief Data Officer
28
DRAFT
01/24/20 29
Background
Open data – an initiative launched in 2015
Engaged with residents and increased transparency
Data is a valuable asset that helps us meet strategic goals
PUBLIC FACING … … WHILE SHIFTING CULTURE
01/24/20 30
Inform policy
We can decide the next category to add to Get it Done by conducting content analysis and text mining on reports submitted to the ”Other” category.
DRAFT
01/24/20 31
Improve service
We used 911 call volumes andambulance time on task to
calculate the optimal number of ambulances for each of our
response zones.
01/24/20 32
Evaluate programs
A tool that calculates parking meter utilization by block using daily transaction data will help program managers pilot changes and make timely adjustments.
DRAFT
01/24/20 33
Thank you
Andrell Bower, [email protected] our Open Data PortalView our open-source data projects on Github
Central Ohio Case Studies inCivic Data Privacy & Governance
Christina Drummond, M.A. ISTPProgram Manager and Senior Analyst
Moritz College of Law, The Ohio State [email protected]
Presentation to SANDAGJanuary 24th, 202034
DRAFT
35
Professional Networks
Efforts
36
Regional EffortsDRAFT
37
• ���������� ������������ � ��� ��� � ������
• ������ �� � ����� ���������� ������� �� ��������� �� � �������
• � ������������������������ �� � ����� ����������
Regional Data Advisory Committee (RDAC)
https://www.morpc.org/committees/regional-data-advisory-committee/Contact Aaron Schill, Director, Data & Mapping. [email protected]
38
������� ��������� �������
! "���������� ��#��������� ��� ����
$! � � � ���%�������� ������ ����
&! '(���)� � ������ ���
*! +���� ������ ������� ��� ��������� ���
,! - ���� ����������������%� ���.���%
MORPC Regional Data Agenda
https://www.morpc.org/wordpress/wp-content/uploads/2019/03/DATA-POLICY-AGENDA_FINAL_WEB.pdf
DRAFT
39
• � � �����%/���������%01������
• ������ �+����� ���� ��� � (���
• ������ �- ����� �"������� ���%
• ���� �� �����%� �#�� ��
Active Working Groups
- 2��3������ �� � �������%3�� � �����
39
40
MORPC RDAC Data Policy Needs Survey & Toolkit WGSurveying and resourcing civic data governance needs
Data topics: e.g. cybersecurity, open data, privacy & ethics, contracting, sourcing and provisioning
Operations: Policy, procedures, staffing and training
DRAFT
41
MORPC Data User Personas
Guiding data delivery & engagement
Personas span:
• Mid-sized City Planner
• Township Administrator
• Engaged Resident
• Elected Village Council Member
• Nonprofit Employee
• Consultancy Project Manager
• Civic Tech Enthusiast
https://www.morpc.org/tool-resource/data-user-personas/
$40 MILLION
42
DRAFT
To empower our residents to live their best lives through responsive, innovative and safe mobility solutions.
VISION MISSIONTo demonstrate how an
intelligent transportation system and equitable access to transportation can have
positive impacts on every day challenges
faced by cities.
43
44
DRAFT
USDOT PORTFOLIO
45
TECHNICAL WORKING GROUP (TWG)
Data
• Identify use cases for data to solve community challenges & continue to make the city smarter
Technical
• Architecture & Design Recommendations
• Best Practices
• Tech Strategies
• Licensing / IP
Policy
• Data Management
• Data Privacy
• Policy Roadmap
46
DRAFT
Data: Privacy, Management and Policies
SMART COLUMBUS DATA PRIVACY AND MANAGEMENT PLANS
47
WHAT IS THE SMART COLUMBUS DMP?
Data Management Plan
• Governs data within Smart Columbus OS
• Addresses metadata management, data size, data acquisition, data access and data use.
https://smart.columbus.gov/programs/smart‐city‐demonstration
48
DRAFT
DATA INVENTORY
• Description of Data• Data Type• Dataset• Source of the Data• Responsible Party• Collection Approach• Frequency• Needed for Performance
Management
• Period of Collection• Users of the Data• Value of the Data to the
Users• Does it Contain PII• Relevant Data Standards• Access Policies for the Data• Where is the data located?
Data Inventory for each dataset:
49
SMART COLUMBUS DATA PRIVACY PLAN
• High-level plan to protect privacy and secure data
• Privacy controls• Security controls• De-identification• Ongoing governance
• A commitment to respect and to be a good steward of personal information
https://smart.columbus.gov/programs/smart‐city‐demonstration
50
DRAFT
HOW WERE THEY DEVELOPED?
• Guiding Light Documents
• Iterative Drafts with Special Reviews
• Diverse Drafting Team
https://www.smartcolumbusos.com/share‐your‐data
51
Types of Guiding Documents
Laws, including:• ORC: Personal Information Systems, Public Records• Federal: Privacy Act 1974• International: GDPR
NIST Standards/Guides• Security categorization standards, defining breach impacts
(FIPS 199)• Minimum security requirement specs, risk-based process
control selection (FIPS 200)• Information type to security category mapping (800-60v1)• Protection of PII confidentiality (800-122)• Security and privacy controls (800-53) *Control defining
52
DRAFT
Guiding documents continued…
Policy Reports/Playbooks
• Open Data Privacy Playbook (Green et al/Berkman)
• De-identification protocol for Open Data (IAPP)
Policy from other cities
• Open Data Release Toolkit – Data SF/San Francisco
• Open Data Risk Assessment - Seattle
• Data Privacy Plan – Connected Vehicle Pilot / Tampa
53
Iterative Plan Development Process
Guiding Documents
Technical
LegalPrivacy
54
DRAFT
Drafting Dream Team
• Over two years• 38 people from across 22 organizations• Facilitated by Moritz Program on Data and Governance • Diverse perspectives:
• Public Sector IT• Lawyers• Information Privacy Professionals• Corporate Chief Privacy Officers• Unaffiliated national privacy experts• OSU students
55
Transparency
User Notice/Consent
PII Use Limitations
Data Minimization
Secure Environment
Identity Shielding
Accountability
PRINCIPLES | SMART COLUMBUS DPP
56
DRAFT
Data Privacy Plan Elements
Principles and
Legal Protections
for
Projects using PII
• Data Stewardship
• Applicable Law Compliance
• PII Definitions
• Administrative/Legal Safeguards
• Scope of plan related demonstration data
57
Data Privacy Plan Elements
Specified
Privacy Controls
1. Notice and Consent
2. Data minimization
3. PII Use/Sharing
4. Data retention
5. Access, correction, deletion
6. Transparency
7. Accountability
8. De-identification
58
DRAFT
Data Privacy Plan Elements
Specified
Privacy Controls
9. Data Curation, incl. Data Inventory spec.Benefit/Risk Analysis process
10.Privacy Testing
11.Data compartmentalization
12.IRB
13.Privacy and Security Board
14.Independent Evaluator Access
15.Contractor / 3rd party compliance
16.Privacy Impact Assessments
59
Data Privacy Plan Elements
Specified
Security Controls
1. Encryption
2. Physical controls
3. Access controls, with logs
4. Role/ID-based authorization
5. Penetration Testing
6. Secure software development
7. System monitoring
8. Data loss prevention
60
DRAFT
Data Privacy Plan Elements
Specified
Security Controls
9. Patching, Antivirus, Malware Checks
10.DPP training
11.Accountability and event review• Incident response plan spec.
61
Data Privacy Plan Elements
Privacy Impact
Assessment
• PIA question prompts• Data description, use, retention• Internal and external sharing/disclosure• Data collection notice, consent• Data access, redress, correction• Project security controls
62
DRAFT
63
Additional Resources
64
• ����4��4������5�����0���������# ���� � ������ �������� ��������� ��
Civic Data Privacy Leaders Network
Contact: Kelsey Finch, Senior Counsel, Future of Privacy Forum. [email protected]
Future of Privacy Forum
• Resources and reports related to civic data privacy and governance
https://fpf.org/2018/10/31/nothing-to-hide-tools-for-talking-and-listening-about-data-privacy-for-integrated-data-systems/
https://fpf.org/2016/04/25/a-visual-guide-to-practical-data-de-identification/DRAFT
65
• 3��%6���������% ������#��
• 7� ������� ������ ������� � 4��������������4� ����
• ���� ���� � ��
� ������683��� �� ����
MetroLab
66
DRAFT
Next OS: Clearinghouse for Transportation DataSpeaker: Sanjiv Nanda
67
68
What is Next OS
Who…
What… access transportation services that are cheaper, faster, and safer
Resident and Visitors, Goods and Freight
Transportation Operators & Transportation Service Providers
manage services or assets as efficiently as possible
Planners and Policymakers
make informed decisions to promote economic growth and quality of life
How…
A suite of integrated applications to browse, book, and pay for services
A suite of dashboards with advanced analytics and dynamic control tools
A platform with public and private data that better informs decisions
Next OS: Data Clearinghouse, Suite of Data Driven Tools and Applications
DRAFT
69
Next OS In Operation
Flexible Fleets• Shared Mobility• Microtransit• Micromobility
Mobility Hubs• Trip Planning Kiosks• Curb Access Management
Transit Leap• Real Time Tracking• Dynamic Scheduling
Complete Corridors• Managed Lanes• Signal Priorities• Dynamic Pricing
Next OS: Data Clearinghouse, Suite of Data Driven Tools and Applications
• Operational EfficiencyCoordination, Scheduling and Pricing of Resources and Assets
• Transportation User EngagementTrip Planning and Payments at your fingertips
DRAFT