shared data infrastructures from smart cities to education
TRANSCRIPT
Shared data infrastructures: From Smart Cities to Education
Mathieu d’Aquin (@mdaquin)Data Science Group (@DataScienceGr)
KMi, The Open Universitydatahub.technology
A city?
A city?
A place
A city?
A place
A bunch of people (and organisations)
A city?
A place
A bunch of people (and organisations)
live / reside there
A city?
A place
Power infrastructure
A bunch of people (and organisations)
A city?
A place
Power infrastructure
Water infrastructure
A bunch of people (and organisations)
A city?
A place
Power infrastructure
Water infrastructure
Transport infrastructure
A bunch of people (and organisations)
A city?
A place
Power infrastructure
Water infrastructure
Transport infrastructure
A bunch of people (and organisations)
use
A city?
A place
Power infrastructure
Water infrastructure
Transport infrastructure
A bunch of people (and organisations)
useunderstand???
A city?
A place
Power infrastructure
Water infrastructure
Transport infrastructure
A bunch of people (and organisations)
useunderstand???
effective
with???
A smart city?
A place
Power infrastructure
Water infrastructure
Transport infrastructure
Data infrastructure
A bunch of people (and organisations)
mksmart.org
Making Milton Keynes a Smart City
Making Milton Keynes a Smart City
Challenges
Transport
Water
Energy
By 2026, transport demand in MK is estimated to grow by 60%, with engineering solutions only likely to meet half of this
MK is in a water stressed area and it is projected that climate change may reduce regional water availability by 29 million litres water per day by 2025.
MK needs to transition to being a low energy city to support sustainable economic growth. The 2013 core strategy of MK Council aims to achieve a 22% reduction in CO2 emissions per capita from a 2005 base by 2020.
data
apps
apps
apps
A data infrastructure
Because addressing vertically every route from data to application in isolation is crazy!
Challenges:
- Data heterogeneity: The content of the data is not the same
- Data diversity: The context and conditions under which the data is available are not the same
Data heterogeneity
3 types of data in a city:
- Highly temporal data- Highly spatial data- Others...
Answer the question: “what do we know about x?” where x is a place, organisation, bus route, roundabout, etc.
Data source 1
Data source 2
Data source 3
Data source 4
Data source 5
Data source 6
Big City Warehouse
access
ETL process
ETL process
ETL process
ETL process
ETL process
ETL process
Typical integration approach
Answer the question: “what do we know about x?” where x is a place, organisation, bus route, roundabout, etc.
Data source 1
Data source 2
Data source 3
Data source 4
Data source 5
Data source 6
Big City Warehouse
access
ETL process
ETL process
ETL process
ETL process
ETL process
ETL process
Typical integration approach
Hard to maintain and keep running at scale (i.e. as number of datasets grow)
Taking a Linked Data approach
Taking a Linked Data approach
Answer the question: “what do we know about x?” where x is a place, organisation, bus route, roundabout, etc.
Data source 1
Data source 2
Data source 3
Data source 4
Data source 5
Data source 6
Query template Query template Query template Query template Query template Query template
access access access
Answer the question: “what do we know about x?” where x is a place, organisation, bus route, roundabout, etc.
Data source 1
Data source 2
Data source 3
Data source 4
Data source 5
Data source 6
Query template Query template Query template Query template Query template Query template
access access accessCan be added and maintained in isolation from each other
Answer the question: “what do we know about x?” where x is a place, organisation, bus route, roundabout, etc.
Data source 1
Data source 2
Data source 3
Data source 4
Data source 5
Data source 6
Query template Query template Query template Query template Query template Query template
access access accessCan be added and maintained in isolation from each other
Virtual (i.e. never fully materialised) - no need for maintenance
Result
An “Entity-API” for things in the city, integrating hundreds of datasets and providing thousands of data endpoints, each providing integrated information about a bus stop, an area, a restaurant, a school, a roundabout, etc.
Result - Top MK - card playing with city data
Result - MK Insight - City data portal
Data Diversity
The eskimo language has 255 different words for “visiting linguist”
What’s the point of integrating data if we have to go through
every bit of it to check if it can be used?
Example
Smart meter data
Anonymisation analysisAnon data Modelprediction/
recommendation
Results
Example
Smart meter data
Anonymisation
Solar panel monitoring
Anonymisation
analysisAnon data
Anon data
Modelprediction/
recommendation
Results
Example
Smart meter data
Anonymisation
Solar panel monitoring
Anonymisation
Weather data
Location data
Electricity tariff data
analysisAnon data
Anon data
Modelprediction/
recommendation
Results
Example
Smart meter data
Anonymisation
Solar panel monitoring
Anonymisation
Weather data
Location data
Electricity tariff data
analysisAnon data
Anon data
Modelprediction/
recommendation
Data prot.
Corp lic. 1
Results
Example
Smart meter data
Anonymisation
Solar panel monitoring
Anonymisation
Weather data
Location data
Electricity tariff data
analysisAnon data
Anon data
Modelprediction/
recommendation
Data prot.
Corp lic. 1
Corp lic. 2
Data prot.
Data prot.
Results
User T&C
OGL
Corp lic. 3
Example
Smart meter data
Anonymisation
Solar panel monitoring
Anonymisation
Weather data
Location data
Electricity tariff data
analysisAnon data
Anon data
Modelprediction/
recommendation
Data prot.
Corp lic. 1
Corp lic. 2
Data prot.
Data prot.
Results
User T&C
OGL
Corp lic. 3
?
Example
Smart meter data
Anonymisation
Solar panel monitoring
Anonymisation
Weather data
Location data
Electricity tariff data
analysisAnon data
Anon data
Modelprediction/
recommendation
Data prot.
Corp lic. 1
Corp lic. 2
Data prot.
Data prot.
Results
User T&C
OGL
Corp lic. 3
?
?
?
?
Semantic approach
Explicit, machine readable representation of data policies and licences...
… as well as of the data flows through which they are processed
Handled through a sophisticated data cataloguing process
Handled through a sophisticated data cataloguing process
Result
datahub.mksmart.org
~ 400 registered users~ 700 datasets
Result - Applications
Result - Reusable components
datahub.technology
Open source data cataloguing components that integrate with CMS system
“Entity API” framework for data integration
High Speed processing components
Data portal framework
What about education?
Learner
Platform
Analytics
VLE | Website | LibraryAssessment |
Enrollment
School/University
Prediction Drop out
BI
Planning
Recommendation
Sentiment AnalysisCollective
Intelligence Behaviour Analysis
Collaboration
Community Support
AFEL - Analytics for Everyday Learning (afel-project.eu)
Learner
Platform
Analytics
VLE | Website | LibraryAssessment |
Enrollment
School/University
Prediction Drop out
BI
Planning
Recommendation
Sentiment AnalysisCollective
Intelligence Behaviour Analysis
Collaboration
Community Support
AFEL - Analytics for Everyday Learning (afel-project.eu)
Same challenges…… same solutions
Pointers and next steps
Deploying to other cities in the UK and Europe, as well as for research projects and other types of organisations.
Dealing with data quality and accuracy - automatic checking based on a cross comparison with other datasets.
Reusable, parameterizable services for analytics - building a catalogue of pipelines and models.
Thank you!
@mdaquin@DataScienceGr
http://datahub.technology