public policy in the ‘big data’ age: martin ralphs presentation

Post on 13-Apr-2017

195 Views

Category:

News & Politics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Sciencein

Government

Government Data Science Partnership

Raise awareness of data science potential

Embed new approaches and new skills and improve existing capability

Engage with departments to understand opportunities and issues

Build and support a cross-government data science community to share expertise

Break down technical barriers and understand ethical issues

Learning by doing

Government Innovation GroupGovernment Digital Service

What is data science?

3

Data science

Volume

Velocity

Variety

New approach

New technology

New ApproachA ‘data first’ mindset; exploring

the data to find insights & potential improvements using new & innovative techniques

New technologyNew, low priced storage in the

cloud, with unrestricted technology capable of running

software which can gain speedy insights

How can data science improve government policy and operations?

4

Data visualisation

New data sets and collection methods

Machine learning

Social media Webscraping

Prediction

Clustering

Unstructured data

Real time data

Interactive web appsReal-time feeds

Personalisation

Data sources for official statisticsSurveys – e.g. of businesses and households

Census – every 10 years

Administrative data – by-product of government processes

Big Data?“Data that is difficult to collect, store or process within the conventional systems of statistical organizations. Either, their volume, velocity, structure or variety requires the adoption of new statistical software processing techniques and/or IT infrastructure to enable cost-effective insights to be made.”

(UNECE, 2013)

Big data sources

Social media: posts, pictures and videos

Purchase transaction records

Mobile phone GPS and cell tower signalsHigh volume administrative

& transactional records

Sensors gathering information: e.g. climate, traffic, internet of things etc.

Digital satellite images

7

New data sets and collection methods

Social media Webscraping

Real time dataWeb scraping supermarket prices

● Prices collection currently manual● Web scraping offers more detailed, more frequent

data at lower cost● Web scraped data provides an opportunity to gain

experience in processing high volume price data

8

Prototype web scrapers

● 3 supermarkets● 35 CPI/RPI item categories● Written using Python (scrapy)● Daily collection (around 6500 price quotes)● Item counts monitored daily

9

Classification challenge

“This is a dessert apple”

“This is fruit juice (not orange)”

“This is fruit juice (not orange)” and not a dessert apple!

Tesco Mango Juice Drink 1ltr

Tesco Pure Apple Juice 2 Litre

Training Set

Supervised machine learning

10

Price quote distributions

Whisky

Onions

Price Indices Publication 1st September 2015

http://bit.ly/1PRKMGx

“The real finding of the initial research was not that inflation is too high, but the method of collecting prices matters rather a lot”

Paul Johnson, IFS

Smart metersRationale: Smart-type electricity meter data to model occupancy or household composition with energy use profiles

Support more efficient field operations (in 2011, £6.6m spent trying to enumerate vacant properties)

Data from smart meter trials in Great Britain and Republic of Ireland

A range of potential methods identified

Significant issues around privacy and ethics

Electricity: smart meters

14

Half hourly electricity consumption over 7 days at one meter, through 28 consecutive 7 day periods.

TwitterRationale: Using geo-located Tweets to explore mobility and migration7 months of geo-located tweets within Great Britain (about 100 million data points)

Can infer place of usual residence

Significant issues around privacy and ethics

Geolocated Tweet penetration rate by local authority

Demographics and Twitter data

Geo-located Tweet volumes by Device Type Great Britain, 15 August to 31 October 2014

18

Predicting Norovirus (ahead of lab reports) using social media

19

Machine learning Prediction

ClusteringUnstructured data

Segmentation

Thank you!martin.ralphs@ons.gsi.gov.ukTwitter: @GoodPracticeMR

ONS Big Data Project web page http://bit.ly/1OZAOzsGDS Data Science blog http://bit.ly/1QCT5Xs

Government Data Program

me

Policies andGovernance

Modern Data InfrastructureData Science

Open Data

Data Leaders Network

Data Steering Group

Inter-Ministerial Group for Digital Transformation

NationalInformationInfrastructure

Common Technology

Services

Platforms and Standards

Registers

Digital Services

Departmental Transformation

Governmentas a

Platform

top related