when urban air quality meets big data

16
When Urban Air Quality Meets Big Data Yu Zheng Lead Researcher, Microsoft Research

Upload: vondra

Post on 23-Feb-2016

90 views

Category:

Documents


0 download

DESCRIPTION

When Urban Air Quality Meets Big Data. Yu Zheng Lead Researcher, Microsoft Research. Background. Air quality monitor station. Air quality NO2, SO2 Aerosols: PM2.5, PM10 Why it matters Healthcare Pollution control and dispersal Reality Building a measurement station is not easy - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: When Urban Air Quality Meets Big Data

When Urban Air Quality Meets Big Data

Yu ZhengLead Researcher, Microsoft Research

Page 2: When Urban Air Quality Meets Big Data

Background• Air quality

– NO2, SO2– Aerosols: PM2.5, PM10

• Why it matters – Healthcare– Pollution control and dispersal

• Reality– Building a measurement station

is not easy– A limited number of stations

(poor coverage)

Beijing only has 15 air quality monitor stations in its urban areas (50kmx40km)

Air quality monitor station

Page 3: When Urban Air Quality Meets Big Data

2PM, June 17, 2013

Page 4: When Urban Air Quality Meets Big Data

Challenges• Air quality varies by locations

non-linearly• Affected by many factors

– Weathers, traffic, land use…– Subtle to model with a clear formula

0 40 80 120 160 200 240 280 320 360 400 440 4800.00

0.05

0.10

0.15

0.20

0.25

0.30

Por

titio

n

Deviation of PM2.5 between S12 and S13

>35%Prop

ortio

n

A) Beijing (8/24/2012 - 3/8/2013)

Page 5: When Urban Air Quality Meets Big Data

We do not really know the air quality of a location without a monitoring station!

30,000 + USD, 10ug/m3 202×85×168( mm)

Page 6: When Urban Air Quality Meets Big Data

Inferring Real-Time and Fine-Grained air quality throughout a city using Big Data

Meteorology Traffic POIs Road networksHuman Mobility

Historical air quality data Real-time air quality reports

Page 7: When Urban Air Quality Meets Big Data

http://www.uairquality.com/

Page 8: When Urban Air Quality Meets Big Data

Applications• Location-based air quality awareness

– Fine-grained pollution alert– Routing based on air quality

• Identify candidate locations for setup new monitoring stations• A step towards identifying the root cause of air pollution

S2

S1

S5

S3

S7

S6S4

S1

S8

S9

S10

B) Shanghai

Page 9: When Urban Air Quality Meets Big Data

Cloud + Client

http://urbanair.msra.cn/

CloudMS Azure

Clients

Page 10: When Urban Air Quality Meets Big Data

Difficulties• Incorporate multiple heterogeneous data sources into a

learning model– Spatially-related data: POIs, road networks– Temporally-related data: traffic, meteorology, human mobility

• Data sparseness (little training data)– Limited number of stations– Many places to infer

• Efficiency request – Massive data– Answer instant queries

Page 11: When Urban Air Quality Meets Big Data

Methodology Overview• Partition a city into disjoint grids• Extract features for each grid from its impacting region

– Meteorological features– Traffic features– Human mobility features– POI features– Road network features

• Co-training-based semi-supervised learning model for each pollutant

– Predict the AQI labels– Data sparsity– Two classifiers

Page 12: When Urban Air Quality Meets Big Data

Semi-Supervised Learning Model

• Philosophy of the model– States of air quality

• Temporal dependency in a location• Geo-correlation between locations

– Generation of air pollutants• Emission from a location• Propagation among locations

– Two sets of features• Spatially-related• Temporally-related

s2

s1s3

s4l

s2

s1s3

s4l

s2

s1

s3

s4

ti

t1

t2

lTim

e

Geo

spac

e

A location with AQI labels A location to be inferred Temporal dependencySpatial correlation

POIs: Spatial

Fh Temporal

Road Networks: Fr

Ft FmMeteorologic:Traffic:Human mobility:

FpSpatial Classifier

Temporal ClassifierCo-T

rain

ing

Page 13: When Urban Air Quality Meets Big Data

Evaluation

Data sources Beijing Shanghai Shenzhen Wuhan

POI 2012 Q1 271,634 321,529 107,061 102,4672012 Q3 272,109 317,829 107,171 104,634

Road

#.Segments 162,246 171,191 45,231 38,477Highways 1,497km 1,963km 256km 1,193km

Roads 18,525km 25,530km KM 6,100km 9,691km

#. Intersec. 49,981 70,293 32,112 25,359

AQI

#. Station 22 10 9 10Hours 23,300 8,588 6,489 6,741

Time spans 8/24/2012-3/8/2013

1/19/2013-3/8/2013

2/4/2013-3/8/2013 2/4/2013-3/8/2013

Urban Size (grids) 5050km (2500) 5050km (2500) 5745km(2565) 4525km (1165)

• Datasets

S1

S2

S4

S5

S8

S5

S2

S1

S7

S5

S3

S3

S6 S7S6

S9S10

S12

S11

S13 S14

S22S15

S16

S16

S17

S18

S19

S20

S21S3

S7

S6

S4

S1

S8

S9

S10

S1

S4

S2

S6

S9

S8

S1

S2

S4

S3S10

S5

S9

S6 S7

S8

A) Beijing B) Shanghai C) Shenzhen D) Wuhan

Page 14: When Urban Air Quality Meets Big Data

Evaluation• Overall performance of the co-training

0 20 40 60 80 100 120 140 160

0.65

0.70

0.75

0.80

Pre

cisi

on

Num. of Iterations

SC TC Co-Training

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

NO2PM10

Acc

urac

y

U-Air Linear Guassian Classical DT CRF-ALL ANN-ALL

Accu

racy

Page 15: When Urban Air Quality Meets Big Data

Status• Publication at KDD 2013: U-Air: when urban air quality inference meets big data

• Website is publicly available via Azure• A mobile client ”Urban Air” n WP App store• Component of Urban Air is in CityNext platform• On Bing Map China Now• Working on prediction

http://urbanair.msra.cn/