welcome to ds504/cs586: big data analyticsyli15/courses/ds504fall16/...• final project...

42
DS504/CS586: Big Data Analytics Application I Prof. Yanhua Li Welcome to Time: 6:00pm –8:50pm R Loca2on: AK 232 Fall 2016

Upload: others

Post on 25-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

DS504/CS586: Big Data Analytics Application I Prof. Yanhua Li

Welcome to

Time:6:00pm–8:50pmRLoca2on:AK232

Fall2016

Page 2: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

•  18 critiques & Next Thur we have the last critique. –  Already graded 8 of them. –  Plan to grade 1-2 more.

Page 3: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

•  Grading –  Projects (40%)

•  Project 1 (10%) •  Project 2 (30%)

–  Finalreportsinthediscussionforum(by11:59pm12/13);–  Self-and-peerevalua2onformforproject2(by11:59PM12/13);

– WriPenwork(30%):•  Cri2ques+Projectreports(20%)•  Quiz(10%,with5%each)

– Oralwork(30%):•  Presenta2on

Page 4: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

•  Final Project Presentation – 22 minutes each group (including Q&A) –  Schedule:

•  12/15Thu

Page 5: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

5

Next Class: Summary and Discussion

v  Review of the semester v  Plus the last critique/review

Page 6: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Ar2ficialNeuralNetworks(ANN)

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

X1

X2

X3

Y

Black box

Output

Input

OutputYis1ifatleasttwoofthethreeinputsareequalto1.

Page 7: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Ar2ficialNeuralNetworks(ANN)

X1 X2 X3 Y1 0 0 01 0 1 11 1 0 11 1 1 10 0 1 00 1 0 00 1 1 10 0 0 0

S

X1

X2

X3

Y

Black box

0.3

0.3

0.3 t=0.4

Outputnode

Inputnodes

⎩⎨⎧

=

>−++=

otherwise0 trueis if1

)( where

)04.03.03.03.0( 321

zzI

XXXIY

Page 8: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Ar2ficialNeuralNetworks(ANN)

•  Modelisanassemblyofinter-connectednodesandweightedlinks

•  Outputnodesumsupeachofitsinputvalueaccordingtotheweightsofitslinks

•  Compareoutputnodeagainstsomethresholdt

S

X1

X2

X3

Y

Black box

w1

t

Outputnode

Inputnodes

w2

w3

)( tXwIYi

ii −= ∑PerceptronModel

)( tXwsignYi

ii −= ∑

or

Page 9: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

GeneralStructureofANN

Activationfunction

g(Si )Si Oi

I1

I2

I3

wi1

wi2

wi3

Oi

Neuron iInput Output

threshold, t

InputLayer

HiddenLayer

OutputLayer

x1 x2 x3 x4 x5

y

TrainingANNmeanslearningtheweightsoftheneurons

Page 10: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Real-worldproblemsarealwaysmessy

•  Mul2plemodels

•  Keyfeatures

•  DataSparsity

Page 11: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

•  Whatdowedotosolveaclassifica2on/inference/predic2onproblem?–  DataCleaning

–  Featureselec2on

–  Inferencemodel

–  Evalua2on

•  Anexampleofhowtosolverealworldapplica2onproblem

Page 12: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

U-Air:WhenUrbanAirQualityMeetsBigData

Authors:YuZheng,MicrosogResearchAsia

Page 13: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Background •  Airquality

–  NO2,SO2–  Aerosols:PM2.5,PM10

•  WhyitmaPers–  Healthcare–  Pollu2oncontrolanddispersal

•  Reality–  Buildingameasurementsta2on

isnoteasy–  Alimitednumberofsta2ons

(poorcoverage)

Beijingonlyhas22airqualitymonitorsta2onsinitsurbanareas(50kmx40km)

Air quality monitor station

Page 14: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

2PM,June17,2013

Page 15: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Challenges•  Airqualityvariesbyloca2ons

non-linearly•  Affectedbymanyfactors

–  Weathers,traffic,landuse…–  Subtletomodelwithaclearformula

0 40 80 120 160 200 240 280 320 360 400 440 4800.00

0.05

0.10

0.15

0.20

0.25

0.30

Por

titio

n

Deviation of PM2.5 between S12 and S13

>35%Prop

or2o

n

A) Beijing (8/24/2012 - 3/8/2013)

Page 16: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

WedonotreallyknowtheairqualityofalocaGonwithoutamonitoringstaGon!

Page 17: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Challenges •  Exis2ngmethodsdonotworkwell

–  Linearinterpola2on–  Classicaldispersionmodels

•  GaussianPlumemodelsandOpera2onalStreetCanyonmodels•  Manyparametersdifficulttoobtain:Vehicleemissionrates,street

geometry,theroughnesscoefficientoftheurbansurface…

–  Satelliteremotesensing•  Sufferfromclouds•  Doesnotreflectgroundairquality•  Varyinhumidity,temperature,loca2on,andseasons

–  Outsourcedcrowdsensingusingportabledevices•  Limitedtoafewgasses:CO2andCO•  Sensorsfordetec2ngaerosolarenotportable:PM10,PM2.5•  Alongperiodofsensingprocess,1-2hours

30,000+USD,10ug/m3 202×85×168(mm)

Page 18: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

InferringReal-TimeandFine-GrainedairqualitythroughoutacityusingBigData

Meteorology Traffic POIs RoadnetworksHumanMobility

Historicalairqualitydata Real-2meairqualityreports

Page 19: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:
Page 20: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Applica2ons •  Loca2on-basedairqualityawareness

–  Fine-grainedpollu2onalert–  Rou2ngbasedonairquality

•  Deployingnewmonitoringsta2ons•  Asteptowardsiden2fyingtherootcauseofairpollu2on

S2

S1

S5

S3

S7

S6S4

S1

S8

S9

S10

B) Shanghai

Page 21: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Difficul2es •  1.Howtoiden2fyfeaturesfromeachkindofdatasource

•  2.Incorporatemul2pleheterogeneousdatasourcesintoalearningmodel–  Spa2ally-relateddata:POIs,roadnetworks–  Temporally-relateddata:traffic,meteorology,humanmobility

•  3.Datasparseness(liPletrainingdata)–  Limitednumberofsta2ons–  Manyplacestoinfer

Page 22: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

MethodologyOverview •  Par22onacityintodisjointgrids•  Extractfeaturesforeachgridfromitsaffec2ngregion

–  Meteorologicalfeatures –  Trafficfeatures–  Humanmobilityfeatures–  POIfeatures–  Roadnetworkfeatures

•  Co-training-basedsemi-supervisedlearningmodelforeachpollutant

–  PredicttheAQIlabels–  Datasparsity–  Twoclassifiers

Page 23: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

MeteorologicalFeatures:Fm

•  Rainy,Sunny,Cloudy,Foggy•  Windspeed•  Temperature•  Humidity•  Barometerpressure

GoodModerate

UnhealthyUnhealthy-S

Very Unhealthy

AQIofPM10AugusttoDec.2012inBeijing

Page 24: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

TrafficFeatures:Ft

•  Distribu2onofspeedby2me:F(v)•  Expecta2onofspeed:E(V)•  Standarddevia2onofSpeed:D

Good Moderate

km

Unhealthy-S Very UnhealthyUnhealthy

0≤v<20

20≤v<40

v≥ 40

E(v)

D(v)

GPStrajectoriesgeneratedbyover30,000taxisFromAugusttoDec.2012inBeijing

Page 25: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

HumanMobilityFeatures:Fh •  Humanmobilityimplies

–  Trafficflow–  Landuseofaloca2on–  Func2onofaregion(likeresiden2alorbusinessareas)

•  Features:–  Numberofarrivals 𝑓↓𝑎 andleavings𝑓↓𝑙 

A) AQI of PM10 B) AQI of NO2

fl fl

fa faGoodModerate

UnhealthyUnhealthy-S

Very Unhealthy

GoodModerate

UnhealthyUnhealthy-S

Very Unhealthy

Numberofarrivalsfaandleavings(departures)fl

Parksvsfactories

Page 26: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Extrac2ngTraffic/HumanMobilityFeatures

•  Offlinespa2o-temporalindexing•  ta:arrival2me•  Traj:trajID•  Ii:theindexofthefirstGPSpoint(inthetrajectory)enteringagrid•  Io:theindexofthelastGPSpoint(inthetrajectory)enteringthegrid

Page 27: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

POIFeatures:Fp•  WhyPOI

–  Indicatethelanduseandthefunc2onoftheregion–  thetrafficpaPernsintheregion

•  Features–  NumbersofPOIsovercategories–  Por2onofvacantplaces–  ThechangesinthenumberofPOIs

•  Factories,shoppingmalls,•  hotelandrealestates•  Parks,decora2onandfurnituremarkets

Page 28: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

RoadNetworkFeatures:Fr •  Whyroadnetworks

–  Haveastrongcorrela2onwithtrafficflows–  Agoodcomplementaryoftrafficmodeling

•  Features:

–  Totallengthofhighways𝑓↓ℎ –  Totallengthofother(low-level)roadsegments𝑓↓𝑟 –  Thenumberofintersec2ons𝑓↓𝑠 inthegrid’saffec2ngregion

fh

fr

fs

Page 29: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Semi-SupervisedLearningModel

•  Philosophyofthemodel–  Statesofairquality

•  Temporaldependencyinaloca2on•  Geo-correla2onbetweenloca2ons

–  Genera2onofairpollutants•  Emissionfromaloca2on•  Propaga2onamongloca2ons

–  Twosetsoffeatures•  Spa2ally-related•  Temporally-related

s2s1

s3

s4l

s2s1

s3

s4l

s2s1

s3

s4

ti

t1

t2

lTime

Geospace

A location with AQI labels A location to be inferred Temporal dependencySpatial correlation

POIs: Spatial

Fh Temporal

Road Networks: Fr

Ft FmMeteorologic:Traffic:Human mobility:

FpSpa2alClassifier

TemporalClassifierCo-Training

Page 30: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Co-Training-BasedLearningModel

•  Spa2alclassifier–  Modelthespa2alcorrela2onbetweenAQIofdifferentloca2ons–  Usingspa2ally-relatedfeatures–  BasedonaBPneuralnetwork

•  Inputgenera2on–  Selectnsta2onstopairwith–  Performmrounds

∆P1x

∆R1x

c

D1Fp

Frl1 D2

c

d1x

D1

D2

D1

D1

1

1

Fp

Frlk

k

k

Fpx Frx lx

∆Pkx

∆Rkx

cdkx

1

k

x

ANNInput generation

w'11

w'qr

w1

wr

wpq

w11 b1

bq

b'r

b'1

b''

∆P1x

∆R1x

c

D1Fp

Frl1 D2

c

d1x

D1

D2

D1

D1

1

1

Fp

Frlk

k

k

Fpx Frx lx

∆Pkx

∆Rkx

cdkx

1

k

x

ANNInput generation

w'11

w'qr

w1

wr

wpq

w11 b1

bq

b'r

b'1

b''

Page 31: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Co-Training-BasedLearningModel

•  Temporalclassifier–  Modelthetemporaldependencyoftheairqualityinaloca2on–  Usingtemporallyrelatedfeatures–  BasedonaLinear-ChainCondi2onalRandomField(CRF)

Yt-1

Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)

Yt Yt-1

Page 32: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

LearningProcess

Yt-1

Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)

Yt Yt-1

∆P1x

∆R1x

c

D1Fp

Frl1 D2

c

d1x

D1

D2

D1

D1

1

1

Fp

Frlk

k

k

Fpx Frx lx

∆Pkx

∆Rkx

cdkx

1

k

x

ANNInput generation

w'11

w'qr

w1

wr

wpq

w11 b1

bq

b'r

b'1

b''

Training

Temporally-relatedfeatures

Spa2ally-relatedfeatures

Labeleddata Unlabeleddata

Inference

Page 33: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

InferenceProcess

∆P1x

∆R1x

c

D1Fp

Frl1 D2

c

d1x

D1

D2

D1

D1

1

1

Fp

Frlk

k

k

Fpx Frx lx

∆Pkx

∆Rkx

cdkx

1

k

x

ANNInput generation

w'11

w'qr

w1

wr

wpq

w11 b1

bq

b'r

b'1

b''

Temporally-relatedfeatures

Spa2ally-relatedfeatures

<𝑝↓𝑐1 , 𝑝↓𝑐2 ,…,𝑝↓𝑐𝑛 >

<𝑝′↓𝑐1 , 𝑝′↓𝑐2 ,…,𝑝′↓𝑐𝑛 >

× 𝑐= arg↓𝑐↓𝑖 ∈𝒞 𝑀𝑎𝑥( 𝑝↓𝑐𝑖 × 𝑝′↓𝑐𝑖 )

< pc1 , · · · , pcn >

< p0c1 , · · · , p0cn >

c = argci2CMax(P ciSC ⇥ P

ciTC)

Yt-1

Fm(t-1) t-1Ft(t-1) Fh(t-1) Fm(t) tFt(t) Fh(t) Fm(t+1) t+1Ft(t+1) Fh(t+1)

Yt Yt-1

Page 34: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on

Datasources Beijing Shanghai Shenzhen Wuhan POI 2012Q1 271,634 321,529 107,061 102,467

2012Q3 272,109 317,829 107,171 104,634

Road #.Segments 162,246 171,191 45,231 38,477 Highways 1,497km 1,963km 256km 1,193km Roads 18,525km 25,530kmKM 6,100km 9,691km #.Intersec. 49,981 70,293 32,112 25,359

AQI #.Sta2on 22 10 9 10 Hours 23,300 8,588 6,489 6,741 Timespans 8/24/2012-3/8/201

3 1/19/2013-3/8/20

13 2/4/2013-3/8/201

3 2/4/2013-3/8/2013

UrbanSize(grids) 5050km(2500) 5050km(2500) 5745km(2565) 4525km(1165)

•  Datasets

S1

S2

S4

S5

S8

S5

S2

S1

S7

S5

S3

S3

S6 S7S6

S9S10

S12

S11

S13 S14

S22S15

S16

S16

S17

S18

S19

S20

S21S3

S7

S6

S4

S1

S8

S9

S10

S1

S4

S2

S6

S9

S8

S1

S2

S4

S3S10

S5

S9

S6 S7

S8

A) Beijing B) Shanghai C) Shenzhen D) Wuhan

Page 35: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on

•  GroundTruth–  Removeasta2on–  Crossci2es

•  Baselines–  LinearandGaussianInterpola2ons–  ClassicalDispersionModel–  DecisionTree(DT):–  CRF-ALL–  ANN-ALL

Page 36: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on

PM10 NO2

Features Precision Recall Precision Recall

Fm 0.572 0.514 0.477 0.454 Ft 0.341 0.36 0.371 0.35 Fh 0.327 0.364 0.411 0.483 Fp+Fr 0.441 0.443 0.307 0.354

Fm+Ft 0.664 0.675 0.634 0.635 Fm+Ft+Fp+Fr 0.731 0.734 0.701 0.691

Fm+Ft+Fp+Fr+Fh 0.773 0.754 0.723 0.704

•  Doeseverykindoffeaturecount?

Page 37: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on •  Overallperformanceoftheco-training

0 20 40 60 80 100 120 140 160

0.65

0.70

0.75

0.80

Pre

cisi

on

Num. of Iterations

SC TC Co-Training

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

NO2PM10

Acc

urac

y

U-Air Linear Guassian Classical DT CRF-ALL ANN-ALL

Accuracy

Page 38: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on

GroundTruth

PredicGons

G M S U

G 3789 402 102 0 0.883

Recall

M 602 3614 204 0 0.818 S 41 200 532 50 0.646 U 0 22 70 219 0.704 0.855 0.853 0.586 0.814 0.828

Precision

•  ConfusionmatrixofCo-TrainingonPM10

Page 39: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on

CiGes PM2.5 PM10 NO2

Prec. Rec. Prec. Rec. Prec. Rec.

Beijing 0.764 0.763 0.762 0.745 0.730 0.749

Shanghai 0.705 0.725 0.702 0.718 0.715 0.706

Shenzhen 0.740 0.737 0.710 0.742 0.732 0.722

Wuhan 0.727 0.723 0.731 0.739 0.744 0.719

•  PerformanceofSpa2alclassifier

Page 40: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Evalua2on

Procedures Time(ms) Procedures Time(ms) Feature

extracGon (pergrid)

Ft&Fh 53.2 Inference (pergrid)

SC 21.5 Fp 28.8 TC 13.1 Fr 14.4 Total 131

•  Efficiencystudy•  Singlegrid131s•  InferringtheAQIsforen2reBeijingin5minutes

Page 41: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Conclusion•  Inferfine-grainedairqualitywith

–  Real-2meandhistoricalairqualityreadingsfromexis2ngsta2ons–  Otherdatasources:meteorology,POIs,roadnetwork,humanmobility,and

trafficcondi2on

•  Co-Training-basedsemi-supervisedlearningapproach–  Dealwithdatasparsitybylearningfromunlabeleddata–  Modelthespa2alcorrela2onamongtheairqualityofdifferentloca2ons–  Modelthetemporaldependencyoftheairqualityinaloca2on

•  Results–  0.82withtrafficdata(co-training)–  0.76ifonlyusingspa2alclassifier

Page 42: Welcome to DS504/CS586: Big Data Analyticsyli15/courses/DS504Fall16/...• Final Project Presentation – 22 minutes each group (including Q&A) – Schedule: • 12/15 Thu 5 Next Class:

Ques2ons?