mining sensor data - roma tre universitytorlone/bigdata/s1-streaming.pdf · not only big data •...
TRANSCRIPT
![Page 1: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/1.jpg)
Mining Sensor Data DonatellaFirmani
![Page 2: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/2.jpg)
Preliminary comments
• InternetofThings=connectedeverythingworld• AccordingCisco,therewill21billionconnecteddevicesby2018.
• AnalyBcofsensorgenerateddatait’smostlyaboutrealBmeanalyBcofBmeseriesdata
2
![Page 3: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/3.jpg)
Overview of this lecture
• Real-worldexample• Theory(Datastreaming)• PracBce(Sparkexercise)
3
![Page 4: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/4.jpg)
Real-world example
4
![Page 5: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/5.jpg)
Data collec?on
• HowMuchClimateDataatNASA?• MERRA*ReanalysisCollecBon~200TB• TotaldataholdingsoftheNASACenterforClimateSimulaBon(NCCS)is~40PB
• IntergovernmentalPanelonClimateChange• FiXhAssessmentReport~5PB(dataonlinenow)• IntergovernmentalPanelonClimateChangeSixthAssessmentReport~100PB(tobecreatedwithinthenext5to6years)
*ModernEra-RetrospecBveAnalysisforResearchandApplicaBons
5
![Page 6: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/6.jpg)
MERRA technologies
• HadoopFileSystem• NaBveMERRAfilesaresequencedandingestedintotheHadoopclusterintriplicated640MBblocks.
• TotalsizeofMERRAHDFSrepository~480TB.
• MapReduce• 36nodeDellcluster,576Intel2.6GHzSandyBridgecores,1300TBrawstorage,1250GBRAM,11.7TFtheoreBcalpeakcomputecapacity.
• FDRInfinibandnetworkwithpeakTCP/IPspeeds>20Gbps.
6
![Page 7: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/7.jpg)
Impact
• WeiExperiment(ContribuBonofIrrigaBontoPrecipitaBon)
• TradiBonal:• ~8.4TBtransferredfromarchivetolocalworkstaBon(weeks)• Clipping,averagingperformedbyFortranprogramonlocalworkstaBon(days)
• MERRA:• Clipping,averagingperformedbyMERRA(lessthanoneday)• ~35GBoffinalproductmovedtolocalworkstaBon• SignificantBmesavingsindatawrangling,• Rapidscreeningovermonthlymeansfilestakesminutes
7
![Page 8: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/8.jpg)
Other Applica?ons
• Military• AcBvitymonitoring• EventdetecBon
• Cosmological• SpacestaBondata• Spacetelescopes
• Mobile• wearablesensors• socialsensing
8
![Page 9: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/9.jpg)
Theory
9
![Page 10: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/10.jpg)
Problem defini?on
DataMining(discoveryofmeaningfulpaeernsincollecBonsofdata)
DataStreaming
BigData
10
![Page 11: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/11.jpg)
Not only big data
• IftheinsightbeingsoughtthroughanalyBcneedsaglobalcontext,thenallthedatawillbesenttobackendBigDataplaform(e.g.NOSQLdatabase)
• Otherwise:• AllthedatamaynotendupinaBigDataplaform(Theremaybehubnodesinasensornetworkwhichmaycollectandaggregatedatafromasetofsensors)
• ThedataarrivingattheBigDataplaformmaynotalwaysbetherawsensordata.Itmaybedataaggregatedandpreprocessedatthenetworkedge.
11
![Page 12: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/12.jpg)
Data streaming
• CharacterisBcsofDatastreamsare:◦ ConBnuousflowofdata◦ Infinitelength
Networktraffic
Sensordata
Callcenterrecords
◦ Examples:
12
![Page 13: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/13.jpg)
Challenges
• Datastreamingchallenges• Volume
• Specificsensorchallenges• Onepassofthedata• Temporalcomponent(NostraighforwardadaptaBonofone-passalgorithms)
• DataisoXenuncertain• OXenminedinadistributedfashion(Intermediatesensornodeslimitedprocessingpower)
13
![Page 14: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/14.jpg)
Typical problems
• Frequentitems• Frequentitemsets:
• recurringgroupsofelements• usedforforecasBng
• Clustering:groupsimilaritems• ClassificaBon:learnmodelbasedonexamples
14
![Page 15: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/15.jpg)
Typical computa?onal models
• overtheenBredatastream• considersthedatafromthebeginningunBlnow• calledlandmarkdatamodel
• overawindow• considersthedatafromnowuptoacertainrangeinthepast
• slidingwindowmodel• hybrid
• associatesweightswiththedatainthestream,andgiveshigherweightstorecentdatathanthoseinthepast.
• dampedwindowmodel
15
![Page 16: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/16.jpg)
Discussed in this lecture
• Frequentitems• Clustering
16
![Page 17: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/17.jpg)
Classic Frequent PaJern Mining
Customer buys diaper
Customer buys both
Customer buys beer
Tid Items bought
10 Beer, Nuts, Diaper
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Diaper, Eggs, Milk
17
![Page 18: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/18.jpg)
Over the en?re stream
Stream
IdenBfyallelementswhosecurrentfrequencyexceedssupportthresholds=0.1%.
18
![Page 19: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/19.jpg)
Related problem
Stream
IdenBfyallsubsetsofitemswhosecurrentfrequencyexceedss=0.1%
19
![Page 20: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/20.jpg)
Over a window
bucket 1 bucket2 bucket 3
Dividethestreamintopossiblyoverlappingbuckets(“slidingwindow”)ItispossibletoholdthetransacBonsineachbucketinmainmemory(i.e.,keepexactcountersforitemsinthebuckets)
20
![Page 21: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/21.jpg)
Hybrid
• Dampedwindow:• decayfactor:theweightofeachtransacBonismulBpliedbyafactoroff<1,whenanewtransacBonarrives.
• TheoveralleffectisanexponenBaldecayfuncBon• effecBveforevolvingdatastream,sincerecenttransacBonsarecountedmoresignificantly
• Outofthescopeofthislecture• Wefocuson“overtheenBrestream”
21
![Page 22: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/22.jpg)
Synopsis 1/2
• FittheenBrestreamwithintheavailablespace?• impossible• computesta$s$calproper$esofthefrequencyvector,insteadofthevectoritself
• acceptabletogenerateapproximatesoluBons
• Frequencyvector:histogramofourstream• p-thfrequencymomentofthestream:
22
![Page 23: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/23.jpg)
Notable moments
• zerofrequencymoment:numberofdisBnctelementsinourstream(numberofnon-zeroentriesofthefrequencyvector)
• firstfrequencymoment:numberofelementsinthestream.
• secondfrequencymoment:classicstaBsBcforstreamingapplicaBons
23
![Page 24: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/24.jpg)
Synopsis 2/2
• ApproximatesoluBonsbysummarizingthedata:• Sampling• HashSketches
• DisBnctitems• FM(Flajolet-MarBn)Sketches
• Linear-ProjecBonSketches• 2ndfrequencymoment• AMS(Alon,MaBasandSzegedy)Sketches
24
![Page 25: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/25.jpg)
Warm-Up
• Streamcontainsd−1disBnctintegersx∈[1,d]inanarbitraryorder
• Computethemissingintegerk?• iniBalizecountera=1⊕2⊕…⊕d• update(x):a=a⊕x• query():returna
• kistheonlyintegerthatappearsonceintheXORsequencesoa=k
• memory:logd+1=O(logd)bits
25
![Page 26: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/26.jpg)
FM Sketch 1/2
• AssumeahashfuncBonh(x)thatmapsincomingvaluesxin[0,…,N-1]uniformlyacross[0,…,2^L-1],whereL=O(logN)
• Letlsb(y)denotetheposiBonoftheleast-significant1bitinthebinaryrepresentaBonofy
• Avaluexismappedtolsb(h(x))• MaintainHashSketch=BITMAParrayofLbits,iniBalizedto0
• Foreachincomingvaluex,setBITMAP[lsb(h(x))]=1
26
x=5 h(x)=101100 lsb(h(x))=2 0 0 0 001
BITMAP543210
![Page 27: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/27.jpg)
FM Sketch 2/2
• Byuniformitythroughh(x):Prob[BITMAP[k]=1]=Prob[]=• AssumingddisBnctvalues:expectd/2tomaptoBITMAP[0],d/4tomaptoBITMAP[1],...
• LetR=posiBonofrightmostzeroinBITMAP• Useasindicatoroflog(d)
• [FM85]provethatE[R]=,where• EsBmated=• Averageseveralinstances(differenthashfuncBons)toreduceesBmatorvariance
27
)log( dφ 7735.=φφR2
k10 121+k
0
fringeof0/1saroundlog(d)
0 0 0 00 10 00 111 1 11111
posiBon<<log(d)posiBon>>log(d)
L-1
![Page 28: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/28.jpg)
Hash sketches proper?es
• Composable:Component-wiseOR/adddistributedsketchestogether
• EsBmate|S1US2U…USk|=set-unioncardinality• Distributedse}ng:
• performslocalcomputaBonateachnode• mergesthesesketchesintoasingleglobalsketch
• Delete-proof:JustusecountersinsteadofbitsinthesketchlocaBons
• +1forinserts,-1fordeletes
28
![Page 29: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/29.jpg)
AMS Sketch
• Goal:Buildsmall-spacesummaryfordistribuBonvectorf(i)(i=1,...,N)seenasastreamofi-values
• BasicConstruct:RandomizedLinearProjecBonoff()=projectontodotproductoff-vectorand
• Simpletocomputeoverthestream:adduponthei-thvalue
• TunableprobabilisBcguaranteesonapproximaBonerror
29
3,1,2,4,2,3,5,...f(1)f(2)f(3)f(4)f(5)
11 12 2
∑>=< iiff ξξ )(, where=vectorofrandomvaluesξ
ξ
iξ
![Page 30: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/30.jpg)
Linear sketches proper?es
• Composable:Simplyaddindependently-builtprojecBons
• Delete-Proof:Justsubtracttodeleteani-thvalueoccurrence
30
iξ
![Page 31: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/31.jpg)
Classic clustering
• Thegoalofclusteringisto• groupdatapointsthatareclose(orsimilar)toeachother• idenBfygroupings(orclusters)inanunsupervisedmanner
• Unsupervised:noinformaBonisprovidedtothealgorithmonwhichdatapointsbelongtowhichclusters
• Example
31
x x
x x
x x
x x
x
![Page 32: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/32.jpg)
Data Stream Clustering
• Clustersoveruser-specifiedBme-horizons• “microclustering”
• OnlineComponent:• periodicallystoresdetailedsummarystaBsBcs
• OfflineComponent:• usesonlythesummarystaBsBcstodoclustering
32
View of Micro-Cluster View of Macro-Cluster
![Page 33: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/33.jpg)
Micro-clusters
• AMicro-ClusterisasetofindividualdatapointsthatareclosetoeachotherandwillbetreatedasasingleunitinfurtherofflineMacro-clustering.
• Themicro-clustersarestoredatsnapshots.
33
… …Snapshot
![Page 34: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/34.jpg)
What to Store in a Micro-Cluster
• SelectrelevantproperBesusefulformaintainingclustersdynamically
• OnlyaddiBve/subtracitveproperBes• wedon’thavetocomputethemfromscratchateachsnapshot
• Examples• first-ordermoment• second-ordermoment
34
![Page 35: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/35.jpg)
Macro-Cluster Crea?on
• CurrentTimeT,thewindowsizeish.Thatmeanstheuserwanttofindtheclustersformedin(T-h,T).
• Approach:• 1ststep:FindthesnapshotforT,getthemicro-clustersetS(T).
• 2ndstep:FindthesnapshotforT-h,getthemicro-clustersetS(T-h).
• UseS(T)-S(T-h)• Specifically,wehaveamergedclusterwithIdlist(C1,C2,C3)inS(T)andaclusterwithIdC1inS(T-h).
• SinceC1areformedbeforeT-h,shouldnotcontributetothemicro-clusterformedin(T-h,T)
• RunK-meansonremainingMicro-Clusters35
![Page 36: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/36.jpg)
Example
36
C_ID:[C1]
Time:T-h
C_ID:[C1,C2,C3]
Time:T
C_ID:[C2,C3]
Result:T-h
![Page 37: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/37.jpg)
Distributed seSng
• Expensive:• transmitallofthedatatoacentralizedserver• naturalapproach
• Efficient:• performslocalclusteringateachnode• mergesthesedifferentclustersintoasingleglobalclustering
• lowcommunicaBoncost
37
![Page 38: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/38.jpg)
Prac?ce
38
![Page 39: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/39.jpg)
Gap between theory and prac?ce
• Somehowformalmethodsandpopulartechnologiesaredisjointnowadays
• Methodsàre-thinkingclassicalproblems• TechnologiesàmakingcomputaBonfeasible
• ForlackofBme,ourdiscussiononpracBcefocusesonmorebasicproblems(levelshiX)thandiscussedintheory(Note:Designpaeernarevalidingeneral)
39
![Page 40: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/40.jpg)
Exercise: Level shiV
• DetecBonofoutliersinsensorstreamusingspark.• Example:Considersomeproductbeingshippedintemperaturecontrolledcontainers.ThecustomerhasaServiceLevelAgreement(SLA)withthetransportaBoncompany,whichdefineshowthetemperatureismaintainedwithinapredefinedrange.
• MeantemperaturewithinaBmewindowhastobebelowpredefinedupperlimitorabovesomepredefinedlowerthreshold.
• SomeminimumpercentageofthedatawithinaBmewindowhastobebelowsomeupperthresholdorabovesomelowerthreshold
40
![Page 41: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/41.jpg)
Caveat
• InMachineLearningparlancetheproblemwearesolvingissupervisedoutlierdetec$on.It’ssupervisedbecausewearespecifyingtheoutliercondiBonsexplicitlythroughtheSLA
41
![Page 42: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/42.jpg)
Sample architecture
NodeManagerSecondaryNodeM
DataNodeNodeManager
ResourceManagerSparkDriver
DataNodeNodeManager
192.160.27.100 192.160.27.101
192.160.27.102 192.160.27.103
HDFSYARNSPARK
42
![Page 43: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/43.jpg)
Sample soVware
• heps://github.com/pranab/ruscello• Java:StreamingalgsforlevelshiXdetecBon(canbeusedbyanystreamcomputaBonframework,e.g.Storm,SparkStreaming)
• Scala:Sparkshell• Python:Everythingelse
43
![Page 44: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/44.jpg)
Spark Streaming
• SparkunifiesbatchandrealBmeprocessing• SparkstreamingnotablecharacterisBcs:
• Messagesareprocessedinmicrobatches,wherethestreamisessenBallyasequenceofRDDs
• RDDsfromthestreamareprocessedlikenormalsparkofflineRDDprocessing.
44
![Page 45: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/45.jpg)
Sensor data genera?on
• Temperatureatdesiredlevel+somerandomnoise• RandomtemperatureshiXsupper/lower• Sensordata:
• SensorID• Timestamp• Temperature
• Datacanbepipedto• Socketserver(sparkstreaminghasasocketstreamreceiver)
• KaVaqueue• HDFS
45
![Page 46: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/46.jpg)
Input window
• SincewearedealingwithBmeseriesdata,weuseBmeboundwindowài.e.,every30sec
• Ifdatasamplesarriveatregularintervalsandthevariabilityinsamplingperiodisnegligible,wecanusesizeboundwindowài.e.,every10samples
• Aseachdatasamplearrives• Adddatatothewindowobject• VerifySLAcondiBonexpression• Iftrue,thenviolaBonisappendedintheobjectstate
46
![Page 47: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/47.jpg)
Output stream
• SparkreturnsastreamofRDDs,whereeachRDDiscomprisedof(sensorID,stateobject)
• QuerystateobjectfornumberofviolaBons• Sampleoutput:
device:U4W8U4L3 num violations:102
device:HCEJRWFP num violations:194
device:U4W8U4L3 num violations:102
device:HCEJRWFP num violations:194
device:U4W8U4L3 num violations:247
device:HCEJRWFP num violations:411
(WecouldalsoproduceamoredetailedoutputcontainingtheBmestampandmeantemperaturereadingforeachviolaBonofeachsensor.)
47
![Page 48: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/48.jpg)
References
48
![Page 49: Mining Sensor Data - Roma Tre Universitytorlone/bigdata/S1-streaming.pdf · Not only big data • If the insight being sought through analyBc needs a global context, then all the](https://reader034.vdocument.in/reader034/viewer/2022052013/6029b8bb1d9b88036274be9c/html5/thumbnails/49.jpg)
Useful References
• “Schnase,JohnL.,etal."MERRAanalyBcservices:meeBngthebigdatachallengesofclimatesciencethroughcloud-enabledclimateanalyBcs-as-a-service."Computers,EnvironmentandUrbanSystems(2014)”
• Aggarwal,CharuC.,ed.Managingandminingsensordata.SpringerScience&BusinessMedia,2013.
• hep://pkghosh.wordpress.com/2015/02/19/real-Bme-detecBon-of-outliers-in-sensor-data-using-spark-streaming
49