activity recognition - justin liangjustin-liang.com/talks/activity_recognition.pdf · agenda...
TRANSCRIPT
![Page 1: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/1.jpg)
ActivityRecognitionJUSTINLIANGMARCH27, 2016
1
![Page 2: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/2.jpg)
Agenda•End-to-endLearningofActionDetectionfromFrameGlimpsesinVideos.S.Yeung,O.Russakovsky,G.Mori,L.Fei-Fei.CVPR2016.
•DetectingEventsandKeyActorsinMulti-PersonVideos.V.Ramanathan,J.Huang,S.Abu-El-Haija,A.Gorban,K.MurphyandL.Fei-Fei.CVPR2016.
2
![Page 3: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/3.jpg)
WhatisActivityRecognition•Ideaistobeabletodetectwhateventoccursinavideo• Ex.diving, successfullayup,failedlayup,successfulslamdunk,blocking, setting,standing
•Differentsubdomainstodoactivityrecognition:• Individualactivityrecognition• Groupactivityrecognition• Temporalactivityrecognition
[Ibrahimetal.CVPR2016]
3
![Page 4: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/4.jpg)
End-to-endLearningofActionDetectionfromFrameGlimpsesinVideos
4
![Page 5: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/5.jpg)
End-to-endLearningofActionDetectionfromFrameGlimpsesinVideos•PaperfromSerenaYeung,OlgaRussakovsky,GregMori,LiFei-Fei inCVPR2016.
•Objective:• Predictactionsandtheirtemporalbounds:howlongandwheretheyoccurinavideoclip.Videoclipsusedareuntrimmed.
•KeyContributions:• End-to-endapproachtoactiondetectionandtemporallocalizationinvideos• Trainanagentpolicytoskipvideoframestofindwheretheactionsareinthevideo• Showthatthismethodcanoutperformstateoftheartresults
5
![Page 6: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/6.jpg)
Approach•Actiondetectionisaprocessofobservationandrefinement.Effectivelychoosingasequenceofframeobservationsallowsustoquicklynarrowdownwhenthebaseballswingoccurs.
6
![Page 7: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/7.jpg)
Approach(Pipeline)•𝑜": observationfeaturevector
•ℎ": internalhiddenstate
•𝑑": candidatedetection• 𝑠": actionstarts• 𝑒": actionends• 𝑐": actionconfidence level
•𝑝": indicatortoemitaction
•𝑙"*+: locationofnextobservation,𝑙" ∈ [0,1]
7
![Page 8: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/8.jpg)
ObservationNetwork•Boththelocation𝑙" andvideoframe𝑣34 aremappedtoahiddenspaceandthencombinedwithafullyconnectedlayertoproducetheobservationvector𝑜"•𝑣34 ismappedusingtheVGG16networkandfc7featuresareextractedfromit
8
![Page 9: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/9.jpg)
RecurrentNetwork•Observationfeatures𝑜" andpreviousinternalhiddenstateℎ"5+ areinputstotherecurrentnetwork𝑓7 whichisparameterizedby𝜃7 toproduceℎ"
9
![Page 10: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/10.jpg)
RecurrentNetwork•Observationfeatures𝑜" andpreviousinternalhiddenstateℎ"5+ areinputstotherecurrentnetwork𝑓7 whichisparameterizedby𝜃7 toproduceℎ"•Candidatedetection𝑑":• 𝑑" = 𝑓: ℎ"; 𝜃: ,𝑓: isafullyconnectedlayer
10
![Page 11: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/11.jpg)
RecurrentNetwork•Observationfeatures𝑜" andpreviousinternalhiddenstateℎ"5+ areinputstotherecurrentnetwork𝑓7 whichisparameterizedby𝜃7 toproduceℎ"•Candidatedetection𝑑":• 𝑑" = 𝑓: ℎ"; 𝜃: ,𝑓: isafullyconnectedlayer
•PredictionIndicator𝑝":• 𝑝" = 𝑓< ℎ";𝜃< ,𝑓< isafullyconnectedlayer• During training,𝑓< isusedtoparameterizeaBernoullidistribution fromwhich𝑝" issampled.AttesttimeMAPestimateisused.
11
![Page 12: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/12.jpg)
RecurrentNetwork•Observationfeatures𝑜" andpreviousinternalhiddenstateℎ"5+ areinputstotherecurrentnetwork𝑓7 whichisparameterizedby𝜃7 toproduceℎ"•Candidatedetection𝑑":• 𝑑" = 𝑓: ℎ"; 𝜃: ,𝑓: isafullyconnectedlayer
•PredictionIndicator𝑝":• 𝑝" = 𝑓< ℎ";𝜃< ,𝑓< isafullyconnectedlayer• During training,𝑓< isusedtoparameterizeaBernoullidistribution fromwhich𝑝" issampled.AttesttimeMAPestimateisused.
•Locationofnextobservation𝑙"*+:• 𝑙"*+ = 𝑓3 ℎ"; 𝜃3 ,𝑓3 isafullyconnectedlayer• During training, 𝑙"*+ issampledfromaGaussiandistributionwithmean𝑓3 ℎ"; 𝜃3 andfixedvariance.AttesttimeMAPestimateisused.
12
![Page 13: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/13.jpg)
Training•Goalistotrainthreeoutputs:candidatedetection𝑑",predictionindicator𝑝",locationofnextobservation𝑙"*+• Thisisdifficultduetothechallengesofdesigning suitablelossandrewardfunctionsandhandling non-differentiablemodelcomponents
•Weusebackpropagationtotrain𝑑" andREINFORCEtotrain𝑝" and𝑙"*+
13
![Page 14: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/14.jpg)
Training(CandidateDetection𝑑")•MatcheachcandidatedetectionD = {𝑑"|𝑛 = 1,… , 𝑁} fromrecurrentnetworktogroundtruth𝑔+,…,Q•Matchingfunction:
• 𝑦"S = T1𝑖𝑓𝑚 = 𝑎𝑟𝑔𝑚𝑖𝑛YZ+,…,Q𝑑𝑖𝑠𝑡(𝑙", 𝑔Y)0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• 𝑔Y = (𝑠Y , 𝑒Y)• 𝑑𝑖𝑠𝑡 𝑙", 𝑔Y = min( 𝑠Y − 𝑙" , 𝑒Y − 𝑙" )
14
![Page 15: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/15.jpg)
Training(CandidateDetection𝑑")•MatcheachcandidatedetectionD = {𝑑"|𝑛 = 1,… , 𝑁} fromrecurrentnetworktogroundtruth𝑔+,…,Q•Matchingfunction:
• 𝑦"S = T1𝑖𝑓𝑚 = 𝑎𝑟𝑔𝑚𝑖𝑛YZ+,…,Q𝑑𝑖𝑠𝑡(𝑙", 𝑔Y)0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• 𝑔Y = (𝑠Y , 𝑒Y)• 𝑑𝑖𝑠𝑡 𝑙", 𝑔Y = min( 𝑠Y − 𝑙" , 𝑒Y − 𝑙" )
•Lossfunction:• ∑ 𝐿c3d 𝑑" + 𝛾 ∑ ∑ 𝕀[𝑦"S= 1]𝐿3hc(S"" 𝑑", 𝑔S)• 𝐿c3d 𝑑" :crossentropy lossondetectionconfidence𝑐"• 𝐿3hc(𝑑", 𝑔S):L2losstofurtherminimizedistance 𝑠", 𝑒" − 𝑠S, 𝑒S
•Optimizelossusingbackpropagation
15
![Page 16: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/16.jpg)
Training(Location𝑙"*+ andPredictionIndicator𝑝")•UseREINFORCEtolearnobservationandemissionpolicies
•REINFORCE:• Objective:𝐽 𝜃 = ∑ 𝑝j 𝑎 𝑟(𝑎)k∈𝒜• 𝒜:spaceofactionsequences• 𝑝j 𝑎 :probability ofaction• 𝑟(𝑎):reward
16
![Page 17: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/17.jpg)
Training(Location𝑙"*+ andPredictionIndicator𝑝")•UseREINFORCEtolearnobservationandemissionpolicies
•REINFORCE:• Objective:𝐽 𝜃 = ∑ 𝑝j 𝑎 𝑟(𝑎)k∈𝒜• 𝒜:spaceofactionsequences• 𝑝j 𝑎 :probability ofaction• 𝑟(𝑎):reward
• Gradient:𝛻𝐽 𝜃 = ∑ 𝑝j 𝑎 𝛻log𝑝j 𝑎 𝑟(𝑎)k∈𝒜• Thisisanontrivialoptimizationproblemduetothehigh
dimensional spaceofpossible actionsequences!• InsteadwecanuseMonteCarlototaketheexpectation
17
![Page 18: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/18.jpg)
Training(Location𝑙"*+ andPredictionIndicator𝑝")•UseREINFORCEtolearnobservationandemissionpolicies
•REINFORCE:• Objective:𝐽 𝜃 = ∑ 𝑝j 𝑎 𝑟(𝑎)k∈𝒜• 𝒜:spaceofactionsequences• 𝑝j 𝑎 :probabilityofaction• 𝑟(𝑎):reward
• Gradient:𝛻𝐽 𝜃 = ∑ 𝑝j 𝑎 𝛻log𝑝j 𝑎 𝑟(𝑎)k∈𝒜• UseMonteCarlotoapproximate:• 𝛻𝐽 𝜃 ≈ +
q∑ ∑ 𝛻 log𝜋j 𝑎"s |ℎ+:"s ,𝑎+:"5+s 𝑅"sv
"Z+qsZ+
• 𝐾 interactionsequences• 𝑁 RNNtimesteps• 𝜋j :agent’spolicy• 𝑎":currentaction(𝑙"*+or𝑝")• 𝑅":cumulativerewardfromcurrenttimestep onward• ℎ":hiddenstate
• Optimizebymaximizingobjective
18
![Page 19: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/19.jpg)
Training(Location𝑙"*+ andPredictionIndicator𝑝")•Rewardfunction:• Wanthighprecisionandrecall
• 𝑟v = T 𝑅<𝑖𝑓𝑀 > 0𝑎𝑛𝑑𝑁< = 0𝑁*𝑅* + 𝑁5𝑅5𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• 𝑁<:#predictions emittedbyagent• 𝑁*,𝑅*:#truepositive predictions andreward• 𝑁5, 𝑅5:#falsepositive preditions andreward• 𝑅<: penaltyfornotemittingpredictionwhen#groundtruth𝑀 > 0
• Prediction iscorrectifitsoverlapwithground truth isgreaterthanathresholdandhigher thananyotherprediction
19
![Page 20: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/20.jpg)
Strengths/WeaknessesofApproach•Strengths:• Donotneedtolookatalltheframes• End-to-endlearning
•Weaknesses:• Needalltheframesinaclip(cannotdoonlinedetection)• Canbedifficulttolearnobservationpolicy ifeventcontainslessdiscriminativemovements
20
![Page 21: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/21.jpg)
Results•ResultsfromTHUMOS’14comparingwithtop3performers.mAP isreportedfordifferentIOUthresholds𝛼
•Ablationstudiesshowthatwithoutlocalizationregressionandwheretoobservenext,resultsaresignificantlyworse
21
![Page 22: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/22.jpg)
Results(LearnedObservationPolicy)
22
![Page 23: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/23.jpg)
Results(LearnedObservationPolicy)
23
![Page 24: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/24.jpg)
FutureDirection•Learnjointspatio-temporalobservationpolicies
24
![Page 25: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/25.jpg)
DetectingEventsandKeyActorsinMulti-PersonVideos
25
![Page 26: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/26.jpg)
DetectingEventsandKeyActorsinMulti-PersonVideos•PaperfromVignesh Ramanathan,JonathanHuang,SamiAbu-El-Haija,AlexanderGorban,KevinMurphyandLiFei-Fei inCVPR2016.
•Objective:• Predicteventsandkeyactorsinvideoswheremultiplepeopleareinvolved
•KeyContributions:• Introducelarge-scalebasketballeventdataset• Useattentiontodecidemostrelevantpeople totheactionbeingperformed• Showthattheattentionmodelresultsinbettereventrecognition
26
![Page 27: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/27.jpg)
Dataset•Introducedalargedatasetwithmulti-personactionvideos.Thedatasetconsistsof257NCAAgameseacharound1.5hourslong.11differentbasketballeventsaredenselyannotatedinthevideos.
27
![Page 28: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/28.jpg)
Approach•Eventsinateamsportareperformedbyasetofkeyplayers.Itissufficienttofocusonlytheplayersparticipatingtorecognizeanevent.Forexample,a“steal”eventinbasketballisdefinedbytheactionoftheplayerattemptingtopasstheballandtheplayerstealing.
•Theideaistofocusonkeyplayerstopredictevents.
28
![Page 29: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/29.jpg)
Approach(Pipeline)•EachplayertrackisprocessedbyaBLSTMnetwork.Theoutputhiddenstateisprocessedbyanattentionmodeltoidentifykeyplayers.
•Thethicknessoftheboxesshowattentionweights.
•EachvideoframeisprocessedbyaBLSTMnetwork.
29
![Page 30: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/30.jpg)
FeatureExtraction•Eachvideoframe𝑡 isrepresentedasafeaturevector𝑓{ fromtheactivationofthelastfullyconnectedlayeroftheInception7network.
•Eachplayer𝑖 boundingboxisrepresentedasafeaturevector𝑝{s fromInception7.
30
![Page 31: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/31.jpg)
EventClassification•Computeglobalcontextvectorforeachframe𝑡:• ℎ{
| = 𝐵𝐿𝑆𝑇𝑀|�kS�(ℎ{5+| , ℎ{*+
| ,𝑓{ )
31
![Page 32: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/32.jpg)
EventClassification•Computeglobalcontextvectorforeachframe𝑡:• ℎ{
| = 𝐵𝐿𝑆𝑇𝑀|�kS�(ℎ{5+| , ℎ{*+
| ,𝑓{ )
•Nextcomputehiddenstateofeventattime𝑡:• ℎ{� = 𝐿𝑆𝑇𝑀(ℎ{5+� ,ℎ{
|,𝑎{)• 𝑎{ isthefeaturevectorfortheplayersfromtheattentionmodel
32
![Page 33: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/33.jpg)
EventClassification•Computeglobalcontextvectorforeachframe𝑡:• ℎ{
| = 𝐵𝐿𝑆𝑇𝑀|�kS�(ℎ{5+| , ℎ{*+
| ,𝑓{ )
•Nextcomputehiddenstateofeventattime𝑡:• ℎ{� = 𝐿𝑆𝑇𝑀(ℎ{5+� ,ℎ{
|,𝑎{)• 𝑎{ isthefeaturevectorfortheplayersfromtheattentionmodel
•Predictclasslabelusing𝑤��ℎ{�
•SquaredHingeLossfunction:• 𝐿 = +
�∑ ∑ max(0,1− 𝑦�𝑤��ℎ{�)�q
�Z+�{Z+
• 𝑦� is1ifthevideobelongs toclass𝑘 and-1otherwise
33
![Page 34: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/34.jpg)
Attention•Howdowegetthefeaturevector𝑎{ fortheplayersfromtheattentionmodel?
34
![Page 35: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/35.jpg)
AttentionModels(withtracking)•AttentionmodelwithKLTtrackingforplayer𝑖 andframet:• ℎ{s
< = 𝐵𝐿𝑆𝑇𝑀{�kc�(ℎ{5+,s< ,ℎ{*+,s
< , 𝑝{s)• 𝑎{{�kc� = ∑ 𝛾{s{�kc�ℎ{s
<v�sZ+
• 𝛾{s{�kc� = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝜙(ℎ{|,ℎ{s
< ,ℎ{5+� ); 𝜏)
•𝑎{: weightedcombinationoverplayersinframe𝑡•𝛾{s : attentionweights
•𝑁{: #playerdetectionsinframe𝑡•𝜙():multilayerperceptron
•𝜏:softmax temperature
35
![Page 36: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/36.jpg)
AttentionModels(withouttracking)•Attentionmodelwithouttracking:• 𝑎{"h{�kc� = ∑ 𝛾{s"h{�kc� 𝑝{s
v�sZ+
• 𝛾{s"h{�kc� = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝜙(ℎ{|,𝑝{s , ℎ{5+� ); 𝜏)
•𝑎{:weightedcombinationoverplayersinframe𝑡
•𝛾{s : attentionweights
•𝑁{: #playerdetections inframe𝑡
•𝜙():multilayerperceptron
•𝜏:softmax temperature
•𝑝{s:playerfeaturevectorfromInception7
36
![Page 37: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/37.jpg)
Strengths/WeaknessesofApproach•Strengths:• Attentionfocusesonkeyplayers
•Weaknesses:• Needalltheframesinaclip(cannotdoonlinedetection)• Model tendstobereluctanttoswitchattentionbetweenplayersinascene
37
![Page 38: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/38.jpg)
Results(EventClassification)•Herewecomparetheabilitytoclassifyisolatedvideoclipsinto11classes
•Attentionisparticularlygoodforshot-basedeventswhereattendingtotheshotmakingpersonordefenderscanbeuseful
38
![Page 39: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/39.jpg)
Results(EventDetection)•Herewecomparetheabilitytotemporallylocalizeeventsinuntrimmedvideos usinga4second slidingwindowthroughallthe videos
•Here,astealeventisparticularlychallengingasitisoftenmistakenforapass
•Combining theplayerfeaturesbyaveragingwithoutusingattentionperformsverygoodaswell• Possiblybecause thealgorithmhasdifficultychangingattentionsincewearedealingwithuntrimmedvideos
39
![Page 40: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/40.jpg)
Results(Attention)•Attendedplayerisincyanandballisinyellow
•Resultsshowthatmodelattendstotheplayermakingtheshotatthebeginning
40
![Page 41: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/41.jpg)
Results(AttentionHeatmap)•Distributionofattentionshowsinitiallyattentionfocussesonshooterandthendisperseslaterintheevent
41
![Page 42: activity recognition - Justin Liangjustin-liang.com/talks/activity_recognition.pdf · Agenda •End-to-end Learning of Action Detection from Frame Glimpses in Videos. S. Yeung, O](https://reader034.vdocument.in/reader034/viewer/2022050422/5f9145e81ef5070201514cd4/html5/thumbnails/42.jpg)
WrapUp•Questions?
•Suggestions?
42