weakly-supervised action segmentation with iterative soft...

1
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment Li Ding (MIT) Chenliang Xu (University of Rochester) [email protected] [email protected] take bowl pour cereals pour milk stir cereals take bowl pour cereals pour milk stir cereals take bowl pour cereals pour milk stir cereals take bowl pour cereals pour milk stir cereals pour cereals pour cereals pour milk take bowl pour cereals pour milk stir cereals pour cereals pour cereals pour milk training video transcript initialize target inference assignment new target … iterate till meeting stop criteria take bowl pour cereals pour milk stir cereals ground truth training inference squeeze orange pour juice transcript initialize target SIL SIL inference Δ> Δ> Δ< new transcript squeeze orange pour juice SIL SIL squeeze orange SIL frame (t) probability video feature 1D Conv + BN + ReLU + Pool feature 1D Conv + BN + ReLU + Pool feature feature feature Upsample + feature 1x1 Conv 1x1 Conv Upsample + 1x1 Conv (Frame-wise) FC + Softmax prediction Encoder Decoder 1D Conv 1D Conv 1D Conv Overview of the Weakly-Supervised Action Segmentation System Iterative Soft Boundary Assignment (ISBA) Temporal Convolutional Feature Pyramid Network (TCFPN) Experiment Conclusion We iteratively train a temporal segmentation network with target generated from action transcript, and refine the action transcript based on the inference of current network. Code About We propose ISBA as a novel training strategy for weakly- supervised sequence learning, and TCFPN as an advanced temporal convolutional network for supervised action recognition. The whole system TCFPN+ISBA outperforms state-of-the- art on both action segmentation task and alignment tasks. Our training strategy for weakly-supervised sequence learning is general and can be extended to other tasks, such as speech recognition, video segmentation, etc. ISBA uses a simple-yet- effective algorithm, trying to generate a reasonable probabilistic distribution of different actions through numbers of iterations.

Upload: others

Post on 03-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Weakly-Supervised Action Segmentation with Iterative Soft ...liding/materials/ding2018weakly_poster.pdf · Li Ding (MIT) Chenliang Xu (University of Rochester) liding@mit.edu chenliang.xu@rochester.edu

Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment

Li Ding (MIT) Chenliang Xu (University of Rochester)[email protected] [email protected]

take bowl pour cereals pour milk stir cereals

take bowl pour cereals pour milk stir cereals

take bowl pour cereals pour milk stir cereals

take bowl pour cereals pour milk stir cerealspour cereals pour cereals pour milk

take bowl pour cereals pour milk stir cerealspour cereals pour cereals pour milk

training

video

transcript

initialize target

inference

assignment

new target

… iterate till meeting stop criteria

take bowl pour cereals pour milk stir cereals

ground truth

training

inference

squeeze orange pour juice

transcript

initialize target

SIL SIL

inference

Δ > 𝜌

Δ > 𝜌 Δ < 𝜌

new transcript

squeeze orange pour juiceSIL SILsqueeze orangeSIL

frame (t)

probability

video feature

1D Conv + BN + ReLU + Pool

feature

1D Conv + BN + ReLU + Pool

feature

feature

feature

Upsample

+

feature

1x1 Conv

1x1 Conv

Upsample

+1x1 Conv

(Frame-wise) FC + Softmax

prediction

Encoder Decoder

1D Conv

1D Conv

1D Conv

Overview of the Weakly-Supervised Action Segmentation System

Iterative Soft Boundary Assignment (ISBA) Temporal Convolutional Feature Pyramid Network (TCFPN)

Experiment

Conclusion

• We iteratively train a temporal segmentation network with

target generated from action transcript, and refine the action

transcript based on the inference of current network.

Code

About

• We propose ISBA as a novel training strategy for weakly-

supervised sequence learning, and TCFPN as an

advanced temporal convolutional network for supervised

action recognition.

• The whole system TCFPN+ISBA outperforms state-of-the-

art on both action segmentation task and alignment tasks.

• Our training strategy for weakly-supervised sequence

learning is general and can be extended to other tasks,

such as speech recognition, video segmentation, etc.

• ISBA uses a simple-yet-

effective algorithm, trying

to generate a reasonable

probabilistic distribution of

different actions through

numbers of iterations.