automatic set list identification and song segmentation of full-length concert videos @ismir2014

20
Ju -Chiang Wang 1,2 , Ming-Chi Yen 1 , Yi-Hsuan Yang 1 , and Hsin-Min Wang 1 1 Academia Sinica, Taiwan 2 University of California, San Diego, USA Automatic Set List Identification and Song Segmentation for Full-Length Concert Videos 1

Upload: ju-chiang-wang

Post on 03-Jul-2015

91 views

Category:

Technology


0 download

DESCRIPTION

Recently, plenty of full-length concert videos have become available on video-sharing websites such as YouTube. As each video generally contains multiple songs, natural questions that arise include “what is the set list?” and “when does each song begin and end?” Indeed, many full concert videos on YouTube contain song lists and timecodes contributed by uploaders and viewers. However, newly uploaded content and videos of lesser-known artists typically lack this metadata. Manually labeling such metadata would be labor-intensive, and thus an automated solution is desirable. In this paper, we define a novel research problem, automatic set list segmentation of full concert videos, which calls for techniques in music information retrieval (MIR) such as audio fingerprinting, cover song identification, musical event detection, music alignment, and structural segmentation. Moreover, we propose a greedy approach that sequentially identifies a song from a database of studio versions and simultaneously estimates its probable boundaries in the concert. We conduct preliminary evaluations on a collection of 20 full concerts and 1,152 studio tracks. Our result demonstrates the effectiveness of the proposed greedy algorithm.

TRANSCRIPT

Page 1: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Ju-Chiang Wang1,2, Ming-Chi Yen1, Yi-Hsuan Yang1, and Hsin-Min Wang1

1 Academia Sinica, Taiwan2 University of California, San Diego, USA

Automatic Set List Identification and Song Segmentation for Full-Length Concert Videos

1

Page 2: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Outline

• Motivation

• Problem Definition & Challenges

• Methodology – call for segmentation, song identification & alignment techniques

• Evaluation – dataset, performance metric & result

• Conclusion & Future Work

2

Page 3: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

What is Set List?• A list of songs that a band/artist intends to play or has played in a

concert

3

Page 4: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Motivation• Millions of full-concert (unabridged footage) videos have become

available on YouTube

About 37 Million Results Returned!

4

Page 5: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Motivation

• Natural questions:• What is the set list?

• When does each song begin and end?

• Setlist and timecodeprovided by uploadersor viewers

5

Page 6: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Motivation

• Newly uploaded video, lesser-known artists lack this metadata

• Labeling is labor-intensive• Needs experienced annotators

• At least one should go through the entire video

• AN AUTOMATED SOLUTION IS DESIRABLE!!

No Timecodes! Song 06 is missing!

6

Page 7: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Problem Definition

• Two sub-tasks: set list identification and song boundary estimation

Process

S1 S2 S3…

A Live Concert Audio

Song ID

Start time End time

Studio Version Database

7

Page 8: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Challenges

• A live song can be played in many different ways• Handle musically motivated variations (timbre, tempo, key, & mode)

• Structural variation• Featuring transitions between consecutive songs, repeated oscillations

between the sections of different songs

• Events or sections with no reference in the studio version database• intro, outro, solo, banter, big rock ending, broken instruments, malfunction

• Play cover songs from other artists

• The unstable audio quality in user-contributed concert videos

8

Page 9: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Methodology: Three Components

• Segmentation process – studio version database construction• Index the database by extracting the thumbnail of each studio song

• Song identification process – audio fingerprinting (AF) [1] or cover song identification (CSID) [2] techniques• Select top 5 probable song candidates based on the matching scores

• Alignment process – dynamic time warping (DTW) (referring to [3])• Search for the boundaries and at the same time select the top best song

based on the alignment scores

9

[1] D. Ellis, Robust landmark-based audio fingerprinting, 2009.[2] J. Serra, et al. Cross recurrence quantification for cover song identification. New Journal of Physics, 11(9), 2009.[3] M. Meuller. Information retrieval for music and motion. 2007.

Page 10: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Studio Version Database Preparation

• For better efficiency and robustness to the song structure variationwe index each studio song by its thumbnail

• Assume each live song contains its thumbnail regardless of any musical variations

• Apply Segmentino [4] and an algorithm similar to [5] to extract the thumbnail• Ensure a thumbnail is highly repetitive, long enough, and important

10

[4] M. Mauch, et al. Using musical structure to enhance automatic chord transcription. In ISMIR, 2009.[5] B. Martin, et al. Indexing musical pieces using their major repetition. In JCDL, 2011.

Page 11: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

The Proposed Greedy Approach

11

A full concert Z

A probe excerpt Xlength: L

StudioDatabase

(Thumbnails)

Song Identification(AF or CSID)

Y1

Y2

Y3

Y4

Y5

Top K Candidates(Entire Tracks)

Remove the Identified Song

Start from the End Point of

Y3 on Z

A new probe excerpt Xlength: L

Y3

Find Y3 that best matches X

Start point

End point

Estimate Boundarieson X for Y3

Smallest DTW cost

New start point

Alignment Process

Page 12: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

DTW Alignment forBoundary Estimation

12A

pro

be excerp

tX

, len= L

The entire track of a studio candidate song

Global Optimal Warping Path (OWP)Yk

• Goal: Search for a local OWP that ends at the frame withthe minimum average cost

Average cost =Accumulated cost/OWP length

L = α × mean studio track length

Average cost

Search area:

[ 𝐿

2: 𝐿]

• Set the frame index as a boundary i

The frame index iwith the minimum average cost

Page 13: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

13

The tru

ncated

pro

be excerp

t,len

= L’

The entire track of a studio candidate song

Search area:

[ 𝐿′

2: 𝐿′]

The frame index jwith the minimum average cost

DTW Alignment for Boundary & Song Selection

• Search backward from i for the frame that results in the minimum average cost, denoted by δk

• Set the frame index as a boundary j

• Obtain boundary pair (i, j)• Select the song with the

smallest δk (re-ranking)

Global Optimal Warping Path (OWP)Yk

ind

exi

Page 14: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Data Collection

• 20 popular full concert videosset list and timecodes

• Music genre: pop/rock

• Manually label and refine thestart and end boundaries

• 10 artists

• 115.2 studio tracks per artist

• 16.2 live songs per concert

14

Artist ID Artist # Concerts # Tracks

01 Coldplay 2 96

02 Maroon 5 2 62

03 Linkin’ Park 4 88

04 Muse 2 100

05 Green Day 2 184

06 Guns N' Roses 2 75

07 Metallica 1 136

08 Bon Jovi 1 205

09 The Caneberries 2 100

10 Placebo 2 126

Total 20 1152

Page 15: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Pilot Study – Song Identification Efficacy

• Goal: compare the performance between AF and CSID for live song identification

• Query: manually segment each live song from the concert

• Reference database: complete tracks of studio version

15

Method MAP Precision@1

AF 0.060 0.048

CSID 0.915 0.904

Random 0.046 0.009

Page 16: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Evaluation

• Input: a full concert audio, the set of reference studio tracks

• Output: a sequence of song indices, timecodes

• Evaluation metric: • Edit distance (ED): the dissimilarity between

the song sequence and the ground truth

• Boundary deviation (BD): start/end time difference, denoted by sBD and eBD

• Frame accuracy (FA): percentage of accurate frames (200ms, non-overlapped) (the larger the better)

16

output song sequence (10)

gro

un

d t

ruth

seq

ue

nce

(8

)

ED = 8

Page 17: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Quantitative Result for Each Concert (α = 1.5)

17

Concert ID Artist # Songs GT # Song OP ED sBD eBD FA

1 Metallica 20 15 17 6.5 89.1 0.3172 Linkin’ Park 17 17 4 3.3 12.3 0.7863 Coldplay 15 15 3 27.2 33.2 0.7444 Bon Jovi 23 25 14 8.8 66.8 0.4415 Placebo 19 18 5 11.5 27.8 0.641

6 Guns N' Roses 10 11 1 19.1 22.8 0.8757 Maroon 5 10 10 6 28.2 39.1 0.4288 Linkin’ Park 22 22 9 28.2 39.6 0.6109 Guns N' Roses 20 21 7 30.7 35.9 0.653

10 The Caneberries 17 15 4 5.3 9.8 0.758

11 The Caneberries 22 21 3 6.0 8.7 0.86012 Muse 17 19 7 32 21.9 0.68113 Maroon 5 9 12 5 110 155 0.50914 Coldplay 17 17 2 20.1 18.4 0.777

15 Maroon 5 11 11 7 50.9 72.9 0.39316 Linkin’ Park 17 20 9 36.9 24.7 0.54417 Muse 13 11 4 48.1 94.3 0.62618 Linkin’ Park 23 22 10 10.0 34.8 0.63619 Green Day 7 7 3 42.4 13.6 0.58420 Green Day 15 13 9 42.4 36.6 0.465

Page 18: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Qualitative Result for Three Concerts

18

Guns N' Roses - Live at the Ritz - 1988 - Full concert

Linkin Park - Road To Revolution [Full Concert] HD

Metallica - Rock am Ring 2012 (Full Concert)

Page 19: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Performance Comparison

19

Method # Songs GT # Song OP ED sBD eBD FA

Baseline

16.2

19.7 8.9 229 241 0.434

α = 1.2 18 7.3 25.7 57.3 0.582

α = 1.5 16.1 6.5 23.4 42.9 0.616

α = 1.8 14.6 8.4 29.3 45.3 0.526

• Baseline method: without using DTW alignment, a song starts at the probe expert and ends at the length of the identified song Y*, then hops 0.1 length of Y* for the next probe

• The full system with probe length 1.5μ performs the best, whereμ is the mean length of the studio tracks

Page 20: Automatic Set List Identification and Song Segmentation of Full-Length Concert Videos @ISMIR2014

Conclusion & Future Work

• Propose a novel MIR research problem• A new opportunity for MIR researchers to link music/audio technology to

real-world applications

• Develop a new dataset and a novel greedy approach

• Expand the size of the dataset and conduct more in-depth signal-level analysis of the dataset

• Propose this task to MIREX to call for more advanced approaches(due to the copyright issue on the studio tracks)

20

Thank You! Any Question?