multimedia databases - tu braunschweig · 2010-01-08 · • e.g., a temporal model for fades –...

90
Multimedia Databases Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Upload: others

Post on 06-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

Multimedia DatabasesMultimedia Databases

Wolf-Tilo BalkeSilviu HomoceanuInstitut für InformationssystemeTechnische Universität Braunschweighttp://www.ifis.cs.tu-bs.de

Page 2: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

10 Video Retrieval - Shot Detection

10.1 Video Abstraction

10.2 Shot Detection

10.3 Statistical Structure Models

10 Video Retrieval – Shot Detection

10.3 Statistical Structure Models

10.4 Temporal Models

10.5 Shot Activity

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

Page 3: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Temporal and spatial structuring of the content of a video

• Important for questions related to temporal issues: “Find clips in which an object falls down!“

10.1 Video Abstraction

issues: “Find clips in which an object falls down!“

• Basically, two sub-domains

– Video modeling and representation

– Video segmentation and summarization

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

Page 4: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Video modeling

– General structure of a video

10.1 Video Abstraction

Story Unit Story Unit Story Unit

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

Structural

Unit

Structural

Unit

Structural

Unit

Structural

Unit

Structural

Unit

Shot Shot Shot Shot ShotShot

Frames

Key Frame

Page 5: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• News broadcast

– Story unit:

• War in Iraq

– Structural units:

10.1 Example

• Introduction: “The fighting around the city ...”

• Transmission: various scenes of war

• Summary: “The reaction of the federal parliament ...”

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 5

Page 6: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– Shots

• Anchorman in a studio

• Pan across a desert landscape

• Bombing of a city

• Refugees

10.1 Example

• Refugees

• Anchorman in a studio

• Speech in the parliament

– Typical frames for all shots

• Usually represented by some key frame

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 6

Page 7: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

10.1 Example

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

Page 8: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• But how can shots be detected?

• With the introduction of MPEG-7shot detection is ready-made

– Metadata standard

10.2 Shot Detection

– Metadata standard

– The correct decomposition is already stored in the metadata

• Camera information is easy to extract

– But semantic annotation is unfortunately very expensive

– Archive material still needs a lot of manual work

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

Page 9: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• A clip consists of many scenes

• Images belonging to a scene are relatively similar to each other

– Example: anchorman in the newsroom, desert

10.2 Shot Detection

– Example: anchorman in the newsroom, desert landscape

• For this reason, we do not have to index each individual frame to perform efficient video retrieval, but index only key frames

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

Page 10: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Problems in finding key frames

– Detecting a scene transitionwith hard or soft transitions

• A hard transition is called a “cut”

• A soft transition “dissolve” (blending) or “fade in/out”

10.2 Shot Detection

• A soft transition “dissolve” (blending) or “fade in/out”

– Selecting a representative image, either by random selection, or with regard to the camera movement or an image with average characteristic values, ...

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

Page 11: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• For grouping of frames into shots each transition has to be recognized

– With uncompressed videos

• Information from each image is optimally used but the procedure is relatively inefficient

10.2 Shot Detection

procedure is relatively inefficient

– Or compressed videos

• E.g., only data about the change is available

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 11

Page 12: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Shot detection in uncompressed videos

– Template matching (Zhang and others, 1993)

• Pixel wise comparison: For each pixel (x, y) in the image, the value of the color of the pixel in this frame is compared with the color value in a later frame

10.2 Shot Detection

compared with the color value in a later frame

• If the change between two frames is large enough (larger than a predefined threshold), a cut is assumed

• This only works for hard transitions

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 12

Page 13: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

DDDDcutcutcutcut = = = = ΣΣΣΣx, yx, yx, yx, y |I(x, y, t) |I(x, y, t) |I(x, y, t) |I(x, y, t) ---- I(x, y, t + 1)|I(x, y, t + 1)|I(x, y, t + 1)|I(x, y, t + 1)|

– It is impossible to distinguish small changes in a wide area of major changes in a small area

– Susceptible to noise, object

10.2 Template Matching

– Susceptible to noise, object movements and changes in camera angle

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

Page 14: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Histogram-based methods (Tonomura, 1991)

– Assumption: frames containing identical foreground and background elements have a similar brightness distribution

– Classification based on the brightness values

10.2 Histograms

– Classification based on the brightness values

– Histogram columns as the number of image pixels with a specified value

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

Page 15: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– Let H(j, t) be the histogram value for the jth

brightness value in frame t

DDDDcutcutcutcut = = = = ΣΣΣΣjjjj |H( j , t ) |H( j , t ) |H( j , t ) |H( j , t ) –––– H( j , t + 1)|H( j , t + 1)|H( j , t + 1)|H( j , t + 1)|

10.2 Histograms

DDDDcutcutcutcut = = = = ΣΣΣΣjjjj |H( j , t ) |H( j , t ) |H( j , t ) |H( j , t ) –––– H( j , t + 1)|H( j , t + 1)|H( j , t + 1)|H( j , t + 1)|

– Once again using a predefined threshold we can decide whether there is a cut or not

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

Page 16: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Histograms are invariant towards image rotation and change only slightly under

– Object translation

– Occlusions caused by moving objects

10.2 Histograms

– Slow camera movements

– Zooming

• Significantly less error sensitive than template matching

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

Page 17: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Good choice of thresholds is important

– Too low thresholds produce false cuts

– Too high thresholds leads to missed cuts

• Selection depends on the type of videos (training)

10.2 Threshold

• Selection depends on the type of videos (training)

• Choose the threshold such that as few cuts as possible are overlooked, but not too many false cuts are produced

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 17

Page 18: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Selection, e.g., using distribution functions

10.2 Threshold

number

– Differences within the sequences

– Differences between sequences

– Selection by minimal error rate

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 18

difference

Page 19: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• For smooth transitions (dissolves, fades, ...) there are only small changes between consecutive transitions

– Still, the differences between the middle frames ofdifferent shots, are large enough

10.2 Twin-Thresholding

different shots, are large enough

• Idea: use two thresholds

– One for the determination of hard cuts

– And one for the soft cuts

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

Page 20: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Twin comparisons (Zhang and others, 1993)

– Threshold tc corresponds to the size of an intolerable change in the pixel intensities

– Using a threshold ts we can detect possible originsof smooth transitions

t

10.2 Twin-Thresholding

ts

of smooth transitions

– If a possible smooth transition is detected at time t, the frame is marked at this time as a reference frame

• The next frames are compared against this reference frame

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

Page 21: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– All differences of subsequent frames in the interval [t + 1, t + n] are not computed regarding the direct predecessor, but the reference frame t (for some fixed n)

– Only if the difference rises above the threshold tc,

10.2 Twin-Thresholding

n

– Only if the difference rises above the threshold tc,

there is a smooth cut, otherwise differences are simply re-formed between consecutive frames

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

Page 22: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Example:

10.2 Twin-Thresholding

possible soft cutdifference

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

time

hard cut no soft cut soft cut

Page 23: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Block-based techniques try to avoid the problem of noise and different camera settings (Idris and Panchanathan, 1996)

– Each frame is divided into r blocks

10.2 Block based techniques

– Local characteristics are calculated for each block

– Corresponding sub-frames are compared

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 23

Page 24: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Advantages

– We can detect and ignore effects occurring in only part of the picture through block-wise comparison

• E.g., movement of the anchorman’s head

– If a high number of the r blocks are the same in a

10.2 Block based techniques

– If a high number of the r blocks are the same in a sequence of two consecutive frames, this is an indication of the frames belonging to the same shot

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 24

Page 25: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• There are only a small amount of possible transitions between two shots

– Idea: model the transitions as mathematical operations

– Characteristic temporal patterns in video streams can

10.2 Model based procedure

– Characteristic temporal patterns in video streams can be detected

– Advantage: this doesn’t only recognize transitions, but also their type

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

Page 26: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• E.g., a temporal model for fades

– When fading out the pictures of the first shot become darker. The brightness histogram is compressed in the x direction

– Then there are some (almost) black frames

10.2 Model based procedure

– Then there are some (almost) black frames

– When fading in, the images of the second shot become brighter. The histogram is stretched in the x direction

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

Page 27: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– This behavior can be interpreted as the application of mathematical operations on the histogramand observed on a stream of frames

– Defining the start and end of the fade out/in process delivers the shot boundaries

10.2 Model based procedure

delivers the shot boundaries

• Similar models can be set for other transitions (e.g., dissolve)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

Page 28: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

10.2 E.g., Fade Out, Fade In

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

Page 29: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Shot detection in compressed videos

– Compressed storage is needed due to the size of video data

– Pixel-based methods for shot detection use uncompressed videos and are therefore usually very

10.2 SD in Compressed Videos

uncompressed videos and are therefore usually very computationally intensive

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 29

Page 30: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Shot detection is possible also on the compressed data however trading between efficiency and accuracy

• Approaches:

10.2 SD in Compressed Videos

• Approaches:

– MPEG compression information

• Cosine transformation coefficients

• I P B frame structure

– Motion vectors information

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 30

Page 31: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Video compression is based often on discrete cosine transform (DCT)

– E.g., MPEG, H.264, MotionJPEG, ...

– The DCT coefficients have correspondents in the real space of the input signal

10.2 Cosine Transformation

264

space of the input signal

– The oscillation of the coefficients can be used for shot detection

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

Page 32: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• In compression techniques, the image is dividedinto blocks (e.g., 8x8 pixels in JPEG). Each block is separately transformed using DCT

– The first coefficient (DC) of the DCT is the average intensity of the block

10.2 Cosine Transformation

the average intensity of the block

– A DC-frame is created by using only the DCs of all the blocks and ignore all the higher coefficients

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

Page 33: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– A sequence of DC frames is called DC sequence.DC sequences abstract video clips without having to decode them

– If features are extracted from DC frames we can form traces of sequences, such as the generalized traces

10.2 Cosine Transformation

traces of sequences, such as the generalized traces (Taskiran and Delp, 1998)

– From these traces it is possible to calculate the probability of cuts for each frame

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

Page 34: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Compression based onthe encoding of

changes between frames– I-frames are independently coded (I: independent)

– P-frames are encoded with change information from

10.2 MPEG Compression

– P-frames are encoded with change information from preceding I or P-frames (P: predicted)

– B-frames are interpolations between two P or I and P frame (bi-directional)

– B-frames can thus be calculated both from the preceding, and from the subsequent frame (depending on the encoder)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

Page 35: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• A shot is thus a chain of I-, P-and B-frames:

– IBPBPBIBPBP ...

• The video stream is rearranged for transmission:

10.2 MPEG Compression

transmission:

– IPBPBPBIPBP ...

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 35

Page 36: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• I-frames are independently encoded

– Direct access to the DC component to measure differences between two consecutive I-frames

– Recognition method with DC-frames are directly applicable

10.2 Shot Boundaries in MPEG

applicable

– Accuracy: between two I-frames there usually are about 15 B-and P-frames

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 36

Page 37: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• (Block) motion vectors can be extracted directly from an MPEG bitstream

• They tend to change continuously within a scene

10.2 Motion Vectors

scene

– The number of motion vectors, in consecutiveframes belonging to the same shot is similar

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

Page 38: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Example of shot detection (Zhang et al., 1993)

– Determine the number of motion vectors in the P-frames

– For B-frames count the number of forward and backward movements

10.2 Motion Vectors

backward movements

– Let M be the smaller of two numbers

– If M is smaller than a specified threshold, then it probably represents a shot boundary

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

Page 39: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Procedures for the use of DCTcoefficients and motion vectorscan be combined

– Increase the recognition accuracy

10.2 Hybrid Approaches

accuracy

– Utilization of various frame types in MPEG

– E.g., Meng and others, 1995

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

Page 40: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Shot detection at work with MSU Video tool.Shot detection algorithms:

– Pixelwise comparison

– Global histogram

10.2 Shot Detection

– Block based histogram

– Motion based detection

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

Page 41: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• E.g., shot detection on Avatar movie trailer

– Pixelwise (1) and motion based (4) produced 16 cuts

10.2 Shot Detection

– Global Histogram (2) produced 18 cuts

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 41

Page 42: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Idea: decomposition of a video in semantic units (shots)

– Previously: low level primitives (brightness, color information, movements, ...)

– Now: perceptional features (e.g., visual structure of

10.3 Statistical Structural Models

– Now: perceptional features (e.g., visual structure of the whole video)

• Film theory: stylistic elements

– Montage: temporal structure, editing, ...

– Mis-en-scene: spatial structure, scenery,lighting, camera position, ...

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 42

Page 43: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Goal: build models of stylistic elements

– Allows the extraction semantic features for the characterization and classification

– Provides background informationfor the use of low level features to

10.3 Statistical Structural Models

for the use of low level features to shot boundary detection

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

Page 44: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Trailer for movie arranged according to average shot length (montage) and activity during

10.3 Example

and activity during shots (Mis-en-scene)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

Page 45: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– Shot duration and shot activity are very rough categories, but have equivalents in movie directing

– Basic trend: the shorter the shot, the higher the action (and vice versa)

– If we widely divide the movies into categories action

10.3 Example

– If we widely divide the movies into categories action film, comedy and love movies, then we can clusteraccording to these categories

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

Page 46: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

10.3 Example

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

Page 47: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Clusters can be explained through film theory

– If emotions have to be transferred then long passages of text and detailed facial expressions (a long close-up) are required

– The development of a character and his connection

10.3 Example

– The development of a character and his connection with the audience takes time

– Charles Chaplin: “Tragedy is a close-up,comedy a long shot.”

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 47

Page 48: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– For action or suspense, rhythmic patterns are used (e.g., “Psycho” or “Birds” by Hitchcock)

– Fast cuts require a continuous adaptationof the viewer and create confusion

– Long dialogues are unnecessary,

10.3 Example

– Long dialogues are unnecessary,people express themselves through acts

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 48

Page 49: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Semantic structure assists in categorizing

– Either based on film theory

– Or learned from a sample collection

• From high-level structure patterns emerge

10.3 Video Structure

• From high-level structure patterns emerge “more” semantics than from low level features

– Statistical inference

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

Page 50: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• The more a video is structured, the more semantic information can be derived from it

– News programs are highly structured and relatively easy to fragment

– Home made videos are mostly unstructured and

10.3 Assumption

– Home made videos are mostly unstructured and almost impossible to fragment

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

Page 51: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• The classical element of the movie direction is the shot duration

• Classic elements of the mis-en-scene are more difficult to capture

10.3 Classical Elements

difficult to capture

– Activity in scenes is important

• Not only between actors (explosions, ...)

• Often correlates to violence

– But also mood (e.g., brightness, colors)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

Page 52: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Temporal video structure: shot boundaries can be modeled as a series of events occurring in succession

– Queuing theory: arrivals of persons

10.4 Temporal Models

– Modeling through a Poisson process

• Number of events in a fixed time interval follows a Poisson distribution

• Temporal distance between two successive events is exponentiallydistributed

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

Page 53: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Problem 1111: exponential distribution leads to many short, but very few long shots

• Problem 2222: exponential distribution has no memory, i.e., the probability that within the next t>0

t

10.4 Temporal Video Structure

2222

memory, i.e., the probability that within the next t>0 time units a shot change will happen, is independent of t

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 53

Page 54: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Alternative models: shot durations are not exponentially distributed, but follow distributions like

– Erlang distribution

– Weibull distribution

• Objective: estimate the model parameters from a

10.4 Temporal Video Structure

• Objective: estimate the model parameters from a training collection, were the shot boundary is manually determined

– Maximum likelihood estimate

– This knowledge can then assist in the detection of shot boundary of unknown videos

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54

Page 55: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Consider shot durations are Erlang distributed

– The length τ of a (fixed) shot has probability density

– Generalization of the exponential distribution (r = 1)

r/λ

10.4 Erlang Model

– Generalization of the exponential distribution (r = 1)

– Expected value (average shot duration): r/λ

– The sum of r independent random variables exponentially distributed with parameter λ is(r, λ)-Erlang distributed

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

Page 56: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

10.4 Erlang Model

r = 1, λ = ½

r = 2, λ = ½

r = 3, λ = ½

r = 5, λ = 1

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

r = 5, λ = 1

r = 9, λ = 2

Page 57: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• The sum of r independent random variables exponentially distributed with parameter λ is (r, λ)-Erlang distributed

– It represents a Poisson process since only exactly each r-th event is counted

r = 2

10.4 Erlang Model

(r, λ)

r-th event is counted

– r = 2: structure of the context of the whole image, followed by a zoom on the essential details

– r = 3: emotional development, followed by an action, followed by the result of this action

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

Page 58: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Likelihood function for a single Erlang-distributed random variable:

10.4 Erlang Model

• Corresponding log-likelihood function:

• Choose the optimal parameters r and λ for a sample of N independent and identically Erlang distributed random variables:

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

Page 59: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Optimization problem over a discrete variable (r) and a continuous variable (λ)

• Film theory: r is small

10.4 Erlang Model

r λ

• Film theory: r is small

• Brute-force solution:

– Test all r = 1, ..., 10 and compute the optimal λ

– Choose the pair (r, λ) that maximizes the above expression

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 59

Page 60: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• If r is known then the determination is simplified

• Derivative with respect to λ and zero values

10.4 Erlang Model

• Derivative with respect to λ and zero values returns:

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 60

Page 61: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Estimation of the parameters r and λ from a training collection:

10.4 Erlang Model

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

Page 62: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Erlang distribution solves the first problem (distribution of shot durations)

• Problem 2, however, remains

– The Erlang distribution itself has memory but the

10.4 Erlang Model

2

– The Erlang distribution itself has memory but the exponentially distributed random variables underlying each shot have no memory

– Solution: Weibull distribution (a generalization of the exponential distribution)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

Page 63: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• To assess the activity within one shot, we can again rely on low level features

– One possibility: the difference of color histograms of two consecutive frames

10.5 SD through Shot Activity

– Goal: determine a statistical model for the activity within one shots with the help of histograms

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

Page 64: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Film theory: continuity in editing

– In order not to confuse the audience, theframes separated through cuts shoulddiffer clearly

• Segment the video into regular frames (state S = 0) and S = 1

10.5 Shot Activity

• Segment the video into regular frames (state S = 0) and shot boundary (S = 1)

• Attempts to classify each frame either as regular frame or shot-boundary

• Additionally use low level features such as color histograms

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

Page 65: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Experience:

– Training data for shot activity can not be approximated good enough by means of “standard deviation”

• Therefore use several different distribution components 2000

10.5 Shot Activity

• Therefore use several different distribution components (Vasconcelos and Lippman, 2000)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 65

Page 66: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Activity within shots (S = 0)

10.5 Shot Activity

Mixture of four random variables:

three Erlang distributed

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 66

three Erlang distributed

one uniform distributed

Distance

Page 67: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Activity in shot transitions (S = 1)

10.5 Shot Activity

Mixture of two random variables:

a normal, and a uniform distribution

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 67

Distance

Page 68: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Application of statistics:

– Given: two frames, there are two hypotheses: H0: there is no cut in between (S = 0) H1: there is a cut in between (S = 1)

– Likelihood ratio test: choose H1 if

10.5 Shot Boundary Detection

H S = 0

H1 S = 1

– Likelihood ratio test: choose H1 if

(or equivalently: )

and H0 otherwise (D is the measured distance between the two frames)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 68

>

Page 69: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• The likelihood ratio test uses no knowledge about “typical” shot duration

• However, we know the a-priori distribution of the shot duration (or we can at least estimate it)

10.5 Shot Boundary Detection

shot duration (or we can at least estimate it)

• Therefore, we now use Bayesian statistics to test the two hypotheses

• We obtain in this way a generalization of the basic thresholding method for histogram differences

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 69

Page 70: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Notation:

– δ: duration of each frame (constant, determined by frame rate)

– St , t + δ: indicates whether there is a shot boundary between frame t and his immediate successors (or

10.5 Shot Boundary Detection

St , t + δ

between frame t and his immediate successors (or not)

– Dt , t + δ : distance between frame t and his immediate successors

– St: vector with components S0, δ, Sδ,2δ, ..., St, t + δ

– Dt: vector with components D0, δ, Dδ,2δ, ..., Dt, t + δ

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 70

Page 71: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Hypothesis H1 (there is a shot change)is valid, if

• Equivalent formulation:

10.5 Shot Boundary Detection

>

log > 0

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 71

log > 0

Page 72: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• If there was a cut at time t, and none in the interval [t, t + τ], then the probability for a cut in the interval [t + τ , t + τ + δ ] according to Bayes, is:

10.5 Shot Boundary Detection

• γ is a normalization constant

• On the other hand, the probability that there is no cut, is:

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 72

Page 73: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Thus:

10.5 Shot Boundary Detection

• Supposition: Dt, t + δ is conditionally independent (with St, t + δ) from all other D and S

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 73

Page 74: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

10.5 Shot Boundary Detection

Behavior of conditional

probabilities for activity

Behavior of the probabilities for

cuts

• So hypothesis HHHH1111 is valid if the logarithm of the above expression is positive

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 74

probabilities for activity

(is estimated from the

training collection,

shot activity)

cuts

(estimated from the training

collection,

distribution of shot duration)

Page 75: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Intuitive interpretation

– The left side uses information about the “normal”

10.5 Hypothesis Verification

– The left side uses information about the “normal” frame distances within shots and shot transitions

– The right part uses knowledge regarding the ”normal” distribution of the shot duration (a priori probability)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 75

Page 76: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Define with t as the time of the last cut

• Let be the distribution density of the elapsed time from t until the first cut after t

10.5 Hypothesis Verification

time from t until the first cut after t

• The log posterior odds ratio is then:

(same as , just different notation)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 76

Page 77: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• According to our initial Bayesian approach, we can decide whether there is a shot transition at point

or not, by using the following threshold based estimation

10.5 Hypothesis Verification

– If the last cut took place at time t, and we now observe , then and only then there is a new cut, if applicable:

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 77

:

Page 78: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

– This means: with the introduction of a priori probability, the verification of our hypotheses doesn’t depend anymore from a fixed threshold

10.5 Hypothesis Verification

:

doesn’t depend anymore from a fixed threshold

– The threshold changes dynamically with the time elapsed since the last cut

– The density can be assumed to be an Erlang or Weibull distribution density

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 78

Page 79: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Density function of the Erlang distribution:

• For the Erlang model, the following threshold

10.5 Erlang Model

• For the Erlang model, the following threshold function results:

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 79

Page 80: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Typical time distribution of thresholds:

10.5 Erlang Model

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 80

Page 81: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Initially, the threshold is high – Cuts are unlikely

– Cuts are therefore accepted only if the frame differences are very large

• Then, the threshold drops

10.5 Erlang Model

• Then, the threshold drops– Cuts are accepted for clearly less changes to the

features

• Problem is the asymptotic convergence to a positive value– Constant level for several consecutive soft cuts

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 81

Page 82: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• For all Erlang Thresholds we have:

and thus there is always such a boundary line Threshold

10.5 Erlang Model

Threshold

– The problem comes from the assumption of the underlying exponential distribution in the Erlangmodel

– Also here is the solution the Weibull distribution

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 82

Page 83: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Experimental verification (Vasconcelos and Lippman, 2000)

– Test within a collection cinema trailers

– Training (determination of model parameters) with the objects from the collection

10.5 Experimental Verification

the objects from the collection

• Task: segmentation of a new trailer (“Blankman”)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 83

Page 84: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Trailer for “Blankman”

10.5 Experimental Verification

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 84

Page 85: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• For each trailer simple color histogram distances were used for determining the selected activity

• The fixed threshold was chosen as good as

10.5 Experimental Verification

• The fixed threshold was chosen as good as possible (through tests)

• “O”: Missed cut

• “*”: False estimated cut

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 85

Page 86: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Fixed threshold:

10.5 Experimental Verification

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 86

Page 87: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Weibull threshold:

10.5 Experimental Verification

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 87

Page 88: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Direct comparison of two samples

10.5 Experimental Verification

Fixed

threshold

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 88

Weibull-

threshold

Page 89: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Total number of errors:

10.5 Experimental Verification

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 89

Page 90: Multimedia Databases - TU Braunschweig · 2010-01-08 · • E.g., a temporal model for fades – When fading out the pictures of the first shot become darker. The brightness histogram

• Video Signatures

– Intuitive Video Similarity

– Voronoi Video Similarity

Next lecture

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 90