comments on carriage of timed text and visual overlays in mp4

18
Institut Mines-Télécom Comments on timed text DIS Cyril Concolato, Jean Le Feuvre 19/04/2013

Upload: cyril-concolato

Post on 26-Jun-2015

835 views

Category:

Technology


7 download

DESCRIPTION

Presentation made during the 104th MPEG meeting related to the carriage of subtitles, WebVTT in MP4

TRANSCRIPT

Page 1: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom

Comments on timed text DIS

Cyril Concolato, Jean Le Feuvre

19/04/2013

Page 2: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom2

Proposed editorial changes (section 1)

See attached document

17/04/2013 Timed Text streams in ISOBMFF

Page 3: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom3

IS Timeline (section 2.1)

The ISO standard won’t be able to progress to IS until the WebVTT specification has been transferred from the CG to the WG and until it has reached Proposed Recommendation status

17/04/2013 Timed Text streams in ISOBMFF

Page 4: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom4

Parser Behaviour (section 2.2)

Problem• Relationship btw WebVTT file Parsing and MP4

Parsing is unclear─ WebVTT discarded text─ WebVTT invalid cues─ WebVTT parsing character replacement

17/04/2013 Timed Text streams in ISOBMFF

Page 5: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom5

Valid WebVTT file

17/04/2013 Timed Text streams in ISOBMFF

WEBVTT 00:11.000 --> 00:13.000We are in New York City 00:13.000 --> 00:16.000<v Roger Bingham>We're actually at the Lucern Hotel, just down the street NOTE – This is a comment 00:16.000 --> 00:18.000<v Roger Bingham>from the American Museum of Natural History 00:18.000 --> 00:20.000<v Roger Bingham>And with me is Neil deGrasse Tyson

Signature

Cues

Comments

Page 6: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom6

Parseable WebVTT File

17/04/2013 Timed Text streams in ISOBMFF17/04/2013 Timed Text streams in ISOBMFF6

WEBVTT – this is text after the signature (not syntax compliant)This is a headeron three linesheaders are not specifically mentioned in the syntax This line is not in the headerThis one too 00:11.000 --> 00:13.000We are in New York City 00:00001.000 --> 00:16.000<v Roger Bingham>We're actually at the Lucern Hotel, just down the street NOTE – This is a comment 00:16.000 --> 00:18.000<v Roger Bingham>from the American Museum of Natural History This line is in between cue and is not mentioned in the syntaxThis one too

00:18.000 --> 00:20.000<v Roger Bingham>And with me is Neil deGrasse Tyson

Trailing text

Header

Additional linesnot a header

Additional

linesnot a comment

Invalid cue

Page 7: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom7

Parser Behaviour (section 2.2)

Problem• Relationship btw WebVTT file Parsing and MP4

Parsing is unclear─ WebVTT discarded text─ WebVTT invalid cues─ WebVTT parsing character replacement

Proposal (1)• ISOBMF <-> WebVTT round-trip should be as

conservative as possible ─ Avoid any forward compatibility issues

17/04/2013 Timed Text streams in ISOBMFF

Page 8: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom8

WebVTT vs. ISOBMFF ParsingWhich compatibility ? (section 2.3)

17/04/2013 Timed Text streams in ISOBMFF

ISOBMF Writer

ISOBMF Reader

WebVTT File 1

WebVTT Parser

WebVTT File 2

ISOBM File

WebVTT ParserCompare

Compare

Page 9: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom9

Parser Behaviour (section 2.2)

Problem• Relationship btw WebVTT file Parsing and MP4

Parsing is unclear─ WebVTT discarded text─ WebVTT invalid cues─ WebVTT parsing character replacement

Proposal (2)• AdditionalTextBox

─ Keeps non visible text in different boxes─ Avoids storage of emtpy lines

17/04/2013 Timed Text streams in ISOBMFF

Page 10: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom10

Proposal AdditionalTextBox

17/04/2013 Timed Text streams in ISOBMFF

WEBVTT – this is text after the signature (not syntax compliant)This is a headeron three linesheaders are not specifically mentioned in the syntax This line is not in the headerThis one too 00:11.000 --> 00:13.000We are in New York City 00:00001.000 --> 00:16.000<v Roger Bingham>We're actually at the Lucern Hotel, just down the street NOTE – This is a comment 00:16.000 --> 00:18.000<v Roger Bingham>from the American Museum of Natural History This line is in between cue and is not mentioned in the syntaxThis one too

00:18.000 --> 00:20.000<v Roger Bingham>And with me is Neil deGrasse Tyson

Sample Entry

AdditionalTextBox

VTTCueBox

Discarded

VTTCueBox

VTTCueBox

AdditionalTextBox

AdditionalTextBox

Page 11: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom11

Other clarifications needed

MP4 Parsing vs. WebVTT Parsing• Character replacement (optional)

Configuration shall contain:• Signature + trailing chars + \n• Header lines + \n (end of each line)

Cue Payload shall be:• Cue lines + \n (end of each line)

AdditionalText shall be:• Additional lines or comments + \n (end of each line)

Identifier line shall have:• No trailing \n (not needed)

Settings string shall have:• No leading space, no trailing \n

17/04/2013 Timed Text streams in ISOBMFF

Page 12: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom12

Cue splitting (Section 2.4)

Cues in WebVTT are ordered by their start times• But might overlap

Current storage principle in ISOBMFF• Cue start time = sample time• Cue end time – cue start time = sample duration

Cue overlap requires splitting cues in >1 samples

17/04/2013 Timed Text streams in ISOBMFF

2 cues 1 sample

2 cues 2 samples

2 cues 2 samples

2 cues 2 samples

2 cues 3 samples

2 cues 3 samples

Page 13: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom13

Cue merging

Sample reading may need cue merging• Simple cue comparison

─ SampleEnd(N-1) == SampleStart(N)─ Cue(N-1).id == cue(N).id─ Cue(N-1).settings == cue(N).settings─ Cue(N-1).payload == cue(N).payload

• No need for SourceIDBox or LocalIDBox

Introduces a delay in dispatching cues• Need to get sample N to make sure of the

cue_end(N-1)

17/04/2013 Timed Text streams in ISOBMFF

Page 14: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom14

Proposed modifications

Add a CueDurationBox• Similar to duration embedded in TTML

Simplified split• Only cues with the same start time but different end

times are split• No need for empty cue boxes anymore• All cues are not necessarily RAP

17/04/2013 Timed Text streams in ISOBMFF

2 cues 1 sample

2 cues 2 samples

2 cues 2 samples

2 cues 2 samples

2 cues 2 samples

2 cues 2 samples

Page 15: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom

ISOBMFF types of text tracks (section 3.1)

17/04/2013 Timed Text streams in ISOBMFF15

Metadata Sample Entry

XML Metadata Sample Entry

(metx)

Text Metadata Sample Entry

(mett)

Non Presentable Text Data

Presentable Text Data

Requires text-procesing

only

Requirestext & image

rendering

Subtitle Sample Entry

Plain TextSample Entry

Web VTTSample Entry

(wvtt)

XML SubtitleSample Entry

(stpp)

Text SubtitleSample Entry

(sbtt)

or 3GPP Timed

Text (tx3g)

or 3GPP Timed

Graphics (tigr)

Page 16: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom16

Examples

TTML stream• Presentable• May require processing of images (e.g. SMPTE-TT)• TTML documents are XML & non-progressive

« XMLSubtitleSampleEntry » (stpp)

WebVTT stream• Presentable text• Cannot display images• WebVTT documents are not XML but progressive

« PlainTextSampleEntry » (wvtt)

SVG stream• Presentable text• Can display readable text and images• SVG documents are XML but can be progressive

« TextSubtitleSampleEntry » (sbtt)

• Same for HTML streams and CSS streams

17/04/2013 Timed Text streams in ISOBMFF

Page 17: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom17

Design questions?

Sample Entry types • Why don’t we have a Sample Entry for Presentable

XML w/o images?• Why do we need to distinguish between text only and

text with images?• What if WebVTT v2 would carry images? CSS inline?• What if WebVTT carries SVG data or else?

Where should the configuration string be ?• out-of-band (WVTT or PlainTextSampleEntry)• In-band (in the first sample or in each RAP)

17/04/2013 Timed Text streams in ISOBMFF

Page 18: Comments on carriage of timed text and visual overlays in MP4

Institut Mines-Télécom18

Proposed Codec parameters (section 3.2)

Sample Entry 4CC Proposed Bucket media type elements

Observation

MetaDataSampleEntry      

URIMetadataSampleEntry urim urim.<theURI>  

XMLMetaDataSampleEntry metx metx.<namespace>  

TextMetaDataSampleEntry mett mett.<mime_format> What if the sub format has parameters?

SubtitleSampleEntry      

XMLSubtitleSampleEntry stpp stpp.<namespace>  

TextSubtitleSampleEntry sbtt sbtt.<mime_format>  

PlainTextSampleEntry      

WVTTSampleEntry wvtt wvtt  

TextSampleEntry tx3g tx3g  

??? text text Where is this SampleEntry defined?

??? tigr tigr  What is the syntax of this Sample Entry?

17/04/2013 Timed Text streams in ISOBMFF