comments on carriage of timed text and visual overlays in mp4
DESCRIPTION
Presentation made during the 104th MPEG meeting related to the carriage of subtitles, WebVTT in MP4TRANSCRIPT
![Page 1: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/1.jpg)
Institut Mines-Télécom
Comments on timed text DIS
Cyril Concolato, Jean Le Feuvre
19/04/2013
![Page 2: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/2.jpg)
Institut Mines-Télécom2
Proposed editorial changes (section 1)
See attached document
17/04/2013 Timed Text streams in ISOBMFF
![Page 3: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/3.jpg)
Institut Mines-Télécom3
IS Timeline (section 2.1)
The ISO standard won’t be able to progress to IS until the WebVTT specification has been transferred from the CG to the WG and until it has reached Proposed Recommendation status
17/04/2013 Timed Text streams in ISOBMFF
![Page 4: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/4.jpg)
Institut Mines-Télécom4
Parser Behaviour (section 2.2)
Problem• Relationship btw WebVTT file Parsing and MP4
Parsing is unclear─ WebVTT discarded text─ WebVTT invalid cues─ WebVTT parsing character replacement
17/04/2013 Timed Text streams in ISOBMFF
![Page 5: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/5.jpg)
Institut Mines-Télécom5
Valid WebVTT file
17/04/2013 Timed Text streams in ISOBMFF
WEBVTT 00:11.000 --> 00:13.000We are in New York City 00:13.000 --> 00:16.000<v Roger Bingham>We're actually at the Lucern Hotel, just down the street NOTE – This is a comment 00:16.000 --> 00:18.000<v Roger Bingham>from the American Museum of Natural History 00:18.000 --> 00:20.000<v Roger Bingham>And with me is Neil deGrasse Tyson
Signature
Cues
Comments
![Page 6: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/6.jpg)
Institut Mines-Télécom6
Parseable WebVTT File
17/04/2013 Timed Text streams in ISOBMFF17/04/2013 Timed Text streams in ISOBMFF6
WEBVTT – this is text after the signature (not syntax compliant)This is a headeron three linesheaders are not specifically mentioned in the syntax This line is not in the headerThis one too 00:11.000 --> 00:13.000We are in New York City 00:00001.000 --> 00:16.000<v Roger Bingham>We're actually at the Lucern Hotel, just down the street NOTE – This is a comment 00:16.000 --> 00:18.000<v Roger Bingham>from the American Museum of Natural History This line is in between cue and is not mentioned in the syntaxThis one too
00:18.000 --> 00:20.000<v Roger Bingham>And with me is Neil deGrasse Tyson
Trailing text
Header
Additional linesnot a header
Additional
linesnot a comment
Invalid cue
![Page 7: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/7.jpg)
Institut Mines-Télécom7
Parser Behaviour (section 2.2)
Problem• Relationship btw WebVTT file Parsing and MP4
Parsing is unclear─ WebVTT discarded text─ WebVTT invalid cues─ WebVTT parsing character replacement
Proposal (1)• ISOBMF <-> WebVTT round-trip should be as
conservative as possible ─ Avoid any forward compatibility issues
17/04/2013 Timed Text streams in ISOBMFF
![Page 8: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/8.jpg)
Institut Mines-Télécom8
WebVTT vs. ISOBMFF ParsingWhich compatibility ? (section 2.3)
17/04/2013 Timed Text streams in ISOBMFF
ISOBMF Writer
ISOBMF Reader
WebVTT File 1
WebVTT Parser
WebVTT File 2
ISOBM File
WebVTT ParserCompare
Compare
![Page 9: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/9.jpg)
Institut Mines-Télécom9
Parser Behaviour (section 2.2)
Problem• Relationship btw WebVTT file Parsing and MP4
Parsing is unclear─ WebVTT discarded text─ WebVTT invalid cues─ WebVTT parsing character replacement
Proposal (2)• AdditionalTextBox
─ Keeps non visible text in different boxes─ Avoids storage of emtpy lines
17/04/2013 Timed Text streams in ISOBMFF
![Page 10: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/10.jpg)
Institut Mines-Télécom10
Proposal AdditionalTextBox
17/04/2013 Timed Text streams in ISOBMFF
WEBVTT – this is text after the signature (not syntax compliant)This is a headeron three linesheaders are not specifically mentioned in the syntax This line is not in the headerThis one too 00:11.000 --> 00:13.000We are in New York City 00:00001.000 --> 00:16.000<v Roger Bingham>We're actually at the Lucern Hotel, just down the street NOTE – This is a comment 00:16.000 --> 00:18.000<v Roger Bingham>from the American Museum of Natural History This line is in between cue and is not mentioned in the syntaxThis one too
00:18.000 --> 00:20.000<v Roger Bingham>And with me is Neil deGrasse Tyson
Sample Entry
AdditionalTextBox
VTTCueBox
Discarded
VTTCueBox
VTTCueBox
AdditionalTextBox
AdditionalTextBox
![Page 11: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/11.jpg)
Institut Mines-Télécom11
Other clarifications needed
MP4 Parsing vs. WebVTT Parsing• Character replacement (optional)
Configuration shall contain:• Signature + trailing chars + \n• Header lines + \n (end of each line)
Cue Payload shall be:• Cue lines + \n (end of each line)
AdditionalText shall be:• Additional lines or comments + \n (end of each line)
Identifier line shall have:• No trailing \n (not needed)
Settings string shall have:• No leading space, no trailing \n
17/04/2013 Timed Text streams in ISOBMFF
![Page 12: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/12.jpg)
Institut Mines-Télécom12
Cue splitting (Section 2.4)
Cues in WebVTT are ordered by their start times• But might overlap
Current storage principle in ISOBMFF• Cue start time = sample time• Cue end time – cue start time = sample duration
Cue overlap requires splitting cues in >1 samples
17/04/2013 Timed Text streams in ISOBMFF
2 cues 1 sample
2 cues 2 samples
2 cues 2 samples
2 cues 2 samples
2 cues 3 samples
2 cues 3 samples
![Page 13: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/13.jpg)
Institut Mines-Télécom13
Cue merging
Sample reading may need cue merging• Simple cue comparison
─ SampleEnd(N-1) == SampleStart(N)─ Cue(N-1).id == cue(N).id─ Cue(N-1).settings == cue(N).settings─ Cue(N-1).payload == cue(N).payload
• No need for SourceIDBox or LocalIDBox
Introduces a delay in dispatching cues• Need to get sample N to make sure of the
cue_end(N-1)
17/04/2013 Timed Text streams in ISOBMFF
![Page 14: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/14.jpg)
Institut Mines-Télécom14
Proposed modifications
Add a CueDurationBox• Similar to duration embedded in TTML
Simplified split• Only cues with the same start time but different end
times are split• No need for empty cue boxes anymore• All cues are not necessarily RAP
17/04/2013 Timed Text streams in ISOBMFF
2 cues 1 sample
2 cues 2 samples
2 cues 2 samples
2 cues 2 samples
2 cues 2 samples
2 cues 2 samples
![Page 15: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/15.jpg)
Institut Mines-Télécom
ISOBMFF types of text tracks (section 3.1)
17/04/2013 Timed Text streams in ISOBMFF15
Metadata Sample Entry
XML Metadata Sample Entry
(metx)
Text Metadata Sample Entry
(mett)
Non Presentable Text Data
Presentable Text Data
Requires text-procesing
only
Requirestext & image
rendering
Subtitle Sample Entry
Plain TextSample Entry
Web VTTSample Entry
(wvtt)
XML SubtitleSample Entry
(stpp)
Text SubtitleSample Entry
(sbtt)
or 3GPP Timed
Text (tx3g)
or 3GPP Timed
Graphics (tigr)
![Page 16: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/16.jpg)
Institut Mines-Télécom16
Examples
TTML stream• Presentable• May require processing of images (e.g. SMPTE-TT)• TTML documents are XML & non-progressive
« XMLSubtitleSampleEntry » (stpp)
WebVTT stream• Presentable text• Cannot display images• WebVTT documents are not XML but progressive
« PlainTextSampleEntry » (wvtt)
SVG stream• Presentable text• Can display readable text and images• SVG documents are XML but can be progressive
« TextSubtitleSampleEntry » (sbtt)
• Same for HTML streams and CSS streams
17/04/2013 Timed Text streams in ISOBMFF
![Page 17: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/17.jpg)
Institut Mines-Télécom17
Design questions?
Sample Entry types • Why don’t we have a Sample Entry for Presentable
XML w/o images?• Why do we need to distinguish between text only and
text with images?• What if WebVTT v2 would carry images? CSS inline?• What if WebVTT carries SVG data or else?
Where should the configuration string be ?• out-of-band (WVTT or PlainTextSampleEntry)• In-band (in the first sample or in each RAP)
17/04/2013 Timed Text streams in ISOBMFF
![Page 18: Comments on carriage of timed text and visual overlays in MP4](https://reader036.vdocument.in/reader036/viewer/2022062303/558c8950d8b42a0c3d8b4651/html5/thumbnails/18.jpg)
Institut Mines-Télécom18
Proposed Codec parameters (section 3.2)
Sample Entry 4CC Proposed Bucket media type elements
Observation
MetaDataSampleEntry
URIMetadataSampleEntry urim urim.<theURI>
XMLMetaDataSampleEntry metx metx.<namespace>
TextMetaDataSampleEntry mett mett.<mime_format> What if the sub format has parameters?
SubtitleSampleEntry
XMLSubtitleSampleEntry stpp stpp.<namespace>
TextSubtitleSampleEntry sbtt sbtt.<mime_format>
PlainTextSampleEntry
WVTTSampleEntry wvtt wvtt
TextSampleEntry tx3g tx3g
??? text text Where is this SampleEntry defined?
??? tigr tigr What is the syntax of this Sample Entry?
17/04/2013 Timed Text streams in ISOBMFF