pdf/a-3 for preservation. notes on embedded files and jpeg2000

27
SCAPE Johan van der Knijff Koninklijke Bibliotheek – National Library of the Netherlands DPC, PDF/A-3 Briefing, Leeds, 13.3.2013 PDF/A-3 for preservation Notes on embedded files and JPEG 2000

Upload: scape-project

Post on 18-Nov-2014

737 views

Category:

Technology


2 download

DESCRIPTION

Johan van der Knijff, the National Library of the Netherlands, presented his views on ‘PDF/A-3 for preservation’ based on notes on embedded files and JPEG2000. The presentation was given at DPC briefing (http://bit.ly/1b487mD) which introduced and reviewed recent developments with the PDF / A standard, with particular emphasis on PDF/A version 3 published in October 2012. The meeting took place in Leeds on 13 March 2013.

TRANSCRIPT

Page 1: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

SCAPE

Johan van der Knijff

Koninklijke Bibliotheek – National Library of the Netherlands

DPC, PDF/A-3 Briefing, Leeds, 13.3.2013

PDF/A-3 for preservation Notes on embedded files and JPEG 2000

Page 2: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Part 1: Embedded files

PDF/A-3: embedding of any file (type)

Page 3: PDF/A-3 for preservation. Notes on embedded files and JPEG2000
Page 4: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Key point:

Use of “embedded files” really means “embedded file streams” = specific data structure in PDF!

Page 5: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

File specification dictionary

31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj

Page 6: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

File specification dictionary

31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj

EF key points to embedded file

stream

Page 7: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Embedded file stream

32 0 obj <</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>> stream …SVG Data… endstream endobj

Page 8: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Uses of embedded file streams

Page 9: PDF/A-3 for preservation. Notes on embedded files and JPEG2000
Page 10: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

File attachments not meant to be rendered by viewer

Page 11: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

File attachment annotation EmbeddedFiles entry in name dictionary

PDF/A-3

Page 12: PDF/A-3 for preservation. Notes on embedded files and JPEG2000
Page 13: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Rendered in/by PDF viewer

Page 14: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Rendition actions Screen annotations

PDF/A-3

Page 15: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

What about inline images?

Page 16: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Not based on “embedded file stream”, but on “Image XObject” data structure (allows limited set of pre-defined formats)

What about inline images?

Page 17: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

No impact on content that is meant to be rendered by PDF viewer

But PDF/A-3’s may contain file of any possible

format as an attachment

Embedded files wrap-up:

Page 18: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Part 2: JPEG 2000

Supported since PDF/A-2

Page 19: PDF/A-3 for preservation. Notes on embedded files and JPEG2000
Page 20: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Image XObject

1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj

Page 21: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Image XObject

1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj

Identifies object as JPEG 2000 image

Page 22: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

ISO 19005-2 (PDF/A-2):

JPEG 2000 support based on subset of JPEG 2000 Part 2 (JPX baseline)

Only Part 1 of the standard (JP2) commonly

used for archival applications!

Page 23: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

JP2 vs JPX

JP2

JPX

JPEG 2000 Part 1: Basic still image format

JPEG 2000 Part 2: = JP2 + assorted advanced stuff …

Page 24: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Fragmented codestreams

Allowed in JPX Baseline!

Page 25: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

OS PDF viewers – JPEG 2000 libraries

Ghostscript: OpenJPEG or JasPer Evince: OpenJPEG Mupdf: OpenJPEG Firefox PDF viewer: built-in decoder None of these libraries support fragmented

codestreams!

Page 26: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Is it really a problem?

Fragmented codestreams extremely rare But why is this feature even allowed in a long-

term archival format? OS support of JPEG 2000 in general remains

problematic

Page 27: PDF/A-3 for preservation. Notes on embedded files and JPEG2000

#SCAPEProject

http://www.scape-project.eu

This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Funding