SCAPE
Johan van der Knijff1,2, René
van der Ark1, Carl Wilson31 Koninklijke Bibliotheek – National Library of the Netherlands2
Open Planets Foundation3
The British Library
IS&T, Archiving 2012, Copenhagen, 15.6.2012
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer
tool
SCAPE
National Programme for preservation of paper heritage
Digitisation as a means to conserve threatened paper originals
Metamorfoze
TIFFJP2
146 TB
Migrate by end 2012
SCAPEJP2 from JISC 1 Newspaper Collection (BL)
SCAPE
“Well‐formed and valid”
JP2 from JISC 1 Newspaper Collection (BL)
SCAPE
Hardware failure may result in
corrupted images
Source:
http://img70.imageshack.us/img70/9950/serversnm2.jpg
SCAPE
Not all encoders
produce standard
compliant images
SCAPEPossible solutions
Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2Not ready for operational use (yet)
Option 3
Develop dedicated tool
SCAPE
0
1 1 0 0 1 1 1
0
1
1
10
0
1
1
1 0 1 011 1 10
Jpylyzer
tool
SCAPEJpylyzer
tool
‐
First prototype: December 2011
‐
Refactoring
of original code: Jan 2012
‐
Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna
‐
Add remaining functionality, bugfixes: Apr‐May 2012 (current version: 1.5)
SCAPE
JPEG 2000 Signature box
Contiguous Codestream
box 0
File Type box
JP2 Header box (superbox)
Contiguous Codestream
box n
IPR box
XML box(es)
UUID box(es)
UUID Info box(es) (superbox)
JP2 file
SCAPECommand‐line use
SCAPEResult
SCAPEProperties extraction (excerpt)
SCAPEProperties embedded ICC profile
SCAPEDocumentation
SCAPE
Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2
Example 1: detection of broken JP2s in JISC 1 Newspapers
*Includes unzipping, actual time needed by jpylyzer
much less!
SCAPEResults
‐
676 broken JP2s in JISC 1 collection (0.03 %)TIFF originals still available
‐
JISC 2 (> 1 million images): 3 broken JP2s
‐
19th
Century books (> 22 million images): no broken JP2s
SCAPE
TIFFJP2
146 TB
Migrate by end 2012
Example 2: quality control Metamorfoze migration
SCAPETIFF
Aware JP2K SDK
JP2 Jpylyzer*
pixel compare
compareimage
properties
propertiesprofile pass fail
pixelsidentical?
propertiesmatch?
valid JP2?
yes
no
no
no
yes
yes
*Imported as module in Python‐based workflow
SCAPEExample 3: pre‐ingest quality control Wellcome
Library
‐
JP2s produced in‐house and by external suppliers
‐
Use jpylyzer
to validate against JP2 spec
‐
Use extracted properties to validate against a profile
(Progression order, ratio, layers, ….)
‐
Profile coded as XML schema(So jpylyzer
output can be validated against schema)
SCAPEPlatforms and licensing stuff
SCAPEhttp://www.openplanetsfoundation.org/software/jpylyzer
SCAPECommunity involvement
SCAPEAcknowledgements
Debian
packages‐
Dave Tarrant (Uni
Southampton/OPF)
‐
Miguel Ferreira, Rui Castro, Hélder
Silva (KEEP Solutions),
‐
Rainer Schmidt (AIT)
Feedback on early versions‐
Christy Henshaw (Wellcome
Library)
‐
Ross Spencer (TNA)
‐
Wouter Kool (KB)
SCAPE
#SCAPEProject
http://www.scape‐project.eu
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Funding