jpylyzer, a validation and feature extraction tool developed in scape project
DESCRIPTION
Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.TRANSCRIPT
SCAPE
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation
3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool
SCAPE
National Programme for preservation of paper heritage Digitisation as a means to conserve threatened paper
originals
Metamorfoze
TIFF JP2
146 TB
Migrate by end 2012
SCAPE JP2 from JISC 1 Newspaper Collection (BL)
SCAPE
“Well-formed and valid”
JP2 from JISC 1 Newspaper Collection (BL)
SCAPE
Hardware failure may result in
corrupted images
Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg
SCAPE
Not all encoders
produce standard
compliant images
SCAPE Possible solutions
Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet)
Option 3
Develop dedicated tool
SCAPE
1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0
Jpylyzer tool
SCAPE Jpylyzer tool
- First prototype: December 2011
- Refactoring of original code: Jan 2012
- Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna
- Add remaining functionality, bugfixes: Apr-May 2012 (current version: 1.5)
SCAPE
JPEG 2000 Signature box
Contiguous Codestream box 0
File Type box
JP2 Header box (superbox)
Contiguous Codestream box n
IPR box
XML box(es)
UUID box(es)
UUID Info box(es) (superbox)
JP2 file
SCAPE Command-line use
SCAPE Result
SCAPE Properties extraction (excerpt)
SCAPE Properties embedded ICC profile
SCAPE Documentation
SCAPE
Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2
Example 1: detection of broken JP2s in JISC 1 Newspapers
*Includes unzipping, actual time needed by jpylyzer much less!
SCAPE Results
- 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available
- JISC 2 (> 1 million images): 3 broken JP2s
- 19th Century books (> 22 million images): no broken JP2s
SCAPE
TIFF JP2
146 TB
Migrate by end 2012
Example 2: quality control Metamorfoze migration
SCAPE TIFF
Aware JP2K SDK
JP2 Jpylyzer*
pixel compare
compare image
properties
properties profile
pass fail
pixels identical?
properties match?
valid JP2?
yes
no
no
no
yes
yes
*Imported as module in Python-based workflow
SCAPE Example 3: pre-ingest quality control Wellcome
Library
- JP2s produced in-house and by external suppliers
- Use jpylyzer to validate against JP2 spec
- Use extracted properties to validate against a profile (Progression order, ratio, layers, ….)
- Profile coded as XML schema (So jpylyzer output can be validated against schema)
SCAPE Platforms and licensing stuff
SCAPE http://www.openplanetsfoundation.org/software/jpylyzer
SCAPE Community involvement
SCAPE Acknowledgements
Debian packages - Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)
Feedback on early versions - Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)
SCAPE
#SCAPEProject
http://www.scape-project.eu
This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Funding