![Page 1: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/1.jpg)
SCAPE
Johan van der Knijff1,2, René
van der Ark1, Carl Wilson31 Koninklijke Bibliotheek – National Library of the Netherlands2
Open Planets Foundation3
The British Library
IS&T, Archiving 2012, Copenhagen, 15.6.2012
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer
tool
![Page 2: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/2.jpg)
SCAPE
National Programme for preservation of paper heritage
Digitisation as a means to conserve threatened paper originals
Metamorfoze
TIFFJP2
146 TB
Migrate by end 2012
![Page 3: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/3.jpg)
SCAPEJP2 from JISC 1 Newspaper Collection (BL)
![Page 4: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/4.jpg)
SCAPE
“Well‐formed and valid”
JP2 from JISC 1 Newspaper Collection (BL)
![Page 5: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/5.jpg)
SCAPE
Hardware failure may result in
corrupted images
Source:
http://img70.imageshack.us/img70/9950/serversnm2.jpg
![Page 6: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/6.jpg)
SCAPE
Not all encoders
produce standard
compliant images
![Page 7: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/7.jpg)
SCAPEPossible solutions
Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2Not ready for operational use (yet)
Option 3
Develop dedicated tool
![Page 8: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/8.jpg)
SCAPE
0
1 1 0 0 1 1 1
0
1
1
10
0
1
1
1 0 1 011 1 10
Jpylyzer
tool
![Page 9: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/9.jpg)
SCAPEJpylyzer
tool
‐
First prototype: December 2011
‐
Refactoring
of original code: Jan 2012
‐
Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna
‐
Add remaining functionality, bugfixes: Apr‐May 2012 (current version: 1.5)
![Page 10: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/10.jpg)
SCAPE
JPEG 2000 Signature box
Contiguous Codestream
box 0
File Type box
JP2 Header box (superbox)
Contiguous Codestream
box n
IPR box
XML box(es)
UUID box(es)
UUID Info box(es) (superbox)
JP2 file
![Page 11: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/11.jpg)
SCAPECommand‐line use
![Page 12: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/12.jpg)
SCAPEResult
![Page 13: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/13.jpg)
SCAPEProperties extraction (excerpt)
![Page 14: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/14.jpg)
SCAPEProperties embedded ICC profile
![Page 15: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/15.jpg)
SCAPEDocumentation
![Page 16: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/16.jpg)
SCAPE
Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2
Example 1: detection of broken JP2s in JISC 1 Newspapers
*Includes unzipping, actual time needed by jpylyzer
much less!
![Page 17: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/17.jpg)
SCAPEResults
‐
676 broken JP2s in JISC 1 collection (0.03 %)TIFF originals still available
‐
JISC 2 (> 1 million images): 3 broken JP2s
‐
19th
Century books (> 22 million images): no broken JP2s
![Page 18: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/18.jpg)
SCAPE
TIFFJP2
146 TB
Migrate by end 2012
Example 2: quality control Metamorfoze migration
![Page 19: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/19.jpg)
SCAPETIFF
Aware JP2K SDK
JP2 Jpylyzer*
pixel compare
compareimage
properties
propertiesprofile pass fail
pixelsidentical?
propertiesmatch?
valid JP2?
yes
no
no
no
yes
yes
*Imported as module in Python‐based workflow
![Page 20: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/20.jpg)
SCAPEExample 3: pre‐ingest quality control Wellcome
Library
‐
JP2s produced in‐house and by external suppliers
‐
Use jpylyzer
to validate against JP2 spec
‐
Use extracted properties to validate against a profile
(Progression order, ratio, layers, ….)
‐
Profile coded as XML schema(So jpylyzer
output can be validated against schema)
![Page 21: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/21.jpg)
SCAPEPlatforms and licensing stuff
![Page 22: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/22.jpg)
SCAPEhttp://www.openplanetsfoundation.org/software/jpylyzer
![Page 23: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/23.jpg)
SCAPECommunity involvement
![Page 24: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/24.jpg)
SCAPEAcknowledgements
Debian
packages‐
Dave Tarrant (Uni
Southampton/OPF)
‐
Miguel Ferreira, Rui Castro, Hélder
Silva (KEEP Solutions),
‐
Rainer Schmidt (AIT)
Feedback on early versions‐
Christy Henshaw (Wellcome
Library)
‐
Ross Spencer (TNA)
‐
Wouter Kool (KB)
![Page 25: Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool](https://reader034.vdocument.in/reader034/viewer/2022052619/5550b792b4c905ff618b4c2b/html5/thumbnails/25.jpg)
SCAPE
#SCAPEProject
http://www.scape‐project.eu
This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).
Funding