Transcript
Page 1: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Johan van der Knijff1,2, René van der Ark1, Carl Wilson3

1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation

3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool

Page 2: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

National Programme for preservation of paper heritage Digitisation as a means to conserve threatened paper

originals

Metamorfoze

TIFF JP2

146 TB

Migrate by end 2012

Page 3: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE JP2 from JISC 1 Newspaper Collection (BL)

Page 4: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

“Well-formed and valid”

JP2 from JISC 1 Newspaper Collection (BL)

Page 5: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Hardware failure may result in

corrupted images

Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg

Page 6: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Not all encoders

produce standard

compliant images

Page 7: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Possible solutions

Option 1

Improve JPEG 2000 module JHOVE

But no institutional support, superseded by JHOVE2 (?)

Option 2

Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet)

Option 3

Develop dedicated tool

Page 8: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0

Jpylyzer tool

Page 9: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Jpylyzer tool

- First prototype: December 2011

- Refactoring of original code: Jan 2012

- Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna

- Add remaining functionality, bugfixes: Apr-May 2012 (current version: 1.5)

Page 10: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

JPEG 2000 Signature box

Contiguous Codestream box 0

File Type box

JP2 Header box (superbox)

Contiguous Codestream box n

IPR box

XML box(es)

UUID box(es)

UUID Info box(es) (superbox)

JP2 file

Page 11: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Command-line use

Page 12: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Result

Page 13: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Properties extraction (excerpt)

Page 14: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Properties embedded ICC profile

Page 15: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Documentation

Page 16: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Number of images 2,152,116

Total size 45 TB

Average image size 21.8 MB

Number of threads 1

Time 21 days*

Images/day/ thread 100,000

TB/day/thread 2

Example 1: detection of broken JP2s in JISC 1 Newspapers

*Includes unzipping, actual time needed by jpylyzer much less!

Page 17: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Results

- 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available

- JISC 2 (> 1 million images): 3 broken JP2s

- 19th Century books (> 22 million images): no broken JP2s

Page 18: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

TIFF JP2

146 TB

Migrate by end 2012

Example 2: quality control Metamorfoze migration

Page 19: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE TIFF

Aware JP2K SDK

JP2 Jpylyzer*

pixel compare

compare image

properties

properties profile

pass fail

pixels identical?

properties match?

valid JP2?

yes

no

no

no

yes

yes

*Imported as module in Python-based workflow

Page 20: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Example 3: pre-ingest quality control Wellcome

Library

- JP2s produced in-house and by external suppliers

- Use jpylyzer to validate against JP2 spec

- Use extracted properties to validate against a profile (Progression order, ratio, layers, ….)

- Profile coded as XML schema (So jpylyzer output can be validated against schema)

Page 21: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Platforms and licensing stuff

Page 22: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE http://www.openplanetsfoundation.org/software/jpylyzer

Page 23: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Community involvement

Page 24: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Acknowledgements

Debian packages - Dave Tarrant (Uni Southampton/OPF)

- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),

- Rainer Schmidt (AIT)

Feedback on early versions - Christy Henshaw (Wellcome Library)

- Ross Spencer (TNA)

- Wouter Kool (KB)

Page 25: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

#SCAPEProject

http://www.scape-project.eu

This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Funding


Top Related