jpylyzer, a validation and feature extraction tool developed in scape project

25
Johan van der Knijff 1,2 , René van der Ark 1 , Carl Wilson 3 1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation 3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012 Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool

Upload: scape-project

Post on 05-Dec-2014

640 views

Category:

Technology


0 download

DESCRIPTION

Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.

TRANSCRIPT

Page 1: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Johan van der Knijff1,2, René van der Ark1, Carl Wilson3

1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation

3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool

Page 2: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

National Programme for preservation of paper heritage Digitisation as a means to conserve threatened paper

originals

Metamorfoze

TIFF JP2

146 TB

Migrate by end 2012

Page 3: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE JP2 from JISC 1 Newspaper Collection (BL)

Page 4: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

“Well-formed and valid”

JP2 from JISC 1 Newspaper Collection (BL)

Page 5: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Hardware failure may result in

corrupted images

Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg

Page 6: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Not all encoders

produce standard

compliant images

Page 7: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Possible solutions

Option 1

Improve JPEG 2000 module JHOVE

But no institutional support, superseded by JHOVE2 (?)

Option 2

Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet)

Option 3

Develop dedicated tool

Page 8: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0

Jpylyzer tool

Page 9: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Jpylyzer tool

- First prototype: December 2011

- Refactoring of original code: Jan 2012

- Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna

- Add remaining functionality, bugfixes: Apr-May 2012 (current version: 1.5)

Page 10: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

JPEG 2000 Signature box

Contiguous Codestream box 0

File Type box

JP2 Header box (superbox)

Contiguous Codestream box n

IPR box

XML box(es)

UUID box(es)

UUID Info box(es) (superbox)

JP2 file

Page 11: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Command-line use

Page 12: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Result

Page 13: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Properties extraction (excerpt)

Page 14: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Properties embedded ICC profile

Page 15: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Documentation

Page 16: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

Number of images 2,152,116

Total size 45 TB

Average image size 21.8 MB

Number of threads 1

Time 21 days*

Images/day/ thread 100,000

TB/day/thread 2

Example 1: detection of broken JP2s in JISC 1 Newspapers

*Includes unzipping, actual time needed by jpylyzer much less!

Page 17: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Results

- 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available

- JISC 2 (> 1 million images): 3 broken JP2s

- 19th Century books (> 22 million images): no broken JP2s

Page 18: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

TIFF JP2

146 TB

Migrate by end 2012

Example 2: quality control Metamorfoze migration

Page 19: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE TIFF

Aware JP2K SDK

JP2 Jpylyzer*

pixel compare

compare image

properties

properties profile

pass fail

pixels identical?

properties match?

valid JP2?

yes

no

no

no

yes

yes

*Imported as module in Python-based workflow

Page 20: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Example 3: pre-ingest quality control Wellcome

Library

- JP2s produced in-house and by external suppliers

- Use jpylyzer to validate against JP2 spec

- Use extracted properties to validate against a profile (Progression order, ratio, layers, ….)

- Profile coded as XML schema (So jpylyzer output can be validated against schema)

Page 21: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Platforms and licensing stuff

Page 22: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE http://www.openplanetsfoundation.org/software/jpylyzer

Page 23: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Community involvement

Page 24: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE Acknowledgements

Debian packages - Dave Tarrant (Uni Southampton/OPF)

- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),

- Rainer Schmidt (AIT)

Feedback on early versions - Christy Henshaw (Wellcome Library)

- Ross Spencer (TNA)

- Wouter Kool (KB)

Page 25: Jpylyzer, a validation and feature extraction tool developed in SCAPE project

SCAPE

#SCAPEProject

http://www.scape-project.eu

This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Funding