using jhove2 for policy assessment of files
DESCRIPTION
Using JHOVE2 for Policy Assessment of Files. Richard Anderson Code4LibCon Preconference 2/7/2011 http://code4lib.org/conference/2011/schedule#preconf 13:30-16:30 : Persimmon Room. Agenda 13:30-16:30. What is JHOVE2 ? Characterization of digital objects Validation vs Assessment - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/1.jpg)
Using JHOVE2 for Policy Assessment of Files
Richard AndersonCode4LibCon Preconference
2/7/2011
http://code4lib.org/conference/2011/schedule#preconf13:30-16:30 : Persimmon Room
![Page 2: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/2.jpg)
Agenda 13:30-16:30
• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules
![Page 3: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/3.jpg)
JHOVE2 is …
… a project to develop a next-generation open source framework and application for format-aware characterization
… a collaborative undertaking of the California Digital Library (CDL), Portico, and Stanford University
… a two year grant from the Library of Congress as part of its National Digital Information Infrastructure Preservation Program (NDIIPP)
![Page 4: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/4.jpg)
“What? So what?”
Characterization is the automated determination of the intrinsic and extrinsic properties of a formatted object
– Identification
– Feature extraction
– Validation
– Assessment
Determining the presumptive format of a digital object based on suggestive extrinsic hints and intrinsic signatures
Reporting the intrinsic properties of an object significant for classification, analysis, and planning
![Page 5: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/5.jpg)
What's new in JHOVE2?
Processing of multi-file objects as well as embedded objects inside files
Recursive processing of containers objects
Plug-in Format Modules
Buffered I/O
Internationalized output
Clean APIs and modern design patterns
Je ne sais quoi !
![Page 6: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/6.jpg)
API design idioms
Separation of concerns
– Annotation and Reflection confluence.ucop.edu/display/JHOVE2Info/Background+Papers
Inversion of Control (IOC) / Dependency Injection
– Martin Fowlermartinfowler.com/articles/injection.html
– Spring Frameworkwww.springsource.org/
![Page 7: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/7.jpg)
Project HomeDomain name
– http://jhove2.org/
Code Repository– https://bitbucket.org/jhove2/main/wiki/Home
• Public Wiki/Documentation• Browse/Clone Source Code• Download Release Packages• Changeset History• Issue Tracking
Mailing lists– [email protected]– [email protected]
![Page 8: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/8.jpg)
JHOVE2 Documentation
Complete documentation
– User’s guide
– Architectural overview
– Module specifications
– Programmer’s guide
![Page 9: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/9.jpg)
Agenda 13:30-16:30
• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules
![Page 10: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/10.jpg)
Characterization
![Page 11: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/11.jpg)
Validation vs. AssessmentValidation is the determination of the level of conformance to the normative requirements of a format’s authoritative specification
– To the extent that there is community consensus on these requirements, validation is an objective determination – Hard coded in JHOVE2 Modules
Assessment is the determination of the level of acceptability for a specific purpose on the basis of locally-defined policy rules
– Since these rules are locally configurable, assessment is a subjective determination – Scripted via config files
![Page 12: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/12.jpg)
Format Specifications
Format Specification
JPEG 2000 JP2 (ISO/IEC 15444-1), JPX (ISO/IEC 15444-2)
PDF PDF 1.0 – 1.7, ISO 3200-1, PDF/A-1 (ISO 19005-1), PDF/X-1 (ISO 15920-1), -1a (ISO 15930-4), -2 (ISO 15930-5) -3 (ISO 15930-6)
TIFF TIFF 4 – 6, Class B, F, G, P, R, Y, TIFF/EP (ISO 12234-2), TIFF/IT (ISO 12639), GeoTIFF, Exif (JEITA CP-3451), DNG
UTF-8 ASCII (ANSI X3.4)
WAVE BWF (EBU N22-1997)
![Page 13: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/13.jpg)
Validation vs. AssessmentValidation is the determination of the level of conformance to the normative requirements of a format’s authoritative specification
– To the extent that there is community consensus on these requirements, validation is an objective determination – Hard coded in JHOVE2 Modules
Assessment is the determination of the level of acceptability for a specific purpose on the basis of locally-defined policy rules
– Since these rules are locally configurable, assessment is a subjective determination – Scripted via config files
![Page 14: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/14.jpg)
Putting it another way …
Assessment is the evaluation ofa source unit's
reportable properties against a set of
policy-based rules
![Page 15: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/15.jpg)
Assessment is the evaluation ofa source unit's
– File (UTF-8)– File with embedded ByteStream(s)
(TIFF with ICC profile)– Aggregate (Directory, ZIP ) – ClumpSource (ShapeFile)
reportable properties against a set of
policy-based rules
![Page 16: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/16.jpg)
Assessment is the evaluation ofa source unit's reportable properties
– Format Identification– Features – Validity
against a set of policy-based rules
![Page 17: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/17.jpg)
Assessment is the evaluation ofa source unit's
reportable properties
against a set of policy-based rules– Is the item acceptable?
– Is there a preservation risk?– What level of preservation service?– Should we flag object for future action?
![Page 18: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/18.jpg)
Practical Applications of Assessment
• Ingest workflows
• Migration workflows
• Digitization workflows
• Publishing workflows
![Page 19: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/19.jpg)
Agenda 13:30-16:30
• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules
![Page 20: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/20.jpg)
Running JHOVE
jhove2.sh –d Text –o outfile.txt myfile.xmlDisplay format choices are: Text (default), JSON, and XML.
File argument can be any of:– Filename– Directory name– URL– Set of space-delimited filepaths
http://bitbucket.org/jhove2/main/wiki/documents/JHOVE2-Users-Guide.pdf
![Page 21: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/21.jpg)
JHOVE2 Output options
• Input File– xml-schemaLocation-cannot-resolve.xml
• Text– text-output.txt
• XML– xml-output.xml
• JSON– json-output.txt
![Page 22: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/22.jpg)
FileSource:
Path: E:\samples\xml\schema-sample.xml
Size (byte): 9516
LastModified: 2010-10-12T11:55:29-06:00
SourceName: schema-sample.xml
StartingOffset (byte): 0
…
JHOVE2 Output
![Page 23: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/23.jpg)
Format Identification
PresumptiveFormats:
PresumptiveFormat {FormatIdentification}:
NativeIdentifier {I8R}:
Namespace: PUID
Value: fmt/101 PRONOM Identifier
JHOVE2Identifier {I8R}:
Namespace: JHOVE2
Value: http://jhove2.org/terms/format/xml
...
![Page 24: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/24.jpg)
PRONOM Format Registryhttp://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=638
Name Extensible Markup LanguageVersion 1.0Other names XML (1.0)Identifiers PUID: fmt/101
Apple Uniform Type Identifier: public.xmlMIME: text/xml
Classification Text (Mark-up)Description The Extensible Markup Language (XML) is a general
purpose markup language for creating other, special purpose, markup languages, and is a simplified subset of SGML. …
![Page 25: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/25.jpg)
Agent used for Identification
Module {DROIDIdentifier}:
SignatureFile: …/DROID_SignatureFile_V20.xml
Version: 2.0.0
ReleaseDate: 2010-09-10
WrappedProduct:
Name: DROID
Version: 4.0.0
ReleaseDate: 2009-07-23
...
![Page 26: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/26.jpg)
DROIDhttp://sourceforge.net/projects/droid/ DROID (Digital Record Object Identification) is an automatic
file format identification tool. It is the first in a planned series of tools developed by The National Archives under the umbrella of its PRONOM technical registry service
![Page 27: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/27.jpg)
XML Module Module {XmlModule}:
SaxParser:
Parser: org.apache.xerces.parsers.SAXParser
XmlDeclaration:
Version: 1.0
Encoding: UTF-8
Standalone: no
RootElement:
Name: mets
Namespace: http://www.loc.gov/METS/
![Page 28: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/28.jpg)
XML Module (namespaces) NamespaceInformation:
NamespaceCount: 2
Namespaces:
Namespace:
URI: http://www.loc.gov/METS/
Declarations:
Prefix: [default]
SchemaLocations:
SchemaLocation:
Location: http://www.loc.gov/standards/mets/version15/mets.xsd
Namespace:
URI: http://www.loc.gov/mix/v10
Declarations:
Prefix: mix
![Page 29: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/29.jpg)
XML Module (cont)
ValidationResults:
ParserWarnings {ValidationMessageList}:
ValidationMessageCount: 0
ParserErrors {ValidationMessageList}:
ValidationMessageCount: 0
FatalParserErrors {ValidationMessageList}:
ValidationMessageCount: 0
isWellFormed: true
isValid: true
![Page 30: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/30.jpg)
Format Modules from JHOVE2 Team
ICC color profileJPEG 2000PDFSGMLShapefile
TIFFUTF-8WAVEXMLZip
JHOVE2 can identify (by DROID) many more formats than it can validate (by modules)
![Page 31: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/31.jpg)
Other Module Development3rd party development activities
– NetCDF and GRIB modules (Wegener Institute)
– Integration with DuraCloud (DuraSpace)– ARC module (Bibliothèque nationale de France)– WARC, JPEG, GIF modules (CDL, hopefully ;-)
Possible development efforts– Additional format modules– Configuration GUIs– JHOVE2-as-a-service– Integration with DAITTS, DSpace, Fedora, FITS, etc.
Suggestions, volunteers and funders welcome
![Page 32: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/32.jpg)
AssessmentModule Module {AssessmentModule}:
AssessmentResultSets:
AssessmentResultSet:
RuleSetName: XmlRuleSet
RuleSetDescription: RuleSet for Xml Module
ObjectFilter: org.jhove2.module.format.xml.XmlModule
BooleanResult: true
AssessmentResults:
AssessmentResult:
RuleName: XmlValidityRule
RuleDescription: Is the XML file acceptable?
BooleanResult: true
NarrativeResult: Acceptable
![Page 33: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/33.jpg)
Agenda 13:30-16:30
• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules
![Page 34: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/34.jpg)
JHOVE2 Abstractions
• Source Unit• Module• Reportable• Reportable Property• Message
![Page 35: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/35.jpg)
Source UnitA formatted object about which characterization information can be meaningfully reported
– Unitary File e.g. UTF-8 text file File inside of a container e.g. TIFF inside a Zip Byte stream inside a file e.g. ICC inside a TIFF
– Aggregate Directory Directory inside of a container Clump e.g. Shapefile File set e.g. command line arguments
For purposes of characterization, directories, file sets, and clumps are considered format types
![Page 36: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/36.jpg)
Source Interface (Java)
public Set<FormatIdentification> getPresumptiveFormats() {return presumptiveFormatIdentifications;
}public List<Module> getModules() {
return this.modules;}public List<Source> getChildSources() {
return this.children;}
![Page 37: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/37.jpg)
![Page 38: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/38.jpg)
Format Module• implements Parser• implements Validator • Implements Reportable• Imports org.jhove2.annotation.ReportableProperty
public long parse(JHOVE2 jhove2, Source source, Input input) {// extract features and //fill in the reportable properties fields
. . . }
![Page 39: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/39.jpg)
Reportables
A Reportable is a named set of properties– Reportables correspond to Java classes
– Including classes for sources and modules
Also define reportables for the major conceptual structures inherent to a format
– JPEG 2000: Box
– TIFF: IFH, IFD, IFD entry (“tag”)
– UTF-8: Character stream, character
– WAVE: Chunk
![Page 40: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/40.jpg)
Reportable Interfacepackage org.jhove2.core
public interface Reportable { public I8R getReportableIdentifier(); public String getReportableName(); public void setReportableName(String name);}
public abstract class AbstractReportable implements Reportable{ protected I8R reportableIdentifier; protected String reportableName;}
A reportable class implements the Reportable marker interface
![Page 41: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/41.jpg)
ReportablePropertiesA ReportableProperty is a named, typed value
– org.jhove2.annotation.ReportableProperty – Unique formal identifier– Data type
Scalar or collection Java types, JHOVE2 primitive types, or JHOVE2 reportables
– Typed value– Description of correct semantic interpretation– Properties correspond to fields
![Page 42: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/42.jpg)
ReportableProperty AnnotationEach reportable property is represented by a field and accessor and mutator methodsThe accessor method must be marked with the @ReportableProperty annotation
public class MyReportable implements Reportable{ protected String myProperty;
@ReportableProperty(order=1, desc= “description”, ref= “reference”) public String getMyProperty() { return this.myProperty; }
public void setMyProperty(String property) { this.myProperty = property; }}
![Page 43: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/43.jpg)
Wave Reportable Properties
chunks[ ]
formatChunkNotBeforeDataChunkMessage
missingRequiredFormatChunkMessage
missingRequiredDataChunkMessage
missingRequiredFactChunkMessage
isValid
childChunks[ ]hasPadByteidentifierisValidsize
![Page 44: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/44.jpg)
UTF-8 Reportable Properties
byteOrderMark
c0Characters
c1Characters
codeBlocks
eOLMarkers
invalidCharacters[ ]
isValid
numCharacters
numLines
numNonCharacters
c0Controlc1ControlcodeBlockcodePointcodePointOutOfRangecoverageinvalidByteValuesisByteOrderMarkisC0ControlisC1ControlisNonCharacterisValidsize
![Page 45: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/45.jpg)
XML Reportable Properties
![Page 46: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/46.jpg)
Fields for the reportable properties protected String saxParser = "org.apache.xerces.parsers.SAXParser"; protected XmlDeclaration xmlDeclaration = new XmlDeclaration(); protected String xmlRootElementName; protected List<XmlDTD> xmlDTDs; protected HashMap<String,XmlNamespace> xmlNamespaceMap; protected List<XmlNotation> xmlNotations; protected List<String> xmlCharacterReferences; protected List<XmlEntity> xmlEntitys; protected List<XmlProcessingInstruction> xmlProcessingInstructions; protected List<String> xmlComments; protected XmlValidationResults xmlValidationResults ; protected boolean wellFormed ;
![Page 47: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/47.jpg)
Getter methods for reportable propertiesimport org.jhove2.annotation.ReportableProperty;
@ReportableProperty(order = 1, value = "Java class used to parse the XML")
public String getSaxParser() { return saxParser; } @ReportableProperty(order = 2, value = "XML Declaration data") public XmlDeclaration getXmlDeclaration() { return xmlDeclaration; } @ReportableProperty(order = 3, value = "Name of the document's root element") public String getXmlRootElementName() { return xmlRootElementName; }
![Page 48: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/48.jpg)
Messagesif (position == start && ch.isByteOrderMark()) { Object [] messageParms = new Object [] {position};
this.bomMessage = new Message(Severity.INFO,Context.OBJECT,"org.jhove2.module.format.utf8.UTF8Module.bomMessage",messageParms );
}
![Page 49: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/49.jpg)
Messages
• Messages are reportable properties– Unique identifier
info:jhove2/message/…– Context
Process Condition arising from the process of characterization
Object Condition arising in the object being characterized
– Severity Error Warning Info
– Internationalizable
![Page 50: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/50.jpg)
Agenda 13:30-16:30
• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules
http://code4lib.org/conference/2011/schedule#preconf
![Page 51: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/51.jpg)
Assessment rules
Assertions (logical expressions) based on
– Presence/absence of a property– Constraints on property values– Combinations of properties/values
![Page 52: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/52.jpg)
Predicate Logic
• Rules use a construct whose basic structure looks like this:
If (condition)
Then (consequent)
Else (alternative)
http://en.wikipedia.org/wiki/Conditional_(programming)
![Page 53: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/53.jpg)
ConditionA condition is defined by a
universal or existential qualifier “for all” “for any”¬ “not any”
and an arbitrary set of predicates {ALL_OFF | ANY_OF | NONE_OF}
(predicate) (predicate) ...
http://www.csm.ornl.gov/~sheldon/ds/sec1.6.html
![Page 54: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/54.jpg)
Predicate
Each predicate is a string containing a boolean expression
xmlDeclaration.standalone == 'yes'
These assertions take the form:property relation value
Supported relational operators include:
== != < > =< =>
contains
exists ( != null)
![Page 55: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/55.jpg)
XML Assessment rule
If ANY_OF validity == true ;
(validity == undetermined) and (wellFormed == true)Then AcceptableElse Not acceptableEnd If
![Page 56: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/56.jpg)
JPEG 2000 Assessment Rule
If ALL_OF validity == true;
exists(colourBox);
exists(resolutionBox.capture)Then AcceptableElse Not acceptableEnd If
![Page 57: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/57.jpg)
Wave Assessment rule
If ALL_OF validity == true ;
exists(broadcastWaveExtensionChunk) ;
waveFormatChunk.nSamplesPerSec == 96000 ;
waveFormatChunk.nBitsPerSample == 24Then AcceptableElse Not acceptableEnd If
![Page 58: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/58.jpg)
TIFF Assessment rule
If ANY_OF validity == true ;
((ifd.messages contains ‘offsetNotByteAligned’) or (ifd.messages contains ‘dateNotWellFormed’))Then AcceptableElse Not acceptableEnd If
![Page 59: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/59.jpg)
Rules Engines
• JSR 94: JavaTM Rule Engine APIhttp://jcp.org/en/jsr/detail?id=94
• Rule Engines Overviewhttp://jadex-rules.informatik.uni-hamburg.de/xwiki/bin/view/Resources/Rule+Engines
• Top 10 Java Business Rule Engineshttp://blog.taragana.com/index.php/archive/top-10-java-business-rule-engines/
• Introduction to Droolshttp://www.intltechventures.com/presentations/2008-01-26-Introduction-to-Drools.pdf
![Page 60: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/60.jpg)
Expression Languages• Predicates (conditions) are evaluated using an domain-specific
language that supports scripted examination of Java objects
• MVEL (MVFLEX Expression Language)
http://mvel.codehaus.org/• OGNL (Object-Graph Navigation Language)
http://www.opensymphony.com/ognl
• Groovyhttp://groovy.codehaus.org/
• Open Source Expression Languages in Javahttp://java-source.net/open-source/expression-languages
http://www.java-opensource.com/open-source/expression-languages.html
![Page 61: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/61.jpg)
![Page 62: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/62.jpg)
![Page 63: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/63.jpg)
![Page 64: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/64.jpg)
Assessment Module at work public void assess(JHOVE2 jhove2, Source source) throws JHOVE2Exception { /* Assess the source unit. */ this.configInfo = jhove2.getConfigInfo(); List<Module> modules = source.getModules(); for (Module module : modules) { assessObject(module); this.getModuleAccessor().persistModule(this); } assessObject(source); this.getModuleAccessor().persistModule(this);
}
![Page 65: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/65.jpg)
AssessObject Method private void assessObject(Object assessedObject) throws JHOVE2Exception {
String objectFilter = assessedObject.getClass().getName();List<RuleSet> ruleSetList = getRuleSetFactory()
.getRuleSetList(objectFilter);if (ruleSetList != null) { for (RuleSet ruleSet : ruleSetList) {
if (ruleSet.isEnabled()) { AssessmentResultSet resultSet =
new AssessmentResultSet();assessmentResultSets.add(resultSet);
resultSet.setRuleSet(ruleSet); resultSet.fireAllRules(assessedObject);
} } }
![Page 66: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/66.jpg)
Fire Off the Rules
![Page 67: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/67.jpg)
![Page 68: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/68.jpg)
Sequence Diagram
![Page 69: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/69.jpg)
Identification
![Page 70: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/70.jpg)
Feature extraction
![Page 71: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/71.jpg)
Assessmemt
![Page 72: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/72.jpg)
Agenda 13:30-16:30
• What is JHOVE2 ?• Characterization of digital objects• Validation vs Assessment• Examples of JHOVE2 output• Source Units, Modules, Reportable Properties• Implementation of Assessment• Configuration of Assessment Rules
![Page 73: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/73.jpg)
Assessment Configuration• Lists of properties for a Module can be generated
using the ReportableInstanceTraverser utilityUSAGE: java -cp CLASSPATH
org.jhove2.app.util.traverser.ReportableInstanceTraverser fully-qualified-class-name output-file-path {optional boolean should-recurse(default true)}
• wave-property-list.txt
• tiff-module-properties.txt
![Page 74: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/74.jpg)
Assessment Configuration• Rules are configured using ARules utility
– Utility developed by CDL to create rule set in XML– Future plans: a GUI
• ARules output is a Spring config fle
![Page 75: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/75.jpg)
ARules configurationruleset XmlRuleSet enabled org.jhove2.module.format.xml.XmlModule
desc Ruleset for XML module
rule XmlStandaloneRule enabled
desc Does XML Declaration specify standalone status?
cons Is Standalone
alt Is Not Standalone
quant all
pred xmlDeclaration.standalone == "yes"
rule XmlAcceptableRule enabled
Desc Is the XML status acceptable?
cons Acceptable
alt Not Acceptable
quant any
pred valid.name() == "True"
pred (valid.name() == "Undetermined") && (wellFormed.name() == "True")
![Page 76: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/76.jpg)
RuleSet Spring Bean <!-- RuleSet bean for the XmlModule --><bean id="XmlRuleSet" class="org.jhove2.module.assess.RuleSet"
scope="singleton"> <property name="name" value="XmlRuleSet"/> <property name="description"
value="RuleSet for Xml Module"/> <property name="objectFilter"
value="org.jhove2.module.format.xml.XmlModule"/> <property name="rules"> <list value-type="org.jhove2.module.assess.Rule">
<ref local="XmlStandaloneRule"/><ref local="XmlValidityRule"/>
</list></property><property name="enabled" value="true"/>
</bean>
![Page 77: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/77.jpg)
Rule Spring Bean<!-- Rule bean for evaluating validity value --><bean id="XmlValidityRule"
class="org.jhove2.module.assess.Rule" scope="singleton"> <property name="name" value="XmlValidityRule"/> <property name="description"
value="Is the XML validity status acceptable?"/><property name="consequent" value="Acceptable"/> <property name="alternative" value="Not Acceptable"/> <property name="quantifier" value="ANY_OF"/><property name="predicates"> <list value-type="java.lang.String">
<value><![CDATA[ valid.toString() == 'true' ]]</value><value><![CDATA[ (valid.toString() == 'undetermined') &&
(wellFormed.toString() == 'true') ]]></value> </list></property><property name="enabled" value="true"/>
</bean>
![Page 78: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/78.jpg)
Spring Config Filesconfig│ └───spring │ └───module ├───aggrefy │ jhove2-aggrefy-config.xml │ ├───assess │ jhove2-assess-config.xml │ jhove2-ruleset-xml-config.xml │ ├───digest │ jhove2-digest-config.xml │ ├───display │ jhove2-display-config.xml │ ├───identify │ jhove2-display-config.xml
![Page 79: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/79.jpg)
Assessment Output
Results stored as new characterization properties
Rule evaluation output includes – Rule's name and brief description– Boolean value of the condition that was evaluated– Text value of the consequent of alternative– Details of the predicate evaluation results
![Page 80: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/80.jpg)
Assessment Output ExampleModule {AssessmentModule}:
AssessmentResultSets: AssessmentResultSet:
RuleSetName: XmlRuleSet RuleSetDescription: Ruleset for XML module
ObjectFilter: org.jhove2.module.format.xml.XmlModule BooleanResult: false AssessmentResults:
AssessmentResult: RuleName: XmlStandaloneRule RuleDescription: Does XML Declaration specify standalone status? BooleanResult: false NarrativeResult: Is Not Standalone AssessmentDetails: ALL_OF { xmlDeclaration.standalone == "yes" =>
false; } AssessmentResult: RuleName: XmlAcceptableRule RuleDescription: Is the XML status acceptable? BooleanResult: true NarrativeResult: Acceptable AssessmentDetails: ANY_OF { valid.name() == "True" => true;(valid.name( )
== "Undetermined") && (wellFormed.name() == "True") => false; }
![Page 81: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/81.jpg)
Actionable Outcomes?
– Assessment outcome is informational data– Surrounding workflows may utilize assessment
results to guide control mechanism– JHOVE2 provides API, but does not initiate actions
![Page 82: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/82.jpg)
Assessment Enhancements• Assessment Config file editing
– Make it easier for a non-programmer to edit– Editing should be bullet-proofed if possible
• GUI User interface– Presents a GUI treeview that lists reportable properties in a navigable
hierarchy.
• Sanity checking– Pre-test config files to ensure compatability
• Command-line invocation of the sanity checker• Run check whenever installed modules have been changed
– Also have robust reporting in case property is missing
![Page 83: Using JHOVE2 for Policy Assessment of Files](https://reader036.vdocument.in/reader036/viewer/2022062408/56814477550346895db11034/html5/thumbnails/83.jpg)
JHOVE2 Community
Wiki– http://jhove2.org/– https://bitbucket.org/jhove2/main/wiki/Modules
Mailing lists– [email protected]– [email protected]