mets java toolkit stephen l. abrams harvard university library [email protected] dlf spring...

22
METS Java Toolkit Stephen L. Abrams Harvard University Library [email protected] DLF Spring Forum May 10-12, 2002, Chicago, IL

Upload: mildred-hutchinson

Post on 17-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

METS Java Toolkit

Stephen L. AbramsHarvard University Library

[email protected]

DLF Spring ForumMay 10-12, 2002, Chicago, IL

Page 2: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 2

Why Do We Need a Toolkit?

• Automation for archiving project with multiple content providers.– METS used in hierarchical SIP– Client-side tools to produce syntactically valid

SIPs

• Use of METS to encapsulate complex objects, with multiple content streams.– Page turner, currently based on MOA2

Page 3: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 3

Functional Requirements

• Java API to provide support for generic METS.

• Support procedural:– Construction of in-memory representation– Validation– Marshalling/unmarshalling to/from instance

documents

• Usable as basis for application-specific tools.– Sub-class for specific functionality or restrictions

Page 4: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 4

JAXB

• API based on Sun’s JAXB specification, but not the tools.

JAXBcompiler

Sourceschema

Bindingschema

Schemaclasses

JAXB bindpackage

JAXBmarshalpackage

Page 5: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 5

Toolkit API

• Each schema element corresponds to a class. Mets mets = new Mets();

• Accessor/mutator methods for each attribute. mets.setID(id); String id = mets.getID();

• Accessor/mutator methods for content model. List content = Mets.getContent(); content.add(child);

Page 6: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 6

Toolkit API UML+validateThis()

«interface»Element

+id() : String

«interface»IdentifiableElement

+validate()

«interface»RootElement

+marshal(in mob : MarshallableObject)+writer() : XMLWriter

Marshaller

+scanner() : XMLScanner+unmarshal() : MarshallableObject

Unmarshaller

+invalidate()+validate()+validateThis()

-_valid : bool

ValidatableObject

+marshal()+unmarshal()

MarshallableObject

+marshal()+validate()

MarshallableRootElement

+chars() : String+chars(in chars : String)

PCData

+validate(in vob : ValidatableObject)

Validator

javax.xml.bind

+atAttribute() : boolean+atAttributeValue() : boolean+atEnd() : boolean+atStart() : boolean+atChars() : boolean+takeAttributeName() : String+takeAttributeValue() : String+takeChars() : String+takeEmpty()+takeEnd() : String+takeStart() : String

XMLScanner

+chars(in chars : String)+end(in name : String)+flush()+leaf()+start(in name : String)+attribute(in name : String, in value : String)

XMLWriter

javax.xml.marshal

+get*()+set*()+marshal(in m : Marshaller)+validate(in v : Validator)+validateThis()+unmarshal(in u : Unmarshaller)

-_ID : String-_OBJID : String-_LABEL : String-_TYPE : String-_PROFILE : String-_content : List

Mets

MetsHdr

FileSec

StructMap

DmdSec

AmdSec

BehaviorSec

...mets

Page 7: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 7

Why Do We Need a New API?

• Why not use DOM?– Unnatural unit of granularity: elements and

attributes are both nodes in DOM tree

• Why not JDOM?– Explicit support for validation

• JAXB compiler could (potentially) be used to support METS upgrades.

Page 8: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 8

Procedural Construction

• The initial current element is <mets>• For each child element in the current

element’s content model:– Instantiate an appropriate element object– Set its attributes– Define its content model– Add it to the content model of its parent

Page 9: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 9

Procedural Construction (Ex.)

Mets mets = new Mets();mets.setID ("1234");

MetsHdr metsHdr = new MetsHdr(); metsHdr.setCREATEDATE(new Date());

Agent agent = new Agent(); agent.setROLE(Role.CREATOR);

Name name = new Name (); name.getContent().add(new PCData ("S. Abrams"));

agent.getContent().add(name);

metsHdr.getContent().add(agent);

mets.getContent().add(metsHdr);...

Page 10: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 10

Validation• Global

– ID uniqueness– IDREF-to-ID consitency

• Local– Existence of required attributes and content

model elements

Mets mets = new Mets();...mets.validate ();

Page 11: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 11

Marshalling

• Serializing in-memory representation to an output stream.

Mets mets = new Mets();...FileOutputStream out = new FileOutputStream("mets.xml"); mets.validate ();mets.marshal(out);

Page 12: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 12

Unmarshalling• Parsing instance document and creating in-

memory representation.• Implicit local validation during parsing;

global validation must be explicit.• Internal parsing with Jim Clark’s XP.

FileInputStream in = new FileInputStream("mets.xml");Mets mets = Mets.unmarshal(in);mets.validate ();...

Page 13: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 13

Extension Schemas

• Toolkit could be extended to include explicit support for additional schemas.

• Generic namespace-aware Any class:

Any any = new Any("elem");any.setAttribute("attr", value);String attr = any.getAttribute("attr");any.getContent().add(child);

Page 14: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 14

Additional Work

• To be done any day now…– Support for <area>, <par>, and <seq>– Strict validation of sequence ordering– Marshal non-UTF-8 encodings– Base64 encoding/decoding methods for binData and Fcontent

– Support for entity references– Diagnostic error messages

Page 15: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 15

Distribution

• HUL’s intent is to make the toolkit freely available under an Open Source license.

• Minimal support (if any).

• Community process for maintenance?

• Does an appropriate organizational home exist?

Page 16: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 16

Implementation

• METS schema, Version 1.0 (zeta)

• JAXB specification, Version 0.21 <http://java.sun/xml/jaxb>

• XP, Version 0.5 <http://jclark.com/xml/xp>

• Java J2SE and JDK 1.3.1

• Solaris 2.7

• Home page: <http://hul.harvard.edu/mets>

Page 17: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 17

Marshal.javaimport java.util.*;

import org.mets.xml.bind.*;

import org.mets.xml.mets.*;

public class Marshal

{ public static void main (String [] args)

{

Mets mets = new Mets ();

mets.setOBJID ("1234-5678(2002)9:1<>1.0.CO;9-X");

mets.setLABEL ("METS Java toolkit");

mets.setTYPE ("Article");

MetsHdr metsHdr = new MetsHdr ();

metsHdr.setCREATEDATE (new Date ());

metsHdr.setRECORDSTATUS ("DRAFT");

Agent agent = new Agent ();

agent.setROLE (Role.CREATOR);

Name name = new Name ();

name.getContent ().add (new

PCData ("S. L. Abrams"));

agent.getContent ().add (name);

Note note = new Note ()

note.getContent ().add (new

PCData ("HUL/OIS"));

agent.getContent ().add (note);

note = new Note ();

note.getContent ().add (new

PCData ("Special order, 2002/02/25"));

agent.getContent ().add (note);

metsHdr.getContent ().add (agent);

AltRecordID doi = new AltRecordID ();

doi.setTYPE ("DOI");

doi.getContent ().add (new

PCData ("10.1234/56789"));

AltRecordID nrs = new AltRecordID ();

nrs.setTYPE ("NRS");

nrs.getContent ().add (new

PCData ("nrs:hul.ois:10203"));

metsHdr.getContent ().add (doi);

metsHdr.getContent ().add (nrs);

mets.getContent ().add (metsHdr);

DmdSec dmdSec = new DmdSec ();

dmdSec.setID ("xyz-123");

MdRef mdRef = new MdRef ();

mdRef.setLOCTYPE (Loctype.DOI);

MdRef.setMDTYPE (Mdtype.DC);

mdRef.setMIMETYPE ("text/xml");

...

Page 18: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 18

Marshal.java (cont.) ...

mdRef.setXlinkHref ("10.9876/54321");

dmdSec.getContent ().add (mdRef);

MdWrap mdWrap = new MdWrap ();

mdWrap.setMDTYPE (Mdtype.MARC);

BinData binData = new BinData ();

binData.getContent ().add (new

PCData ("AbC…Yz0123456789"));

mdWrap.getContent ().add (binData);

dmdSec.getContent ().add (mdWrap);

mets.getContent ().add (dmdSec);

AmdSec amdSec = new AmdSec ();

TechMD techMD = new TechMD ();

techMD.setID ("t-1234");

mdWrap = new MdWrap ();

mdWrap.setMDTYPE (Mdtype.OTHER);

mdWrap.setOTHERMDTYPE ("MyTechMD");

XmlData xmlData = new XmlData ();

Any any = new Any ("my", "techMD");

any.getAttributes ().add (new

Attribute ("ID", "AB123"));

any.getAttributes ().add (new

Attribute ("my", "type", "TIFFF"));

any.getContent ().add (new

PCData ("...technical MD..."));

xmlData.getContent ().add (any);

mdWrap.getContent ().add (xmlData);

techMD.getContent ().add (mdWrap);

amdSec.getContent ().add (techMD);

RightsMD rightsMD = new RightsMD ();

rightsMD.setID ("r-5678");

mdWrap = new MdWrap ();

mdWrap.setMDTYPE (Mdtype.OTHER);

mdWrap.setOTHERMDTYPE ("MyRightsMD");

xmlData = new XmlData ();

any = new Any ("my", "rightsMD");

any.getContent ().add (new

PCData ("...rights MD..."));

xmlData.getContent ().add (any);

any = new Any ("your", "rightsMD");

any.getContent ().add (new

PCData ("...rights MD..."));

xmlData.getContent ().add (any);

any = new Any ("their", "rightsMD");

any.getContent ().add (new

PCData ("...rights MD..."));

...

Page 19: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 19

Marshal.java (cont.) ...

xmlData.getContent ().add (any);

mdWrap.getContent ().add (xmlData);

rightsMD.getContent ().add (mdWrap);

amdSec.getContent ().add (rightsMD);

SourceMD sourceMD = new SourceMD ();

sourceMD.setID ("s-9012");

mdWrap = new MdWrap ();

mdWrap.setMDTYPE (Mdtype.OTHER);

mdWrap.setOTHERMDTYPE ("MySourceMD");

xmlData = new XmlData ();

any = new Any ("my", "sourceMD");

any.getAttributes ().add (new

Attribute ("aat", "type",

new Integer (178684)));

any.getContent ().add (new

PCData ("...source MD..."));

xmlData.getContent ().add (any);

mdWrap.getContent ().add (xmlData);

sourceMD.getContent ().add (mdWrap);

amdSec.getContent ().add (sourceMD);

DigiprovMD digiprovMD = new DigiprovMD ();

digiprovMD.setID ("d-3456");

mdWrap = new MdWrap ();

mdWrap.setMDTYPE (Mdtype.OTHER);

mdWrap.setOTHERMDTYPE ("MyDigiprovMD");

xmlData = new XmlData ();

any = new Any ("my", "digiprovMD");

any.getContent ().add (new

PCData ("...provenance MD..."));

xmlData.getContent ().add (any);

mdWrap.getContent ().add (xmlData);

digiprovMD.getContent ().add (mdWrap);

amdSec.getContent ().add (digiprovMD);

mets.getContent ().add (amdSec);

FileSec fileSec = new FileSec ();

FileGrp fileGrp = new FileGrp ();

fileGrp.getADMID ().add ("t-1234");

fileGrp.getADMID ().add ("s-9012");

File file = new File ();

file.setID ("a1b2c3");

FLocat flocat = new FLocat ();

flocat.setLOCTYPE (Loctype.URN);

flocat.setXlinkHref ("urn:nid:nss");

file.getContent (). add (flocat);

FContent fcontent = new FContent ();

...

Page 20: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 20

Marshal.java (cont.) ...

fcontent.getContent ().add (new

PCData ("MS0yLTM="));

file.getContent ().add (fcontent);

fileGrp.getContent ().add (file);

fileSec.getContent ().add (fileGrp);

mets.getContent ().add (fileSec);

StructMap structMap = new StructMap ();

structMap.setID ("A125");

structMap.setLABEL ("Individual volumes");

Div div = new Div ();

div.setORDER (25);

div.setORDERLABEL ("xxv");

div.setTYPE ("Chapter");

Div sec = new Div ();

sec.setTYPE ("Section");

Div sub = new Div ();

sub.setTYPE ("Sub-section");

Fptr fptr = new Fptr ();

fptr.setFILEID ("a1b2c3");

sub.getContent ().add (fptr);

sec.getContent ().add (sub);

div.getContent ().add (sec);

sec = new Div ();

sec.setTYPE ("Section");

Mptr mptr = new Mptr ();

mptr.setID ("123-45-6789");

mptr.setLOCTYPE (Loctype.OTHER);

mptr.setOTHERLOCTYPE ("filepath");

mptr.setXlinkHref ("dir/file.xml");

sec.getContent ().add (mptr);

div.getContent ().add (sec);

structMap.getContent ().add (div);

mets.getContent ().add (structMap);

BehaviorSec behavior = new BehaviorSec ();

behavior.setID ("killerapp");

behavior.getSTRUCTID ().add ("A125");

behavior.getSTRUCTID ().add ("s-9012");

Mechanism mechanism = new Mechanism ();

mechanism.setLOCTYPE (Loctype.URL);

mechanism.setXlinkHref ("http://host/path");

behavior.getContent ().add (mechanism);

mets.getContent ().add (behavior);

mets.validate ();

mets.marshal (System.out);

}

}

Page 21: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 21

marshal.xml<mets xmlns="http://www.loc.gov/METS/”

xmlns:xlink="http://www.w3.org/1999/xlink”

xmlns:xsi="http://www.w3.org/2001/XMLSchema-

instance”

xsi:schemaLocation="http://www.loc.gov/METS/

http://www.loc.gov/standards/mets/mets.xsd”

OBJID="1234-5678(2002)9:1&lt;&gt;1.0.CO;9-X”

LABEL="METS Java toolkit" TYPE="Article">

<metsHdr CREATEDATE="2002-03-15T161023”

RECORDSTATUS="DRAFT">

<agent ROLE="CREATOR">

<name>S. L. Abrams</name>

<note>HUL/OIS</note>

<note>Special order, 2002/02/25</note>

</agent>

<altRecordID TYPE="DOI">10.1234/56789</altRecordID>

<altRecordID TYPE="NRS">nrs:hul.ois:10203</altRecordID>

</metsHdr>

<dmdSec ID="xyz-123">

<mdRef LOCTYPE="DOI" xlink:type="simple”

xlink:href="10.9876/54321" MDTYPE="DC"

MIMETYPE="text/xml"/>

<mdWrap MDTYPE="MARC">

<binData>AbCdEfGhIjKlMnOpQrStUvWxYz0123456789</binData>

</mdWrap>

</dmdSec>

<amdSec>

<techMD ID="t-1234">

<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyTechMD">

<xmlData>

<my:techMD ID="AB123" my:type="TIFF">...technical

MD...</my:techMD>

</xmlData>

</mdWrap>

</techMD>

<rightsMD ID="r-5678">

<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyRightsMD">

<xmlData>

<my:rightsMD>...rights MD...</my:rightsMD>

<your:rightsMD>...rights MD...</your:rightsMD>

<their:rightsMD>...rights MD...</their:rightsMD>

</xmlData>

</mdWrap>

</rightsMD>

...

Page 22: METS Java Toolkit Stephen L. Abrams Harvard University Library stephen_abrams@harvard.edu DLF Spring Forum May 10-12, 2002, Chicago, IL

DLF Spring Forum 2002

METS Java Toolkit 22

marshal.xml (cont.) ...

<sourceMD ID="s-9012">

<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MySourceMD">

<xmlData>

<my:sourceMD aat:type="178684">...source

MD...</my:sourceMD>

</xmlData>

</mdWrap>

</sourceMD>

<digiprovMD ID="d-3456">

<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyDigiprovMD">

<xmlData>

<my:digiprovMD>...provenance MD...</my:digiprovMD>

</xmlData>

</mdWrap>

</digiprovMD>

</amdSec>

<fileSec>

<fileGrp ADMID="t-1234 s-9012">

<file ID="a1b2c3">

<FLocat LOCTYPE="URN" xlink:type="simple”

xlink:href="urn:nid:nss"/>

<FContent>MS0yLTM=</FContent>

</file>

</fileGrp>

</fileSec>

<structMap ID="A125" LABEL="Individual volumes">

<div ORDER="25" ORDERLABEL="xxv" TYPE="Chapter">

<div TYPE="Section">

<div TYPE="Sub-section">

<fptr FILEID="a1b2c3"/>

</div>

</div>

<div TYPE="Section">

<mptr ID="123-45-6789" LOCTYPE="OTHER”

OTHERLOCTYPE="filepath”

xlink:type="simple" xlink:href="dir/file.xml"/>

</div>

</div>

</structMap>

<behaviorSec ID="killerapp" STRUCTID="A125 s-9012">

<mechanism LOCTYPE="URL" xlink:type="simple”

xlink:href="http://host/path"/>

</behaviorSec>

</mets>