mets java toolkit stephen l. abrams harvard university library [email protected] dlf spring...
TRANSCRIPT
METS Java Toolkit
Stephen L. AbramsHarvard University Library
DLF Spring ForumMay 10-12, 2002, Chicago, IL
DLF Spring Forum 2002
METS Java Toolkit 2
Why Do We Need a Toolkit?
• Automation for archiving project with multiple content providers.– METS used in hierarchical SIP– Client-side tools to produce syntactically valid
SIPs
• Use of METS to encapsulate complex objects, with multiple content streams.– Page turner, currently based on MOA2
DLF Spring Forum 2002
METS Java Toolkit 3
Functional Requirements
• Java API to provide support for generic METS.
• Support procedural:– Construction of in-memory representation– Validation– Marshalling/unmarshalling to/from instance
documents
• Usable as basis for application-specific tools.– Sub-class for specific functionality or restrictions
DLF Spring Forum 2002
METS Java Toolkit 4
JAXB
• API based on Sun’s JAXB specification, but not the tools.
JAXBcompiler
Sourceschema
Bindingschema
Schemaclasses
JAXB bindpackage
JAXBmarshalpackage
DLF Spring Forum 2002
METS Java Toolkit 5
Toolkit API
• Each schema element corresponds to a class. Mets mets = new Mets();
• Accessor/mutator methods for each attribute. mets.setID(id); String id = mets.getID();
• Accessor/mutator methods for content model. List content = Mets.getContent(); content.add(child);
DLF Spring Forum 2002
METS Java Toolkit 6
Toolkit API UML+validateThis()
«interface»Element
+id() : String
«interface»IdentifiableElement
+validate()
«interface»RootElement
+marshal(in mob : MarshallableObject)+writer() : XMLWriter
Marshaller
+scanner() : XMLScanner+unmarshal() : MarshallableObject
Unmarshaller
+invalidate()+validate()+validateThis()
-_valid : bool
ValidatableObject
+marshal()+unmarshal()
MarshallableObject
+marshal()+validate()
MarshallableRootElement
+chars() : String+chars(in chars : String)
PCData
+validate(in vob : ValidatableObject)
Validator
javax.xml.bind
+atAttribute() : boolean+atAttributeValue() : boolean+atEnd() : boolean+atStart() : boolean+atChars() : boolean+takeAttributeName() : String+takeAttributeValue() : String+takeChars() : String+takeEmpty()+takeEnd() : String+takeStart() : String
XMLScanner
+chars(in chars : String)+end(in name : String)+flush()+leaf()+start(in name : String)+attribute(in name : String, in value : String)
XMLWriter
javax.xml.marshal
+get*()+set*()+marshal(in m : Marshaller)+validate(in v : Validator)+validateThis()+unmarshal(in u : Unmarshaller)
-_ID : String-_OBJID : String-_LABEL : String-_TYPE : String-_PROFILE : String-_content : List
Mets
MetsHdr
FileSec
StructMap
DmdSec
AmdSec
BehaviorSec
...mets
DLF Spring Forum 2002
METS Java Toolkit 7
Why Do We Need a New API?
• Why not use DOM?– Unnatural unit of granularity: elements and
attributes are both nodes in DOM tree
• Why not JDOM?– Explicit support for validation
• JAXB compiler could (potentially) be used to support METS upgrades.
DLF Spring Forum 2002
METS Java Toolkit 8
Procedural Construction
• The initial current element is <mets>• For each child element in the current
element’s content model:– Instantiate an appropriate element object– Set its attributes– Define its content model– Add it to the content model of its parent
DLF Spring Forum 2002
METS Java Toolkit 9
Procedural Construction (Ex.)
Mets mets = new Mets();mets.setID ("1234");
MetsHdr metsHdr = new MetsHdr(); metsHdr.setCREATEDATE(new Date());
Agent agent = new Agent(); agent.setROLE(Role.CREATOR);
Name name = new Name (); name.getContent().add(new PCData ("S. Abrams"));
agent.getContent().add(name);
metsHdr.getContent().add(agent);
mets.getContent().add(metsHdr);...
DLF Spring Forum 2002
METS Java Toolkit 10
Validation• Global
– ID uniqueness– IDREF-to-ID consitency
• Local– Existence of required attributes and content
model elements
Mets mets = new Mets();...mets.validate ();
DLF Spring Forum 2002
METS Java Toolkit 11
Marshalling
• Serializing in-memory representation to an output stream.
Mets mets = new Mets();...FileOutputStream out = new FileOutputStream("mets.xml"); mets.validate ();mets.marshal(out);
DLF Spring Forum 2002
METS Java Toolkit 12
Unmarshalling• Parsing instance document and creating in-
memory representation.• Implicit local validation during parsing;
global validation must be explicit.• Internal parsing with Jim Clark’s XP.
FileInputStream in = new FileInputStream("mets.xml");Mets mets = Mets.unmarshal(in);mets.validate ();...
DLF Spring Forum 2002
METS Java Toolkit 13
Extension Schemas
• Toolkit could be extended to include explicit support for additional schemas.
• Generic namespace-aware Any class:
Any any = new Any("elem");any.setAttribute("attr", value);String attr = any.getAttribute("attr");any.getContent().add(child);
DLF Spring Forum 2002
METS Java Toolkit 14
Additional Work
• To be done any day now…– Support for <area>, <par>, and <seq>– Strict validation of sequence ordering– Marshal non-UTF-8 encodings– Base64 encoding/decoding methods for binData and Fcontent
– Support for entity references– Diagnostic error messages
DLF Spring Forum 2002
METS Java Toolkit 15
Distribution
• HUL’s intent is to make the toolkit freely available under an Open Source license.
• Minimal support (if any).
• Community process for maintenance?
• Does an appropriate organizational home exist?
DLF Spring Forum 2002
METS Java Toolkit 16
Implementation
• METS schema, Version 1.0 (zeta)
• JAXB specification, Version 0.21 <http://java.sun/xml/jaxb>
• XP, Version 0.5 <http://jclark.com/xml/xp>
• Java J2SE and JDK 1.3.1
• Solaris 2.7
• Home page: <http://hul.harvard.edu/mets>
DLF Spring Forum 2002
METS Java Toolkit 17
Marshal.javaimport java.util.*;
import org.mets.xml.bind.*;
import org.mets.xml.mets.*;
public class Marshal
{ public static void main (String [] args)
{
Mets mets = new Mets ();
mets.setOBJID ("1234-5678(2002)9:1<>1.0.CO;9-X");
mets.setLABEL ("METS Java toolkit");
mets.setTYPE ("Article");
MetsHdr metsHdr = new MetsHdr ();
metsHdr.setCREATEDATE (new Date ());
metsHdr.setRECORDSTATUS ("DRAFT");
Agent agent = new Agent ();
agent.setROLE (Role.CREATOR);
Name name = new Name ();
name.getContent ().add (new
PCData ("S. L. Abrams"));
agent.getContent ().add (name);
Note note = new Note ()
note.getContent ().add (new
PCData ("HUL/OIS"));
agent.getContent ().add (note);
note = new Note ();
note.getContent ().add (new
PCData ("Special order, 2002/02/25"));
agent.getContent ().add (note);
metsHdr.getContent ().add (agent);
AltRecordID doi = new AltRecordID ();
doi.setTYPE ("DOI");
doi.getContent ().add (new
PCData ("10.1234/56789"));
AltRecordID nrs = new AltRecordID ();
nrs.setTYPE ("NRS");
nrs.getContent ().add (new
PCData ("nrs:hul.ois:10203"));
metsHdr.getContent ().add (doi);
metsHdr.getContent ().add (nrs);
mets.getContent ().add (metsHdr);
DmdSec dmdSec = new DmdSec ();
dmdSec.setID ("xyz-123");
MdRef mdRef = new MdRef ();
mdRef.setLOCTYPE (Loctype.DOI);
MdRef.setMDTYPE (Mdtype.DC);
mdRef.setMIMETYPE ("text/xml");
...
DLF Spring Forum 2002
METS Java Toolkit 18
Marshal.java (cont.) ...
mdRef.setXlinkHref ("10.9876/54321");
dmdSec.getContent ().add (mdRef);
MdWrap mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.MARC);
BinData binData = new BinData ();
binData.getContent ().add (new
PCData ("AbC…Yz0123456789"));
mdWrap.getContent ().add (binData);
dmdSec.getContent ().add (mdWrap);
mets.getContent ().add (dmdSec);
AmdSec amdSec = new AmdSec ();
TechMD techMD = new TechMD ();
techMD.setID ("t-1234");
mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.OTHER);
mdWrap.setOTHERMDTYPE ("MyTechMD");
XmlData xmlData = new XmlData ();
Any any = new Any ("my", "techMD");
any.getAttributes ().add (new
Attribute ("ID", "AB123"));
any.getAttributes ().add (new
Attribute ("my", "type", "TIFFF"));
any.getContent ().add (new
PCData ("...technical MD..."));
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
techMD.getContent ().add (mdWrap);
amdSec.getContent ().add (techMD);
RightsMD rightsMD = new RightsMD ();
rightsMD.setID ("r-5678");
mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.OTHER);
mdWrap.setOTHERMDTYPE ("MyRightsMD");
xmlData = new XmlData ();
any = new Any ("my", "rightsMD");
any.getContent ().add (new
PCData ("...rights MD..."));
xmlData.getContent ().add (any);
any = new Any ("your", "rightsMD");
any.getContent ().add (new
PCData ("...rights MD..."));
xmlData.getContent ().add (any);
any = new Any ("their", "rightsMD");
any.getContent ().add (new
PCData ("...rights MD..."));
...
DLF Spring Forum 2002
METS Java Toolkit 19
Marshal.java (cont.) ...
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
rightsMD.getContent ().add (mdWrap);
amdSec.getContent ().add (rightsMD);
SourceMD sourceMD = new SourceMD ();
sourceMD.setID ("s-9012");
mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.OTHER);
mdWrap.setOTHERMDTYPE ("MySourceMD");
xmlData = new XmlData ();
any = new Any ("my", "sourceMD");
any.getAttributes ().add (new
Attribute ("aat", "type",
new Integer (178684)));
any.getContent ().add (new
PCData ("...source MD..."));
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
sourceMD.getContent ().add (mdWrap);
amdSec.getContent ().add (sourceMD);
DigiprovMD digiprovMD = new DigiprovMD ();
digiprovMD.setID ("d-3456");
mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.OTHER);
mdWrap.setOTHERMDTYPE ("MyDigiprovMD");
xmlData = new XmlData ();
any = new Any ("my", "digiprovMD");
any.getContent ().add (new
PCData ("...provenance MD..."));
xmlData.getContent ().add (any);
mdWrap.getContent ().add (xmlData);
digiprovMD.getContent ().add (mdWrap);
amdSec.getContent ().add (digiprovMD);
mets.getContent ().add (amdSec);
FileSec fileSec = new FileSec ();
FileGrp fileGrp = new FileGrp ();
fileGrp.getADMID ().add ("t-1234");
fileGrp.getADMID ().add ("s-9012");
File file = new File ();
file.setID ("a1b2c3");
FLocat flocat = new FLocat ();
flocat.setLOCTYPE (Loctype.URN);
flocat.setXlinkHref ("urn:nid:nss");
file.getContent (). add (flocat);
FContent fcontent = new FContent ();
...
DLF Spring Forum 2002
METS Java Toolkit 20
Marshal.java (cont.) ...
fcontent.getContent ().add (new
PCData ("MS0yLTM="));
file.getContent ().add (fcontent);
fileGrp.getContent ().add (file);
fileSec.getContent ().add (fileGrp);
mets.getContent ().add (fileSec);
StructMap structMap = new StructMap ();
structMap.setID ("A125");
structMap.setLABEL ("Individual volumes");
Div div = new Div ();
div.setORDER (25);
div.setORDERLABEL ("xxv");
div.setTYPE ("Chapter");
Div sec = new Div ();
sec.setTYPE ("Section");
Div sub = new Div ();
sub.setTYPE ("Sub-section");
Fptr fptr = new Fptr ();
fptr.setFILEID ("a1b2c3");
sub.getContent ().add (fptr);
sec.getContent ().add (sub);
div.getContent ().add (sec);
sec = new Div ();
sec.setTYPE ("Section");
Mptr mptr = new Mptr ();
mptr.setID ("123-45-6789");
mptr.setLOCTYPE (Loctype.OTHER);
mptr.setOTHERLOCTYPE ("filepath");
mptr.setXlinkHref ("dir/file.xml");
sec.getContent ().add (mptr);
div.getContent ().add (sec);
structMap.getContent ().add (div);
mets.getContent ().add (structMap);
BehaviorSec behavior = new BehaviorSec ();
behavior.setID ("killerapp");
behavior.getSTRUCTID ().add ("A125");
behavior.getSTRUCTID ().add ("s-9012");
Mechanism mechanism = new Mechanism ();
mechanism.setLOCTYPE (Loctype.URL);
mechanism.setXlinkHref ("http://host/path");
behavior.getContent ().add (mechanism);
mets.getContent ().add (behavior);
mets.validate ();
mets.marshal (System.out);
}
}
DLF Spring Forum 2002
METS Java Toolkit 21
marshal.xml<mets xmlns="http://www.loc.gov/METS/”
xmlns:xlink="http://www.w3.org/1999/xlink”
xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance”
xsi:schemaLocation="http://www.loc.gov/METS/
http://www.loc.gov/standards/mets/mets.xsd”
OBJID="1234-5678(2002)9:1<>1.0.CO;9-X”
LABEL="METS Java toolkit" TYPE="Article">
<metsHdr CREATEDATE="2002-03-15T161023”
RECORDSTATUS="DRAFT">
<agent ROLE="CREATOR">
<name>S. L. Abrams</name>
<note>HUL/OIS</note>
<note>Special order, 2002/02/25</note>
</agent>
<altRecordID TYPE="DOI">10.1234/56789</altRecordID>
<altRecordID TYPE="NRS">nrs:hul.ois:10203</altRecordID>
</metsHdr>
<dmdSec ID="xyz-123">
<mdRef LOCTYPE="DOI" xlink:type="simple”
xlink:href="10.9876/54321" MDTYPE="DC"
MIMETYPE="text/xml"/>
<mdWrap MDTYPE="MARC">
<binData>AbCdEfGhIjKlMnOpQrStUvWxYz0123456789</binData>
</mdWrap>
</dmdSec>
<amdSec>
<techMD ID="t-1234">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyTechMD">
<xmlData>
<my:techMD ID="AB123" my:type="TIFF">...technical
MD...</my:techMD>
</xmlData>
</mdWrap>
</techMD>
<rightsMD ID="r-5678">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyRightsMD">
<xmlData>
<my:rightsMD>...rights MD...</my:rightsMD>
<your:rightsMD>...rights MD...</your:rightsMD>
<their:rightsMD>...rights MD...</their:rightsMD>
</xmlData>
</mdWrap>
</rightsMD>
...
DLF Spring Forum 2002
METS Java Toolkit 22
marshal.xml (cont.) ...
<sourceMD ID="s-9012">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MySourceMD">
<xmlData>
<my:sourceMD aat:type="178684">...source
MD...</my:sourceMD>
</xmlData>
</mdWrap>
</sourceMD>
<digiprovMD ID="d-3456">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="MyDigiprovMD">
<xmlData>
<my:digiprovMD>...provenance MD...</my:digiprovMD>
</xmlData>
</mdWrap>
</digiprovMD>
</amdSec>
<fileSec>
<fileGrp ADMID="t-1234 s-9012">
<file ID="a1b2c3">
<FLocat LOCTYPE="URN" xlink:type="simple”
xlink:href="urn:nid:nss"/>
<FContent>MS0yLTM=</FContent>
</file>
</fileGrp>
</fileSec>
<structMap ID="A125" LABEL="Individual volumes">
<div ORDER="25" ORDERLABEL="xxv" TYPE="Chapter">
<div TYPE="Section">
<div TYPE="Sub-section">
<fptr FILEID="a1b2c3"/>
</div>
</div>
<div TYPE="Section">
<mptr ID="123-45-6789" LOCTYPE="OTHER”
OTHERLOCTYPE="filepath”
xlink:type="simple" xlink:href="dir/file.xml"/>
</div>
</div>
</structMap>
<behaviorSec ID="killerapp" STRUCTID="A125 s-9012">
<mechanism LOCTYPE="URL" xlink:type="simple”
xlink:href="http://host/path"/>
</behaviorSec>
</mets>