![Page 1: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/1.jpg)
Superset Me—Not:Why the JPTS Is Sufficient if You Use Appropriate Layer
Validation
Alexander (“Sasha”) SchwarzmanAmerican Geophysical Union (AGU)
JATS-ConNovember 2, 2010
![Page 2: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/2.jpg)
Summary
We have built a superset of the NLM Journal Publishing Tag Set in order to enforce business rules, data types, and house style and, having done that, realized that a JPTS subset could have been sufficient to meet AGU's needs if it were used in conjunction with the appropriate layer validation technology, such as Schematron
Alexander (“Sasha”) Schwarzman 2 Superset Me—Not JATS-Con Nov 2, 2010
![Page 3: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/3.jpg)
3
Contents
• Why we built a JPTS superset• DTD vs. Schematron– Attribute values– Number of element occurrences– Element position & sequence– References
• Lessons learned
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 4: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/4.jpg)
4
Why we built a JPTS superset
• No generic book model• Lack of familiarity with Schematron• Lack of mature tool support (running SVRL not
a viable option in Production environment)• Lack of expertise on integrating Schematron
with validation against relational DB• JATS v2.3: no Compound Keywords, not all
content models parameterized
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 5: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/5.jpg)
5
DTD vs. Schematron:Attribute values
Requirement: Article type is required and can be one of three types: a regular article (rga), a correction (cor), or an editorial (edt)
Strict DTD
<!ATTLIST article article-type (rga | cor | edt) #REQUIRED >
JPTS
<!ATTLIST article article-type CDATA #IMPLIED >
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 6: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/6.jpg)
6
DTD vs. Schematron:Attribute values (cont’d)
XML instance (contains non-allowed article type)
<article article-type='xxx'/> Schematron
<rule context="article"> <assert test="@article-type=('rga','cor','edt')">
@article-type '<value-of select='@article-type'/>' not allowed, must be 'rga', 'cor', or edt'</assert></rule>
Schematron message
@article-type 'xxx' not allowed, must be 'rga', 'cor', or'edt'
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 7: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/7.jpg)
7
DTD vs. Schematron:Number of element occurrences
Requirement: Acknowledgments, if present, must contain exactly one paragraph, except for two journals (journal code ‘ja’ and ‘rg’) where Acknowledgments must contain two paragraphs
Strict DTD
<!ELEMENT ack (p, p?) >
JPTS
<!ELEMENT ack (p*) >
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 8: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/8.jpg)
8
DTD vs. Schematron:Number of occurrences (cont’d)
XML instance (wrong number of paragraphs)
<article> ... <journal-id>jb</journal-id> ... <ack> <p>Blah</p> <p>Blah-blah</p> </ack> </article>
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 9: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/9.jpg)
9
DTD vs. Schematron:Number of occurrences (cont’d)
Schematron
<rule context="ack[ancestor::*/journal-id=('ja','rg')]"> <assert test="count(p) eq 2">
'<name/>' in '<value-of select="ancestor::*/journal-id"/>' must contain exactly two paragraphs</assert></rule>
<rule context="ack"> <assert test="count(p) eq 1">
'<name/>' in '<value-of select="ancestor::*/journal-id"/>' must contain only one paragraph</assert></rule>
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 10: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/10.jpg)
10
DTD vs. Schematron:Number of occurrences (cont’d)
Schematron message
'ack' in 'jb' must contain only one paragraph
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 11: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/11.jpg)
11
DTD vs. Schematron:Element position & sequence
Requirement: If a journal has subj. grouping (ToC category, subset) & article belongs to sp. collection (sp. section, theme), then subj. grouping info must precede special collection info
Strict DTD
<!ELEMENT article-categories (subject-group*, special-collection?) >JPTS
<!ELEMENT article-categories (subj-group*) >
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 12: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/12.jpg)
12
DTD vs. Schematron:Element position & sequence (cont’d)
XML instance (wrong sequence of subject groups)
<article-categories> <subj-group subj-group-type="special-section"> <subject content-type="EARLYWARN1">New Methods and
Applications of Earthquake Early Warning</subject>
</subj-group> <subj-group subj-group-type="toc-category"> <subject content-type="SDE">Solid Earth</subject> </subj-group></article-categories>
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 13: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/13.jpg)
13
DTD vs. Schematron:Element position & sequence (cont’d)Schematron
<rule context="article-categories/ subj-group[@subj-group-type=('special-section','theme')]"> <assert test="not(following-sibling::
subj-group[@subj-group-type=('toc-category','subset')])">
<name/>/@subj-group-type='<value-of select='@subj-group- type'/>' must appear after a ToC Category or a Subset when either is present</assert></rule>
Schematron message
subj-group/@subj-group-type='special-section' must appear after a ToC Category or a Subset when either is present
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 14: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/14.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 14
DTD vs. Schematron:References
Validating references is a challenge:• Variety vs. the need to enforce editorial styleStrict DTD:• Fixed element order, no mixed content• Punctuation, spacing, face markup – on outputJPTS:• Lots of elements, any order, mixed content• Punctuation, spacing, face markup includedAlexander (“Sasha”) Schwarzman
![Page 15: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/15.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 15
DTD vs. Schematron:References (cont’d)
Strict DTD
<!ELEMENT book-standalone-citation ((person-group | string-name), year, source, edition?, (person-group | string-name)?, size?, elocation-id?, publisher-name, publisher-loc) ><!ATTLIST book-standalone-citation id ID #REQUIRED >
Alexander (“Sasha”) Schwarzman
![Page 16: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/16.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 16
DTD vs. Schematron:References (cont’d)
JPTS
<!ELEMENT mixed-citation (#PCDATA | person-group | string-name | year | source | edition | size | elocation-id | publisher-name | publisher-loc | ... | ...)* >
<!ATTLIST mixed-citation id ID #IMPLIED publication-type CDATA #IMPLIED >
Alexander (“Sasha”) Schwarzman
![Page 17: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/17.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 17
DTD vs. Schematron:References (cont’d)
Example:
Mood, A. M., and F. A. Graybill (1963), Introduction to the Theory Statistics, 2nd ed., 295 pp., McGraw-Hill, New York.
Alexander (“Sasha”) Schwarzman
![Page 18: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/18.jpg)
18
DTD vs. Schematron:References (cont’d)
XML instance (strict DTD)<book-standalone-citation id="mood63"> <person-group person-group-type="author"> <name><surname>Mood</surname> <given-names>A. M.</given-names></name> <name><surname>Graybill</surname> <given-names>F. A.</given-names></name> </person-group> <year>1963</year> <source>Introduction to the Theory Statistics</source> <edition>2nd</edition> <size units="page">295 pp<size/> <publisher-name>McGraw-Hill</publisher-name> <publisher-loc>New York</publisher-loc></book-standalone-citation>
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 19: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/19.jpg)
19
DTD vs. Schematron:References (cont’d)
XML instance (JPTS)<mixed-citation publication-type="book-standalone"> <string-name> <surname>Mood</surname>, <given-names>A. M.</given-names> </string-name>, and <string-name> <given-names>F. A.</given-names> <surname>Graybill</surname> </string-name> (<year>1963</year>), <source><italic>Introduction to the Theory Statistics</italic></source>, <edition>2</edition>nd ed., <size units="page">295</size> pp., <publisher-name>McGraw-Hill</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 20: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/20.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 20
DTD vs. Schematron:References (cont’d)
Schematron can check that all required elements are present and are in the correct sequence (note the required elements and that edition, if present, follows source):
<!ELEMENT book-standalone-citation ((person-group | string-name), year, source, edition?, (person-group | string-name)?, size?, elocation-id?, publisher-name, publisher-loc) >
Alexander (“Sasha”) Schwarzman
![Page 21: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/21.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 21
DTD vs. Schematron:References (cont’d)
• Schematron can check that all required elements are present:
<rule context="mixed-citation[@publication-type='book-standalone']">
<assert test="(person-group | string-name) and yearand source and publisher-nameand publisher-loc">
required element missing</assert></rule>
• & that the elements are in the correct sequence:
Alexander (“Sasha”) Schwarzman
![Page 22: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/22.jpg)
22
DTD vs. Schematron:References (cont’d)
XML instance (JPTS) (edition is in the wrong place)
<mixed-citation publication-type="book-standalone"><string-name> <surname>Mood</surname>, <given-names>A. M.</given-names></string-name>, and <string-name> <given-names>F. A.</given-names><surname>Graybill</surname></string-name> (<year>1963</year>), <edition>2</edition>nd ed.,<source><italic>Introduction to the Theory …</italic></source>, <size units="page">295</size> pp.,<publisher-name>McGraw-Hill</publisher-name>, <publisher-loc>New York</publisher-loc>.</mixed-citation>
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 23: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/23.jpg)
23
DTD vs. Schematron:References (cont’d)
This Schematron uses positional predicate [1] to check that year is immediately followed by source:
<rule context="mixed-citation[@publication-type= 'book-standalone']/year"> <assert test="following-sibling::*[1]/self::source"> '<name/>' must be followed by 'source', not by '<value-of
select='name(following-sibling::*[1])'/>'</assert></rule>
Schematron message
'year' must be immediately followed by 'source', not by 'edition'
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 24: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/24.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 24
DTD vs. Schematron:References (cont’d)
But how to check the sequence of required elements when there might be optional elements interspersed between them?
This Schematron checks that required publisher-name is preceded by required source, regardless of any optional elements that may occur in-between:
<rule context="mixed-citation[@publication-type= 'book-standalone']/publisher-name"> <assert test="preceding-sibling::source">
'<name/>' must be preceded by 'source'</assert></rule>
Alexander (“Sasha”) Schwarzman
![Page 25: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/25.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 25
DTD vs. Schematron:References (cont’d)
• Rick Jelliffe’s approach combines flexibility of JPTS with benefits of a DTD-like fixed element order:– Each element rewritten as a string of its element
names– Content model represented as a regular expression– Schematron checks the string of names against regex– Schematron generates an error message if content
does not match the model
Alexander (“Sasha”) Schwarzman
![Page 26: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/26.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 26
DTD vs. Schematron:References (cont’d)
An XML file, e.g., citation-models.xml, specifies structured citation models:...<model publication-type="book-standalone"> ((string-name | person-group), year, source, edition, (string-name | person-group)?, size?, elocation-id?, publisher-name, publisher-loc)</model> ...
Alexander (“Sasha”) Schwarzman
![Page 27: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/27.jpg)
Superset Me—Not JATS-Con Nov 2, 2010 27
DTD vs. Schematron:References (cont’d)
• Advantages:– DTD is still DTD-valid– Mixed content is permitted– Type-sensitive handling of references is possible
• Caveat: XSLT 2.0!
Alexander (“Sasha”) Schwarzman
![Page 28: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/28.jpg)
28
Lessons learned• AGU Tag Set + Schematron (200+ checks)– Ensures data quality– Ensures markup integrity– Provides control over production processes
• AGU Tag Set is a superset of JPTS– Based on JPTS– Uses the same modularization principles– Can be easily mapped to JPTS
• Were we to do this again we would have developed JPTS subset and a Schematron
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 29: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/29.jpg)
29
Lessons learned (cont’d)
• Appropriate layer validation– Even the most “Prussian” DTD can’t enforce all
business rules, data types, and house style– Rules-based checking needed anyway– May as well use “Californian” JPTS (de facto
industry standard) adopted by publishers, conversion & composition vendors, archives, etc.
• Paradigm shift: the crux of validation shifts from XML parser to Schematron engine
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 30: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/30.jpg)
30
Lessons learned (cont’d)
• This shift is not without costs:– Content may be valid to JPTS but make no sense– Dependency on Schematron for semantic integrity– Constraints on business partners: must be
Schematron-capable and have tools– Schematron does not “fix” problems—people do.
Processes and procedures must be well-defined
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 31: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/31.jpg)
31
Lessons learned (cont’d)• Writing a simple Schematron is easy; building a complex and efficient one is not:– Elicit, document, convey, and clarify the Requirements– Ensure Schematron fits into your workflow– Modularize Schematron– Ensure that individual Schematron rules aren’t in conflict– Optimize Schematron performance– Employ XSLT 2.0– Test, test, test– Cultivate Schematron & XSLT 2.0 expertise in-house
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010
![Page 32: Superset Me—Not: Why the JPTS I s Sufficient if You Use Appropriate Layer Validation](https://reader036.vdocument.in/reader036/viewer/2022062410/568165a8550346895dd88f99/html5/thumbnails/32.jpg)
32
Conclusion• What about content that is not like a journal
article, e.g., generic (non-NCBI) books and their parts/chapters?
• When this deficiency is addressed, the NLM Archiving and Interchange Tag Suite could truly say:
“Superset Me—Not!”
Alexander (“Sasha”) Schwarzman Superset Me—Not JATS-Con Nov 2, 2010