Download - Schematron and Other Useful Tools
An Aside: AP’s Ingestion Pipleline
This is greatly simplified, obviously.
ATOM + XHTML
XSLT Transform
APPL + NITF
One way we ingest content:
we transform ATOM and XHTML into
our internal XML (APPL) and NITF
Converting from HTML to XML
<p>The budget was just £100.</p>
<p>How could it be done for so little money?
<p>Luckily open source tools were available.</p>
These are not new problems.</p>
The solutions were even standardized.<p/>
Hard to enforce rules in the spec
“HeadLine - this element must contain the same
value as the entry’s <title> element”
“summary is required for non-text content items,
such as news photos and video. This element is
optional for text story content items.”
XML structure complies with XSD…
…but can fail in downstream systems
Validate and Fix Prior to Ingestion
Original ATOM + XHTML
Tidy fixes sloppy HTML
Custom XSLT tidies up XML
W3C schema validates structure & syntax
Schematron schema validates business rules
Valid ATOM + XHTML, ready for ingestion
HTML Tidy
Fix sloppy HTML
HTML -> XHTML
Schematron
Fact checker for XML documents
Business rules that can’t be expressed in W3C XSD schema
• MediaType="Video"
• Format="ANPA1312"
Previously, we had to inspect new feeds to catch errors
The risk is that feeds are approved but errors appear later
(Not to mention manual checking of XML is tedious)
Schematron
Small, powerful, lightweight fact-checker for XML documents
Schematron Schema
Validate
Specify constraints using XPATH rules
You write the error messages
One time compile
into an XSLT
ReportsValidation reports
Validation as an
XSLT transform
Presence or absence of
specific content
Relationships between
elements and attributes
Anatomy of a Schematron Rule
<sch:rule context="atom:feed/atom:link">
<sch:assert test="starts-with(@href, 'http://')">
The feed/link/@href must contain an http url
</sch:assert>
</sch:rule>
Establish the context of the rule
with an XPATH expression XSLT-style test establishes
the constraint for each assert
You write the error message to be
used if the assert fails
DSDL – Pipeline Validation
XSD RELAX NG
Schematron
NVDL
DTTL
CRSL
DSRL
Grammar
Rules
Namespace dispatch
Datatype
Character repertoire
Document Semantic Renaming
Still under development
Declaratively specify a pipeline (using XML, naturally)
Similar in concept to
Yahoo! Pipes
BizTalk
But XML specific and a W3C standard
Thanks!