combined xml, sgml issues

66
MHE MHE - the print2image2Internet consultants Combined XML, SGML Issues William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP AIIM 2002 - March 6, 2002

Upload: zivanka-rumer

Post on 02-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Combined XML, SGML Issues. William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP AIIM 2002 - March 6, 2002. About MHE. MHE is the “print2image2Internet” consulting firm - PowerPoint PPT Presentation

TRANSCRIPT

MHE

MHE - the print2image2Internet consultants

Combined XML, SGML Issues

William J. ‘Bill’ McCalpin

MIT, LIT, CDIA, EDP

AIIM 2002 - March 6, 2002

MHE

MHE - the print2image2Internet consultants

About MHE• MHE is the “print2image2Internet”

consulting firm

• MHE’s principals have nearly 40 years of experience in electronic print streams, in taking electronic print streams to imaging systems, and now in taking legacy information to the Internet

• See http://www.mhe-consulting.com

MHE

MHE - the print2image2Internet consultants

About the Speaker• William J. ‘Bill’ McCalpin is a principal at

MHE

• Mr. McCalpin was the first - and for years the only - person in the world to have the MIT, LIT, CDIA, and EDP designations

• Mr. McCalpin serves on the AIIM Accreditation Committee and AIIM Conference Committee

MHE

MHE - the print2image2Internet consultants

About the Speaker (cont.)

• Mr. McCalpin is on the Xplor Board of Directors and is Treasurer

• Mr. McCalpin recently completed a two-year stint as Xploration Editor-in-Chief

• Mr. McCalpin is a frequent speaker at both AIIM and Xplor

MHE

MHE - the print2image2Internet consultants

What Do You Say When They Ask You,

“When Are You Going To Support XML?”

MHE

MHE - the print2image2Internet consultants

But The Real Question Is, “Why Should I Support XML?”

MHE

MHE - the print2image2Internet consultants

Agenda• What is XML?• What do we do in “e-Business”?• When do you want to use XML?• The Right Way and the Wrong Way to use

XML• The Flow of Information• The XML Bubble• The answer to “when” and “why”

MHE

MHE - the print2image2Internet consultants

What is XML?

MHE

MHE - the print2image2Internet consultants

XML And SGML

• XML is eXtensible Markup Language

• XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879)

• XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data

MHE

MHE - the print2image2Internet consultants

XML and HTML

• HTML is a tagged language, but the tags are 40 or 50 “grammatical” tags like <p> or <h1>

• XML is a tagged language, and the tags are (usually) created and agreed to by “domains” or vertical industry segments. E.g. <account_number> or <city>

MHE

MHE - the print2image2Internet consultants

The ‘Document’

• A document is “an organized collection of information in time”

• A document contains information which can be understood by human or machine, and has validity at some period in time

• The information in a document can be organized in many ways - as text, bitmaps, print streams, tagged languages, etc.

MHE

MHE - the print2image2Internet consultants

The New Document

• Per this definition, the document– does not depend on which organization of the

information is used (so long as author and recipient agree)

– does not depend on the medium (paper, film, optical, magnetic or even parchment are all fine)

– does not have to have presentation information, because the recipient may be a machine

MHE

MHE - the print2image2Internet consultants

Three Parts of an XML ‘Document’

Tagged Data (in XML)

Presentation (in XSL or CSS)

Tag Definitions (in DTD or Schema)

MHE

MHE - the print2image2Internet consultants

The XML Document

• Data - data values bounded by XML tags

• Presentation:– CSS - Cascading Style Sheets, like for HTML– XSL - format information in XML

• Tag Definitions:– DTD - Document Type Definitions - old SGML

definition– Schema - definitions in XML

MHE

MHE - the print2image2Internet consultants

Data In the XML Document

• Data is the purpose of an XML document

• Each piece of data is specifically identified by a tag

• Data is organized because the tags match patterns in the DTD or Schema

• An example of data in XML:

MHE

MHE - the print2image2Internet consultants

Data Example in XML<AUTHOR> <NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT,

LIT</NAME> <JOBTITLE>Principal</JOBTITLE> <AFFILIATION>MHE</AFFILIATION> <ADDRESS> <STREET>1400 Cheyenne Dr.</STREET> <CITY>Richardson</CITY> <STATE>Texas</STATE> <ZIPCODE>75080</ZIPCODE> <EMAIL>[email protected]</EMAIL> </ADDRESS></AUTHOR>

MHE

MHE - the print2image2Internet consultants

Presentation in XML

• Tags in XML don’t have natural formatting (unlike HTML), so if presentation is needed, it must be explicitly defined

• CSS can be used for HTML and XML

• XSL can be parsed by an XML parser, and it can be used by XML and XSLT

• XSL example:

MHE

MHE - the print2image2Internet consultants

Presentation Example• <?xml version="1.0"?>• <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">• <xsl:template match="author">• <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... <TR>• <TD COLSPAN="2">• <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... • <FONT COLOR="#000000"><xsl:value-of select="name"/></FONT>• </TD>• ...• </xsl:template>• </xsl:stylesheet>

MHE

MHE - the print2image2Internet consultants

Why Two Style Sheet Languages?

Style Sheet Format CSS XSL

Can be used with HTML? Yes No

Can be used with XML? Yes Yes

Transformation language? No Yes

Syntax CSS XML

MHE

MHE - the print2image2Internet consultants

DTD/Schema in XML

• The DTD is the “old” (SGML) way of defining not only what tags are valid, but their relative order, number, mandatory/optional attributes, and so on

• The Schema is a total rewrite - written in XML itself - which defines all of the above as well as possible legal values for a tag (e.g., integer, date, days of the week, etc.)

MHE

MHE - the print2image2Internet consultants

Schema Example

• <?xml version="1.0"?>

• <Schema name="sample_schema" ...>

• ...

• <!-- ********** Element Types ************ -->

• <!-- *** data *** -->

• <ElementType name="author">

• <element type="name" minOccurs="1" maxOccurs="1"/>

• </ElementType>

• ...

• </Schema>

MHE

MHE - the print2image2Internet consultants

What do we do in “e-Business”?

MHE

MHE - the print2image2Internet consultants

What is “e-Business”?• Of course, e-Business is really just doing

business using 100% electronic methods such as the Internet

• In e-Business, we do transactions or exchange information using electronic media rather than the usual paper media

• e-Business can broken down into two parts:– B2C– B2B

MHE

MHE - the print2image2Internet consultants

B2C• B2C is “Business to Consumer”

• Your business generates the information, and a consumer receives it

• The consumer is normally interested only in the data and its presentation

• Thus, in this scenario, the consumer needs only an XML document and CSS/XSL - which is more or less the same as HTML!

MHE

MHE - the print2image2Internet consultants

Important Fact #1

• When you are engaged in B2C, and the recipient is a consumer with a “thin” client, then HTML is usually sufficient– Supplying the data in XML is usually a waste

of time, because the recipient gets no additional value from the XML over HTML

– XHTML is just HTML which is XML compliant

MHE

MHE - the print2image2Internet consultants

B2B• B2C is “Business to Business”

• Your business generates the information, and another business receives it

• Frequently, the recipient is not a person, but a software process in the business

• Thus, in this scenario, the recipient often needs only the XML data and the reference to the DTD or Schema - no presentation may be needed!

MHE

MHE - the print2image2Internet consultants

Important Fact #2

• When you are engaged in B2B, and the recipient is a software process, then XML is often the most appropriate format– Binary data formats may be smaller, but will

require more work and more maintenance– Don’t send presentation information unless the

recipient actually wants your presentation information!

MHE

MHE - the print2image2Internet consultants

When do you want to use XML?

MHE

MHE - the print2image2Internet consultants

When Do I Use XML?

• As we have seen, XML is best suited for the preservation of the “author’s” content

• And (X)HTML is best suited for presentation of information to an enduser

• And this leads us to...

MHE

MHE - the print2image2Internet consultants

Important Fact #3• In today’s market:

– XML is better utilized when communicating with a “thick” client - that is, most B2B in which a software process is the recipient

– (X)HTML is better utilized when communicating with a “thin” client - that is, most B2C in which an Internet browser is the recipient

• And when is this not true?

MHE

MHE - the print2image2Internet consultants

Exceptions to Fact #3

• XML can be used in B2C when the browser is used with so much Java and other local applications that the overall process resembles a thick client

• (X)HTML can be used in B2B if the recipient is just a human being rather than a software process, e.g., when information is transmitted only to be viewed

MHE

MHE - the print2image2Internet consultants

The Right Way And The Wrong Way To Use XML

MHE

MHE - the print2image2Internet consultants

CML Chemical Markup Language

• One of the early “vertical” implementations of XML

• The official site is http://www.xml-cml.org/

• A “better” site is http://www.ch.ic.ac.uk/chimeral/

• CML uses the trio of tagged data, Schema, and XSL

MHE

MHE - the print2image2Internet consultants

A CML XML Document

<molecule title="caffeine" id="mol_caffeine">

<formula>C8 H10 N4 O2</formula>

<string title="CAS">58-08-2</string>

...

</molecule>

CML Data

MHE

MHE - the print2image2Internet consultants

The CML Schema• <?xml version="1.0"?>• <Schema name="cml_dev_karne" xmlns="urn:schemas-microsoft-

com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes">• ...• <!-- ********** Element Types ************ -->• <!-- *** data *** -->• <ElementType name="molecule" content="eltOnly" model="open"

order="many">• <element type="formula" minOccurs="0" maxOccurs="*"/>• ...

CML Schema

MHE

MHE - the print2image2Internet consultants

A CML Stylesheet• <xsl:template match="molecule">

• <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0" CELLPADDING="3" BORDERCOLOR="#CCCCFF" BGCOLOR="#EEEEFF">

• <TR>

• <TD COLSPAN="2">

• <FONT COLOR="#0000AA">Formula

• <FONT COLOR="#000000"><xsl:value-of select="formula"/></FONT></TD><TD>

• ...CML XSL

MHE

MHE - the print2image2Internet consultants

The CML Document

• Note that each data item is tagged

• Note that each tag matches the standard Schema

• Note that the data is used to create a complex image in the browser - but not the only possible image!

caffeine.xml

MHE

MHE - the print2image2Internet consultants

A Print to XML/HTML Conversion

• Print stream does not contain any metadata, only data and presentation information

• Tags cannot be meaningful unless they are reverse-engineered

• The result might be only the tagged data and the stylesheet

• Too often, the XML looks like:

MHE

MHE - the print2image2Internet consultants

Bad XML Example• /* text positioning information */• .ps0{position:absolute;top:533px;left:29px;width:40px;}• .ps1{position:absolute;top:533px;left:317px;width:38px;}• .ps2{position:absolute;top:533px;left:454px;width:90px;}• ...• /* font properties information */• .ft1{font-weight:bold;font-size:22px;}• .ft2{font-size:17px;}• .ft3{font-size:11px;}

• <!-- text starts here -->• <SPAN CLASS="ps0"><NOBR>Account Number</NOBR></SPAN>• <SPAN CLASS="ps1"><NOBR>12345</NOBR></SPAN>• <SPAN CLASS="ps2"><NOBR>Name</NOBR></SPAN>• ...

bad HTML example.html

MHE

MHE - the print2image2Internet consultants

An Image to XML Example

• Most information may not be tagged– <invoice>– <account_no>12345</account_no>– <name>Bill McCalpin</name>– <data>70 02 02 02 02 FE A7 47 47 48 03 F9

A7 42 27 4A 74….</data>– </invoice

MHE

MHE - the print2image2Internet consultants

The Flow of Information

MHE

MHE - the print2image2Internet consultants

The Flow of Information

• E-Business is about the flow of information between parties as well as within the enterprise

• Traditionally, as information moves through the business process, we lose as much information as we add

• Look at how we used to treat information:

MHE

MHE - the print2image2Internet consultants

As Information Flow Used to Be

Generation Composition Distribution Archival

Data Data

awareness(metadata)

Data Presentation

information

Toner onpaper

Rasterimage

MHE

MHE - the print2image2Internet consultants

As Information Flow Used To Be

Data

Data awareness (metadata)

Data

Presentation information

Zap!

Toner on paper

Archive

Scan

X’010101’(bits)

Composer

MHE

MHE - the print2image2Internet consultants

As Information Flow Is Today

Generation Composition Distribution(push orpull)

Archival(presentationformat likePDF)

Data Data

awareness(metadata)

Data Presentation

information

Data Presentation

information Distribution

metadata

Data Presentation

information

MHE

MHE - the print2image2Internet consultants

As Information Flow Is Today

Data

Data awareness (metadata)

Data

Presentation information

Zap!

Web page, emails, etc.

PDF

Transform

Text and graphics

Composer

MHE

MHE - the print2image2Internet consultants

As Information Flow Should BeGeneration Compo-

sitionDistri-bution(push orpull)

Archival(XML)

Data Data

awareness(metadata)

Data Data

awareness(metadata)

Presentationinformation

Data Data

awareness(“metadata”)

Presentationinformation

Distributionmetadata

Data Data

awareness(“metadata”)

Presentationinformation

Distributionmetadata

MHE

MHE - the print2image2Internet consultants

As Information Flow Should Be

Data

Data awareness (metadata)

Data

Presentation information

User

Data awareness (metadata)

email

WAP

Web page

archive

paper

Complete XML documents

MHE

MHE - the print2image2Internet consultants

Or, As In The XML Bubble...Web page

Archive

Data & metadata

Data & metadata

Data & metadata

Process

Process

Add presenta-tion

email

Cell phones

B2B applica-tions

MHE

MHE - the print2image2Internet consultants

Important Fact #4

• Use XML to delay the loss of important information

• Don’t throw away information until you commit the document to a final format which can’t support it

• In other words, keep the information in XML as long as possible

MHE

MHE - the print2image2Internet consultants

The XML Bubble

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

EBPP

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE

MHE - the print2image2Internet consultants

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE

MHE - the print2image2Internet consultants

Today’s Billing Process + XML

BillingExtract

Print/Format

DataBase

PostProcess

XMLApp.

MHE

MHE - the print2image2Internet consultants

Driver

XMLApplicationswith business rules

Driver

Driver

DriverEmail

MHE

MHE - the print2image2Internet consultants

Remember the Question, “Why Should I Support XML?”

MHE

MHE - the print2image2Internet consultants

Why Should I Support XML?

• I should support XML in B2B, unless the recipient wants only to view my presentation

• I should support (X)HTML in B2C, unless the recipient has a thick client which can utilize the XML (cf. Quicken and OFX)

MHE

MHE - the print2image2Internet consultants

How Should I Use XML?

• Once information is in XML, I should keep it there as long as possible

• I should use industry accepted DTDs and Schemas

• I shouldn’t even think of “well-formed” XML (syntactically correct but no DTD/Schema) as real XML, to avoid confusion

MHE

MHE - the print2image2Internet consultants

A Final Note

• The World Wide Consortium (www.w3c.org) is the standards body for the generic protocols of XML, such as XML syntax itself, XSL, RDF, etc.

• Most “domain” or vertically centric XML definitions are supported by the verticals themselves, e.g., CML, GEML (Gene Expression Markup Language), etc.

MHE

MHE - the print2image2Internet consultants

A Final Note, Part Deux

• At www.xml.org, there are nearly 100 Schema/DTDs listed from 31 different industries, from AIML (Astronomical Instrument Markup Language) to RecipeML (Recipe Markup Language) – yes, XML for the kitchen.

• Also see Robin Cover’s excellent work at xml.coverpages.org/sgml-xml.html

MHE

MHE - the print2image2Internet consultants

Contact Information

William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP

PrincipalMHE

1400 Cheyenne Dr.Richardson, Texas 75080-3921 USA

(972) 231-3660 (v) (972) 690-4521 (f)[email protected]

www.mhe-consulting.com