open xml deep dive
TRANSCRIPT
Satisfy Your Technical Curiosity
Open XML Deep DiveOpen XML Deep Dive
Doug MahughDoug MahughTechnical Evangelist, MicrosoftTechnical Evangelist, Microsoft
http://blogs.msdn.com/dmahugh
Satisfy Your Technical Curiosity
Application type: Document AssemblyServer environment: Linux, Java, Apache, MySqlDesktop environment: Office 2007
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Session ObjectivesSession ObjectivesSatisfy your curiosity about Open XML:Satisfy your curiosity about Open XML:
ArchitectureArchitectureThe three main Open XML schemasThe three main Open XML schemasDevelopment optionsDevelopment optionsCustom XML supportCustom XML supportDevelopment scenariosDevelopment scenarios
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Today is the tip of the icebergToday is the tip of the iceberg
Comprehensive 2-day Open XML Developer Comprehensive 2-day Open XML Developer workshop scheduled for Belgium on May 21workshop scheduled for Belgium on May 21Contact Imma Verheyen, Partner Development Contact Imma Verheyen, Partner Development Manager: Manager: [email protected]
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Diverse EnvironmentsDiverse EnvironmentsAll you need is ZIP and XML supportAll you need is ZIP and XML support
Linux Java Microsoft COM
ZIP LibraryMinizip
zLib
J2SEjava.util.zip
.NET Framework 3.0
System.IO.Packaging *
Xceed .NET controls
Xceed ActiveX controls
XML Library Apache Xerces JAXP .NET Framework 3.0System.Xml MSXML
* Also includes abstractions for OPC concepts (Open Packaging Convention)
Satisfy Your Technical Curiosity
Scenario ExampleDocument AssemblyServer-based or user-assisted construction of documents from archived content or database content.
Create sales reports from financial and forecast data stored in a CRM system.
Integration & Content ReuseMuch easier to move content between documents, including different document types.
Quickly and efficiently apply content stored in Word documents to Web pages.
Document SanitizationRemove unwanted content like comments, embedded code or potentially sensitive items from your document when appropriate.
Remove all tracked changes and comments from a Word document before it is published.
Document InterrogationQuery document repositories based on custom data, content types or document metadata.
Search for all documents containing a specific company name or sales contact.
Content TaggingAdding a tagging schema to content can dramatically improve content searches and the value of the data stored in documents.
Organizations can create their own smart tags then use them as the basis for searches.
Document ArchivalEnsuring document formats can be consumed long into the future without vendor-specific clients or applications.
XML-based document archives include the data and presentation information.
Development ScenariosDevelopment Scenarios
Satisfy Your Technical Curiosity
XML in Office: the last 10 XML in Office: the last 10 yearsyears
Office 2000Early InnovationXML Document Properties
Office 97Existing binary file formats designed in 1994, launched in Office 97
Office XPFirst XML FormatsSpreadsheet XML
Office 2003Breakthrough XML SupportWordProcessingML, SpreadsheetMLCustom-defined schema
2007 Office systemNew XML-based FormatsXML File format DefaultXML PowerPoint Format
Satisfy Your Technical Curiosity
Open XML ArchitectureOpen XML Architecture
WordprocessingML SpreadsheetML PresentationML
ZIP XML + Unicode
DrawingML
Content Types
Custom XML Bibliography
Shared Vocabularies
Relationships
Metadata
DigitalSignatures
VML (legacy) Equations
Markup Languages
Open Packaging Convention
Core Technologies
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Open Packaging ConventionOpen Packaging ConventionLow-level conventions that define the structure of Low-level conventions that define the structure of an Office Open XML documentan Office Open XML document
Also used by XPS, and some third-party Also used by XPS, and some third-party implementations are under developmentimplementations are under development
Key concepts: package, parts, relationships, and Key concepts: package, parts, relationships, and content typescontent types
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PartsParts
Stored inside the package in a specific locationStored inside the package in a specific locationReachable via a URIReachable via a URIAssociated with a specific content typeAssociated with a specific content type
Often XML, but can be of any defined content type (including custom types)Often XML, but can be of any defined content type (including custom types)
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
CCoontent Typesntent Types
Every part must have a content typeEvery part must have a content typeMost OXML parts are content type XMLMost OXML parts are content type XMLConsumers support a specific set of content Consumers support a specific set of content typestypes
You can define custom content types, and You can define custom content types, and consumers will preserve them – this is a key consumers will preserve them – this is a key area of opportunity for developer innovationarea of opportunity for developer innovation
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
RelationshipsRelationships
Tie elements inside the package to each otherTie elements inside the package to each other
Allow you to step through the document without Allow you to step through the document without parsing partsparsing parts
Are required: Are required: a part without a relationship is not a part without a relationship is not part of the package, and may be discardedpart of the package, and may be discarded
Satisfy Your Technical Curiosity
OPC is a OPC is a LogicalLogical Structure Structure
Files and folders – NO!Files and folders – NO!These details may vary.These details may vary.
Parts should be referenced by Parts should be referenced by their their relationship type.relationship type.
Satisfy Your Technical Curiosity
Reference SchemasReference SchemasDisplay-orientedDisplay-orientedEnables Enables technicaltechnical interoperability interoperability
Custom-defined SchemasCustom-defined SchemasData-orientedData-orientedEnables Enables semanticsemantic interoperability interoperability
Brian Jones, ODC2006
Types of InteroperabilityTypes of Interoperability
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
WordprocessingMLWordprocessingMLDocument Document aarchitecturerchitecture
Document
bodyproperties
fontTable
headers/footers
images
numberingDefinitions
styles
customXML
footnotes/endnotes
comments
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraphs, Runs and TextParagraphs, Runs and TextHow text is stored in wordprocessingMLHow text is stored in wordprocessingML
The document elementThe document element• Contains a body elementContains a body element
• Contains paragraphsContains paragraphs• Contains runsContains runs
• Contains text elementsContains text elements<document> <body> <p> <r> <t>HELLO!</t> </r> </p> </body></document>
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Direct Formatting ExampleDirect Formatting ExampleSimple formatting at paragraph/run levels:Simple formatting at paragraph/run levels:
Paragraph properties specify bold (default for the entire paragraph)
<w:p> <w:pPr> <w:b/> </w:pPr> <w:r> <w:t>The quick</w:t> </w:r> <w:r> <w:rPr> <w:i/> </w:rPr> <w:t>brown</w:t> </w:r> <w:r> <w:t>fox.</w:t> </w:r></w:p>
Run properties specify italics (override for this run)
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph PropertiesParagraph PropertiesCan be set directly or in a paragraph styleCan be set directly or in a paragraph style24 total property settings24 total property settings
<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Define formatting forDefine formatting forindividual charactersindividual charactersFont attributes, size/position,Font attributes, size/position,other settingsother settings24 total properties24 total properties
Run PropertiesRun Properties
<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Text Text <w:t><w:t>
The only element in the main story that can The only element in the main story that can contain text contain text – all other text is in attributes– all other text is in attributesThree other types of text are allowed in runs:Three other types of text are allowed in runs:
Deleted text Deleted text <w:delText><w:delText>Field code Field code <w:instrText><w:instrText>Deleted field codes Deleted field codes <w:delInstrText><w:delInstrText>
By looking to <w:t> nodes, you can be sure By looking to <w:t> nodes, you can be sure you’re seeing only displayed textyou’re seeing only displayed text
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Revision IDs (RSIDs)Revision IDs (RSIDs)RSID values are used to identify a set of RSID values are used to identify a set of changes that were made during the same changes that were made during the same editing sessionediting sessionFound in many elements:Found in many elements:
Paragraphs, runs, sections, stylesParagraphs, runs, sections, stylesTable rows, table properties, charts, diagramsTable rows, table properties, charts, diagrams
Allows for merging revisions, without the Allows for merging revisions, without the privacy and security issues involved in tracking privacy and security issues involved in tracking who who changed changed whatwhatOptional, but recommended for applications Optional, but recommended for applications that modify existing documentsthat modify existing documents
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
ImagesImagesAn image is a An image is a w:pictw:pict element inside a run element inside a run <w:r><w:r>The The v:imagedatav:imagedata element is defined in VML: element is defined in VML:
xmlns:v="urn:schemas-microsoft-com:vml"xmlns:v="urn:schemas-microsoft-com:vml"
The actual image is referenced via a relationship:The actual image is referenced via a relationship:
The relationship points to an image part in the package:The relationship points to an image part in the package:
<w:pict> <v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:250; height:200"> <v:imagedata r:id="rId4"/> </v:shape></w:pict>
<Relationship Id="rId4” Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image” Target="image1.jpg"/>
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
TablesTablesTables are a set of paragraphs which are Tables are a set of paragraphs which are arranged into rows and columnsarranged into rows and columns
In WordprocessingML, tables are block level In WordprocessingML, tables are block level content, and are specified using the content, and are specified using the tabletable elementelement
Analogous to the HTML <table> elementAnalogous to the HTML <table> element
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
What’s in a table?What’s in a table?
PropertiesPropertiesGridGridRowsRowsCellsCells
<w:tbl>
<w:tblPr> <w:tblStyle w:val=“TableGrid”/> <w:tblW w:w=“0” w:type=“auto”/> <w:tblLook w:val=“01E0”/> </w:tblPr>
<w:tblGrid> <w:gridCol w:w=“2952”/> <w:gridCol w:w=“2952”/> <w:gridCol w:w=“2952”/> </w:tblGrid>
<w:tr>
<w:tc> <w:tcPr> <w:tcW w:w=“2952” w:type=“dxa”/> </w:tcPr> <w:p> <w:r> <w:t>1,1</w:t> </w:r> </w:p> </w:tc> <w:tc> <w:tcPr> <w:tcW w:w=“2952” w:type=“dxa”/> </w:tcPr> <w:p> <w:r> <w:t>1,2</w:t> </w:r> </w:p> </w:tc> </w:tr></w:tbl>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
StylesStylesA A style style defines a specific set of values for formatting properties that may be applied as a single logical unitdefines a specific set of values for formatting properties that may be applied as a single logical unit
For example, the Normal style in Word 2007 defines these formatting properties:For example, the Normal style in Word 2007 defines these formatting properties:Font = Calibri (body)Font = Calibri (body)Font Size = 11 pointFont Size = 11 pointFont Language = Word default (as configured by user)Font Language = Word default (as configured by user)Justification = LeftJustification = LeftLine Spacing = SingleLine Spacing = SingleWidow/Orphan controlWidow/Orphan control
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Style TypesStyle TypesWordprocessingML supports six style types:WordprocessingML supports six style types:
Paragraph stylesParagraph stylesCharacter stylesCharacter stylesLinked stylesLinked stylesTable stylesTable stylesList stylesList stylesDefault style (linked type, but applies when no style Default style (linked type, but applies when no style specified)specified)
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph Styles ExampleParagraph Styles ExampleStep 1: define a paragraph styleStep 1: define a paragraph style
Styles are defined in the style part:Styles are defined in the style part:
Paragraph Properties
Character (Run) Properties
Common Properties
<w:style w:type=“paragraph” w:styleid=“TestParagraphStyle”>
<w:name w:val=“Test Paragraph Style”/> <w:qformat/> <w:rsid w:val=“009E253E”/>
<w:pPr> <w:pStyle w:val=“TestParagraphStyle”/> <w:spacing w:line=“480” w:lineRule=“auto”/> <w:ind w:firstLine=“1440”/> </w:pPr>
<w:rPr> <w:rFonts w:ascii=“Algerian” w:hAnsi=“Algerian”/> <w:b/> <w:color w:val=“ED1C24”> <w:sz w:val=“40”/> </w:rPr>
</w:style>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Paragraph Styles ExampleParagraph Styles ExampleStep 2: apply the style to a paragraphStep 2: apply the style to a paragraph
The pStyle element associates a style with a The pStyle element associates a style with a paragraph:paragraph:
The paragraph is displayed with the style applied:The paragraph is displayed with the style applied:
<w:p> <w:pPr> <w:pStyle w:val=“TestParagraphStyle”/> </w:pPr> <w:r> <w:t>Text</w:t> </w:r></w:p>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Numbering StylesNumbering StylesFlexible hierarchical definitionFlexible hierarchical definition
Numbering styles are styles which define the Numbering styles are styles which define the structure of a multi-level numbering formatstructure of a multi-level numbering formatNumbering definition instances are based on an Numbering definition instances are based on an abstract numbering definitionabstract numbering definitionAbstract numbering definitions define paragraph Abstract numbering definitions define paragraph properties for up to 9 hierarchical levelsproperties for up to 9 hierarchical levelsNOTE: items in a list are simply paragraphs. There NOTE: items in a list are simply paragraphs. There is no list “container” as in HTML.is no list “container” as in HTML.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Table StylesTable StylesA A table style is associated with a table via the table style is associated with a table via the tblStyle tblStyle element in the table properties:element in the table properties:
<w:tbl> <w:tblPr> <w:tblStyle w:val=“Style20”/> <w:tblW w:w=“5000” w:type=“pct”/> <w:tblLook w:val=“0220”/> </w:tblPr> … tblGrid, table rows and cells …</w:tbl>
Table style Style20 is applied to the table
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Style Application HierarchyStyle Application HierarchyDirect formatting overrides style settingsDirect formatting overrides style settings
Table
Paragraph
Character
Direct Formatting
Numbering
Document Defaults
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SubdocumentsSubdocumentsMechanism for “rolling up” documentsMechanism for “rolling up” documents
Subdocuments are well-formed Open XML Subdocuments are well-formed Open XML documents and can be edited independentlydocuments and can be edited independentlySubdocuments don’t know they’re part of Subdocuments don’t know they’re part of something bigger – they’re just stand-alone something bigger – they’re just stand-alone documentsdocuments
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SubdocumentsSubdocumentsImplementation detailsImplementation details
Main document part contains Main document part contains subDocsubDoc elements that indicate where to elements that indicate where to insert subdocumentsinsert subdocumentsThe subdocument’s location is stored in a relationshipThe subdocument’s location is stored in a relationship
<w:body> <w:subDoc r:id=“rId1”/> <w:subDoc r:id=“rId2”/> <w:subDoc r:id=“rId3”/>
<Relationship Id=“rId1” Type=“…/subDocument” Target=“Part1.docx” TargetMode=“external”/><Relationship Id=“rId2” Type=“…/subDocument” Target=“Part2.docx” TargetMode=“external”/><Relationship Id=“rId3” Type=“…/subDocument” Target=“Part3.docx” TargetMode=“external”/>
Main document part:
Relationships:
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Document SectionsDocument Sections
A document may be divided into sectionsA document may be divided into sectionsAllows formatting at a higher level than Allows formatting at a higher level than paragraphs:paragraphs:
Landscape/portrait orientationLandscape/portrait orientationPage margins, etc.Page margins, etc.
Section properties are defined in Section properties are defined in sectPrsectPr::<w:sectPr> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1800" w:bottom="1440“ w:left="1800“ w:header="720" w:footer="720" w:gutter="0"/> <w:cols w:space="720"/> <w:docGrid w:linePitch="360"/></w:sectPr>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Section PropertiesSection PropertiesExampleExample
In Word, section properties are In Word, section properties are specified in the Page Setup dialogspecified in the Page Setup dialog
<w:sectPr> <w:pgSz w:w="12240" w:h="15840" /> <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0" /> <w:cols w:space="720" /> <w:docGrid w:linePitch="360" /> </w:sectPr>
Satisfy Your Technical Curiosity
Custom XML SupportCustom XML Support
Merging the worlds of documents and dataMerging the worlds of documents and data
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Why Custom XML?Why Custom XML?Enables Enables semantic semantic interoperabilityinteroperability
Documents can provide a rich view of back-end dataDocuments can provide a rich view of back-end dataDocuments can update back-end data sourcesDocuments can update back-end data sources
Exposes business data within documents to Exposes business data within documents to heterogenous systemsheterogenous systemsBusiness-specific semantics can be applied to Business-specific semantics can be applied to document datadocument dataSeparates presentation and dataSeparates presentation and data
Custom XML schema support was a key design Custom XML schema support was a key design objective for Open XML: objective for Open XML: any schema any schema can be used can be used in Open XML documents.in Open XML documents.
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XMLCustom XMLDeveloper options for custom XML supportDeveloper options for custom XML support
Microsoft Confidential
Custom-defined XML isCustom-defined XML isstored in its own discrete partstored in its own discrete part
Any XML can be stored, withAny XML can be stored, withor without a schemaor without a schema
Only one requirement:Only one requirement:must be well-formed XMLmust be well-formed XML
External applications (client/server) can process External applications (client/server) can process the store or populate the storethe store or populate the store
Document Template
Visualdocument
partsXMLdata
External System
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML PropertiesCustom XML Properties
Information about a custom XML part is stored Information about a custom XML part is stored in a in a custom XML properties custom XML properties partpartStored via an implicit Stored via an implicit customXmlProps customXmlProps relationship from the custom XML partrelationship from the custom XML partContains two types of information:Contains two types of information:
Part IDPart IDUniquely identifies a part within a documentUniquely identifies a part within a documentMaintained through editing sessionsMaintained through editing sessions
XML Schema referencesXML Schema references
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Structured Document TagsStructured Document TagsKnown as "content controls" in MS-OfficeKnown as "content controls" in MS-Office
Smart tags and custom XML markup add semantics, Smart tags and custom XML markup add semantics, but do not have any effect on presentationbut do not have any effect on presentationSometimes you Sometimes you want want to affect presentationto affect presentation
Data-entry restrictions, multi-select, etc.Data-entry restrictions, multi-select, etc.
Solution: the structured document tag Solution: the structured document tag <sdt><sdt>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Types of Content ControlsTypes of Content Controls
Plain textPlain textComboboxComboboxDropdown listDropdown listDocument building blockDocument building blockDate pickerDate pickerRich textRich textPicturePicture
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Data BindingData Binding
2-way synchronization between:2-way synchronization between:Content controls (structured document tags)Content controls (structured document tags)Custom XML nodes (data in Custom XML nodes (data in your schemayour schema))
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Data Binding BasicsData Binding BasicsHow to bind xml nodes to structured document tagsHow to bind xml nodes to structured document tags
Add a Add a <dataBinding> <dataBinding> element to the structured element to the structured document tag properties document tag properties <sdtPr><sdtPr><dataBinding><dataBinding> specifices a custom Xml part (by Custom specifices a custom Xml part (by Custom XML Data Identifier) and an Xpath to a specific node XML Data Identifier) and an Xpath to a specific node within that partwithin that part
Custom XML Data Identifier? What’s that?Custom XML Data Identifier? What’s that?The custom XML part has a properties partThe custom XML part has a properties part
Implicit relationship in Implicit relationship in customXmlPart.xmlcustomXmlPart.xml.rels.relsThe properties part specifies a Custom XML Data IdentifierThe properties part specifies a Custom XML Data Identifier
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Content Control ToolkitContent Control Toolkit
Open-source developer toolOpen-source developer toolhttp://www.codeplex.com/Wiki/View.aspx?ProjectName=dbe
Automatically generates Automatically generates parts, relationships, and parts, relationships, and markup to bind custom XML markup to bind custom XML parts to content controlsparts to content controls
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML MarkupCustom XML MarkupTagging document content with custom semanticsTagging document content with custom semantics
Allows embedding the structure from any XML schema into a WordprocessingML Allows embedding the structure from any XML schema into a WordprocessingML documentdocument
Schema not requiredSchema not requiredXML doesn’t have to validate against your schemaXML doesn’t have to validate against your schemaCustom XML elements may have custom attributesCustom XML elements may have custom attributesConsumers/producers preserve your attributesConsumers/producers preserve your attributes
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Custom XML MarkupCustom XML MarkupExampleExample
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
XML Mapping in SpreadsheetMLXML Mapping in SpreadsheetML
XML elements and attributes may be mapped XML elements and attributes may be mapped to cells and tablesto cells and tables
Store a copy of the schema in the workbookStore a copy of the schema in the workbook
Data is in an external XML fileData is in an external XML file
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetMLSpreadsheetMLDocument architectureDocument architecture
Workbook properties
tablechart
styles
calcChain
sharedStrings
sheet1..Nsheet1..Nsheet1..Nsheet1..N
sheet1..Nsheet1..Nsheet1..Ndrawing
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetMLSpreadsheetMLPerformance optimizationsPerformance optimizations
SpreadsheetML has been optimized based on SpreadsheetML has been optimized based on analysis of typical spreadsheet usage patterns:analysis of typical spreadsheet usage patterns:
Small tag size (often a single character)Small tag size (often a single character)Shared stringsShared stringsShared formulasShared formulasSparse table markup allowedSparse table markup allowedOptional r=“A1” attribute for faster loadingOptional r=“A1” attribute for faster loading
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetML StringsSpreadsheetML StringsTwo alternatives for storing text stringsTwo alternatives for storing text strings
1.1. Inline stringsInline strings• Provided for ease of translation/conversionProvided for ease of translation/conversion• Useful in XSLT scenariosUseful in XSLT scenarios• Excel and other consumers may convert to shared Excel and other consumers may convert to shared
strings on document savestrings on document save2.2. An entry in the shared-strings tableAn entry in the shared-strings table• May be either a simple string or formatted textMay be either a simple string or formatted text
These approaches may be mixed/combinedThese approaches may be mixed/combined
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Shared StringsShared StringsRepetitive strings are common in typical spreadsheetsRepetitive strings are common in typical spreadsheets
Strings are stored in a shared-strings part:Strings are stored in a shared-strings part:Each unique string is stored onceEach unique string is stored onceCells store the index (0-based) of the stringCells store the index (0-based) of the string
Benefits:Benefits:Users: reduced file size, improved performanceUsers: reduced file size, improved performanceDevelopers: all strings are in one part, simplifying Developers: all strings are in one part, simplifying search, localization, and other common string-handling search, localization, and other common string-handling taskstasks
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Shared StringsShared StringsSampled shared-strings tableSampled shared-strings table
<sst xmlns="..." count="6" uniqueCount="4"> <si> <t>Paris</t> </si> <si> <t>Seattle</t> </si> <si> <t>London</t> </si> <si> <t>Copenhagen</t> </si></sst>
6 string references, 4 unique strings
Paris = string 0
<row r="1" spans="1:1"> <c r="A1" t="s"> <v>0</v> </c></row>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Inline StringsInline Strings
No shared-strings part requiredNo shared-strings part requiredEspecially useful in XSLT scenariosEspecially useful in XSLT scenariosIf you’re consuming Open XML documents, you must If you’re consuming Open XML documents, you must handle both cases: inline strings and/or shared stringshandle both cases: inline strings and/or shared stringsExcel 2007 converts to shared strings on saveExcel 2007 converts to shared strings on save
<sheetData> <row><c t="inlineStr"><is><t>Paris</t></is></c></row> <row><c t="inlineStr"><is><t>Seattle</t></is></c></row> <row><c t="inlineStr"><is><t>London</t></is></c></row> <row><c t="inlineStr"><is><t>Copenhagen</t></is></c></row> <row><c t="inlineStr"><is><t>Paris</t></is></c></row> <row><c t="inlineStr"><is><t>London</t></is></c></row></sheetData>
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetML TablesSpreadsheetML TablesDesign goals for SpreadsheetML tables:Design goals for SpreadsheetML tables:1.1. Separate presentation and dataSeparate presentation and data
Data stays in the worksheetData stays in the worksheetTable definition is in a separate part (referenced via a relationship)Table definition is in a separate part (referenced via a relationship)
2.2. Cell definition lightweight but extensibleCell definition lightweight but extensibleComplex type with future storage capabilitiesComplex type with future storage capabilitiesNamed ranges written in their own collection instead of on each cellNamed ranges written in their own collection instead of on each cell
Open XML has different types of tables for each Open XML has different types of tables for each document type, optimized for different scenarios:document type, optimized for different scenarios:
WordprocessingML has its WordprocessingML has its tbltbl element elementSpreadsheetML has its SpreadsheetML has its tabletable element elementPresentationML uses DrawingML tables (PresentationML uses DrawingML tables (tbl tbl inside inside graphicDatagraphicData))
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetML Table ExampleSpreadsheetML Table Example
<sheetData> <row r="1" spans="1:2"> <c r="A1" t="s"><v>0</v></c> <c r="B1" t="s"><v>1</v></c> </row> <row r="2" spans="1:2"> <c r="A2"><v>1</v></c> <c r="B2"><v>4</v></c> </row> <row r="3" spans="1:2"> <c r="A3"><v>2</v></c> <c r="B3"><v>5</v></c> </row> <row r="4" spans="1:2"> <c r="A4"><v>3</v></c> <c r="B4"><v>6</v></c> </row></sheetData>...<tableParts count="1"> <tablePart r:id="rId2"/></tableParts>
Headings = shared strings
Worksheet part:
Table-definition part:<table … ref="A1:B4” …> <autoFilter ref="A1:B4”/> <tableColumns count="2"> <tableColumn id="1" name="Column1" /> <tableColumn id="2" name="Column2" /> </tableColumns> <tableStyleInfo …/> </table>
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
AutoFilter ExampleAutoFilter Example
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
FormulasFormulas
Stored as plain textStored as plain text
Documented in the specDocumented in the specto provide for predictableto provide for predictableinteroperabilityinteroperability
<row> <c> <v>1</v> </c></row><row> <c> <v>2</v> </c></row><row> <c> <v>3</v> </c></row><row> <c> <f>SUM(A1:A3)</f> </c></row>
Satisfy Your Technical Curiosity
DrawingMLDrawingML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
DrawingML vs. VMLDrawingML vs. VMLPer the Ecma spec: Per the Ecma spec: “VML should be considered “VML should be considered a deprecated format included in Office Open a deprecated format included in Office Open XML for legacy reasons only.”XML for legacy reasons only.”VML was not entirely replaced by DrawingML VML was not entirely replaced by DrawingML before submission to Ecmabefore submission to Ecma
Main remaining uses of VML:Main remaining uses of VML:WordprocessingML: OfficeArt shapes, textboxesWordprocessingML: OfficeArt shapes, textboxesSpreadsheetML/PresentationML: comments, SpreadsheetML/PresentationML: comments, embedded OLE objectsembedded OLE objects
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
3-D Effects3-D Effects
3-D Scene Definition
Before Apply 3-D Scene
Apply 3-D Bevels
Adjust Material types
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
DrawingMLDrawingMLImplementation varies for each document typeImplementation varies for each document type
Location varies (main body, drawing part, slide)Location varies (main body, drawing part, slide)Packaging (“shim”) variesPackaging (“shim”) varies
WordprocessingML(in Word):
SpreadsheetML(in Excel):
PresentationML(in PowerPoint):
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
WordprocessingMLWordprocessingMLDrawingML is stored in the DrawingML is stored in the document bodydocument body
Shim defines graphic frame and locked canvas
Shape definition is DrawingML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
SpreadsheetMLSpreadsheetMLDrawing is in a separate Drawing is in a separate drawing partdrawing part
Shim defines anchorposition and type
Shape definition usesspreadsheetDrawing namespacefor non-visual properties
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PresentationMLPresentationMLDrawingML is stored in the slide partDrawingML is stored in the slide part
No shim – the shape is in the shape tree
Shape definition is DrawingML
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PresentationMLPresentationMLDocument architectureDocument architecture
View Properties
PresentationProperties
Code
Themes
Fonts
Notes Masters
Slides
HandoutMasters
Slide Masters
Notes Slides
Slide Layouts
Presentation
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Sample SlideSample SlideTypical presentationML contentTypical presentationML content
Shape ChartTextbox
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Slide PartSlide PartShape tree contains slide content definitionsShape tree contains slide content definitions
<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr> <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr> <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>
Shape
Chart
Textbox
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Shape ChartTextbox
Chart Part (chart1.xml)
Data source
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
PresentationML TablesPresentationML TablesSlide part contains table definitionSlide part contains table definitionIn a graphicFrame elementIn a graphicFrame elementAll DrawingML is in the slide – no separate “table part”All DrawingML is in the slide – no separate “table part”
Table position
Table definition
Header-row formatting
Banded-row formatting
TableStyleID = GUID
Satisfy Your Technical Curiosity
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
OpenXmlDeveloper.orgOpenXmlDeveloper.orgFormed by 40 companies to share developer Formed by 40 companies to share developer information about the Office Open XML file formatsinformation about the Office Open XML file formatsArticles with source code for C#, VB, Java, PHP, XSLTArticles with source code for C#, VB, Java, PHP, XSLTForums for posting technical questionsForums for posting technical questions
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
The Ecma SpecThe Ecma Spec1. Fundamentals1. Fundamentals2. Open Packaging Convention2. Open Packaging Convention3. Primer 3. Primer (start here)(start here)4. Markup Language Reference 4. Markup Language Reference (huge!)(huge!)5. Markup Compatibility and Extensibility5. Markup Compatibility and ExtensibilityReference Schemas (XSD, RelaxNG)Reference Schemas (XSD, RelaxNG)
Tips:Tips:• Start with part 3, PrimerStart with part 3, Primer• Use the PDF version of part 4 to look up elements/attributesUse the PDF version of part 4 to look up elements/attributes
Satisfy Your Technical CuriositySatisfy Your Technical Curiosity
Open XML BlogsOpen XML Blogs
Brian Jones: Brian Jones: http://blogs.msdn.com/brian_jonesDoug Mahugh: Doug Mahugh: http://blogs.msdn.com/dmahughKevin Boske: Kevin Boske: http://blogs.msdn.com/kboskeWouter Van Vugt: Wouter Van Vugt: http://blogs.infosupport.com/woutervErika Ehrli: Erika Ehrli: http://blogs.msdn.com/erikaehrli
See complete list on www.OpenXmlDeveloper.orgSee complete list on www.OpenXmlDeveloper.org
Satisfy Your Technical Curiosity