spreadsheet ml overview

29
Introduction to SpreadsheetML A high-level overview of the structure of a spreadsheetML workbook

Upload: shawn-villaron

Post on 07-Aug-2015

35 views

Category:

Software


0 download

TRANSCRIPT

Introduction to SpreadsheetML

A high-level overview of the structure of a spreadsheetML

workbook

Design Strategies• Highly repetitive XML collections have terse tag names

– Cell Table– Pivot Cache Records– Shared Workbook Revision Log

• Keep the cell table definition lightweight but extensible– E.g. Named Ranges written in their own collection instead of on

each cell– The cell definition is a complex type with future storage

capabilities• Always use A1 style notation• Keep the ‘minimal workbook’ definition simple

– Most collections are optional• Mirror Excel’s in-memory data structures for fast

load/save

Example Workbook

Workbook

Styles Calc ChainShared Strings

File Properties

Sheet3Sheet2Sheets 1..N

ChartTable

Drawing

The Workbook-Level Pieces• “Book-Level” Properties (views, lastSavedVersion)• Sheets – a workbook must have at least 1 sheet

– Spreadsheet– Chart Sheet– Macro Sheet– Dialog Sheet

• Shared String Table – unique list of all strings in the workbook• Styles Definition• Themes (shared schema)• Connections• Calculation Chain• Custom XML Maps• Shared Workbook Properties (if shared)• Future Extensibility• Volatile Dependencies

Book Properties

A Typical Sheet

Sheet Properties

Shared Strings

• Workbooks have LOTS of repeated strings

• Shared String Table is a load/save optimization

Formatting & Styles

• Direct Cell Formatting (XF)– Fonts– Fills– Borders– Numeric Formatting

• Cell Styles• Table Styles• PivotTable Styles

Direct Format

Cell, Table, PivotTable Styles

• Referenced by Name• Explicit formatting is described using

formatting records (xf)

Workbook Connections• All external data connections in the workbook• Excludes external formula references & DDE

Links

Calculation Chain

• Calc-related information– A formula’s position in the calculation order– Multi-threaded calculation information

Custom XML Source

• A cache of the user schemas added to the workbook (xmlmaps.xml)

• Nodes mapped to tables (table1.xml)• Nodes mapped to single cells

(tableSingleCells1.xml)

Shared Workbooks

• Supports the Shared Workbook Feature– Revision Headers (used to know which

revision logs to read in)– Revisions log parts (the actual changes)– User name listing (who made the changes)

Future Extensibility

• Future Storage Buckets– A location to store future feature data

• Alternate Content Blocks & ‘Must Understand’semantics (like Word & PPT)

Volatile Dependencies

• Used for Real Time Data & OLAP functions

• Caches Server, Topic, Subtopic, and last returned value

• Enables Excel to avoid a full re-calculation on load to compute these input values. (Because server, topic, subtopic can be calculated values from the grid)

The Sheet-Level Pieces• Comments• Formulas & References & Defined Names• Tables• AutoFilter• External Links

– General– Special Directory Relationships

• PivotTable– PivotTable– PivotCache

• QueryTable• Metadata

Comments• Content stored in separate

part to facilitate 3rd party identification & removal

• Drawing object information stored separately

• Sheet stores the relationship information

Formulas, References, Defined Names

• Excel saves out exactly what you see in the cell at runtime.

• Implication: Excel re-parses the formula on load, and deserializes it on save

Tables

• Data is stored in cell table (sheet1.xml)

• Table Metadata is stored separately (table1.xml)

AutoFilters

• Note enumerated values vs custom expressions

Formula Links to External Workbooks

• Abstract file path to relationships part• Excel caches snapshot of external

workbook structure (sheets & cell tables)

PivotTable

• PivotTable– View information

• Row & Column axis• Page field information• Cached values in the cells• Pivot Table state & settings (filters, outline mode)

• PivotCache– PivotCache Definition (which cache to use,

data type definitions of the records)– PivotCacheRecords (the underlying data)

QueryTable

• A query table is a specific range connected to an external data source

• In the file, we save QT-specific properties– various formats to apply (numeric, font,

borders, connectionId)

Metadata

• Stores extra data about OLAP formulas in cells– Cell-level information

• E.g. Copy/paste/paste special, insert/delete, behaviors

– Value-level information• E.g. does this metadata transfer with the value via

formula references?

Minimal Workbook Tagsxl\workbook.xml

<workbook>   <sheets>

<sheet name="Sheet1" sheetId="1" r:id="rId1"/></sheets>

</workbook>

xl\worksheets\sheet1.xml<worksheet>

<sheetData/></worksheet>

…plus the content types and relationships to hook it all up.

File Format Types

• Template – “XLTX”• Workbook – “XLSX”

• Both use the same file format – differentiation is a function of the main content type and file extension.

Disclaimer

This presentation is for informational purposes only, and should not be relied upon as a substitute or replacement for Microsoft formal file format documentation, which is available at the following website: https://msdn.microsoft.com/en-us/library/cc313118(v=office.12).aspx. Any views or opinions presented in this material are solely those of the author and do not necessarily represent those of Microsoft. Microsoft disclaims all liability for mistakes or inaccuracies in this presentation.