spreadsheet ml overview
TRANSCRIPT
Design Strategies• Highly repetitive XML collections have terse tag names
– Cell Table– Pivot Cache Records– Shared Workbook Revision Log
• Keep the cell table definition lightweight but extensible– E.g. Named Ranges written in their own collection instead of on
each cell– The cell definition is a complex type with future storage
capabilities• Always use A1 style notation• Keep the ‘minimal workbook’ definition simple
– Most collections are optional• Mirror Excel’s in-memory data structures for fast
load/save
Example Workbook
Workbook
Styles Calc ChainShared Strings
File Properties
Sheet3Sheet2Sheets 1..N
ChartTable
Drawing
The Workbook-Level Pieces• “Book-Level” Properties (views, lastSavedVersion)• Sheets – a workbook must have at least 1 sheet
– Spreadsheet– Chart Sheet– Macro Sheet– Dialog Sheet
• Shared String Table – unique list of all strings in the workbook• Styles Definition• Themes (shared schema)• Connections• Calculation Chain• Custom XML Maps• Shared Workbook Properties (if shared)• Future Extensibility• Volatile Dependencies
Shared Strings
• Workbooks have LOTS of repeated strings
• Shared String Table is a load/save optimization
Formatting & Styles
• Direct Cell Formatting (XF)– Fonts– Fills– Borders– Numeric Formatting
• Cell Styles• Table Styles• PivotTable Styles
Cell, Table, PivotTable Styles
• Referenced by Name• Explicit formatting is described using
formatting records (xf)
Workbook Connections• All external data connections in the workbook• Excludes external formula references & DDE
Links
Calculation Chain
• Calc-related information– A formula’s position in the calculation order– Multi-threaded calculation information
Custom XML Source
• A cache of the user schemas added to the workbook (xmlmaps.xml)
• Nodes mapped to tables (table1.xml)• Nodes mapped to single cells
(tableSingleCells1.xml)
Shared Workbooks
• Supports the Shared Workbook Feature– Revision Headers (used to know which
revision logs to read in)– Revisions log parts (the actual changes)– User name listing (who made the changes)
Future Extensibility
• Future Storage Buckets– A location to store future feature data
• Alternate Content Blocks & ‘Must Understand’semantics (like Word & PPT)
Volatile Dependencies
• Used for Real Time Data & OLAP functions
• Caches Server, Topic, Subtopic, and last returned value
• Enables Excel to avoid a full re-calculation on load to compute these input values. (Because server, topic, subtopic can be calculated values from the grid)
The Sheet-Level Pieces• Comments• Formulas & References & Defined Names• Tables• AutoFilter• External Links
– General– Special Directory Relationships
• PivotTable– PivotTable– PivotCache
• QueryTable• Metadata
Comments• Content stored in separate
part to facilitate 3rd party identification & removal
• Drawing object information stored separately
• Sheet stores the relationship information
Formulas, References, Defined Names
• Excel saves out exactly what you see in the cell at runtime.
• Implication: Excel re-parses the formula on load, and deserializes it on save
Tables
• Data is stored in cell table (sheet1.xml)
• Table Metadata is stored separately (table1.xml)
Formula Links to External Workbooks
• Abstract file path to relationships part• Excel caches snapshot of external
workbook structure (sheets & cell tables)
PivotTable
• PivotTable– View information
• Row & Column axis• Page field information• Cached values in the cells• Pivot Table state & settings (filters, outline mode)
• PivotCache– PivotCache Definition (which cache to use,
data type definitions of the records)– PivotCacheRecords (the underlying data)
QueryTable
• A query table is a specific range connected to an external data source
• In the file, we save QT-specific properties– various formats to apply (numeric, font,
borders, connectionId)
Metadata
• Stores extra data about OLAP formulas in cells– Cell-level information
• E.g. Copy/paste/paste special, insert/delete, behaviors
– Value-level information• E.g. does this metadata transfer with the value via
formula references?
Minimal Workbook Tagsxl\workbook.xml
<workbook> <sheets>
<sheet name="Sheet1" sheetId="1" r:id="rId1"/></sheets>
</workbook>
xl\worksheets\sheet1.xml<worksheet>
<sheetData/></worksheet>
…plus the content types and relationships to hook it all up.
File Format Types
• Template – “XLTX”• Workbook – “XLSX”
• Both use the same file format – differentiation is a function of the main content type and file extension.
Disclaimer
This presentation is for informational purposes only, and should not be relied upon as a substitute or replacement for Microsoft formal file format documentation, which is available at the following website: https://msdn.microsoft.com/en-us/library/cc313118(v=office.12).aspx. Any views or opinions presented in this material are solely those of the author and do not necessarily represent those of Microsoft. Microsoft disclaims all liability for mistakes or inaccuracies in this presentation.