office openxml: a technical approach for ooo

Download Office OpenXML: a technical approach for OOo

If you can't read please download the document

Upload: alexandro-colorado

Post on 16-Apr-2017

2.260 views

Category:

Technology


1 download

TRANSCRIPT

Office Open XML: a technical approach for OOo
OOoCon 2007, Barcelona, September 21st, 2007

Hubert FIguiere

Software Engineer, OpenOffice.org

Novell - [email protected]

Getting Started

What is Office Open XML?

An office application file format

XML based

Created by Microsoft...

...for Microsoft Office 2007

ECMA standard 376

Proposed to ISO

What Office Open XML is not?

Office Open XML is not OpenDocument (ISO 26300)

... nor the previous XML formats for Microsoft Office introduced in the last few MS-Office release

... nor an ISO standard though it has been proposed

Why supporting Open XML?

Support = importing from (and/or exporting to)

For interoperability reasons with Microsoft Office 2007

Overview of the format

The specification

Available to anybody as ECMA standard 376

5 PDF documents

Fundamentals

Open Packaging Conventions

Primer

Markup Language Reference

Markup Compatibility and Extensibility

173 + 129 + 472 + 5129 + 43 = 5946 pages

The specification (cont.)

Some have printed it.

OpenXML printed specphoto by Pavel Janikphoto by Pavel Janikhttp://blog.janik.cz/archives/2007/05/19/T20_32_07/

Packaging Conventions

A zip file: Open Package

Contain the main content...

... and the embedded content

Same container used for other Microsoft format like XPS

Replace the old OLE structured storage

In principle similar to OpenDocument, but not really.

Content

DrawingML

Diagrams, Charts, etc.

WordprocessingML

Word document

SpreadsheetML

Excel document

PresentationML

PowerPoint presentation

Heavily relies on DrawingML

Content (cont.)

Relationships

Maps embedded objects

Set the relationships between fragments

Content (cont.)

VML

Legacy format from Office 2000

Embedded objects

Sound files

Images

Can be anything !

I have seen some PowerPoint document with an OpenDocument chart in an OLE container that was referenced from a slide

OpenOffice implementation

Plans

Implement a native filter for Office Open XML

Import (in progress)

Export (Novell is committed to do it)

Split in 2 modules

Target is tentatively 2.4

Novell's ooo-build 2.3 has it:

Ship with openSUSE 10.3

Will ship with other Linux distros

Joint effort between

Sun and Novell

[...] a team of 5 developers will implement 25 handlers a week, which means that we'd have all the XML handlers written in 44 weeks.

[...] Nevertheless, weve taken a little less than a year to get the converters reading the new file format.

[...] This is just for Word.

-- Rick Schaut, Mac Office team, about implementing the Office 2007 importer for Word for Mac, December 2006.

http://blogs.msdn.com/rick_schaut/archive/2006/12/07/open-xml-converters-for-mac-office.aspx

Microsoft released the beta version of the Word 2007 to RTF converter for MacOS in May 2007...

...and PowerPoint support was released July 31st 2007

Modules

Writerfilter

Word import

Refactoring of the RTF and binary doc filter

See Fridrich Strba presentation for all the details

OOX

Excel and PowerPoint, but not Word

CWS xmlfilter02

implements VML as well

called by the writerfilter if needed.

No XSLT

OOX is not an XSLT based filter.

Process XML to input into OpenOffice.org internal model

Written in C++

The fast SAX parser

5568 tokens are listed in our code

String comparisons for tokens are slow

The fast SAX parser is designed to

reduce the number of string comparisons by using a 32-bits hash for string tokens (including the xml namespace)

offer that API through UNO

It lives in the sax module

Off course it is generic and could be used anywhere

Fast parser details

Hash tokens are generated by gperf at compile time

From a compile time generated list (OOX)

Each know string token is referenced by a const like XML_token

XML namespace in the high order bits of token

Allow selecting the namespace with a simple bit-mask

Example

switch( aToken )

{

case NMSP_DRAWINGML|XML_lnSpc:

break;

case NMSP_DRAWINGML|XML_spcBef:

break;

case NMSP_DRAWINGML|XML_spcAft:

break;

default:

}

API

The OOX module only depend on UNO API

Can't always get inspiration from the binary filters that mostly use the internal APIs

Some UNO API are incomplete or missing

They need to be implemented

The data model

The Office Open XML data model is somewhat very close to the one from the binary format

[...] XLSX may be ugly, but its concepts were very familiar from XLS. We already had much of the code required to handle it.

-- Jody Goldberg about Gnumeric Excel 2007 support,

http://blogs.gnome.org/jody/2007/09/10/odf-vs-oox-asking-the-wrong-questions/

Excel vs Calc

Excel 2007 has more feature difference than Calc

Dealing with missing features in Calc:

Find a workaround

Downgrade the data

Problem with round-trip conversions

Implement the missing feature

Excel 2007 vs Excel 2003

No notable new feature into the core

Overall structures are very similar

shared string table that contains cell string

Sheet protection options data contain the identical set of options.

Autofilter uses internal cell range names (not visible to the user) that are identical both in xlsx and xls.

Excel 2007 vs Excel 2003 (cont.)

Overall structures are very similar (cont.)

In both xls and xlsx formats, pivot table record contains a cached source data.

Excel allows rich text and field objects in the header and footer, and they are encoded. In both xls and xlsx, the same encoding scheme is used.

PowerPoint vs Impress

Pixel perfect rendering

People spend hours in airport to refine their PowerPoint...

...so the import has to be perfect

SmartArt

This is a big feature in PowerPoint 2007

Animation / transition

Both based on SMIL

PowerPoint 2007 vs PowerPoint 2003

Not much changes

SmartArt

Saving in PowerPoint 2007 as binary PPT makes it an embedded OLE

Off course this require having the engine

DrawingML

A shared ML

Used directly by PresentationML

Encountered in WordprocessingML and SpreadsheetML documents.

Defines styles, shapes, text, charts, diagrams, audio/video, etc

Supposed to be more functional than VML, therefore to replace it.

VML

Legacy Microsoft XML format

Still generated by 2007 version if MS applications

Replace the binary EMF for OLE

Used by annotations in Excel

and a lot of drawing features in Word

supposed to be superseded by DrawingML

Alternative Implementations

odf-converter (Free Software)

Microsoft sponsored ODF to Office OpenXML converter

XSLT based

Written in C# / .Net

Also runs with Mono (Free Software platform)

Free Software (MIT style license)

Currently shipped by Novell for SUSE and Windows

GNOME (Free Software)

libgsf

Implement OpenPackage reading and writing

Gnumeric

Import .xlsx files

Export .xlsx files (somewhat)

AbiWord

Import .docx

Both run on non-GNOME platforms like Windows

The initial importer was written on the flight to London for the ECMA meeting, and export was added on the flight back. Toss in a few hours of debugging and the sample file [...] was under a week of effort to read and write.

-- Jody Goldberg about Gnumeric Excel 2007 support,

http://blogs.gnome.org/jody/2007/09/10/odf-vs-oox-asking-the-wrong-questions/

Apple iWork '08 (non-Free)

Pages

Import and export .docx

Numbers

Import and export .xlsx

Keynote

Import and export .pptx

Questions?

Unpublished Work of Novell, Inc. All Rights Reserved.

This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc. Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General Disclaimer

This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents
of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.

Click to enter the title (44pt)
Second line or subtitle (22pt)

Presenter Name (16pt)

Presenter Title (14pt)

Company/email (14pt)

Click to Edit Section Break Text (32pt)
Right Justified

piece in master that I can't get rid of

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level

Click to edit the title text format

Click to edit the title text format (32pt)

Click to edit the outline text format (24pt)

Second Outline Level (20pt)

Third Outline Level (16 pt)

Fourth Outline Level (14pt)

Fifth Outline Level (12pt)

Novell Inc. All rights reserved

Click to edit the title text format

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level