planets & document conversion...

15
PLANETS & Document Conversion Tools Preserving our Digital Assets Document Interoperability Initiative (DII’09) London, 18 May 2009 Wolfgang Keber ([email protected]) Natasa Milic-Frayling ([email protected])

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

PLANETS & Document Conversion Tools

Preserving our Digital Assets

Document Interoperability Initiative (DII’09)London, 18 May 2009

Wolfgang Keber ([email protected])

Natasa Milic-Frayling ([email protected])

Page 2: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Overview

PLANETS Project

Document conversion tools

Objectives

Technical Approach

Current status

Demo

Page 3: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

PLANETS

Preservation and

Long-term

Access through

Networked

Services

PLANETS is a four-year project co-funded by the European Union under

the Sixth Framework Programme to address core digital preservation

challenges.

“The primary goal for Planets is to build practical services and tools to help ensure long-term access to our digital cultural and scientific assets. “(excerpt from http://www.planets-project.eu/)

Page 4: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Partners

The British Library National Library, Netherlands Austrian National Library State and University Library, Denmark Royal Library, Denmark

National Archives, UK Swiss Federal Archives National Archives, Netherlands

Hatii at University of Glasgow University of Freiburg Technical University of Vienna University at Cologne

Tessella Plc IBM Netherlands Microsoft Research, Cambridge ARC Seibersdorf research

Page 5: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Sub Projects

Page 6: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Microsoft & PLANETS: Office Documents

Microsoft Research role within PLANETS:

Conversion of binary Microsoft Office Documents into Office Open XML File Format (OpenXML)

We extended the effort to include other formats

More legacy formats, e.g. WordPerfect

Other open standards, e.g. Open Document Format.

Binary MS Office OpenXML

WordPerfect ODF

Binary MS Office OpenXML

DOS Word UOF

Page 7: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Document Conversion Tools – Our Approach

Three-step approach, resulting in a modular and

extendible infrastructure

Identify existing conversion tools and libraries

Wrap these tools and libraries into re-usable components

Integrate these components into PLANETS and other

systems.

If possible, do not use the office applications (e.g.,

Microsoft Office or OpenOffice.org)

They are designed as interactive applications

Message boxes might pop up (“Do you want …”)

Unclear license question when running on a server.

Page 8: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Reusable Components

Transformer Box (Wrapper)

“Binary OpenXML”TB

Interface

Watch Folder Tool

Web Service

ToooXML (GUI)

Page 9: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Extendible Architecture

Transformer Box (Wrapper)

“ODF OpenXML”

Transformer Box (Wrapper)

“WP OpenXML”

Transformer Box (Wrapper)

“Binary OpenXML”TB

Interface

Watch Folder Tool

Web Service

ToooXML (GUI)

Page 10: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

More Technical Details (1)

Currently two types of wrappers for

Command-line tools (stand-alone executables)

OpenXML/ODF Translator (OpenXML ODF)

OpenXML Document Viewer (OpenXML HTML)

Microsoft conversion libraries (CNV libraries)

WordPerfect RTF

RTF OpenXML

We allow wrappers to be chained

WordPerfect RTF OpenXML ODF.

Page 11: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Microsoft Word

More Technical Details (2)

Microsoft conversion libraries (CNV libraries)

Originally designed to import/export “foreign” document formats into/from Microsoft Word

Based on the Microsoft Conversion API

Foreign2RTF

RTF2Foreign

Transformer Box CNV Wrapper follows this API.

Transformer Box CNV Wrapper

CNV Library

RTF2Foreign Foreign2RTF

Page 12: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Supported Formats

Source formats

WordPerfect 5

WordPerfect 6

DOS Word

Word 2, 6, 95

Word 97-2003

RTF

ODF

OpenXML

Target formats

OpenXML

ODF

UOF

HTML

XCDL (format defined in PLANETS/PC)

Page 13: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

ToOOXML++ Demo

ToOOXML++

UI for demonstrating the conversion tools

Allows documents to be selected

Automatically determines the document format

Offers a list of available target formats.

Virtual PC

Pre-installed ToOOXML++

Legacy applications to view documents in their native applications

WinWord 2

WordPerfect 5

Page 14: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Next Steps

Web services architecture in PLANETS Interoperability Framework is currently being defined

Adapting our current web service to the PLANETS interface

Adopting Microsoft‘s new OpenXML-based External File Converter API

Making the document conversion tools available to a broader community (ConvPortal)

Adding comparison support (source/target document).

Page 15: PLANETS & Document Conversion Toolsdownload.microsoft.com/download/C/D/1/CD1AA269-0362-4EFA... · 2018-10-16 · Document Conversion Tools –Our Approach Three-step approach, resulting

Q & A

Document Interoperability Initiative (DII’09)London, 18 May 2009

Wolfgang Keber ([email protected])

Natasa Milic-Frayling ([email protected])