word to · pdf filea sample documents 14 ... execute setup.exe in the setup\word-to-latex...

37
Word to L A T E X User’s Manual Michal Kebrt

Upload: doanngoc

Post on 06-Feb-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Wordto

LATEX

User’s Manual

Michal Kebrt

Contents

1 User’s manual 31.1 Requirements and installation . . . . . . . . . . . . . . . . . . . . 31.2 Uninstallation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Command-line convertor . . . . . . . . . . . . . . . . . . . . . . . 41.5 EPS to TIF image conversion . . . . . . . . . . . . . . . . . . . . 51.6 Graphic user interface . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6.1 Running the conversion . . . . . . . . . . . . . . . . . . . . 51.6.2 Figures, Equations and Translations . . . . . . . . . . . . . 61.6.3 Document preamble . . . . . . . . . . . . . . . . . . . . . 81.6.4 Special characters . . . . . . . . . . . . . . . . . . . . . . . 91.6.5 Styles and Font sizes . . . . . . . . . . . . . . . . . . . . . 101.6.6 Miscellaneous options . . . . . . . . . . . . . . . . . . . . . 11

1.7 Running Word-to-LATEX from Word . . . . . . . . . . . . . . . . . 121.8 Conversion to XML, XHTML, MathML . . . . . . . . . . . . . . 12

A Sample documents 14

B Structure of configuration files 19B.1 Conversion options . . . . . . . . . . . . . . . . . . . . . . . . . . 20B.2 Conversion mappings . . . . . . . . . . . . . . . . . . . . . . . . . 23B.3 Special characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2

Chapter 1

User’s manual

1.1 Requirements and installation

• Microsoft Windows 2000 or XP is required.

• Microsoft .NET Framework Version 1.1 or higher is required. We stronglyrecommend .NET Framework 1.1 because the convertor cannot be run as aWord addin with .NET Framework 2.0. Only the standalone version (whichis much slower) can be run with .NET Framework 2.0. .NET Framework1.1 can be downloaded from Microsoft and it can be installed together with.NET Framework 2.0 if you already have it.

• Microsoft Word XP (2002) or higher is required to be installed on yoursystem.

• If you want to export mathematical equations not only as images, but also toLATEX or MathML formats, you will have to install Design Science MathType(it’s a commercial product).

• You must have a PostScript printer driver installed on your system to beable to export images to EPS format. You can try this printer.

After you have installed all the required software, close Word (if it’s run-ning), execute setup.exe in the setup\Word-to-LaTeX directory, and follow theinstructions. You must have administrator privileges to install the whole appli-cation properly. Once the installation is finished, you will find a couple of files inyour Word-to-LATEX directory. Some of them are listed here:

• word-to-latex.exe – Word-to-LATEX command-line convertor

• word-to-latex-gui.exe – Word-to-LATEX graphic user interface

• config.xml, XMLconfig.xml – convertor configuration for LATEX and XMLoutput

• html.xsl – XSL file which transforms XML output to HTML

• manual.pdf – user’s manual

• eps2tif – directory containing a batch file for converting EPS images toTIF format

3

1.2 Uninstallation

If you want to uninstall Word-to-LATEX from your system, go to Control Panel| Add or Remove programs and select Word-to-LATEX. Please close Word (ifit’s running) before uninstalling.

1.3 Configuration

All the program configuration is stored in an XML file with a public format whichis defined using XML Schema in the config.xsd file. Before the conversionprocedure starts, the configuration is validated against the schema, so you mustbe very careful when editing the file manually.

There are two predefined configuration files in your Word-to-LATEX directory,config.xml for conversion to LATEX and XMLConfig.xml for conversion to XMLformat.

Don’t be afraid if XML is an unknown abbreviation for you. There is noneed to know anything about XML technologies because you can customize theconvertor also through the graphic interface which will be described in section1.6.

Appendix B describes the XML structure of configuration files and possiblevalues in each element and attribute.

1.4 Command-line convertor

When the command-line convertor (word-to-latex.exe) is executed without anyparameters, the list of all possible options from table 1.1 will be printed.

word-to-latex.exe -i inputFile [-o outputFile] [-opt confFile]

-i input file name-o output file name-opt configuration file name

Table 1.1: word-to-latex.exe options

The only required option is “-i”. When the output file is omitted, the inputfile name appended with “.tex” extension is taken instead. If the configurationfile is not specified, the default configuration stored in the config.xml file is usedfor the conversion.

After you run the program with correct options, it prints all the file names(input, output, configuration) and also your Microsoft Word version which canbe useful when an error occurs. Then the conversion routine is started and youwill be informed about the progress.

Please be patient when you are converting a large document, it can take along time to convert it. Much more faster way of running the conversion will bedescribed in section 1.7.

4

1.5 EPS to TIF image conversion

As not all images included in Word documents can be converted to bitmaps, Iwrote a simple batch file (eps2tif.bat in the eps2tif directory) which convertsEPS files to TIF format. It benefits from the fact that Word-to-LATEX can exportall images to EPS format.

This batch file requires Ghostscript program which is free for non-commercialuse. The path to the Ghostscript executable must be specified at the top of theeps2tif.bat file.

When you want to export all images from a Word document to some bitmapformat (PNG, JPEG, and so on), just run Word-to-LATEX to have an EPS versionof each image and then execute the eps2tif.bat file with the options describedin table 1.2. Finally you can convert the output TIF files to the format you prefer(for example Irfanview does this very effectively).

eps2tif.bat inDir outDir

inDir directory from which the files with .eps extension are takenoutDir directory where the .tif files will be saved

Table 1.2: eps2tif.bat options

1.6 Graphic user interface

For most of users the graphic interface will be the most frequent way of usingWord-to-LATEX convertor. To run it, just click the icon on your Desktop or in theStart menu, or execute the word-to-latex-gui.exe file in your Word-to-LATEXdirectory.

After executing the program, the configuration dialog will appear. All the sixtabs will be described now.

1.6.1 Running the conversion

Only the Input document is required to be selected. When the Output fileis omitted, the Input document file name appended with “.tex” extension istaken instead.

Two configuration files can be found in your Word-to-LATEX directory,config.xml for conversion to LATEX and XMLConfig.xml for conversion to XML.

When the Configuration file is omitted, config.xml will be used instead.But be careful, it’s recommended to customize the settings for each documentyou convert. Save as . . . , Save and Load commands in the Configurationmenu can be used to load and save convertor configurations. Remember that thecurrent configuration must be saved before it is applied during the conversion.You can check the option Save configuration before conversion to save theconfiguration automatically after pressing the Convert button.

When you press the Convert button, all the file names (input, output, con-figuration) and also your Microsoft Word version will be written to the text box

5

Figure 1.1: “Running” tab

below. This can be useful when an error occurs. Then the conversion routineis started and you will be informed about the progress in the text box. Pleasebe patient when you are converting a large document, it can take a long time toconvert it. Much more faster way of running the conversion will be described insection 1.7.

1.6.2 Figures, Equations and Translations

Figure 1.2: “Figures/Eq/Document” tab

6

FiguresCheck Only figures to convert only figures and ignore the text content of theinput document. Word-to-LATEX exports images (including embedded objectslike Excel graphs) in two formats – vector Encapsulated PostScript (EPS) orbitmap PNG. If you want to export images to EPS format, you must specify thePostScript printer. This topic was mentioned in section 1.1.

EPS format is recommended because EPS images can be easily integrated intoLATEX documents and moreover some images included in Word documents (e.g.Word drawings) cannot be exported as bitmaps. If this occurs, the convertor willgive you a notice and after it finishes, you can export all images to EPS formatand use eps2tif program described in section 1.5 to have a bitmap version ofeach image.

EquationsIf you have MathType installed on your system, you can check convert and allequations inserted through Equation Editor, MathType and Word EQ fields willbe converted. Otherwise you have to select ignore to ignore all equations orto images for exporting equations to images.

When the convert option is selected, the output format of converted equa-tions depends on the translation file defined in the TDL filename box. See theTranslators subdirectory of your MathType directory for possible values. Youcan edit or add new files to this directory if you want to customize the conversionof equations.

Document settingsAs the convertor performs a few special actions depending on the Output for-mat, you must select LATEX or XML. But remember that it doesn’t change anyTranslations.

The @WL-DOC_CLASS macro used in the document preamble will be replacedwith the value of the Document class option. The @WL-PAGE_SIZE macro willbe replaced with a value depending on the Page size processing option as showstable 1.3.

Option name @WL-PAGE_SIZE will be replaced with

complete the complete definition of the page size matchingthe page size of the input document

symbolic the convertor will try to translate the symbolicpage size (e.g. A4) of the input document to anappropriate LATEX size (e.g. letterpaper)

use “Page size” the value of the Page size option

Table 1.3: Page size processing options

TranslationsThe translation mappings between input document elements and LATEX com-mands are defined here. It comprises of headings, font styles, footnotes, tables,

7

alignments, colors, and so on. Each element has a Start command which isinserted before the element itself and an End command inserted after the ele-ment.

One example: Let “some text” appear in the document and the FONT_ITALIC

mapping is “\textit” for the start command and “” for the end command.Then “\textitSome text” will be written to the output file.

The complete overview of translated elements with the default mappings forLATEX and XML output can be found in section B.2.

1.6.3 Document preamble

Figure 1.3: “Preamble” tab

Document preamble, inserted at the top of output files, can be easily edited inthis dialog. Table 1.4 shows the list of macros that can be used in the preamble.

The translations of Output format special characters (e.g. “\” in LATEXor “<” in XML) are defined in the right part of this dialog. Don’t forget to fill inthese characters in the right order because some special characters can be usedfor the translation of other special characters (e.g. “\” must be at the top forLATEX output). New characters can be added double-clicking the pink row.

1.6.4 Special characters

Special characters are divided into groups according to their Unicode [1] posi-tions. Each character can have a translation used in regular text context and amath translation used in math context. Currently when a character has bothtranslations defined, the text translation is always used. If it has only a mathtranslation, the character is inserted as a simple inline equation. If no translationis defined, the character is inserted “as is” (in UTF-8 encoding).

The math translation does not influence the conversion of equations. whichis completely defined in a TDL file (see section 1.6.2 for details).

8

Macro Replaced with

@WL-DOC_CLASS the Document class option from the previous di-alog

@WL-DOC_AUTHOR the input document’s author (retrieved from thedocument’s properties)

@WL-DOC_TITLE the input document’s title (retrieved from the doc-ument’s properties)

@WL-PAGE_SIZE see the Document settings in the previous sec-tion

@WL-DEFAULT_FONT_SIZE the default font size; details in section 1.6.5

@WL-STYLE_COMMANDS the commands created from paragraph and charac-ter user styles, see the Styles/Fonts tab in section1.6.5 for details.

Table 1.4: Document preamble macros

Figure 1.4: “Characters” tab

9

Default translations can be changed double-clicking the field you want toedit. The encoding of output files is UTF-8 which covers all national characters,so there is no need to define translations for Latin extended characters (e.g. “a”)or Cyrillic ones. Just make sure that you have appropriate commands in thedocument preamble, for example:

\usepackage[T2A]fontenc

\usepackage[utf8]inputenc

1.6.5 Styles and Font sizes

Figure 1.5: “Styles/Fonts” tab

The translations of paragraph and character user styles can be defined inthis dialog. Press Add new . . . and fill in the name of a style, the startcommand inserted before the text content of the style and the end commandinserted after the text content. When you omit the definition of some style,appropriate commands will be created automatically on the basis of the styleproperties. Word built-in styles are skipped.

You can edit the list of styles double-clicking any of the fields. Write Y(or N) to the leave as is field if you don’t want to make any changes (charactertranslations, wrapping) in the text content of the style. It’s suitable for stylesthat are translated to the verbatim environment.

Check Create commands in the preamble to make a special commandfor each style in the document preamble. It’s recommended to enable this optionbecause it makes output files much more maintainable. For example, if you havea style named“code”, \stylecode command will be created and when you decideto change the definition of the style, you will do it only in one place.

Font sizes are split into 10 groups which are converted to the commands de-fined in Translations (see 1.6.2 for details). Each group has a point range ofsizes that it covers – from the start size (exclusively) to the end size (inclu-sively). You can edit the default settings double-clicking the end size field of agroup you want to change. Start sizes are counted automatically.

10

The portions of text that have the Default font size won’t be marked withany command defining the font size. Therefore it’s very important to have acorrect value in this field to avoid a lot of unnecessary font size commands in theoutput file. Check Auto detect default font size to retrieve the default sizefrom the Word built-in Normal style.

1.6.6 Miscellaneous options

Figure 1.6: “Misc” tab

OutputCheck Wrap paragraphs and insert an integer number to wrap the paragraphsin the output text file. The following line separators can be used in output files:CRLF (Windows), LF (Unix), CR (Macintosh).

ParagraphsCheck Process paragraph alignments and Process paragraph indenta-tions to take them into account. Sometimes it’s better to ignore Word alignmentsand indentations because LATEX can make them automatically and better.

ColorsCheck Convert colored text to convert colored portions of text using xcolor

package. But be very careful when checking this option because it takes a lot oftime to find and convert the colored text.

The same package is used when you check Convert highlighted text (markedwith the Word Highlight tool) and Convert colored table cells.

When any option is unchecked, it only means that commands defining colorswon’t be inserted into the output file. The whole text content will be, of course,converted.

MiscCheck Convert multicolumns to convert multicolumn sections inserted throughFormat | Columns. Sans-serif fonts like Arial or Verdana are converted toappropriate commands only when Convert sans-serif fonts is checked.

11

Check the option Automatically recognize math in italicized text andsimple math expressions like i or k < 30 will be inserted as math text instead oftext in italics.

The convertor can Recognize references to numbered equations if theymatch the pattern ([1-9]+) or ([1-9]+.[1-9]+) (e.g. (3.15)). A numberedequation must be inserted on a separate line and its label must be written at theright part of the same line. Any number of white space characters between theequation and its label is allowed.

Paragraphs not containing any text won’t be converted when Ignore emptyparagraphs is checked.

Word-to-LATEX can Convert endnotes into bibliography items and Rec-ognize bibliography references (citations) if they match the pattern\[[A-Za-z0-9]+\] (e.g. [4] or [Ka76]). But if you don’t use endnotes forbibliography items, you will still have to edit the bibliography section manually.

1.7 Running Word-to-LATEX from Word

The conversion will be at least 10 times faster if you press the button on the Word-to-LATEX toolbar installed directly into your Word application. The convertorinterface is completely the same as the one described in the previous section.

If you have problems with running the convertor from Word, please verifythat you have Medium or Low option checked in the Word Tools | Macro |Security menu.

Figure 1.7: Word-to-LATEX toolbar in Word

1.8 Conversion to XML, XHTML, MathML

The output of the convertor completely depends on the configuration. There isno need to convert documents only to LATEX. The XMLConfig.xml configurationfile, stored in the Word-to-LATEX directory, is used for conversion to XML [2]which is a nice intermediate format that can be easily transformed to whateverformat you need. You should be familiar with XML and related technologies tounderstand a short overview.

The best way to insert mathematical equations into XML documents isMathML language. Word-to-LATEX uses MathType built-in capability to exportequations to MathML format.

XML format is very strict – XML files must be so-called “well-formed”. Some-times the convertor produces a file that is not well-formed, but it’s never difficultto correct such a file manually.

Once we have a well-formed XML file, an XSLT style [3] can be used totransform the file into the format we need. The html.xsl style, located in theWord-to-LATEX directory, transforms the input file to XHTML format [4] com-bined with CSS [5]. This style was tested with saxon XSLT processor.

12

Appendix A

Sample documents

The following pages show two documents converted with Word-to-LATEX.

13

Original Word document

1. Font styles

1.1. Styles 1

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. UT SED NISI vel justo

lobortis venenatis. Sed id risus. Donec sollicitudin. Aenean nulla. Nam blandit, sapien a venenatis viverra, velit nisl mattis urna, non luctus sapien ante et leo. H2O, E = mc2

1.2. Styles 2

Lorem ipsum dolor sit amet1, consectetuer adipiscing elit. Ut sed nisi vel justo lobortis venenatis. Sed id risus. Donec sollicitudin. Aenean nulla. Nam blandit, sapien a venenatis viverra, velit nisl mattis urna, non luctus sapien ante et leo.

2. Special characters in list • Žluťoučký kůň úpěl ďábelské ódy.

o Ψ Ω α ζ δ; i ∈ T; (a,b) ∉ A × B.

3. Paragraph indentation Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut sed nisi vel justo lobortis.

4. Simple table Blue Center bold Right 2-1 Italics Pink

5. Complex table Header

a b A c d

B

1 Lorem ipsum dolor sit amet

14

LATEX output compiled to PostScript

1 Font styles

1.1 Styles 1

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut sed nisi vel justo

lobortis venenatis. Sed id risus. Donec sollicitudin. Aenean nulla. Namblandit, sapien a venenatis viverra, velit nisl mattis urna, non luctus sapien ante etleo. H2O, E = mc2

1.2 Styles 2

Lorem ipsum dolor sit amet1, consectetuer adipiscing elit. Ut sed nisi vel justo

lobortis venenatis. Sed id risus. Donec sollicitudin. Aenean nulla. Nam blandit ,

sapien a venenatis viverra, velit nisl mattis urna, non luctus sapien ante et leo.

2 Special characters in list

• Zlut’oucky kun upel d’abelske ody.

– Ψ Ω α ζ δ; i ∈ T; (a,b) 6∈ A × B.

3 Paragraph indentation

Lorem ipsum dolor sit amet, consectetueradipiscing elit.

Lorem ipsum dolor sit amet, consectetueradipiscing elit. Ut sed nisi vel justo lobortis.

4 Simple table

Blue Center bold Right

2-1 Italics Pink

5 Complex table

Header

A a b Bc d

15

XML output transformed to HTML and rendered in Mozilla

Font styles

Styles 1

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. UT SED NISI vel justo

lobortis venenatis. Sed id risus. Donec sollicitudin. Aenean nulla. Nam blandit, sapien a venenatis viverra , velit nisl mattis urna, non luctus

sapien ante et leo. H2O, E = mc2

Styles 2

Lorem ipsum dolor sit amet ( Lorem ipsum dolor sit amet) , consectetueradipiscing elit. Ut sed nisi vel justo lobortis venenatis. Sed id risus. Donec sollicitudin. Aenean nulla. Nam blandit, sapien a venenatis viverra, velit nisl mattis urna, non luctus sapien ante et leo.

Special characters in list

Žluťoučký kůň úpěl ďábelské ódy.

Ψ Ω α ζ δ; i ∈ T; (a,b) ∉ A × B.

Paragraph indentation

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut sed nisi vel justo lobortis.

Simple table

Blue Center bold Right

2-1 Italics Pink

Complex table

Header

Aa b

Bc d

16

Original Word document at the top, LATEX output compiled toPostScript at the bottom

Bitmap image

0

10

20

30

40

I II III

Energy Water Wood

Microsoft Excel graph

Equation editor expressions

∑=

=),max(

1

),(),(

ji ll

k

kj

kiji oodooD (1)

Given a set of paths PX and a set of path contents PCX , binary relation PPC PCP XX ×⊆ is

defined. An PPCse ∈, denotes the assignment of the path keeee /// 21 …= to the path content

kssss /// 21 …= .

EQ field expression - 3 . See expression (1).

Bitmap image

0

10

20

30

40

I II III

Energy Water Wood

Microsoft Excel graph

Equation editor expressions

D(oi, oj) =

max(li,lj)∑

k=1

d(oki , ok

j ) (1)

Given a set of paths XP and a set of path contents XPC , binary relation PPC ⊆

17

Appendix B

Structure of configuration files

<?xml version="1.0" encoding="utf-8" ?>

<configuration xmlns=’http://kebrt.cz/word-to-latex’

xmlns:xsi=’http://www.w3.org/2001/XMLSchema-instance’>

<variousOptions>

<option name="OUTPUT_FORMAT" value="latex" />

<option name="EQUATIONS" value="toimages" />

...

</variousOptions>

<translationTable>

<docElement name=’FONT_BOLD’

start=’\textbf’ end=’’ />

<docElement name=’HEADING1’ start=’\part’ end=’’ />

...

</translationTable>

<specialChars>

<latexChar char=’\’ convertTo=’\textbackslash ’ />

...

</specialChars>

</configuration>

Figure B.1: Fragment of the config.xml configuration file

All the configuration is stored in an XML file with the <configuration> rootelement which contains three subelements:

<variousOptions> various options applied during the conversion (out-put format, PostScript printer name, . . . )

<translationTable> table containing mappings between input docu-ment elements (sections, paragraphs, footnotes,and so on) and LATEX commands

<specialChars> translation mappings between special (and na-tional) characters and LATEX commands

18

B.1 Conversion options

All the options, listed in table B.1, belong to the <variousOptions> parent ele-ment. Each of the them is inserted into the <option> element with two attributes,name and value.

Option name Description and possible values

ONLY_IMAGES Convert only images and ignore text content.• yes × no

PRINTER_NAME The name of a PostScript printer which is used forexporting images in EPS format. The printer driverhas to be installed on your system.• e.g. Generic Color PS

IMAGE_FORMAT The output format of images.• eps for EPS vector format; requires a PostScriptprinter• png for PNG bitmap format; not all the images canbe exported as bitmaps

TDL_FILENAME The translation file used for the conversion of equa-tions. See the Translators subdirectory of yourMathType directory for possible values (rememberthat MathType must be installed on your system tobe able to convert equations). You can edit or addnew files into this directory if you want to customizethe conversion of equations.• e.g. LaTeX.tdl

EQUATIONS The conversion of equations, covers Equation Editor,MathType and EQ fields equations.• ignore – do not convert• convert – convert using the translation file speci-fied in the TDL_FILENAME option• toimages – convert to images

CREATE_COMMANDS_

FOR_STYLES

The convertor will create (or not) new commands forparagraph and characters user styles in the preamble.Output text files are more maintainable if commandslike \code are used instead of for example \texttt.• yes × no

DOC_CLASS The @WL-DOC_CLASS macro used in the preamble willbe replaced with the value of this option.• e.g. article

Table B.1: Conversion options

19

Option name Description and possible values

OUTPUT_FORMAT The format of output files. Please remember that alltranslations mappings described in B.2 should be setto match this output format. The convertor performsa few special actions depending on two possible val-ues:• latex

• xml

PAGE_SIZE The @WL-PAGE_SIZE macro used in the documentpreamble will be replaced with the value of this op-tion (only if the PAGE_SIZE_PROCESSING option is setto my).• e.g. a4paper

PAGE_SIZE_

PROCESSING

Specifies how the page size will be processed, possiblevalues are:• complete – the @WL-PAGE_SIZE macro used in thedocument preamble will replaced with the completepage size definition matching the page size of the in-put document• symbolic – the convertor will try to translate thesymbolic page size of the input document (e.g. A4)to an appropriate LATEX size (e.g. letterpaper)• my – see the previous option

DEFAULT_FONT_SIZE Defines the default font size of the input document.The portions of text having this size won’t be markedwith any font size command in the output file. Onlyinteger numbers are allowed.• e.g. 12

PARAGRAPH_

ALIGNMENTS

Convert paragraph alignments.

– yes × no

PARAGRAPH_

INDENTATION

Convert paragraph indentations.

– yes × no

COLOR_TEXT Use special commands for colored text.• yes × no

COLOR_BG Use special commands for text with colored back-ground.• yes × no

COLOR_TABLE Use special commands for table cells with coloredbackground.• yes × no

Table B.1: Conversion options

20

Option name Description and possible values

AUTO_DETECT_

DEFAULT_FONT_SIZE

Detect the default font size of the input documentautomatically or not. The font size of the Word built-in Normal style will be taken as the default one if thisoption is set to yes.• yes × no

MULTICOLUMN Convert multicolumn sections.• yes × no

WRAP_PARAGRAPHS A positive value causes paragraphs to be wrappedinto lines after each x characters. Any other valueforces the convertor not to wrap paragraphs.• e.g. 80

NEW_LINE Defines the line separator, possible values are:• crlf – Windows line separator• cr – Macintosh line separator• lf – Unix line separator

SANS_SERIF Use special commands for sans-serif fonts.• yes × no

AUTO_RECOGNIZE_

MATH

Recognize math expressions written in italics (e.g. i).

• yes × no

IGNORE_EMPTY_PAR Ignore paragraphs not containing any text.• yes × no

RECOGNIZE_

NUMBERED_EQ_REF

Recognize references to numbered equations markedwith labels like “(5)” or “(5.2)”.• yes × no

ENDNOTES_TO_BIBLIO Convert endnotes to bibliography items.• yes × no

RECOGNIZE_BIBLIO_

REF

Recognize in-text citations (references to bibliogra-phy items, e.g. “[4]”).– yes × no

FONT_SIZE[1-10] These options define ranges for each converted fontsize group. The range for the i-th group isfrom FONT_SIZE(i-1)+1 to FONT_SIZE(i) (inclu-sive). The first group (FONT_SIZE1) starts with thesize 1. Only integer numbers are allowed.• e.g. 11 for the FONT_SIZE4 option and 12 for theFONT_SIZE5 option when the default font size is 12

Table B.1: Conversion options

21

B.2 Conversion mappings

Table B.2 shows the complete list of conversion mappings between input docu-ment elements (sections, paragraphs, lists, and so on) and Word-to-LATEX. Eachmapping has a start command (S:) which is inserted before the element and mostof them have also an end command (E:) inserted after the element. Some ele-ments like tabulators doesn’t have any content, others hold some kind of content(text, equation, another element) which is inserted between the start and endcommand.

Names of macros that are specific to each element begin with “#”, macroscommon to all elements begin with “@”.

• @WL-NL new line• @WL-TAB tabulator

Table B.2 also contains the default mappings for LATEX and XML output.When E: is omitted, the end command is always ignored by the convertor, “—”stands for the empty translation command.

FONT_BOLD bold font

S: \textbf

E:

S: <font type="bold">

E: </font>

FONT_ITALIC italic font

S: \textit

E:

S: <font type="italic">

E: </font>

FONT_SMALLCAPS small caps font

S: \textsc

E:

S: <font type="smallcaps">

E: </font>

FONT_HIDDEN hidden font

S: @WL-NL%

E: @WL-NL

S: <font type="hidden">

E: </font>

Table B.2: Conversion mappings

22

FONT_SUBSCRIPT subscript font

S: $_

E: $

S: <font type="subscript">

E: </font>

FONT_SUPERSCRIPT superscript font

S: $^

E: $

S: <font type="superscript">

E: </font>

FONT_COURIER courier font (e.g. Courier, Courier New)

S: \texttt

E:

S: <font type="courier">

E: </font>

FONT_UPPERCASE uppercase font

S: \uppercase

E:

S: <font type="uppercase">

E: </font>

FONT_UNDERLINE underlined font

S: \uline

E:

S: <font type="wave-underline">

E: </font>

FONT_DOUBLE_UNDERLINE double-underlined font

S: \uuline

E:

S: <font type="double-underline">

E: </font>

FONT_WAVE_UNDERLINE wavy-underlined font

S: \uwave

E:

S: <font type="wave-underline">

E: </font>

Table B.2: Conversion mappings

23

FONT_STRIKE strikethrough font

S: \sout

E:

S: <font type="strike">

E: </font>

FONT_SANS_SERIF sans-serif font (e.g. Arial, Verdana)

S: \textsf

E:

S: <font type="sans-serif">

E: </font>

FONT_SIZE1 font size (group 1)

S: \tiny

E:

S: <font-size value="1">

E: </font-size>

FONT_SIZE2 font size (group 2)

S: \scriptsize

E:

S: <font-size value="2">

E: </font-size>

FONT_SIZE3 font size (group 3)

S: \footnotesize

E:

S: <font-size value="3">

E: </font-size>

FONT_SIZE4 font size (group 4)

S: \small

E:

S: <font-size value="4">

E: </font-size>

FONT_SIZE5 font size (group 5)

S: \normalsize

E:

S: <font-size value="5">

E: </font-size>

Table B.2: Conversion mappings

24

FONT_SIZE6 font size (group 6)

S: \large

E:

S: <font-size value="6">

E: </font-size>

FONT_SIZE7 font size (group 7)

S: \Large

E:

S: <font-size value="7">

E: </font-size>

FONT_SIZE8 font size (group 8)

S: \LARGE

E:

S: <font-size value="8">

E: </font-size>

FONT_SIZE9 font size (group 9)

S: \huge

E:

S: <font-size value="9">

E: </font-size>

FONT_SIZE10 font size (group 10)

S: \Huge

E:

S: <font-size value="10">

E: </font-size>

HEADING1 heading (level 1); headings have to bemarked with the Word built-in styles; theycan be defined up to level 9

S: \section

E:

S: <heading level="1">

E: </heading>

HEADING2 heading (level 2)

S: \subsection

E:

S: <heading level="2">

E: </heading>

Table B.2: Conversion mappings

25

HEADING3 heading (level 3)

S: \subsubsection

E:

S: <heading level="3">

E: </heading>

ALIGN_CENTER paragraph alignment – centered

S: \begincenter@WL-NL

E: @WL-NL\endcenter

S: <align type="center" />

E: —

ALIGN_LEFT paragraph alignment – left

S: \raggedright@WL-NL

E: @WL-NL

S: <align type="left" />

E: —

ALIGN_RIGHT paragraph alignment – right

S: \raggedleft@WL-NL

E: @WL-NL

S: <align type="right" />

E: —

TABLE_ALIGN_CENTER table paragraph alignment – centered

• #WIDTH table cell width (in points)

S: \parbox#WIDTHpt\centering

E:

S: <align type="center" />

E: —

TABLE_ALIGN_LEFT table paragraph alignment – left

• #WIDTH table cell width (in points)

S: \parbox#WIDTHpt\raggedright

E:

S: <align type="left" />

E: —

Table B.2: Conversion mappings

26

TABLE_ALIGN_RIGHT table paragraph alignment – right

• #WIDTH table cell width (in points)

S: \parbox#WIDTHpt\raggedleft

E:

S: <align type="right" />

E: —

FOOTNOTE footnote

S: \footnote

E:

S: <footnote>

E: </footnote>

PAGE_BREAK page break

S: \pagebreak@WL-NL@WL-NL

S: <pagebreak />

EQUATION_INLINE inline equation

S: \beginmath

E: \endmath

S: <equation type="inline">

E: </equation>

EQUATION_NUMBERED numbered equation

• #ORIG_LABEL original equation label retrieved from theinput document

S: \beginequation

E: @WL-NL%#ORIG_LABEL@WL-NL\endequation

S: <equation type="numbered" origlabel="#ORIG_LABEL">

E: </equation>

EQUATION_LABEL equation label inserted into theEQUATION_NUMEBERED element

• #NAME auto-generated label (auto-incrementingcounter is used)

S: \label#NAME

S: <label name="#NAME"/>

Table B.2: Conversion mappings

27

EQUATION_OUTLINE equation displayed on a separate line

S: \begindisplaymath

E: \enddisplaymath

S: <equation type="outline">

E: </equation>

INDEX_ENTRY index entry (Word XE field)

S: \index

E:

S: <index-entry>

E: </index-entry>

INDEX index (Word INDEX field), LATEX generatesthe whole index automatically

S: \printindex

S: <printindex />

IMAGE_COMMAND image

• #WIDTH image width (in points)

• #FILENAME auto-generated image filename(e.g. img1.eps)

• #TITLE image title (if present)

S: \includegraphics[width=#WIDTHpt]#FILENAME@WL-NL

S: <image width="#WIDTH" src="#FILENAME" title="#TITLE" />

IMAGE_CONTAINER image container (used when the image hasa title)

S: \beginfigure[h]@WL-NL

E: \endfigure

S: —

E: —

IMAGE_TITLE image title inserted into the IMAGE_

CONTAINER element

• #TITLE title

S: \caption#TITLE

S: —

TOC table of contents (Word TOC field), LATEXgenerates the table of contents automati-cally as well as Word

S: \tableofcontents

S: <table-of-contents />

Table B.2: Conversion mappings

28

HYPERLINK hyperlink

• #HREF hyperlink target; the macro can be usedalso in the end command

S: \href#HREF

E:

S: <link href="#HREF">

E: </link>

SPECIAL_COMMAND LATEX command(s) inserted into the doc-ument through the Word PRIVATE fieldwhose content must begin with the case-insensitive string latex:, such a field maylook like this: PRIVATE LaTeX: \indent

(\indent will be inserted between the startand end command)

S: —

E: —

S: —

E: —

REFERENCE bookmark reference

• #NAME name of the bookmark that is being refer-enced

S: \ref#NAME

S: <reference name="#NAME" />

MATH_REFERENCE equation reference; the Word hard-codedreference (e.g. “(3)”) will be the content ofthis element

• #NAME name of the equation that is being refer-enced, it is generated for each numberedequation in the document (e.g. “eq3”).

S: (\ref#NAME)@WL-NL%

E: @WL-NL

S: <math-reference name="#NAME">

E: </math-reference>

NOTE_REFERENCE note reference; currently only endnotes aresupported

• #NAME name of the note (typically number) thatis being referenced

S: \citeref#NAME

S: <note-reference name="#NAME" />

Table B.2: Conversion mappings

29

BIBLIO_REFERENCE reference to a bibliography item (“cita-tion”); the Word hard-coded citation (e.g.“[Ka75]”) will be the content of this ele-ment

• #NAME name of the bibitem (e.g. “Ka75”)

S: \citeref#NAME@WL-NL%

E: @WL-NL

S: <biblio-reference name="#NAME">

E: </biblio-reference>

PAGE_REFERENCE page reference

• #NAME name of the bookmark that is being refer-enced

S: \pageref#NAME

BOOKMARK_LABEL bookmark

• #NAME name of the bookmark

S: \label#NAME

S: <bookmark name="#NAME" />

STYLE paragraph or character user style

• #NAME name of the style; all numbers in the nameare replaced with words (e.g. “1”→“One”)

S: \#NAME

E:

S: <style name="#NAME">

E: </style>

STYLE_DEFINITION container for a single user style definition;commands describing the style will be in-serted into

• #NAME name of the user style

S: \newcommand\#NAME[1]

E:

S: <style-definition name="#NAME">

E: </style-definition>

DOCUMENT_BODY document body

S: \begindocument@WL-NL

E: \enddocument

S: <body>

E: </body></document>

Table B.2: Conversion mappings

30

LIST_ENUMERATE enumerated list

S: \beginenumerate@WL-NL

E: \endenumerate@WL-NL@WL-NL

S: @WL-NL<list type="enumerate">

E: </list>@WL-NL

LIST_ITEMIZE itemized list

S: \beginitemize@WL-NL

E: \enditemize@WL-NL@WL-NL

S: @WL-NL<list type="itemize">

E: </list>@WL-NL

LIST_ITEM list item

S: @WL-TAB\item

E: —

S: <list-item>

E: </list-item>@WL-NL

PARAGRAPH common paragraph

S: —

E: @WL-NL@WL-NL

S: @WL-NL<para>

E: </para>@WL-NL

TABLE_PARAGRAPH paragraph in a table

S: @WL-NL

E: @WL-NL

S: @WL-NL<table-para>

E: </table-para>@WL-NL

LIST_PARAGRAPH paragraph in a list

S: —

E: @WL-NL

S: <list-para>

E: </list-para>

LINE_BREAK line break

S: @WL-NL\\@WL-NL

S: <linebreak />

TAB tabulator

S: \hspace15pt

S: <tab />

Table B.2: Conversion mappings

31

TABLE_CELL table cell

• #WIDTH cell width

S: &

E: —

S: <table-cell width="#WIDTH">

E: </table-cell>

TABLE_ROW table row

S: —

E: \\@WL-NL

S: <table-row>

E: </table-row>

TABLE table

• #TITLE title of the table

S: @WL-NL\vspace3pt \noindent@WL-NL\begintabular

E: \endtabular\\@WL-NL\vspace2pt@WL-NL

S: @WL-NL<table title="#TITLE">

E: </table>@WL-NL

TABLE_CONTAINER table container (used when the table has atitle)

S: @WL-NL\begintable[h]

E: \endtable@WL-NL

S: —

E: —

TABLE_TITLE table title inserted into the TABLE_

CONTAINER element

• #TITLE title

S: \caption#TITLE

S: —

TABLE_MULTIROW table cell with merged rows

• #ROWS number of merged rows in the cell

S: \multirow#ROWS*

E:

S: <table-multirow-cell multi="#ROWS" />

E: —

Table B.2: Conversion mappings

32

TABLE_CELL_COLOR command for the colored background of ta-ble cells; the #COLOR macro in the next el-ement (TABLE_MULTI_COLUMN) will be re-placed with this command

• #COLOR background color in HTML notation (e.g.FF0000)

S: >\columncolor[HTML]#COLOR

S: color="#COLOR"

TABLE_MULTICOLUMN table cell with merged columns

• #COLS number of merged columns

• #LEFT_BORDER “|” if the cell has a left border

• #RIGHT_BORDER “|” if the cell has a right border

• #COLOR see the previous element

• #ALIGN cell content alignment; l (left), r (right),c (center)

S: \multicolumn#COLS#LEFT_BORDER#COLOR#ALIGN#RIGHT_BORDER

E:

S: <table-cell multi="#COLS" left-border="#LEFT_BORDER"

right-border="#RIGHT_BORDER" align="#ALIGN" width="#WIDTH"

#COLOR>

E: </table-cell>

PAR_INDENT paragraph indentation

• #LEFT_INDENT left indentation (in points)

• #RIGHT_INDENT right indentation (in points)

• #FIRST_LINE_INDENT first line indentation (in points)

S: \beginindentation#LEFT_INDENTpt#RIGHT_INDENTpt

#FIRST_LINE_INDENTpt@WL-NL

E: @WL-NL\endindentation

S: @WL-NL<par-indent left="#LEFT_INDENT" right="#RIGHT_INDENT"

first-line="#FIRST_LINE_INDENT" />@WL-NL

E: —

MULTICOLUMN multicolumn section

• #COLS number of columns in the section

S: \beginmulticols#COLS

E: \endmulticols

S: <multicol count="#COLS">

E: </multicol>

Table B.2: Conversion mappings

33

COLOR_TEXT colored text

• #COLOR color in HTML notation (e.g. FF0000)

S: \textcolor[HTML]#COLOR

E:

S: <font-color color="#COLOR">

E: </font-color>

COLOR_BG text with colored background

• #COLOR color in HTML notation (e.g. FF0000)

S: \colorbox[HTML]#COLOR

E:

S: <font-background color="#COLOR">

E: </font-background>

ENDNOTES_SECTION container for endnotes, can be used for in-serting the bibliography

S: \beginthebibliography99@WL-NL

E: \endthebibliography@WL-NL

S: <bibliography>

E: </bibliography>

ENDNOTE endnote, this translation is used in theENDNOTES_SECTION context, suitable forinserting a single bibliography item

• #NUMBER number of the endnote

S: @WL-TAB\bibitem[#NUMBER]ref#NUMBER

E: @WL-NL

S: @WL-TAB<bib-item name="#NUMBER">

E: </bib-item>

ENDNOTE_REFERENCE endnote, this translation is used at theendnote’s insertion point

• #NUMBER number of the endnote

• #CONTENT endnote’s text content (can be used whentranslating endnotes to footnotes)

S: \citeref#NAME

S: <endnote-reference name="#NUMBER" />

Table B.2: Conversion mappings

34

COLOR_BG_AND_BORDER text with colored border and background

• #BORDER_COLOR border color, in HTML notation (e.g.FF0000)

• #COLOR text color, dtto

S: \fcolorbox[HTML]#BORDER_COLOR[HTML]#COLOR

E:

S: <box border-color="#BORDER_COLOR" background-color="#COLOR">

E: </box>

COLOR_BORDER colored border around text

• #BORDER_COLOR border color, in HTML notation (e.g.FF0000)

S: \fcolorbox[HTML]#BORDER_COLOR[HTML]FFFFFF

E:

S: <box border-color="#BORDER_COLOR">

E: </box>

BORDER black border around text

S: \fbox

E:

S: <box>

E: </box>

Table B.2: Conversion mappings

35

B.3 Special characters

The configuration of special characters is enclosed in the <specialChars> ele-ment. <latexChar> elements are used for defining characters that have a specialmeaning in the output format. They must be written in a correct order becauseone special character can be used for translating another special character whichis illustrated in the following example.

<latexChar char=’\’ convertTo=’\textbackslash ’ />

<latexChar char=’’ convertTo=’\’ />

All the other special and national characters are defined in <char> elements.The code attribute contains the Unicode [1] number of each character. Thedetails about the common context translation (convertTo attribute) and themath context translation (mathConvertTo attribute) can be found in section 1.6.4.A short example follows.

<char code="010C" convertTo="\vC" mathConvertTo="\checkC" />

<char code="010D" convertTo="\vc" mathConvertTo="\checkc" />

36

Bibliography

[1] Unicode Home Page, http://www.unicode.org/

[2] Extensible Markup Language (XML), http://www.w3.org/XML/

[3] XSL Transformations (XSLT), http://www.w3.org/TR/xslt

[4] XHTML 1.0 The Extensible HyperText Markup Language,http://www.w3.org/TR/xhtml1/

[5] Cascading Style Sheets, http://www.w3.org/Style/CSS/

37