9 odt2daisy: producing digital talking books with open-source software
DESCRIPTION
odt2daisy is an open-source add-on for OpenOffice.org that converts text processing files to digital talking books in the DAISY1 format (ANSI/NISO2 Z39.86). Digital talking books make print material accessible to blind or otherwise print-disabled persons. DAISY contains features that allow users to navigate by headings or page numbers, and to have a text version that is synchronised with the audio version. odt2daisy produces both Full DAISY 3 (text synchronised with audio) and DAISY 3 XML3 (text without audio). For compatibility with older DAISY software, it also supports DAISY 2.02. odt2daisy also supports mathematical content (Mathematical Markup Language). odt2daisy works on Microsoft Windows, Mac OS X, Linux and Solaris. For the production of audio, odt2daisy relies on the DAISY Pipeline Lite, an open-source software developed by the DAISY Consortium, the LAME MP3 encoding technology, and the operating system’s text-to-speech (TTS) engine(s). The supported languages depend on the TTS engines available on the user’s system. On Unix-based systems odt2daisy relies on the open-source eSpeak TTS engine, which supports 27 languages. odt2daisy enables the production of DAISY books with only opensource software, for example Ubuntu Linux, OpenOffice.org, odt2daisy and eSpeak constitute a completely open-source software stack. The next step is the development of an accessibility evaluation and repair add-on for OpenOffice.org in order to ensure that documents produced with OpenOffice.org can be more accessible and serve as a better basis for exporting to other formats such as DAISY, PDF4 and HTML5. Vincent Spiewak started working on odt2daisy at the Université Pierre et Marie Curie (Paris, France) and continued the work at the Katholieke Universiteit Leuven (Leuven, Belgium) in the framework of ÆGIS, a research and development project co-financed by the European Commission’s 7th Framework Programme.TRANSCRIPT
FOSS-AMASatellite event
odt2daisy: digital talking books
with open-source software
Christophe StrobbeKatholieke Universiteit Leuven
Belgium
27-28 March 2010, Paphos, Cyprus
Motivation & Problem Area
Digital talking books• For persons with “print disabilities”• DAISY – ANSI/NISO Z39/86• Production: typically
– by specialised production centres – for blind & visually impaired users– i.e. not by users (in 2007)
27-28 March 2010, Paphos, Cyprus
Objectives
Enable end-users to produce DAISY• In most European languages• In a free and open-source office suite• Support:
– DAISY 3 (with or without audio)– DAISY 2.02 (for older players)– Multilingual content– Mathematical Markup Language
27-28 March 2010, Paphos, Cyprus
Methodology
• Build OpenOffice.org extension– Odt2dtbook by Vincent Spiewak
available in 2008– Functionality available as extension and
as reusable JAR (Java Archive)– Add:
• DAISY 3 audio, DAISY 2.02• comprehensive set of test documents
(regression testing)• Support for multilingual content on Windows
27-28 March 2010, Paphos, Cyprus
odt2daisy Components (1)
• Java Open Document Library (JODL)– For ODT / XML preprocessing
• odt2daisy library– Converts ODT to DAISY XML (XSTL)– Validates output– Reusable Java library– Command line interface
27-28 March 2010, Paphos, Cyprus
odt2daisy Components (2)
• odt2daisy extension– Wrapper for other components:– Uses OpenOffice.org UNO API– Uses odt2daisy library– Uses DAISY Pipeline Lite (speech
synthesis)– Includes templates
• Templates with custom styles for DAISY production
27-28 March 2010, Paphos, Cyprus
Results (1)
• odt2daisy released November 2009– Tutorials in various formats (text, DAISY,
video)– Developer documentation– Test files for regression testing– TTS in 27 languages where eSpeak is
available (Linux, Windows)
27-28 March 2010, Paphos, Cyprus
Results (2)
• Support for ODT features– Heading, List, Table, Images, Captions,
Notes, Foot/Rear notes, Math, TOC, Section, Frame, Bookmark, Metadata, ...
– Page numbering (1,i,I,a,A; advanced)– Front / body / rear matter– “Complex text layout” and East-Asian
languages not supported
27-28 March 2010, Paphos, Cyprus
Conclusion and Outlook
• Some ODT features are hard to parse (e.g. multilingual text; “Asian” languages)
• Licensing: MP3 vs Ogg Vorbis for TTS• TTS quality: TTS as internet service/
in cloud computing?• Accessibility checking before export
27-28 March 2010, Paphos, Cyprus
Start Using It!
• http://odt2daisy.sf.net/
• Developer site:http://sourceforge.net/projects/odt2daisy/