improving writing aids, the community way
TRANSCRIPT
OOoCon Budapest
2 September 2010
Improving Writing Aids,
the Community Way
Andrea PescettiItalian N-L project Lead
Getting Writing Aids Started
Writing Aids: Overview
Spell Checker
Thesaurus
Hyphenation Patterns
Grammar Checker
Spell Checker
The spell checking engine Hunspell is integrated in all versions of OOo.
Hunspell dictionaries (suitable for OOo, Thunderbird and more) are available for about 100 languages.
http://hunspell.sf.net
Thesaurus
Engine: integrated in all recent versions of OOo.
OOo-specific tool and format, you will usually have to start from scratch.
Documentation: OOo project lingucomponent.openoffice.org
Hyphenation Patterns
Engine: Hyphen, included in the Hunspell project; integrated in all versions of OOo.
Format: tool-specific, but conversion from TeX patterns available (with caveats): start based on TeX patterns!
TeX Archive: http://ctan.org/
Grammar Checker
Not integrated in OOo as a user-visible tool as of 3.2.1, but API available.
Several options available as extensions: LanguageTool, LightProof, CoGrOO and more.
Rules for your language: tool-dependent format.
Licensing Issues
Mere Aggregation
Wide spectrum of licenses for writing aids; most are incompatible with the OOo license, LGPLv3.
But they are pure data files.
FSF: this is mere aggregation, licenses do not need to be compatible: issue 65039.
Extensions OXT
Data for writing aids (except grammar) have been packaged as extensions since OOo 3.x.
This reinforces the mere aggregation concept.
Data files within the extension may have different licenses: still mere aggregation.
Choose your license
LGPLv3 (latest) is compatible with the OOo codebase and ensures that any distributed modified versions remain free.
GPLv3: in OOo, no significant differences (mere aggregation).
AGPLv3: usage on a network (WWW) counts as distribution.
Meet Sun/Oracle legal
Licenses aside, copyright holders must sign the OCA for their work to appear in the OOo code repository.
Usual choice: external contribution, no OCA required.
Sun legal was very slow; but Oracle legal froze the process!
Distributed Management
Use a repository
Make writing aids available to all contributors in an online repository.
Use version control.
Expose an easy, web-based, change tracking interface to show differences between revisions.
Spell Checker
One file in text format.
Human readable, except rules.
Good for collaborative editing.
Thesaurus
One file in text format and an automatically generated index.
Human readable.
Good for collaborative editing.
Hyphenation
One text file.
Format: as arcane as it can get!
Changes very rarely.
Fix bugs upstream, in TeX.
Grammar checker
LanguageTool: rules in XML.
Basic XML knowledge needed.
Fix upstream, in LanguageTool.
Collaboration possible.
Packaging
Generation of the OXT extension can be scripted.
It is even possible to automatically generate an updated OXT for every committed change of a file.
Keep generated OXT files in the same repository.
Team Structure
All components are independent; collaboration is possible in every component.
A packaging manager (or a script!) to generate extensions.
A release manager to make stable versions available in OOo.
Community Involvement
Community Involvement
The Native-Lang community is the best group of people to improve writing aids.
Motivated users, who will benefit directly from their work.
Main issue: providing tools that allow to manage contributions in an efficient way.
Web based interface
Allow quick and easy reporting of missing, erroneous and wrongly hyphenated words.
Easy to setup: basic web form or embed in, e.g., Drupal site.
Notifications: e-mail to maintainers group, suggestions stored in online database.
Web based interface
Expose web services
Allow direct usage of the web application, with no need to submit a form.
Parameters can be embedded in a URL, users don't have to explicitly open the site.
Suitable for inclusion in applications or macros.
Web services in OXT
Ideally, embed a macro in the OXT dictionary package distributed with OOo.
Right click on a word to show:Nominate for inclusion in dictionary.
Nominate for removal from dictionary.
Report wrong hyphenation.
Thesaurus maintenance
Vithesaurus: Existing online tool for collaboratively creating and maintaining a thesaurus.
In use (German) at http://www.openthesaurus.de
Can be installed on own server, free software.
http://vithesaurus.sf.net
Handling Duplicates
In a large community, usually suggestions are reported more than once by different users.
It's a plus: the web application can deal with duplicates and it ranks suggestions according to their frequency, for more efficient operation.
Handling Wrong Reports
Most annoying use case: users actually make some wrong suggestions and repeat them!
The web application helps with a motivated blacklisting: repeated wrong submissions are handled and a message can be shown to the user.
Thanks for attention
Andrea PescettiItalian N-L Project LeadPLIO Board Member
Image credits: Flickr, PLIO Archives.
Improving writing aids, the community way
Slide
Peter Junge:
Improving writing aids, the community way
Slide
Add your slide title here
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
Slide
Peter Junge:
Click to edit the notes format