sustainability of the work and panl10n network: vision beyond 2010 regional conference on localized...

8
Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization project Vientiane, Laos from 11th - 16th January, 2009 Mr. B. Batpurev, CEO, InfoCon, Mongolia Prof. J. Purev, Head of CRLP, NUM Mr. A.Altangerel, ASR team coordinator at MUST Ms. Ch. Munkhzul, project manager, InfoCon

Upload: tobias-boone

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Sustainability of the work and PANL10n network: Vision beyond 2010

Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization project

Vientiane, Laos from 11th - 16th January, 2009

Mr. B. Batpurev, CEO, InfoCon, MongoliaProf. J. Purev, Head of CRLP, NUM

Mr. A.Altangerel, ASR team coordinator at MUSTMs. Ch. Munkhzul, project manager, InfoCon

Page 2: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Outline

• Sustainability of the work• Vision beyond 2010

Page 3: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Sustainability of the work• Policy level

– PANL10n will develop ‘Policy recommendation paper’ in Mongolia– Inclusion of local language computing studies in Univ curriculum– PANL10n project result is widely disseminated to policy makers– Necessary standards for local language computing are being approved

• Human resource– PANL10n project has built capacity of the researchers in many ways– More researchers are interested: e.g. More students are writing B.Sc

thesis on local language computing issues– Master students are now researching in spell checker, machine

translation, ASR and TTS

Page 4: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Sustainability of the work cont…• Infrastructure development

– NUM has established Center for Research in Language Processing (CRLP)– Tools: Spell checker, glossary of IT terms, corpus, ASR trained toolkits are

prepared for future R&D– Information infrastructure and founding research is made

• Initiatives, projects and funding– Government officials are favoring PANL10n project and supportive of our

activities, E-Mongolia implementation should fund NLP development– Mongolian researchers are more connected to other researchers in the region

for collaborative actions– Looking to PANL10n 3 phase

• PANL10n phase II– Has put foundation of local language computing – Needs to maintain the progress & further develop on this by exploring various

resources locally and internationally.

Page 5: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Future work• IDN

– Approaching ICANN to implement .MN in local language• OCR

– Based on common framework by providing segmentation of Mongolian Cyrillic & studying Russian OCR

• ASR– Increase number of speakers for better recognition– Continuous speech recognition (improve…)– Separate out the female and male voices for better result

• TTS– Improving quality of Diphone database– Adding male voice– Experimenting HMM model

Page 6: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Future work cont.• Machine translation

– Paralles corpus - > machine translation– Grammer checker, morpho analyzer, sentence parser– National corpus (expand current one)– Translation memory (.tmx files used for sharing trans.mem)– Statistical machine translation (GIZA, MOSES: tutorials e.g. Kevin

Knight)• Integration of systems for application

– Integrate TTS, ASR and Spell checker to widely used software such as openoffice.org, firefox, thunderbird, seamonkey etc.,

– Develop online TTS, ASR and Spell checker systems

Page 7: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Future work cont…• Training and Content

– Training on localized software rural people (Seamonkey, Pidgin)– Comparing effectiveness of localized software vs English version– Support development of local language content

• Next generation of researchers– University curriculum– Introduce NLP courses in graduate training

• Publication (immediate)– Publication of IT terms glossary– Publication of “Mongolian language processing” book

• Partnership– Integrate Mongolian researchers/institutions with others for sharing, learning– Foster PPP for development of local language computing in the country

• Policy recommendation from PANL10n project (immediate)

Page 8: Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization

Thank you

PANL10n Mongolia team:Mr. B. Batpurev, CEO, InfoCon, MongoliaProf. J. Purev, Head of CRLP, NUMMr. A.Altangerel, ASR team coordinator, MUSTMs. Ch. Munkhzul, project manager, InfoCon