christopher johnson. what is computer mediated communication (cmc)? ◦ short message service ◦...
TRANSCRIPT
Spl chkng cmc txt(Spell Checking CMC text)
Christopher Johnson
What is Computer Mediated Communication (CMC)?◦ Short Message Service◦ Blogs (Twitter)◦ E-mail◦ Instant messages
Observed language during such communication◦ Lo (Microsoft Messenger)
◦ Happy bday hpe u hv a gd day x (SMS)
◦ Awe! Ur so welcome! Sorry I was so sleepy! Lol (Twitter)
Introduction
Most people are in contact with some form of CMC◦ Children◦ Adults
People can hide behind any persona they create for themselves
For example Paedophiles◦ Lure children by pretending to be other children
What is the problem?
Man reading every message? No◦ Would this suffice anyway?
Autonomous processing of messages? Yes◦ Well at least the most appropriate way.
How can we solve it?
We need an understanding of the messages◦ SMS◦ Blogs◦ E-mails◦ And others
We know that abbreviations are used◦ But how can we expand these abbreviations back
to standard text?◦ What about misspellings
How do we get a large real world corpus to train and test on?
How can we do that?
VARD NLP techniques
◦ N-grams (This project will use Bigrams and Trigrams)
Phonetic algorithms◦ Soundex◦ Metaphone
These tools are commonly used for spell checkers
But how well do these apply to CMC?
What tools already exist?
Research into current techniques which could be applicable
Create a large corpus of CMC text Improve techniques for very similar
languages◦ (English CMC and CMC)
Create a system which can distinguish between CMC text and unabridged text◦ Test the systems success rate.
Convert CMC to unabridged text◦ (Ambitious, therefore only if time)
Proposed Plan
The Real World - National Education Association Health Information Network◦ http://bnetsavvy.org/wp/a-teen-talks-about-texting-and-what-parentseducators-need-to-know-ab
out-it/
About VARD 2 – Baron, Alistair ◦ http://www.comp.lancs.ac.uk/~barona/vard2/
Lawrence Philips' Metaphone Algorithm - Atkinson, Kevin◦ http://aspell.net/metaphone/
The Soundex Indexing System – The National Archives◦ http://www.archives.gov/publications/general-info-leaflets/55.html
References