natural language processing in ios / osx
TRANSCRIPT
Tech Talk NLP Tools in iOS/OSX
Todd Kramer
NLP Tools in iOS/OSX: Topics
• CFStringTransform • transliteration, normalization
• CFStringTokenizer • string tokenization, language identification
• UITextChecker • spell check
• NSLinguisticTagger • parts of speech tagging, named entity recognition,
lemmatization, language/script identification
• NSDataDetector • data detection
NLP Tools in iOS/OSX: CFStringTransform
The CFStringTransform Function
NLP Tools in iOS/OSX: CFStringTransform
Transliterate Thai to Latin
Original: สวัสดี; Transformed: sw̄ạsd̄ī
NLP Tools in iOS/OSX: CFStringTransform
NLP Tools in iOS/OSX: CFStringTransform
Transliterate Latin to Gujarati
Original: Gujarātī; Transformed: ગuજરાતી
NLP Tools in iOS/OSX: CFStringTransform
Remove Diacritics and Accents
Original: sw̄ạsd̄ī; Transformed: swasdi
NLP Tools in iOS/OSX: CFStringTransform
Describe Unicode Characters
Original: 👍; Transformed: \N{THUMBS UP SIGN}
CFStringTokenizer
NLP Tools in iOS/OSX: CFStringTokenizer
Tokenize Into Words: Simplified Chinese
Tokens: [⼈人, ⼈人⽣生, ⽽而, ⾃自由, 在, 尊严, 和, 权利, 上, ⼀一律, 平等, 他们, 赋有, 理性, 和, 良⼼心, 并, 应, 以, 兄弟, 关系, 的, 精神, 互相, 对待]
NLP Tools in iOS/OSX: CFStringTokenizer
Transliterate Tokens: Simplified Chinese
Tokens: [rén, rénshēng, ér, zìyóu, zài, zūnyán, hé, quánlì, shàng, yīlv,̀ píngděng, tāmén, fùyǒu, lǐxìng, hé, liángxīn, bìng, yìng, yǐ, xiōngdì, guānxī, de, jīngshén, hùxiāng, duìdài]
NLP Tools in iOS/OSX: CFStringTokenizer
Language Identification: Icelandic
Language Code: is
UITextChecker
NLP Tools in iOS/OSX: UITextChecker
NLP Tools in iOS/OSX: UITextChecker
Spell Check
Misspelled Range: (7,4); Guesses: Optional([ice, Bice, bide, nice, vice, bike, bile, bite, bace, bbce, bcce, bdce, bece, bfce, dice, lice, mice, pice, rice, brice, bicep]) Misspelled Range: (12,3); Guesses: Optional([ay, cay, day, say])
NSLinguisticTagger
NLP Tools in iOS/OSX: NSLinguisticTagger
Parts of Speech Tagging and Named Entity Recognition
NLP Tools in iOS/OSX: NSLinguisticTagger
NSLinguisticTagger Schemes
NLP Tools in iOS/OSX: NSLinguisticTagger
Parts of Speech Tagging and Named Entity Recognition
NLP Tools in iOS/OSX: NSLinguisticTagger
Parts of Speech Tagging and Named Entity Recognition
Token: What; Tag: Pronoun Token: is; Tag: Verb Token: the; Tag: Determiner Token: capital; Tag: Noun Token: of; Tag: Preposition Token: New York; Tag: PlaceName
NLP Tools in iOS/OSX: NSLinguisticTagger
Script Identification
NLP Tools in iOS/OSX: NSLinguisticTagger
Script Identification
Token: hello; Tag: Latn Token: สวัสดี; Tag: Thai Token: bonjour; Tag: Latn Token: 你; Tag: Hani Token: 好; Tag: Hani Token: !લો; Tag: Gujr Token: привет; Tag: Cyrl Token: नमस्ते; Tag: Deva
NSDataDetector
NLP Tools in iOS/OSX: NSDataDetector
Extracting Structured Data
Match: Lunch tomorrow at 12:30PM; - Date: Optional(2014-11-20 20:30:00 +0000) Match: 1600 Pennsylvania Ave. NW, Washington, D.C. 20500; - Street: Optional(1600 Pennsylvania Ave.); - Zip: Optional(20500) Match: 202-456-1414 Match: 2:15PM; - Date: Optional(2014-11-19 22:15:00 +0000) Match: Southwest Airlines Flight 737 Match: www.southwest.com
NLP Tools in iOS/OSX: NSDataDetector
Extracting Structured Data