natural language processing in ios / osx

Post on 16-Jul-2015

579 Views

Category:

Engineering

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tech Talk NLP Tools in iOS/OSX

Todd Kramer

NLP Tools in iOS/OSX: Topics

• CFStringTransform • transliteration, normalization

• CFStringTokenizer • string tokenization, language identification

• UITextChecker • spell check

• NSLinguisticTagger • parts of speech tagging, named entity recognition,

lemmatization, language/script identification

• NSDataDetector • data detection

NLP Tools in iOS/OSX: CFStringTransform

The CFStringTransform Function

NLP Tools in iOS/OSX: CFStringTransform

Transliterate Thai to Latin

Original: สวัสดี; Transformed: sw̄ạsd̄ī

NLP Tools in iOS/OSX: CFStringTransform

NLP Tools in iOS/OSX: CFStringTransform

Transliterate Latin to Gujarati

Original: Gujarātī; Transformed: ગuજરાતી

NLP Tools in iOS/OSX: CFStringTransform

Remove Diacritics and Accents

Original: sw̄ạsd̄ī; Transformed: swasdi

NLP Tools in iOS/OSX: CFStringTransform

Describe Unicode Characters

Original: 👍; Transformed: \N{THUMBS UP SIGN}

CFStringTokenizer

NLP Tools in iOS/OSX: CFStringTokenizer

Tokenize Into Words: Simplified Chinese

Tokens: [⼈人, ⼈人⽣生, ⽽而, ⾃自由, 在, 尊严, 和, 权利, 上, ⼀一律, 平等, 他们, 赋有, 理性, 和, 良⼼心, 并, 应, 以, 兄弟, 关系, 的, 精神, 互相, 对待]

NLP Tools in iOS/OSX: CFStringTokenizer

Transliterate Tokens: Simplified Chinese

Tokens: [rén, rénshēng, ér, zìyóu, zài, zūnyán, hé, quánlì, shàng, yīlv,̀ píngděng, tāmén, fùyǒu, lǐxìng, hé, liángxīn, bìng, yìng, yǐ, xiōngdì, guānxī, de, jīngshén, hùxiāng, duìdài]

NLP Tools in iOS/OSX: CFStringTokenizer

Language Identification: Icelandic

Language Code: is

UITextChecker

NLP Tools in iOS/OSX: UITextChecker

NLP Tools in iOS/OSX: UITextChecker

Spell Check

Misspelled Range: (7,4); Guesses: Optional([ice, Bice, bide, nice, vice, bike, bile, bite, bace, bbce, bcce, bdce, bece, bfce, dice, lice, mice, pice, rice, brice, bicep]) Misspelled Range: (12,3); Guesses: Optional([ay, cay, day, say])

NSLinguisticTagger

NLP Tools in iOS/OSX: NSLinguisticTagger

Parts of Speech Tagging and Named Entity Recognition

NLP Tools in iOS/OSX: NSLinguisticTagger

NSLinguisticTagger Schemes

NLP Tools in iOS/OSX: NSLinguisticTagger

Parts of Speech Tagging and Named Entity Recognition

NLP Tools in iOS/OSX: NSLinguisticTagger

Parts of Speech Tagging and Named Entity Recognition

Token: What; Tag: Pronoun Token: is; Tag: Verb Token: the; Tag: Determiner Token: capital; Tag: Noun Token: of; Tag: Preposition Token: New York; Tag: PlaceName

NLP Tools in iOS/OSX: NSLinguisticTagger

Script Identification

NLP Tools in iOS/OSX: NSLinguisticTagger

Script Identification

Token: hello; Tag: Latn Token: สวัสดี; Tag: Thai Token: bonjour; Tag: Latn Token: 你; Tag: Hani Token: 好; Tag: Hani Token: !લો; Tag: Gujr Token: привет; Tag: Cyrl Token: नमस्ते; Tag: Deva

NSDataDetector

NLP Tools in iOS/OSX: NSDataDetector

Extracting Structured Data

Match: Lunch tomorrow at 12:30PM; - Date: Optional(2014-11-20 20:30:00 +0000) Match: 1600 Pennsylvania Ave. NW, Washington, D.C. 20500; - Street: Optional(1600 Pennsylvania Ave.); - Zip: Optional(20500) Match: 202-456-1414 Match: 2:15PM; - Date: Optional(2014-11-19 22:15:00 +0000) Match: Southwest Airlines Flight 737 Match: www.southwest.com

NLP Tools in iOS/OSX: NSDataDetector

Extracting Structured Data

top related