natural language processing - github pages

19
African Institute for Mathematical Sciences South Africa Natural Language Processing Machine Learning has revolutionized Natural Language Processing Salomon Kabongo [email protected] December 31, 2018

Upload: others

Post on 01-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Language Processing - GitHub Pages

African Institute for Mathematical SciencesSouth Africa

Natural Language Processing

Machine Learning has revolutionizedNatural Language Processing

Salomon [email protected]

December 31, 2018

Page 2: Natural Language Processing - GitHub Pages

1

Content

IntroductionWhat is Natural Language Processing ?

Knowledge Base Approach vs Machine LearningHistoryKnowledge BaseMachine Learning

Natural Language Processing Sub-FieldNatural-language understandingMachine TranslationText ProcessingNatural Language Processing Sub-Field

Practical Example

Research AreaOpen Questions

References

Salomon Kabongo | Natural Language Processing

Page 3: Natural Language Processing - GitHub Pages

2

IntroductionWhat is Natural Language Processing

Computers have long been able to defeat even the best human chessplayer, but are only recently matching some of the abilities of averagehuman beings to recognize objects or speech.The true challenge to artificial intelligence proved to be solving thetasks that are easy for people to perform but hard for people todescribe formally—problems that we solve intuitively, that feelautomatic, like recognizing spoken words or faces in images [4].

DefinitionNatural Language Processing or computer speech and languageprocessing or human language technology or computationallinguistic is a subset of Artificial Intelligence that deals with enablinghuman-machine communication.

NLP is a phrase that is formed from 3 components - natural - asexists in nature, language - that we use to communicate with eachother, processing - something that is done automatically.

Salomon Kabongo | Natural Language Processing

Page 4: Natural Language Processing - GitHub Pages

3

Knowledge Base Approach vs Machine LearningHistory

Historically language processing has been treated very differently incomputer Science (Natural Language Processing), electricalengineering (Speech Recognition), linguistics (ComputationalLinguistics) and Cognitive Science(Computational Psycholinguistics).The earliest roots of the field date to the intellectually fertile periodjust after the second Wold War (1940’s-1950’s) [5].

Salomon Kabongo | Natural Language Processing

Page 5: Natural Language Processing - GitHub Pages

4

Knowledge BaseKnowledge Base Approach

We introduced this presentation by talking about the defeat of thechess world champion Kasparov vs IBM’s Deep Blue chess-playingsystem. This is one of simplest illustrations of the knowledge baseapproach, where a programmer can completely describe the briefrules of the a game to a computer ahead of time.Because of how complex and usually difficult the wold is, most AIprojects based on this approach failed, and lead researchers to thinkof another, more flexible, approach that can learn like a child.

Salomon Kabongo | Natural Language Processing

Page 6: Natural Language Processing - GitHub Pages

5

Machine LearningMachine Learning Approach

The difficulties faced by systems relying on hard-coded knowledgesuggest that AI systems need the ability to acquire their ownknowledge, by extracting patterns from raw data. This capability isknown as machine learning.Introduction of machine learning allowed computers to tackleproblems involving knowledge of the real world and make decisionsthat appeared impossible before.For example a simple machine learning algorithm naive Bayes canseparate legitimate e-mail from spam e-mail [4].

Salomon Kabongo | Natural Language Processing

Page 7: Natural Language Processing - GitHub Pages

6

Natural Language Processing Sub-FieldNatural-language understanding

As a human, I understand English when someone talks to me, is thatNLP? Yes! When done automatically, it is called Natural LanguageUnderstanding (NLU) [3].

Figure: http://www.stuartduncan.name

Salomon Kabongo | Natural Language Processing

Page 8: Natural Language Processing - GitHub Pages

7

Natural Language Processing Sub-FieldMachine Translation

I translated some Chinese to English for my friend, is that NLP? Yes,it is called Machine Translation (MT) when done automatically [3].

Figure: Image Credit: Google Cloud

Salomon Kabongo | Natural Language Processing

Page 9: Natural Language Processing - GitHub Pages

8

Natural Language Processing Sub-FieldText Processing

Word FrequencyThe idea is that the more often a word or term appears in a body oftext, the more semantically significant it is.In other words, the frequency of a word might tell us something aboutthe meaning of the text.

Term Frequency (TF) and Inverse Document Frequency(IDF)This approach use a more sophisticated measure of wordimportance.Now let’s look at the inverse document frequency. This is a measureof the relative number of documents within which the term appears.It’s calculated as the log of total documents divided by the number ofdocuments containing the term.And finally, we just multiply TF by IDF to work out the overallimportance of each term to the documents in which they appear.

Salomon Kabongo | Natural Language Processing

Page 10: Natural Language Processing - GitHub Pages

9

Natural Language Processing Sub-FieldText Processing II

Figure: Microsoft, AI

Stemming or Lemmatization : There are sometimes words that arevery similar, they’re from the same root, stemming consist of judgingthose words as the same.

Salomon Kabongo | Natural Language Processing

Page 11: Natural Language Processing - GitHub Pages

10

Natural Language Processing Sub-FieldOthers

I Text Categorization

I Information RetrievalI Speech Recognition, and many others

Salomon Kabongo | Natural Language Processing

Page 12: Natural Language Processing - GitHub Pages

10

Natural Language Processing Sub-FieldOthers

I Text CategorizationI Information Retrieval

I Speech Recognition, and many others

Salomon Kabongo | Natural Language Processing

Page 13: Natural Language Processing - GitHub Pages

10

Natural Language Processing Sub-FieldOthers

I Text CategorizationI Information RetrievalI Speech Recognition, and many others

Salomon Kabongo | Natural Language Processing

Page 14: Natural Language Processing - GitHub Pages

11

Practical ExampleTF-IDF (using python)

In this short tutorial, we will :I Applying Term Frequency (TF) technique to a part of John F.

Kennedy’s talk (We choose to go to the Moon)

I Applying Inverse Document Frequency (IDF) [1] on 3 differentdocuments

1. John F. Kennedy2. Abraham Lincoln talk3. Aims South Africa (About)

Salomon Kabongo | Natural Language Processing

Page 15: Natural Language Processing - GitHub Pages

11

Practical ExampleTF-IDF (using python)

In this short tutorial, we will :I Applying Term Frequency (TF) technique to a part of John F.

Kennedy’s talk (We choose to go to the Moon)I Applying Inverse Document Frequency (IDF) [1] on 3 different

documents1. John F. Kennedy2. Abraham Lincoln talk3. Aims South Africa (About)

Salomon Kabongo | Natural Language Processing

Page 16: Natural Language Processing - GitHub Pages

12

Open Questions IFurther study

I Translation : using world vectors between two languages canend poorly [2].

Figure: The German language has many different words related toeconomics, and they are all simply closest to the english vector foreconomics

Salomon Kabongo | Natural Language Processing

Page 17: Natural Language Processing - GitHub Pages

13

Open Questions IIFurther study

I Language can contain its own prejudices and unfair treatmenttowards groups. When we then train word vector on theseprejudiced texts, our word vectors will likely reflect thoseproblems [2].

Salomon Kabongo | Natural Language Processing

Page 18: Natural Language Processing - GitHub Pages

14

References

[1] Online course : "Introduction to Artificial Intelligence (AI)".

[2] Online course : "NLT Fundamentals In Python".

[3] What is natural language processing (NLP)? accessed 19December 2018.

[4] I. Goodfellow.Deep learning.Adaptive computation and machine learning. 2016.

[5] D. Jurafsky.Speech and language processing : an introduction to naturallanguage processing, computational linguistics, and speechrecognition.Prentice Hall series in artificial intelligence. Pearson Prentice Hall,Upper Saddle River, N.J., 2nd ed. edition, 2009.

Salomon Kabongo | Natural Language Processing

Page 19: Natural Language Processing - GitHub Pages

TUASAKIDILA