named entity recognition - amazon simple storage service · named entity recognition natural...

17
DataCamp Natural Language Processing Fundamentals in Python Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan

Upload: vandung

Post on 18-Feb-2019

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

NamedEntityRecognition

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 2: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

WhatisNamedEntityRecognition?NLPtasktoidentifyimportantnamedentitiesinthetext

People,places,organizationsDates,states,worksofart...andothercategories!

Canbeusedalongsidetopicidentification...oronitsown!

Who?What?When?Where?

Page 3: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

ExampleofNER

(Source:EuropeanaNewspapers())

http://www.europeana-newspapers.eu

Page 4: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

nltkandtheStanfordCoreNLPLibraryTheStanfordCoreNLPlibrary:

IntegratedintoPythonvianltk

JavabasedSupportforNERaswellascoreferenceanddependencytrees

Page 5: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

UsingnltkforNamedEntityRecognitionIn[1]:importnltk

In[2]:sentence='''InNewYork,IliketoridetheMetrotovisitMOMAandsomerestaurantsratedwellbyRuthReichl.'''

In[3]:tokenized_sent=nltk.word_tokenize(sentence)

In[4]:tagged_sent=nltk.pos_tag(tokenized_sent)

In[5]:tagged_sent[:3]Out[5]:[('In','IN'),('New','NNP'),('York','NNP')]

Page 6: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

nltk'sne_chunk()In[6]:print(nltk.ne_chunk(tagged_sent))(SIn/IN(GPENew/NNPYork/NNP),/,I/PRPlike/VBPto/TOride/VBthe/DT(ORGANIZATIONMetro/NNP)to/TOvisit/VB(ORGANIZATIONMOMA/NNP)and/CCsome/DTrestaurants/NNSrated/VBNwell/RBby/IN(PERSONRuth/NNPReichl/NNP)./.)

Page 7: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 8: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

IntroductiontoSpaCy

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 9: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

WhatisSpaCy?NLPlibrarysimilartogensim,withdifferentimplementations

FocusoncreatingNLPpipelinestogeneratemodelsandcorporaOpen-source,withextralibrariesandtools

Displacy

Page 10: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

Displacyentityrecognitionvisualizer

(source: )https://demos.explosion.ai/displacy-ent/

Page 11: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

SpaCyNERIn[1]:importspacy

In[2]:nlp=spacy.load('en')

In[3]:nlp.entityOut[3]:<spacy.pipeline.EntityRecognizerat0x7f76b75e68b8>

In[4]:doc=nlp("""BerlinisthecapitalofGermany;andtheresidenceofChancellorAngelaMerkel.""")

In[5]:doc.entsOut[5]:(Berlin,Germany,AngelaMerkel)

In[6]:print(doc.ents[0],doc.ents[0].label_)BerlinGPE

Page 12: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

WhyuseSpaCyforNER?EasypipelinecreationDifferententitytypescomparedtonltk

InformallanguagecorporaEasilyfindentitiesinTweetsandchatmessages

Quicklygrowing!

Page 13: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 14: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

MultilingualNERwithpolyglot

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 15: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

Whatispolyglot?NLPlibrarywhichuseswordvectorsWhypolyglot?

VectorsformanydifferentlanguagesMorethan130!

Page 16: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

SpanishNERwithpolyglotIn[1]:frompolyglot.textimportText

In[2]:ẗext="""ElpresidentedelaGeneralitatdeCataluña,CarlesPuigdemont,haafirmadohoyalaalcaldesadeMadrid,ManuelaCarmena,queensuetapadealcaldedeGirona(dejuliode2011aenerode2016)hizounagranpromocióndeMadrid."""

In[3]:ptext=Text(text)

In[4]:ptext.entitiesOut[4]:[I-ORG(['Generalitat','de']),I-LOC(['Generalitat','de','Cataluña']),I-PER(['Carles','Puigdemont']),I-LOC(['Madrid']),I-PER(['Manuela','Carmena']),I-LOC(['Girona']),I-LOC(['Madrid'])]

Page 17: Named Entity Recognition - Amazon Simple Storage Service · Named Entity Recognition NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan. ... Using

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON