named entity recognition - amazon simple storage service · named entity recognition natural...
TRANSCRIPT
DataCamp NaturalLanguageProcessingFundamentalsinPython
NamedEntityRecognition
NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON
KatharineJarmulFounder,kjamistan
DataCamp NaturalLanguageProcessingFundamentalsinPython
WhatisNamedEntityRecognition?NLPtasktoidentifyimportantnamedentitiesinthetext
People,places,organizationsDates,states,worksofart...andothercategories!
Canbeusedalongsidetopicidentification...oronitsown!
Who?What?When?Where?
DataCamp NaturalLanguageProcessingFundamentalsinPython
ExampleofNER
(Source:EuropeanaNewspapers())
http://www.europeana-newspapers.eu
DataCamp NaturalLanguageProcessingFundamentalsinPython
nltkandtheStanfordCoreNLPLibraryTheStanfordCoreNLPlibrary:
IntegratedintoPythonvianltk
JavabasedSupportforNERaswellascoreferenceanddependencytrees
DataCamp NaturalLanguageProcessingFundamentalsinPython
UsingnltkforNamedEntityRecognitionIn[1]:importnltk
In[2]:sentence='''InNewYork,IliketoridetheMetrotovisitMOMAandsomerestaurantsratedwellbyRuthReichl.'''
In[3]:tokenized_sent=nltk.word_tokenize(sentence)
In[4]:tagged_sent=nltk.pos_tag(tokenized_sent)
In[5]:tagged_sent[:3]Out[5]:[('In','IN'),('New','NNP'),('York','NNP')]
DataCamp NaturalLanguageProcessingFundamentalsinPython
nltk'sne_chunk()In[6]:print(nltk.ne_chunk(tagged_sent))(SIn/IN(GPENew/NNPYork/NNP),/,I/PRPlike/VBPto/TOride/VBthe/DT(ORGANIZATIONMetro/NNP)to/TOvisit/VB(ORGANIZATIONMOMA/NNP)and/CCsome/DTrestaurants/NNSrated/VBNwell/RBby/IN(PERSONRuth/NNPReichl/NNP)./.)
DataCamp NaturalLanguageProcessingFundamentalsinPython
Let'spractice!
NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON
DataCamp NaturalLanguageProcessingFundamentalsinPython
IntroductiontoSpaCy
NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON
KatharineJarmulFounder,kjamistan
DataCamp NaturalLanguageProcessingFundamentalsinPython
WhatisSpaCy?NLPlibrarysimilartogensim,withdifferentimplementations
FocusoncreatingNLPpipelinestogeneratemodelsandcorporaOpen-source,withextralibrariesandtools
Displacy
DataCamp NaturalLanguageProcessingFundamentalsinPython
Displacyentityrecognitionvisualizer
(source: )https://demos.explosion.ai/displacy-ent/
DataCamp NaturalLanguageProcessingFundamentalsinPython
SpaCyNERIn[1]:importspacy
In[2]:nlp=spacy.load('en')
In[3]:nlp.entityOut[3]:<spacy.pipeline.EntityRecognizerat0x7f76b75e68b8>
In[4]:doc=nlp("""BerlinisthecapitalofGermany;andtheresidenceofChancellorAngelaMerkel.""")
In[5]:doc.entsOut[5]:(Berlin,Germany,AngelaMerkel)
In[6]:print(doc.ents[0],doc.ents[0].label_)BerlinGPE
DataCamp NaturalLanguageProcessingFundamentalsinPython
WhyuseSpaCyforNER?EasypipelinecreationDifferententitytypescomparedtonltk
InformallanguagecorporaEasilyfindentitiesinTweetsandchatmessages
Quicklygrowing!
DataCamp NaturalLanguageProcessingFundamentalsinPython
Let'spractice!
NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON
DataCamp NaturalLanguageProcessingFundamentalsinPython
MultilingualNERwithpolyglot
NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON
KatharineJarmulFounder,kjamistan
DataCamp NaturalLanguageProcessingFundamentalsinPython
Whatispolyglot?NLPlibrarywhichuseswordvectorsWhypolyglot?
VectorsformanydifferentlanguagesMorethan130!
DataCamp NaturalLanguageProcessingFundamentalsinPython
SpanishNERwithpolyglotIn[1]:frompolyglot.textimportText
In[2]:ẗext="""ElpresidentedelaGeneralitatdeCataluña,CarlesPuigdemont,haafirmadohoyalaalcaldesadeMadrid,ManuelaCarmena,queensuetapadealcaldedeGirona(dejuliode2011aenerode2016)hizounagranpromocióndeMadrid."""
In[3]:ptext=Text(text)
In[4]:ptext.entitiesOut[4]:[I-ORG(['Generalitat','de']),I-LOC(['Generalitat','de','Cataluña']),I-PER(['Carles','Puigdemont']),I-LOC(['Madrid']),I-PER(['Manuela','Carmena']),I-LOC(['Girona']),I-LOC(['Madrid'])]
DataCamp NaturalLanguageProcessingFundamentalsinPython
Let'spractice!
NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON