large scale nlp using python's nltk on azure

39
Beat Schwegler head in the cloud feet on the ground Twitter: @cloudbeatsch Blog: http://cloudbeatsch.com I saw Mr. Washington with a saw! large scale NLP using python's NLTK on Azure

Upload: cloudbeatsch

Post on 11-Apr-2017

620 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Large scale nlp using python's nltk on azure

Beat Schweglerhead in the cloud feet on the ground

Twitter: @cloudbeatsch Blog: http://cloudbeatsch.com

I saw Mr. Washington with a saw!large scale NLP using python's NLTK on Azure

Page 2: Large scale nlp using python's nltk on azure

I saw Mr. Washington.This is your saw… I told you!Is this really a chainsaw?

Page 3: Large scale nlp using python's nltk on azure

fundamentals of nlp

natural language toolkit (nltk)

running python and nltk on Azure

Page 4: Large scale nlp using python's nltk on azure

source: http://www.nltk.org/book_1ed/ch01.html

simple pipeline architecture for a spoken dialogue system

Page 5: Large scale nlp using python's nltk on azure

dialogue with a chatbot

Page 6: Large scale nlp using python's nltk on azure
Page 7: Large scale nlp using python's nltk on azure

identify languagetokenize & tag part of speech (pos)identify named entities

Page 8: Large scale nlp using python's nltk on azure

corpora and lexical resourcescorpus is a large body of textlexical resource is a collection words associated with additional information

Page 9: Large scale nlp using python's nltk on azure

e.g. brown corpusfirst million-word electronic corpus of english, created in 1961 at brown university

Page 10: Large scale nlp using python's nltk on azure

segmentationtokenizetag part of speech (pos)identify named entities

source: http://www.nltk.org/book_1ed/ch07.html

Page 11: Large scale nlp using python's nltk on azure

entity detection using chunking

Page 12: Large scale nlp using python's nltk on azure

fundamentals of nlp

natural language toolkit (nltk)

running python and nltk on Azure

Page 13: Large scale nlp using python's nltk on azure

text as a sequence of words and punctuation represented as a list

sent = [‘I', ‘love', ‘Dublin', ‘!']upper_sent = [w.upper() for w in

sent]

Page 14: Large scale nlp using python's nltk on azure

downloading corpus and lexical resourcesnltk.download(‘all’)nltk.download(‘brown’)

Page 15: Large scale nlp using python's nltk on azure

segment text into sentencesfrom nltk.tokenize import sent_tokenizesent_tokenize_list = sent_tokenize(text)

Page 16: Large scale nlp using python's nltk on azure

tokenize sentencefrom nltk.tokenize import word_tokenizetokens = word_tokenize(sentence)

Page 17: Large scale nlp using python's nltk on azure

tag part of speech (pos)tags = nltk.pos_tag(tokens)

Page 18: Large scale nlp using python's nltk on azure
Page 19: Large scale nlp using python's nltk on azure

identify named entitiesentities = nltk.ne_chunk(tags)entities.draw()

Page 20: Large scale nlp using python's nltk on azure
Page 21: Large scale nlp using python's nltk on azure

demo

Page 22: Large scale nlp using python's nltk on azure

language recognition import langidlang = langid.classify(text)[0]

Page 23: Large scale nlp using python's nltk on azure

fundamentals of nlp

natural language toolkit (nltk)

running python and nltk on Azure

Page 24: Large scale nlp using python's nltk on azure

azure cloud services azure webjobsazure functions

Page 25: Large scale nlp using python's nltk on azure

azure cloud services & pythonpip’s requirements.txtPowerShell scripts for setup and launch

Page 26: Large scale nlp using python's nltk on azure

azure webjobs & pythonupload zip (inc. dependencies)runs run.py (or the first py file it finds)

Page 27: Large scale nlp using python's nltk on azure

configuration settings key = os.environ["STORAGE_KEY"]

Page 28: Large scale nlp using python's nltk on azure

publish webjobpip packages into site-packages zip application (inc. depended packages)upload zip file

Page 29: Large scale nlp using python's nltk on azure

add package location to sys.pathp = os.path.join(os.getcwd(), "site-packages")sys.path.append(p)

Page 30: Large scale nlp using python's nltk on azure

downloading corpusD:\\local\\AppData\\nltk_dataif os.getenv("DOWNLOAD", True) == True : dest = os.environ[“NLTK_DATA_DIR"] nltk.download('all', dest)

Page 31: Large scale nlp using python's nltk on azure

using queues for communicationreads text from input queue writes processed text into output queues

Page 32: Large scale nlp using python's nltk on azure

auto scalebased on queue length

Page 33: Large scale nlp using python's nltk on azure

debugging python webjobslocal: vs and webjob simulatorcloud: use kudu (xyz.scm.azurewebsites.net) and logs

Page 34: Large scale nlp using python's nltk on azure
Page 35: Large scale nlp using python's nltk on azure
Page 36: Large scale nlp using python's nltk on azure

demo

Page 37: Large scale nlp using python's nltk on azure

in closing…

Page 38: Large scale nlp using python's nltk on azure

nltk is a great toolkit to perform nlp tasksazure provides an elastic and scalable platform to run python nltk jobs

Page 39: Large scale nlp using python's nltk on azure

http://www.nltk.org/ http://www.nltk.org/book_1ed

http://azure.com/