![Page 1: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/1.jpg)
Introduction to Natural Language Processing
Steven Bird Ewan Klein Edward Loper
University of Melbourne, AUSTRALIA
University of Edinburgh, UK
University of Pennsylvania, USA
August 27, 2008
![Page 2: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/2.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 3: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/3.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 4: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/4.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 5: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/5.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 6: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/6.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 7: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/7.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 8: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/8.jpg)
Knowledge and Communication inLanguage
• human knowledge, human communication, expressed inlanguage
• language technologies: process human languageautomatically
• handheld devices: predictive text, handwriting recognition• web search engines: access to information locked up in
text• two facets of the multilingual information society:
• natural human-machine interfaces• access to stored information
![Page 9: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/9.jpg)
Problem
• awash with language data• inadequate tools (will this ever change?)• overheads: Perl, Prolog, Java• Natural Language Toolkit (NLTK) as a solution
![Page 10: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/10.jpg)
Problem
• awash with language data• inadequate tools (will this ever change?)• overheads: Perl, Prolog, Java• Natural Language Toolkit (NLTK) as a solution
![Page 11: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/11.jpg)
Problem
• awash with language data• inadequate tools (will this ever change?)• overheads: Perl, Prolog, Java• Natural Language Toolkit (NLTK) as a solution
![Page 12: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/12.jpg)
Problem
• awash with language data• inadequate tools (will this ever change?)• overheads: Perl, Prolog, Java• Natural Language Toolkit (NLTK) as a solution
![Page 13: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/13.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 14: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/14.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 15: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/15.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 16: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/16.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 17: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/17.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 18: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/18.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 19: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/19.jpg)
NLTK: What you get...
• Book• Documentation• FAQ• Installation instructions for Python, NLTK, data• Distributions: Windows, Mac OSX, Unix, data,
documentation• CD-ROM: Python, NLTK, documentation, third-party
libraries for numerical processing and visualization,instructions
• Mailing lists:nltk-announce, nltk-devel, nltk-users,nltk-portuguese
![Page 20: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/20.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 21: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/21.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 22: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/22.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 23: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/23.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 24: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/24.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 25: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/25.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 26: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/26.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 27: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/27.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 28: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/28.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 29: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/29.jpg)
NLTK: Who it is for...
• people who want to learn how to:• write programs• to analyze written language
• does not presume programming abilities:• working examples• graded exercises
• experienced programmers:• quickly learn Python (if necessary)• Python features for NLP• NLP algorithms and data structures
![Page 30: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/30.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 31: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/31.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 32: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/32.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 33: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/33.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 34: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/34.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 35: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/35.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 36: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/36.jpg)
NLTK: What you will learn...
1 how to analyze language data2 key concepts from linguistic description and analysis3 how linguistic knowledge is used in NLP components4 data structures and algorithms used in NLP and linguistic
data management5 standard corpora and their use in formal evaluation6 organization of the field of NLP7 skills in Python programming for NLP
![Page 37: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/37.jpg)
NLTK: Your likely goals...
Goals BackgroundArts and Humanities Science and Engineering
LanguageAnalysis
Programming to managelanguage data, explore lin-guistic models, and testempirical claims
Language as a sourceof interesting problems indata modeling, data min-ing, and knowledge dis-covery
LanguageTechnol-ogy
Learning to program, withapplications to familiarproblems, to work in lan-guage technology or othertechnical field
Knowledge of linguis-tic algorithms and datastructures for high quality,maintainable languageprocessing software
![Page 38: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/38.jpg)
Philosophy
• practical• programming• principled• pragmatic• pleasurable• portal
![Page 39: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/39.jpg)
Philosophy
• practical• programming• principled• pragmatic• pleasurable• portal
![Page 40: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/40.jpg)
Philosophy
• practical• programming• principled• pragmatic• pleasurable• portal
![Page 41: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/41.jpg)
Philosophy
• practical• programming• principled• pragmatic• pleasurable• portal
![Page 42: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/42.jpg)
Philosophy
• practical• programming• principled• pragmatic• pleasurable• portal
![Page 43: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/43.jpg)
Philosophy
• practical• programming• principled• pragmatic• pleasurable• portal
![Page 44: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/44.jpg)
Structure
• Three parts:1 Basics: text processing, tokenization, tagging, lexicons,
language engineering, text classification2 Parsing: phrase structure, trees, grammars, chunking,
parsing3 Advanced Topics: selected topics in greater depth:
feature-based grammar, unification, semantics, linguisticdata management
• each part: chapter on programming; three chapters onNLP
• each chapter: motivation, sections, graded exercises,summary, further reading
![Page 45: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/45.jpg)
Structure
• Three parts:1 Basics: text processing, tokenization, tagging, lexicons,
language engineering, text classification2 Parsing: phrase structure, trees, grammars, chunking,
parsing3 Advanced Topics: selected topics in greater depth:
feature-based grammar, unification, semantics, linguisticdata management
• each part: chapter on programming; three chapters onNLP
• each chapter: motivation, sections, graded exercises,summary, further reading
![Page 46: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/46.jpg)
Structure
• Three parts:1 Basics: text processing, tokenization, tagging, lexicons,
language engineering, text classification2 Parsing: phrase structure, trees, grammars, chunking,
parsing3 Advanced Topics: selected topics in greater depth:
feature-based grammar, unification, semantics, linguisticdata management
• each part: chapter on programming; three chapters onNLP
• each chapter: motivation, sections, graded exercises,summary, further reading
![Page 47: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/47.jpg)
Structure
• Three parts:1 Basics: text processing, tokenization, tagging, lexicons,
language engineering, text classification2 Parsing: phrase structure, trees, grammars, chunking,
parsing3 Advanced Topics: selected topics in greater depth:
feature-based grammar, unification, semantics, linguisticdata management
• each part: chapter on programming; three chapters onNLP
• each chapter: motivation, sections, graded exercises,summary, further reading
![Page 48: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/48.jpg)
Structure
• Three parts:1 Basics: text processing, tokenization, tagging, lexicons,
language engineering, text classification2 Parsing: phrase structure, trees, grammars, chunking,
parsing3 Advanced Topics: selected topics in greater depth:
feature-based grammar, unification, semantics, linguisticdata management
• each part: chapter on programming; three chapters onNLP
• each chapter: motivation, sections, graded exercises,summary, further reading
![Page 49: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/49.jpg)
Structure
• Three parts:1 Basics: text processing, tokenization, tagging, lexicons,
language engineering, text classification2 Parsing: phrase structure, trees, grammars, chunking,
parsing3 Advanced Topics: selected topics in greater depth:
feature-based grammar, unification, semantics, linguisticdata management
• each part: chapter on programming; three chapters onNLP
• each chapter: motivation, sections, graded exercises,summary, further reading
![Page 50: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/50.jpg)
Python: Key Features
• simple yet powerful, shallow learning curve• object-oriented: encapsulation, re-use• scripting language, facilitates interactive exploration• excellent functionality for processing linguistic data• extensive standard library, incl graphics, web, numerical
processing• downloaded for free from http://www.python.org/
![Page 51: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/51.jpg)
Python: Key Features
• simple yet powerful, shallow learning curve• object-oriented: encapsulation, re-use• scripting language, facilitates interactive exploration• excellent functionality for processing linguistic data• extensive standard library, incl graphics, web, numerical
processing• downloaded for free from http://www.python.org/
![Page 52: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/52.jpg)
Python: Key Features
• simple yet powerful, shallow learning curve• object-oriented: encapsulation, re-use• scripting language, facilitates interactive exploration• excellent functionality for processing linguistic data• extensive standard library, incl graphics, web, numerical
processing• downloaded for free from http://www.python.org/
![Page 53: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/53.jpg)
Python: Key Features
• simple yet powerful, shallow learning curve• object-oriented: encapsulation, re-use• scripting language, facilitates interactive exploration• excellent functionality for processing linguistic data• extensive standard library, incl graphics, web, numerical
processing• downloaded for free from http://www.python.org/
![Page 54: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/54.jpg)
Python: Key Features
• simple yet powerful, shallow learning curve• object-oriented: encapsulation, re-use• scripting language, facilitates interactive exploration• excellent functionality for processing linguistic data• extensive standard library, incl graphics, web, numerical
processing• downloaded for free from http://www.python.org/
![Page 55: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/55.jpg)
Python: Key Features
• simple yet powerful, shallow learning curve• object-oriented: encapsulation, re-use• scripting language, facilitates interactive exploration• excellent functionality for processing linguistic data• extensive standard library, incl graphics, web, numerical
processing• downloaded for free from http://www.python.org/
![Page 56: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/56.jpg)
Python Example
import sysfor line in sys.stdin.readlines():
for word in line.split():if word.endswith(’ing’):
print word
1 whitespace: nesting lines of code; scope2 object-oriented: attributes, methods (e.g. line)3 readable
![Page 57: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/57.jpg)
Comparison with Perl
while (<>) {foreach my $word (split) {
if ($word =~ /ing$/) {print "$word\n";
}}
}
1 syntax is obscure: what are: <> $ my split ?2 “it is quite easy in Perl to write programs that simply look
like raving gibberish, even to experienced Perlprogrammers” (Hammond Perl Programming for Linguists2003:47)
3 large programs difficult to maintain, reuse
![Page 58: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/58.jpg)
What NLTK adds to Python
NLTK defines a basic infrastructure that can be used to buildNLP programs in Python. It provides:
• Basic classes for representing data relevant to naturallanguage processing
• Standard interfaces for performing tasks, such astokenization, tagging, and parsing
• Standard implementations for each task, which can becombined to solve complex problems
• Demonstrations (parsers, chunkers, chatbots)• Extensive documentation, including tutorials and reference
documentation
![Page 59: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/59.jpg)
What NLTK adds to Python
NLTK defines a basic infrastructure that can be used to buildNLP programs in Python. It provides:
• Basic classes for representing data relevant to naturallanguage processing
• Standard interfaces for performing tasks, such astokenization, tagging, and parsing
• Standard implementations for each task, which can becombined to solve complex problems
• Demonstrations (parsers, chunkers, chatbots)• Extensive documentation, including tutorials and reference
documentation
![Page 60: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/60.jpg)
What NLTK adds to Python
NLTK defines a basic infrastructure that can be used to buildNLP programs in Python. It provides:
• Basic classes for representing data relevant to naturallanguage processing
• Standard interfaces for performing tasks, such astokenization, tagging, and parsing
• Standard implementations for each task, which can becombined to solve complex problems
• Demonstrations (parsers, chunkers, chatbots)• Extensive documentation, including tutorials and reference
documentation
![Page 61: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/61.jpg)
What NLTK adds to Python
NLTK defines a basic infrastructure that can be used to buildNLP programs in Python. It provides:
• Basic classes for representing data relevant to naturallanguage processing
• Standard interfaces for performing tasks, such astokenization, tagging, and parsing
• Standard implementations for each task, which can becombined to solve complex problems
• Demonstrations (parsers, chunkers, chatbots)• Extensive documentation, including tutorials and reference
documentation
![Page 62: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/62.jpg)
What NLTK adds to Python
NLTK defines a basic infrastructure that can be used to buildNLP programs in Python. It provides:
• Basic classes for representing data relevant to naturallanguage processing
• Standard interfaces for performing tasks, such astokenization, tagging, and parsing
• Standard implementations for each task, which can becombined to solve complex problems
• Demonstrations (parsers, chunkers, chatbots)• Extensive documentation, including tutorials and reference
documentation
![Page 63: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/63.jpg)
NLTK Design: Requirements
1 simplicity: intuitive framework with substantial buildingblocks
2 consistency: uniform data structures, interfaces —predictability
3 extensibility: accommodates new components (replicatevs extend exiting functionality)
4 modularity: interaction between components5 well-documented: substantial documentation
![Page 64: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/64.jpg)
NLTK Design: Requirements
1 simplicity: intuitive framework with substantial buildingblocks
2 consistency: uniform data structures, interfaces —predictability
3 extensibility: accommodates new components (replicatevs extend exiting functionality)
4 modularity: interaction between components5 well-documented: substantial documentation
![Page 65: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/65.jpg)
NLTK Design: Requirements
1 simplicity: intuitive framework with substantial buildingblocks
2 consistency: uniform data structures, interfaces —predictability
3 extensibility: accommodates new components (replicatevs extend exiting functionality)
4 modularity: interaction between components5 well-documented: substantial documentation
![Page 66: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/66.jpg)
NLTK Design: Requirements
1 simplicity: intuitive framework with substantial buildingblocks
2 consistency: uniform data structures, interfaces —predictability
3 extensibility: accommodates new components (replicatevs extend exiting functionality)
4 modularity: interaction between components5 well-documented: substantial documentation
![Page 67: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/67.jpg)
NLTK Design: Requirements
1 simplicity: intuitive framework with substantial buildingblocks
2 consistency: uniform data structures, interfaces —predictability
3 extensibility: accommodates new components (replicatevs extend exiting functionality)
4 modularity: interaction between components5 well-documented: substantial documentation
![Page 68: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/68.jpg)
NLTK Design: Non-requirements
1 encyclopedic: has many gaps; opportunity for students toextend it
2 efficiency: not highly optimised for runtime performance3 programming tricks: avoid in preference for clear
implementations (replicate vs extend exiting functionality)
![Page 69: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/69.jpg)
NLTK Design: Non-requirements
1 encyclopedic: has many gaps; opportunity for students toextend it
2 efficiency: not highly optimised for runtime performance3 programming tricks: avoid in preference for clear
implementations (replicate vs extend exiting functionality)
![Page 70: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/70.jpg)
NLTK Design: Non-requirements
1 encyclopedic: has many gaps; opportunity for students toextend it
2 efficiency: not highly optimised for runtime performance3 programming tricks: avoid in preference for clear
implementations (replicate vs extend exiting functionality)
![Page 71: Steven Bird Ewan Klein Edward Loper - SourceForgenltk.sourceforge.net/doc/slides/preface.pdf · Steven Bird Ewan Klein Edward Loper University of Melbourne, AUSTRALIA ... third-party](https://reader035.vdocument.in/reader035/viewer/2022070803/5f02ff257e708231d407072a/html5/thumbnails/71.jpg)
Corpora Distributed with NLTK• Australian ABC News, 2 genres, 660k words, sentence-segmented• Brown Corpus, 15 genres, 1.15M words, tagged• CMU Pronouncing Dictionary, 127k entries• CoNLL 2000 Chunking Data, 270k words, tagged and chunked• CoNLL 2002 Named Entity, 700k words, pos- and named-entity-tagged (Dutch, Spanish)• Floresta Treebank, 9k sentences (Portuguese)• Genesis Corpus, 6 texts, 200k words, 6 languages• Gutenberg (sel), 14 texts, 1.7M words• Indian POS-Tagged Corpus, 60k words pos-tagged (Bangla, Hindi, Marathi, Telugu)• NIST 1999 Info Extr (sel), 63k words, newswire and named-entity SGML markup• Names Corpus, 8k male and female names• PP Attachment Corpus, 28k prepositional phrases, tagged as noun or verb modifiers• Presidential Addresses, 485k words, formatted text• Roget’s Thesaurus, 200k words, formatted text• SEMCOR, 880k words, part-of-speech and sense tagged• SENSEVAL 2, 600k words, part-of-speech and sense tagged• Shakespeare XML Corpus (sel), 8 books• Stopwords Corpus, 2,400 stopwords for 11 languages• Switchboard Corpus (sel), 36 phonecalls, transcribed, parsed• Univ Decl Human Rights, 480k words, 300+ languages• US Pres Addr Corpus, 480k words• Penn Treebank (sel), 40k words, tagged and parsed• TIMIT Corpus (sel), audio files and transcripts for 16 speakers• Wordlist Corpus, 960k words and 20k affixes for 8 languages• WordNet, 145k synonym sets