cs 679: advanced nlp lecture #1: introduction to text mining this work is licensed under a creative...

20
CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License .

Upload: tyler-obrien

Post on 27-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

CS 679: Advanced NLP

Lecture #1: Introduction to Text Mining

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Page 2: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Objectives for Today

1. Quick course info.2. Overview of Text Mining3. Discuss your applications of Text Mining4. Elements of Text Mining5. Introduce course objectives

Page 3: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Course Info. Office Hours:

Tue & Thu. 3-4pm (without appointment) OR by appointment

TA: TBD Web page: https://facwiki.cs.byu.edu/cs679

Syllabus Regularly updated schedule: Due dates, Reading

assignments, Projects guidelines, Lecture Notes Google Group “BYU CS 679” Email: ringger AT cs DOT byu DOT edu Grades: http://gradebook.byu.edu

Page 4: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Assignments Readings – with max. one page reports

Mostly research papers (see course web page for all hyperlinks)

Usually one reading report per week

Intro. Projects Presentation Report

Semester Project Proposal Presentation Report

Page 5: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Course Policies

Early Late Grades Other

See Syllabus for details

Page 6: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Text Mining

The process of discovering previously unknown information in large text collections

Paraphrased from M. Hearst

Page 7: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Other Definitions

Looking for patterns in unstructured text (Nahm)

Text mining applies the same analytical functions of data mining to the domain of textual information (Doore(

Page 8: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

“Search” versus “Discover”

Data Mining

Text Mining

DataRetrieval

InformationRetrieval

Search(goal-oriented)

Discover(opportunistic)

StructuredData

UnstructuredData (Text)

Credit: adapted from slide by Nathan Treloar, AvaQuest

Page 9: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Your Exciting Applications

Page 10: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

F2011: Your Exciting Applications

Page 11: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

W2011: Exciting Applications

Page 12: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

2010: Exciting Applications

Page 13: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

2009: Exciting Applications

Page 14: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Additional Applications

News Mining Sentiment Detection Summarization Trend Analysis Association Detection

Page 15: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Course Objectives Acquire experience conducting exploratory data analysis on

large collections of text Gain in-depth experience with and understanding of

approaches to document classification sentiment classification

feature engineering feature selection

document clustering unsupervised topic identification visualization, including document summarization

Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis

Page 16: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Course Objectives (2)

Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes

Independent investigation of methods of your choice!

Application of your methods to learn something important from a significant text corpus of your choice

Page 17: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Simplistic Text Mining Process

Credit: NCSA

Page 18: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Methods

Feature Engineering Feature Selection Information Extraction Categorization (Supervised) Clustering (Unsupervised) Topic Identification / Topic Modeling Visualization

Page 19: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Some Available Data Sets 20 Newsgroups -- Usenet Reuters (1990s) newswire Del.icio.us bookmarked web pages Enron Email Movie Reviews Gamespot game reviews General Conference State of the Union Campaign Speeches

… Yours!

Page 20: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative

Assignment

Reading for next time: Course Syllabus "Tapping the Power of Text Mining" by Fan et al.

(CACM 9/2006) "Text-Mining the Voice of the People" by

Evangelopoulos et al. (CACM 2/2012) Skim: Alta Plana Text Analytics Report

Reading Report #1 % Completed Questions