cs 679: advanced nlp lecture #1: introduction to text mining this work is licensed under a creative...
TRANSCRIPT
CS 679: Advanced NLP
Lecture #1: Introduction to Text Mining
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Objectives for Today
1. Quick course info.2. Overview of Text Mining3. Discuss your applications of Text Mining4. Elements of Text Mining5. Introduce course objectives
Course Info. Office Hours:
Tue & Thu. 3-4pm (without appointment) OR by appointment
TA: TBD Web page: https://facwiki.cs.byu.edu/cs679
Syllabus Regularly updated schedule: Due dates, Reading
assignments, Projects guidelines, Lecture Notes Google Group “BYU CS 679” Email: ringger AT cs DOT byu DOT edu Grades: http://gradebook.byu.edu
Assignments Readings – with max. one page reports
Mostly research papers (see course web page for all hyperlinks)
Usually one reading report per week
Intro. Projects Presentation Report
Semester Project Proposal Presentation Report
Course Policies
Early Late Grades Other
See Syllabus for details
Text Mining
The process of discovering previously unknown information in large text collections
Paraphrased from M. Hearst
Other Definitions
Looking for patterns in unstructured text (Nahm)
Text mining applies the same analytical functions of data mining to the domain of textual information (Doore(
“Search” versus “Discover”
Data Mining
Text Mining
DataRetrieval
InformationRetrieval
Search(goal-oriented)
Discover(opportunistic)
StructuredData
UnstructuredData (Text)
Credit: adapted from slide by Nathan Treloar, AvaQuest
Your Exciting Applications
F2011: Your Exciting Applications
W2011: Exciting Applications
2010: Exciting Applications
2009: Exciting Applications
Additional Applications
News Mining Sentiment Detection Summarization Trend Analysis Association Detection
Course Objectives Acquire experience conducting exploratory data analysis on
large collections of text Gain in-depth experience with and understanding of
approaches to document classification sentiment classification
feature engineering feature selection
document clustering unsupervised topic identification visualization, including document summarization
Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis
Course Objectives (2)
Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes
Independent investigation of methods of your choice!
Application of your methods to learn something important from a significant text corpus of your choice
Simplistic Text Mining Process
Credit: NCSA
Methods
Feature Engineering Feature Selection Information Extraction Categorization (Supervised) Clustering (Unsupervised) Topic Identification / Topic Modeling Visualization
Some Available Data Sets 20 Newsgroups -- Usenet Reuters (1990s) newswire Del.icio.us bookmarked web pages Enron Email Movie Reviews Gamespot game reviews General Conference State of the Union Campaign Speeches
… Yours!
Assignment
Reading for next time: Course Syllabus "Tapping the Power of Text Mining" by Fan et al.
(CACM 9/2006) "Text-Mining the Voice of the People" by
Evangelopoulos et al. (CACM 2/2012) Skim: Alta Plana Text Analytics Report
Reading Report #1 % Completed Questions