ai seminar
DESCRIPTION
AI Seminar. Our web page is at: www.cs.nmsu.edu/~gradrep Under “Events” in left frame. Identifying Ideological Point of View Part II. Melanie Martin September 5, 2001. Outline of this presentation. Where are we??? Ideology Statistical NLP and Machine Learning Discourse features - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/1.jpg)
September 5, 2001 Melanie Martin - AI Seminar 1
AI Seminar
Our web page is at:www.cs.nmsu.edu/~gradrepUnder “Events” in left frame
![Page 2: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/2.jpg)
September 5, 2001 Melanie Martin - AI Seminar 2
Identifying Ideological Point of ViewPart II
Melanie MartinSeptember 5, 2001
![Page 3: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/3.jpg)
September 5, 2001 Melanie Martin - AI Seminar 3
Outline of this presentation Where are we??? Ideology Statistical NLP and Machine Learning Discourse features Internet Conclusion
![Page 4: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/4.jpg)
September 5, 2001 Melanie Martin - AI Seminar 4
Where are we???
Let’s recall what we want to do:
Build a system that could take information from web pages and Usenet newsgroups on a given topic and segment, classify or cluster it by ideological point of view…..
![Page 5: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/5.jpg)
September 5, 2001 Melanie Martin - AI Seminar 5
The Proposed System
IdeologicalClustering
TopicClustering,Filtering
Set of documents
on topic
Internet:Web pages,
Usenet
Docs ontopic
clustered by IPV
SearchEngine
User inputstopic
![Page 6: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/6.jpg)
September 5, 2001 Melanie Martin - AI Seminar 6
Where are we??? What do we need?
– A computationally feasible definition of ideological point of view
– A search engine, possibly with additional processing, to produce a collection of documents on the topic specified by the user
![Page 7: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/7.jpg)
September 5, 2001 Melanie Martin - AI Seminar 7
Where are we???
What else do we need?
– A module to cluster documents by ideological point of view
– A user interface
– A way to evaluate the system
![Page 8: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/8.jpg)
September 5, 2001 Melanie Martin - AI Seminar 8
Where are we???
Why do we need this? Some examples using google:
– query: back pain ~2,220,000• scoliosis ~121,000
– query: lyme disease ~163,000– query: zoning shopping center ~65,100
• (add) clark county nv ~299– query: un racism conference ~74,000
![Page 9: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/9.jpg)
September 5, 2001 Melanie Martin - AI Seminar 9
Outline of this presentation Where are we??? Ideology Statistical NLP and Machine Learning Discourse features Internet Conclusion
![Page 10: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/10.jpg)
September 5, 2001 Melanie Martin - AI Seminar 10
Ideology
Working definition from van Dijk: “Ideologies are the fundamental beliefs of a group and its members.”– instantiated as Us vs. Them– predefined ideologies will not work across
domains– want to avoid researcher bias– definition likely needs more work
![Page 11: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/11.jpg)
September 5, 2001 Melanie Martin - AI Seminar 11
Ideology
Linguistics– van Dijk (1998)– Blommaert & Verschueren (1998)– Wang (1993)– Wortham & Locher (1996)
![Page 12: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/12.jpg)
September 5, 2001 Melanie Martin - AI Seminar 12
Ideology
The Systems– Ideology Machine -1965 to 1973 - Abelson et al.– Politics - 1979 - Carbonell– Pauline - 1987 - Hovy– Tracking Point of View in Narrative - 1994 - Wiebe– Spin Doctor - 1994 - Sack– Terminal Time - 2000 - Mateas et al.
![Page 13: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/13.jpg)
September 5, 2001 Melanie Martin - AI Seminar 13
Ideology
Some issues– Evaluation!!!– Hard-coded knowledge– Domain dependence– Cognitive plausibility– More precise definitions
![Page 14: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/14.jpg)
September 5, 2001 Melanie Martin - AI Seminar 14
Outline of this presentation Where are we??? Ideology Statistical NLP and Machine Learning Discourse features Internet Conclusion
![Page 15: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/15.jpg)
September 5, 2001 Melanie Martin - AI Seminar 15
Statistical NLP and ML
Two techniques we will consider– Latent Semantic Analysis– Probabilistic Classification
![Page 16: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/16.jpg)
September 5, 2001 Melanie Martin - AI Seminar 16
Statistical NLP and ML
Issues– clustering versus classification
• categories may not be predefined• may want to take a variety of features into
account– favor learning over hard-coding knowledge– supervised versus unsupervised
• cost of annotated training data
![Page 17: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/17.jpg)
September 5, 2001 Melanie Martin - AI Seminar 17
Statistical NLP and ML
Latent Semantic Analysis– text represented as a matrix
• entries are weighted frequency of word in context
– semantic space obtained through SVD• words appearing in similar context have similar
feature vectors– characterizes semantic content of words in
context
![Page 18: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/18.jpg)
September 5, 2001 Melanie Martin - AI Seminar 18
Statistical NLP and ML Why LSA is a good choice here
– semantics is key component of ideological discourse
– clustering without need for predefined categories
– already shown useful for:• summarization (Ando 2000)• text segmentation (Choi 2001)• measuring text coherence (Foltz 1998)
![Page 19: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/19.jpg)
September 5, 2001 Melanie Martin - AI Seminar 19
Statistical NLP and ML
We want to look a little more closely at Ando’s work– uses term, sentence, and document
vectors– modified SVD algorithm– interesting interface
Multi-document summarization by visualizing topical content. Rie Kubota Ando, Branimir Boguraev, Roy Byrd, and Mary Neff. ANLP/NAACL '00 Workshop on Automatic Summarization
![Page 20: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/20.jpg)
September 5, 2001 Melanie Martin - AI Seminar 20
Statistical NLP and ML
Another option is a probabilistic classifier– assigns most probable class to an object
bases on a probability model– can we get around predefined classes?
![Page 21: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/21.jpg)
September 5, 2001 Melanie Martin - AI Seminar 21
Statistical NLP and ML
Probability model– defines joint distribution of variables
• set of feature variables and a class variable Wiebe and Bruce (1995) got around the
issue of not knowing the classes in advance by breaking up the problem and using a series of classifiers
![Page 22: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/22.jpg)
September 5, 2001 Melanie Martin - AI Seminar 22
Statistical NLP and ML
We need to come up with a set of features…our next topic
Then deciding which features to use can be determined statistically with goodness of fit of graphical models
![Page 23: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/23.jpg)
September 5, 2001 Melanie Martin - AI Seminar 23
Statistical NLP and ML
Both methods seem to have a lot of potential
LSA would be easier to implement – possibly a baseline for evaluation of
probabilistic classifiers Less linguistic knowledge gain likely
with LSA
![Page 24: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/24.jpg)
September 5, 2001 Melanie Martin - AI Seminar 24
Outline of this presentation Where are we??? Ideology Statistical NLP and Machine Learning Discourse features Internet Conclusion
![Page 25: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/25.jpg)
September 5, 2001 Melanie Martin - AI Seminar 25
Discourse features
If we use probabilistic classifiers we need features, so we look at:
– linguistics– previous systems– discourse theory– literary theory
![Page 26: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/26.jpg)
September 5, 2001 Melanie Martin - AI Seminar 26
Discourse features
From linguistics and discourse: General strategy of most ideological
discourse (van Dijk’s Ideological Square):– Emphasize positive things about Us– Emphasize negative things about Them– De-emphasize negative things about Us– De-emphasize positive things about Them
![Page 27: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/27.jpg)
September 5, 2001 Melanie Martin - AI Seminar 27
Discourse features
How are these strategies instantiated in discourse? (van Dijk)– What is there:
• argument structure• syntactic patterns• style and non-literal language• actor descriptions• thematic structure• topoi (standardized topics)
![Page 28: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/28.jpg)
September 5, 2001 Melanie Martin - AI Seminar 28
Discourse features
– What is not there• implication• presupposition• inference• goals and plans
![Page 29: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/29.jpg)
September 5, 2001 Melanie Martin - AI Seminar 29
Discourse features
Disclaimers, selected examples:– Apparent Negation: I have nothing against X, but...– Apparent Concession: They may be very smart,
but...– Apparent Empathy: They may have had problems,
but...– Apparent Effort: We do everything we can, but...
Positive self-representation and face keeping
![Page 30: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/30.jpg)
September 5, 2001 Melanie Martin - AI Seminar 30
Discourse features
Some discourse theories from Computational Linguistics
– Mann & Thompson (RST) (1988)– Grosz & Sidner (G&S) (1986)– Morris & Hirst (Lexical chains) (1991)
![Page 31: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/31.jpg)
September 5, 2001 Melanie Martin - AI Seminar 31
Discourse features Issues
– implementation• G&S, RST
– finite number of fixed primitives• RST
– domain specific• RST depends on training
![Page 32: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/32.jpg)
September 5, 2001 Melanie Martin - AI Seminar 32
Discourse features
A reasonable first approach: Lexical Chains (Morris & Hirst)
Sequences of related words spanning a topical unit in the text– based on lexical cohesion– encapsulates context– helps identify key phrases
![Page 33: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/33.jpg)
September 5, 2001 Melanie Martin - AI Seminar 33
Discourse features
Idea of Algorithm– read next word
• if candidate– check chains within suitable span
» check thesaurus or WordNet» check other knowledge sources
– if found » include in chain» recalculate chain
![Page 34: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/34.jpg)
September 5, 2001 Melanie Martin - AI Seminar 34
Discourse features
Lexical chains could help us in:– topic segmentation– intentional structure– lexical features for a classifier
![Page 35: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/35.jpg)
September 5, 2001 Melanie Martin - AI Seminar 35
Discourse features
Lexical chains are easy to implement, but are unlikely to be sufficient…
For the next approximation: RST– Marcu’s implementation incorporating G&S– Mostly used for summarization and
generation– Would help get at the argument structure
of the text
![Page 36: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/36.jpg)
September 5, 2001 Melanie Martin - AI Seminar 36
Discourse features RST Basics
– about 23 rhetorical relations• account for discourse coherence• link adjacent spans of text
– 5 schema• defined in terms of relations• specify how spans can co-occur
– nucleus and satellite spans– end up with tree structure
![Page 37: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/37.jpg)
September 5, 2001 Melanie Martin - AI Seminar 37
Discourse features
Would most likely use RST to generate features for a classifier or as input to a pattern recognizer
Nuclei spans help pick out the more important segments of text
Produces a tree that gives the structure of the rhetorical structure of the text
![Page 38: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/38.jpg)
September 5, 2001 Melanie Martin - AI Seminar 38
Outline of this presentation Where are we??? Ideology Statistical NLP and Machine Learning Discourse features Internet Conclusion
![Page 39: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/39.jpg)
September 5, 2001 Melanie Martin - AI Seminar 39
Internet
We would like to mine the structure of the internet – see if there is a correspondence with
groups– improved IR by topic– figure out what search engine to use as a
base for our system
![Page 40: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/40.jpg)
September 5, 2001 Melanie Martin - AI Seminar 40
Internet
Issues– topic or query disambiguation– what is a minimal unit– how to use the structure of the web
• finding authorities• communities and subgraphs
– Evaluation!!!
![Page 41: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/41.jpg)
September 5, 2001 Melanie Martin - AI Seminar 41
Internet
Kleinberg (1997)– link based model– hub - links to many related authorities– authority– iterative weighting algorithm that
converges (rapidly in practice)– can disambiguate authorities by sense– can be used to trawl for cyber communities
![Page 42: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/42.jpg)
September 5, 2001 Melanie Martin - AI Seminar 42
Outline of this presentation Where are we??? Ideology Statistical NLP and Machine Learning Discourse features Internet Conclusion
![Page 43: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/43.jpg)
September 5, 2001 Melanie Martin - AI Seminar 43
Conclusion It seems that such a system can be built
– find a good search engine– use Kleinberg’s algorithm to improve
collection of documents retrieved– use LSA and/or a probabilistic classifier to
handle the ideological point of view– with a probabilistic classifier use linguistic
and discourse features – develop evaluation methodolgy
![Page 44: AI Seminar](https://reader035.vdocument.in/reader035/viewer/2022070502/56814af3550346895db80529/html5/thumbnails/44.jpg)
September 5, 2001 Melanie Martin - AI Seminar 44
The End
Thanks for listening!
If you want to know more, my Comprehensive Exam paper is at:
www.CS.NMSU.Edu/~mmartin/courses/comps_all.html