sais svcc
DESCRIPTION
STRANSCRIPT
![Page 1: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/1.jpg)
Simple Sentiment Analysis Using Solr
Silicon Valley Code CampFoothill College – Oct 6,2013
By: Pradeep Pujari
![Page 2: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/2.jpg)
Sentiment Analysis?
Sentiment Analysis – General Architecture
Little Lucene
Sentiment Analysis and Solr
Applications of Sentiment Analysis
Code Walkthrough
Objectives
![Page 3: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/3.jpg)
Working mostly in Search domain
Search = IR + ML + NLP
Who am I?Works for
![Page 4: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/4.jpg)
Contributing to SolrSherlock
- Open Source Project
Who am I?
http://solrsherlock.github.io/SolrSherlock/
![Page 5: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/5.jpg)
What is Sentiment Analysis?A linguistic analysis technique that identifies
The movie is great.
The movie stars Mr. X
The movie is horrible.
opinion early in a piece of text.
![Page 6: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/6.jpg)
Challenging
Too easy Too hard
Difficultymis
cla
ssifi
cati
on
What is Sentiment Analysis?
![Page 7: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/7.jpg)
Sentiment Analysis
NLP
Cognitive Science
What is Sentiment Analysis?
![Page 8: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/8.jpg)
Human can easily understand emotions.
Can a machine be trained to do it?
What is Sentiment Analysis?
![Page 9: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/9.jpg)
SA offers organizations ability to monitor in real time and act accordingly
Marketing managers, PR Firms, campaign managers, politicians, equity investors, on line shoppers are direct beneficiaries
http://www.tweetfeel.com
http://www.nytimes.com/interactive/us/politics/2010-twitter-candidates.html
Key Insights
![Page 10: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/10.jpg)
Generic Sentiment Analysis System
![Page 11: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/11.jpg)
Document-Levelsupervised/non supervised learning
Sentence-Levelsupervised learning
Feature-Based Sentiment AnalysisAll NP in corpus and Polarity
Sentiment Lexicon Acquisition WordNet
Complexity
![Page 12: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/12.jpg)
Open-source Java based search engine
Provides document indexing w/ arbitrary fields and fast search
Several relevance and ranking algorithms
Apache Lucene
![Page 13: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/13.jpg)
1. Create an index
2. Add ‘document’ representations of items
3. Construct queries
4. Ask for results (will be scored )
Using Lucene
![Page 14: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/14.jpg)
IndexWriterConfig config = /* configure */ ;Directory dir = FSDirectory.open(indexFile);IndexWriter w = new IndexWriter(dir, config);for (ItemInfo item: getItems()) { Document doc = new Document(); doc.add(new Field("title", item.title)); doc.add(new Field("tags", item.tags)); w.add(doc);}w.close();
Building the Index
![Page 15: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/15.jpg)
IndexSearcher idx = getIndexSearcher(); IndexReader reader = idx.getIndexReader(); TopDocs results = idx.search(q, n + 1);
Finding Items
![Page 16: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/16.jpg)
PyLucene is Python implementation
Lucy is in C w/ bindings for other langs
Lucene.NET
SOLR provides search server (with REST
API) on top of Lucene
Related Projects
![Page 17: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/17.jpg)
Solr ?Http Request Servlet
AdminInterface
Update Servlet
Standard Request Handler
Custom Request Handler
ResponseWriter
Solr Core
Lucene
Analysis UIMA
configCachin
g
UpdateHandler
![Page 18: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/18.jpg)
Linguistics moduleStems, Lemmas and Synonyms
multi language capabilityCJKAnalyzer, UIMA Analyzers
UIMA integrationUpdateProcessorChain
Why Solr ?
![Page 19: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/19.jpg)
Why Solr ?Extract domain specific entities and concepts
Time and Cost
Solr Set Up – 5 mins
UIMA Annotators - 5 days
Enrich text, write to dedicated field
![Page 20: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/20.jpg)
Tagging entities in review text
Applications:
I wasn't really in the market for another tablet, but my girlfriend ended up getting one for me so she got me on this one. I would like to say that this tablet reminds me of the first Motorola Droid smartphone that came out several years back. The phone jam packed a ton of bells & whistles into its hardware and software to give a lot of bang for your buck. This is what it feels like amazon has done with the Kindle Fire 8.9. They have put a lot of advanced hardware and innovative software, so for the average user, specially someone who absorbs a lot of media, you get a lot for the price. But just because you get a lot for the price, doesn't mean it is without its flaws.
![Page 21: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/21.jpg)
Applications:Consumer feedback about products
Which product features are more relevant
Polarity
![Page 22: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/22.jpg)
Digital SLR with Full 1080p HD Video
There are many preprogrammed scene modes that make this a very easy camera to use.The picture quality is beyond belief, and even better for the price.
Price:
Usecase
![Page 23: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/23.jpg)
Why UIMA ?UIMA Framework manages componentsand data flow – No coding
Deploy pipeline of analysis engines
AEs wrap NLP algorithms
PersonPlace
organization
Language
Detection
Aggregate analysis engine
SentenceAnnotator
POSAnnotator NER
![Page 24: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/24.jpg)
Index
Lucene
Solr UpdateRequestProcessor
Solr
QParser Data
Solr+UIMA
UIMA AE
![Page 25: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/25.jpg)
NLP+UIMA Use POS in query understanding
boosting termsSynonym expansion
Extract concepts/entities
Faceting using entities
Identify places in query and use spatial queries
![Page 26: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/26.jpg)
Ideas: Sentiment Analysis App
Identify Subjective Sentences from text
Remove noisy sentences– Regex, conditional probability
Graph min cut – LingPipe
Subjectivity Lexicons
Discard Facts and Objective Sentences
![Page 27: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/27.jpg)
Subjectivity
detector
Subjective
Objective
Polarity Classifier
Ideas: Sentiment Analysis App
![Page 28: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/28.jpg)
Sentiments Intensity - SentiWordNetWordNet-Affect: WordNet +
annotated concepts
Ideas: Sentiment Analysis App
Hybrid model with adding dictionary
![Page 29: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/29.jpg)
Update Handler with
processor chain
Remove Duplicatesprocessor
Loggingprocessor
Custom Transformprocessor
Indexprocessor
Update Processor Chain
Text Analyzers
Lucene
Lucene Index
Sentence Detectionprocessor
Sentiment Classifier
Company NameAnnotator
Sentiment Scoreprocessor
Product Reviews
![Page 30: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/30.jpg)
Let’s look at the code
![Page 31: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/31.jpg)
Data transformation or post processing UpdateProcessorFactory
LogUpdateProcessorFactory UIMAUpdateProcessorFactory
UpdateRequestProcessorChain◦ Pipe line of UpdateRequestProcessors
UpdateRequestProcessor
![Page 32: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/32.jpg)
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
How to configure
![Page 33: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/33.jpg)
Stanford NER
Additional Libraries
![Page 34: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/34.jpg)
<updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> </lst> <lst name="analysisEngine"><str name="defaultanalysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str> </lst> <lst name="analyzeFields"> <bool name="merge">false</bool> <arr name="fields"> <str>content_text</str> </arr> </lst> <lst name="fieldMappings"> <lst name="type">
<str name="name">org.apache.uima.DictionaryEntry</str> <lst name="mapping"> <str name="feature">coveredText</str> <str
name="field">sentiment_keyword,sentiment_type</str> </lst>
</lst>
![Page 35: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/35.jpg)
![Page 36: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/36.jpg)
Referenceshttp://lucene.apache.org/solr/
http://uima.apache.org/
http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html
http://openie.cs.washington.edu/
http://wiki.apache.org/solr/SolrUIMA
![Page 37: Sais svcc](https://reader034.vdocument.in/reader034/viewer/2022052522/554e8bd5b4c90573338b4a8c/html5/thumbnails/37.jpg)
Questions ?