deep learning for sentiment analysis: a survey · deep learning has emerged as a powerful machine...

34
Deep Learning for Sentiment Analysis: A Survey Lei Zhang, LinkedIn Corporation, [email protected] Shuai Wang, University of Illinois at Chicago, [email protected] Bing Liu, University of Illinois at Chicago, [email protected] Abstract Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis. INTRODUCTION Sentiment analysis or opinion mining is the computational study of people’s opinions, sentiments, emotions, appraisals, and attitudes towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes. 1 The inception and rapid growth of the field coincide with those of the social media on the Web, for example, reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks, because for the first time in human history, we have a huge volume of opinionated data recorded in digital forms. Since early 2000, sentiment analysis has grown to be one of the most active research areas in natural language processing (NLP). It is also widely studied in data mining, Web mining, text mining, and information retrieval. In fact, it has spread from computer science to management sciences and social sciences such as marketing, finance, political science, communications, health science, and even history, due to its importance to business and society as a whole. This proliferation is due to the fact that opinions are central to almost all human activities and are key influencers of our behaviours. Our beliefs and perceptions of reality, and the choices we make, are, to a considerable degree, conditioned upon how others see and evaluate the world. For this reason, whenever we need to make a decision we often seek out the opinions of others. This is not only true for individuals but also true for organizations. Nowadays, if one wants to buy a consumer product, one is no longer limited to asking one’s friends and family for opinions because there are many user reviews and discussions about the product in public forums on the Web. For an organization, it may no longer be necessary to conduct surveys, opinion polls, and focus groups in order to gather public opinions because there is an abundance of such information publicly available. In recent years, we have witnessed that opinionated postings in social media have helped reshape businesses, and sway public sentiments and emotions, which have profoundly impacted on our social and political systems. Such postings have also mobilized masses for political changes such as those happened in some Arab countries in 2011. It has thus become a necessity to collect and study opinions 1 . However, finding and monitoring opinion sites on the Web and distilling the information contained in them remains a formidable task because of the proliferation of diverse sites. Each site typically contains a huge volume of opinion text that is not always easily deciphered in long blogs and forum postings. The average human reader will have difficulty identifying relevant sites and extracting and summarizing the opinions in them. Automated sentiment analysis systems are thus needed. Because of this, there are many start-ups focusing on providing sentiment analysis services. Many big corporations have also built their own in-house capabilities. These practical applications and industrial interests have provided strong motivations for research in sentiment analysis.

Upload: others

Post on 07-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

DeepLearningforSentimentAnalysis:ASurvey

LeiZhang,LinkedInCorporation,[email protected],UniversityofIllinoisatChicago,[email protected]

BingLiu,UniversityofIllinoisatChicago,[email protected]

Abstract

Deeplearninghasemergedasapowerfulmachinelearningtechniquethatlearnsmultiplelayersofrepresentationsorfeaturesofthedataandproducesstate-of-the-artpredictionresults.Alongwiththesuccessofdeeplearninginmanyotherapplicationdomains,deeplearningisalsopopularlyusedinsentimentanalysis in recentyears.Thispaper firstgivesanoverviewofdeep learningandthenprovidesacomprehensivesurveyofitscurrentapplicationsinsentimentanalysis.

INTRODUCTION

Sentimentanalysisoropinionmining is thecomputational studyofpeople’sopinions, sentiments,emotions, appraisals, and attitudes towards entities such as products, services, organizations,individuals, issues,events,topics,andtheirattributes.1TheinceptionandrapidgrowthofthefieldcoincidewiththoseofthesocialmediaontheWeb,forexample,reviews,forumdiscussions,blogs,micro-blogs, Twitter, and social networks, because for the first time in humanhistory,we have ahugevolumeofopinionateddatarecordedindigitalforms.Sinceearly2000,sentimentanalysishasgrown tobeoneof themostactive researchareas innatural languageprocessing (NLP). It is alsowidely studied in datamining,Webmining, textmining, and information retrieval. In fact, it hasspread from computer science to management sciences and social sciences such as marketing,finance,politicalscience,communications,healthscience,andevenhistory,duetoitsimportancetobusiness and society as awhole. This proliferation is due to the fact that opinions are central toalmostallhumanactivitiesandarekeyinfluencersofourbehaviours.Ourbeliefsandperceptionsofreality,andthechoiceswemake,are,toaconsiderabledegree,conditioneduponhowothersseeandevaluatetheworld.For thisreason,wheneverweneedtomakeadecisionweoftenseekouttheopinionsofothers.Thisisnotonlytrueforindividualsbutalsotruefororganizations.

Nowadays,ifonewantstobuyaconsumerproduct,oneisnolongerlimitedtoaskingone’sfriendsandfamilyforopinionsbecausetherearemanyuserreviewsanddiscussionsabouttheproduct inpublic forumsontheWeb.Foranorganization, itmayno longerbenecessarytoconductsurveys,opinionpolls,andfocusgroupsinordertogatherpublicopinionsbecausethereisanabundanceofsuchinformationpubliclyavailable.Inrecentyears,wehavewitnessedthatopinionatedpostingsinsocialmediahavehelpedreshapebusinesses,andswaypublicsentimentsandemotions,whichhaveprofoundlyimpactedonoursocialandpoliticalsystems.SuchpostingshavealsomobilizedmassesforpoliticalchangessuchasthosehappenedinsomeArabcountriesin2011.Ithasthusbecomeanecessitytocollectandstudyopinions1.

However,findingandmonitoringopinionsitesontheWebanddistillingtheinformationcontainedin themremainsa formidable taskbecauseof theproliferationofdiverse sites.Each site typicallycontainsahugevolumeofopiniontextthatisnotalwayseasilydecipheredinlongblogsandforumpostings.Theaveragehumanreaderwillhavedifficultyidentifyingrelevantsitesandextractingandsummarizingtheopinionsinthem.Automatedsentimentanalysissystemsarethusneeded.Becauseof this, there are many start-ups focusing on providing sentiment analysis services. Many bigcorporations have also built their own in-house capabilities. These practical applications andindustrialinterestshaveprovidedstrongmotivationsforresearchinsentimentanalysis.

Page 2: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Existingresearchhasproducednumeroustechniquesforvarioustasksofsentimentanalysis,whichincludebothsupervisedandunsupervisedmethods.Inthesupervisedsetting,earlypapersusedalltypesofsupervisedmachinelearningmethods(suchasSupportVectorMachines(SVM),MaximumEntropy, Naïve Bayes, etc.) and feature combinations. Unsupervised methods include variousmethods that exploit sentiment lexicons, grammatical analysis, and syntactic patterns. Severalsurveybooksandpapershavebeenpublished,which cover thoseearlymethodsandapplicationsextensively.1,2,3

Sinceaboutadecadeago,deep learninghasemergedas apowerfulmachine learning technique4and produced state-of-the-art results inmany application domains, ranging from computer visionandspeechrecognitiontoNLP.Applyingdeeplearningtosentimentanalysishasalsobecomeverypopular recently. This paper first gives an overview of deep learning and then provides acomprehensivesurveyofthesentimentanalysisresearchbasedondeeplearning.

NEURALNETWORKS

Deeplearningistheapplicationofartificialneuralnetworks(neuralnetworksforshort)tolearningtasksusingnetworksofmultiplelayers.Itcanexploitmuchmorelearning(representation)powerofneuralnetworks,whichonceweredeemedtobepracticalonlywithoneortwolayersandasmallamountofdata.

Inspired by the structure of the biological brain, neural networks consist of a large number ofinformationprocessingunits(calledneurons)organizedinlayers,whichworkinunison.Itcanlearnto perform tasks (e.g., classification) by adjusting the connection weights between neurons,resemblingthelearningprocessofabiologicalbrain.

Figure1:Feedforwardneuralnetwork

Basedonnetworktopologies,neuralnetworkscangenerallybecategorizedintofeedforwardneuralnetworksandrecurrent/recursiveneuralnetworks,whichcanalsobemixedandmatched.Wewilldescribe recurrent/recursive neural networks later. A simple example of a feedforward neuralnetworkisgiveninFigure1,whichconsistsofthreelayers𝐿!,𝐿!and𝐿!.𝐿!istheinputlayer,whichcorresponds to the input vector(𝑥!, 𝑥!, 𝑥!)and intercept term +1.𝐿!is the output layer, whichcorresponds to the output vector(𝑠!).𝐿!is the hidden layer, whose output is not visible as anetworkoutput.A circle in𝐿!representsanelement in the input vector,while a circle in𝐿!or𝐿!represents a neuron, the basic computation element of a neural network. We also call it anactivationfunction.Alinebetweentwoneuronsrepresentsaconnectionfortheflowofinformation.Each connection is associatedwith aweight, a value controlling the signal between twoneurons.The learning of a neural network is achieved by adjusting theweights between neuronswith the

Page 3: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

informationflowingthroughthem.Neuronsreadoutputfromneuronsinthepreviouslayer,processtheinformation,andthengenerateoutputtoneuronsinthenextlayer.AsinFigure1,theneutralnetwork alters weights based on training examples(𝑥(!), 𝑦(!)). After the training process, it willobtainacomplexformofhypothesesℎ!,!(𝑥)thatfitsthedata.

Divingintothehiddenlayer,wecanseethateachneuronin𝐿!takesinput𝑥!,𝑥!,𝑥!andintercept+1 from𝐿!, and outputs a value𝑓(𝑊!𝑥) = 𝑓( 𝑊!𝑥! + 𝑏)!

!!! by the activation function𝑓.𝑊! areweightsoftheconnections;𝑏istheinterceptorbias;𝑓isnormallynon-linear.Thecommonchoicesof𝑓are sigmoid function, hyperbolic tangent function (tanh), or rectified linear function (ReLU).Theirequationsareasfollows.

𝑓 𝑊!𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑊!𝑥 = !!!!"# !!!!

(1)

𝑓 𝑊!𝑥 = tanh 𝑊!𝑥 = !!!!!!!!!!

!!!!!!!!!! (2)

𝑓 𝑊!𝑥 = 𝑅𝑒𝐿𝑈 𝑊!𝑥 = max 0,𝑊!𝑥 (3)

Thesigmoidfunctiontakesareal-valuednumberandsquashesittoavalueintherangebetween0and1.Thefunctionhasbeeninfrequentusehistoricallyduetoitsniceinterpretationasthefiringrate of a neuron: 0 for not firing or 1 for firing. But the non-linearity of the sigmoid has recentlyfallenoutoffavourbecauseitsactivationscaneasilysaturateateithertailof0or1,wheregradientsarealmostzeroandtheinformationflowwouldbecut.Whatismoreisthatitsoutputisnotzero-centered,which could introduceundesirable zig-zaggingdynamics in the gradient updates for theconnectionweights in training. Thus, the tanh function is oftenmore preferred in practice as itsoutputrangeiszero-centered,[-1,1]insteadof[0,1].TheReLUfunctionhasalsobecomepopularlately.Itsactivationissimplythresholdedatzerowhentheinputislessthan0.Comparedwiththesigmoid functionand the tanh function,ReLU iseasy tocompute, fast toconverge in trainingandyieldsequalorbetterperformanceinneuralnetworks.5

In𝐿!,wecanusethesoftmaxfunctionastheoutputneuron,whichisageneralizationofthelogisticfunction that squashesaK-dimensional vector𝑋of arbitrary real values toaK-dimensional vector𝜎(𝑋)ofrealvaluesintherange(0,1)thataddupto1.Thefunctiondefinitionisasfollows.

𝜎 𝑋 ! =!!!

!!!!!!!

𝑓𝑜𝑟 𝑗 = 1,… , 𝑘(4)

Generally,softmaxisusedinthefinallayerofneuralnetworksforfinalclassificationinfeedforwardneuralnetworks.

By connecting together all neurons, the neural network in Figure 1 has parameters(𝑊, 𝑏) = (𝑊 ! , 𝑏 ! ,𝑊 ! , 𝑏(!)), where𝑊!"

(!)denotes the weight associated with the connection between

neuron𝑗inlayer𝑙,andneuron𝑖 inlayer𝑙 + 1.𝑏!(!)isthebiasassociatedwithneuron𝑖 inlayer 𝑙 + 1.

Totrainaneuralnetwork,stochasticgradientdescentviabackpropagation6 isusuallyemployedtominimize thecross-entropy loss,which isa loss function forsoftmaxoutput.Gradientsof the lossfunctionwithrespecttoweightsfromthelasthiddenlayertotheoutputlayerarefirstcalculated,and then gradients of the expressionswith respect toweights betweenupper network layers arecalculated recursivelybyapplying thechain rule inabackwardmanner.With thosegradients, theweightsbetween layersareadjustedaccordingly. It isan iterativerefinementprocessuntil certainstoppingcriteriaaremet.ThepseudocodefortrainingtheneuralnetworkinFigure1isasfollows.

Page 4: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Trainingalgorithm:stochasticgradientdescentviabackpropagation

Initializeweights𝑊andbiases𝑏oftheneuralnetwork𝑁withrandomvaluesdoforeachtrainingexample(𝑥! , 𝑦!)𝑝! =neural-network-prediction(𝑁,𝑥!)calculategradientsoflossfunction( 𝑝! , 𝑦! )withrespectto𝑤!atlayer𝐿! get∆𝑤!forallweightsfromhiddenlayer𝐿! tooutputlayer𝐿!calculategradientwithrespectto𝑤!bychainruleatlayer𝐿!

get∆𝑤!forallweightsfrominputlayer𝐿! tohiddenlayer𝐿!update( 𝑤!, 𝑤! )untilalltrainingexamplesareclassifiedcorrectlyorotherstoppingcriteriaaremetreturnthetrainedneuralnetwork

Table1:TrainingtheneuralnetworkinFigure1.

Theabovealgorithmcanbeextendedtogenericfeedforwardneuralnetworktrainingwithmultiplehidden layers. Note that stochastic gradient descent estimates the parameters for every trainingexampleasopposedtothewholesetoftrainingexamplesinbatchgradientdescent.Therefore,theparameter updates have a high variance and cause the loss function to fluctuate to differentintensities,whichhelpsdiscovernewandpossiblybetterlocalminima.

DEEPLEARNING

The researchcommunity lost interests inneuralnetworks in late1990smainlybecause theywereregardedasonlypracticalfor“shallow”neuralnetworks(neuralnetworkswithoneortwolayers)astraining a “deep” neural network (neural networks with more layers) is complicated andcomputationallyveryexpensive.However, in thepast10years,deep learningmadebreakthroughandproducedstate-of-the-art results inmanyapplicationdomains, starting fromcomputervision,then speech recognition, and more recently, NLP.7,8 The renaissance of neural networks can beattributed tomany factors.Most important ones include: (1) the availability of computing powerduetotheadvancesinhardware(e.g.,GPUs),(2)theavailabilityofhugeamountsoftrainingdata,and(3)thepowerandflexibilityoflearningintermediaterepresentations.9

In a nutshell, deep learning uses a cascade of multiple layers of nonlinear processing units forfeatureextractionandtransformation.Thelowerlayersclosetothedatainputlearnsimplefeatures,whilehigherlayerslearnmorecomplexfeaturesderivedfromlowerlayerfeatures.Thearchitectureformsahierarchicalandpowerfulfeaturerepresentation.Figure2showsthefeaturehierarchyfromthe left (a lower layer) to the right (a higher layer) learned by deep learning in face imageclassification.10 We can see that the learned image features grow in complexity, starting fromblobs/edges,thennoses/eyes/cheeks,tofaces.

Figure2:Featurehierarchybydeeplearning

Page 5: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

In recentyears,deep learningmodelshavebeenextensivelyapplied in the fieldofNLPand showgreat potentials. In the following several sections, we briefly describe the main deep learningarchitecturesandrelatedtechniquesthathavebeenappliedtoNLPtasks.

WORDEMBEDDING

Many deep learning models in NLP need word embedding results as input features.7 Wordembeddingisatechniqueforlanguagemodellingandfeaturelearning,whichtransformswordsinavocabulary to vectors of continuous real numbers (e.g.,𝑤𝑜𝑟𝑑 "ℎ𝑎𝑡" → (… , 0.15,… , 0.23,… , 0.41,… ) ). The technique normally involves a mathematicembedding from a high-dimensional sparse vector space (e.g., one-hot encoding vector space, inwhicheachwordtakesadimension)toalower-dimensionaldensevectorspace.Eachdimensionofthe embedding vector represents a latent feature of a word. The vectors may encode linguisticregularitiesandpatterns.

Thelearningofwordembeddingscanbedoneusingneuralnetworks11-15ormatrixfactorization.16,17OnecommonlyusedwordembeddingsystemisWord2Veci,whichisessentiallyacomputationally-efficient neural network prediction model that learns word embeddings from text. It containsContinuousBag-of-Wordsmodel(CBOW)13,andSkip-Grammodel(SG)14.TheCBOWmodelpredictsthetargetword(e.g.,“wearing”)fromitscontextwords(“theboyis_ahat”,where“_”denotesthetargetword),while the SGmodeldoes the inverse, predicting the contextwords given the targetword. Statistically, theCBOWmodel smoothensover a greatdeal of distributional informationbytreatingtheentirecontextasoneobservation.It iseffectiveforsmallerdatasets.However,theSGmodel treats each context-target pair as a new observation and is better for larger datasets.AnotherfrequentlyusedlearningapproachisGlobalVectorii(GloVe)17,whichistrainedonthenon-zeroentriesofaglobalword-wordco-occurrencematrix.

AUTOENCODERANDDENOISINGAUTOENCODER

Autoencoder Neural Network is a three-layer neural network,which sets the target values to beequaltotheinputvalues.Figure3showsanexampleofanautoencoderarchitecture.

Figure3:Autoencoderneuralnetwork

iSourcecode:https://code.google.com/archive/p/word2vec/iiSourcecode:https://github.com/stanfordnlp/GloVe

Page 6: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Given the input vector𝑥 ∈ [0,1]! , the autoencoder first maps it to a hidden representation𝑦 ∈ [0,1]!! by an encoder functionℎ(∙)(e.g., the sigmoid function). The latent representation𝑦isthen mapped back by a decoder function 𝑔(∙) into a reconstruction 𝑟 𝑥 = 𝑔(ℎ 𝑥 ) . Theautoencoder is typically trained to minimize a form of reconstruction error𝑙𝑜𝑠𝑠(𝑥, 𝑟 𝑥 ). Theobjectiveoftheautoencoderistolearnarepresentationoftheinput,whichistheactivationofthehidden layer. Due to the nonlinear functionℎ(∙)and𝑔(∙), the autoencoder is able to learn non-linearrepresentations,whichgiveitmuchmoreexpressivepowerthanitslinearcounterparts,suchasPrincipalComponentAnalysis(PCA)orLatentSemanticAnalysis(LSA).

Oneoftenstacksautoencodersintolayers.Ahigherlevelautoencoderusestheoutputofthelowerone as its training data. The stacked autoencoders18 along with Restricted Boltzmann Machines(RBMs)19 areearliest approaches tobuildingdeepneural networks.Oncea stackof autoencodershas been trained in an unsupervised fashion, their parameters describing multiple levels ofrepresentations for𝑥(intermediate representations) can be used to initialize a supervised deepneuralnetwork,whichhasbeenshownempiricallybetterthanrandomparameterinitialization.

TheDenoisingAutoencoder (DAE)20 isanextensionofautoencoder, inwhich the inputvector𝑥isstochasticallycorruptedintoavector𝑥.Andthemodelistrainedtodenoiseit,thatis,tominimizeadenoising reconstruction error𝑙𝑜𝑠𝑠(𝑥, 𝑟 𝑥 ). The idea behindDAE is to force the hidden layer todiscover more robust features and prevent it from simply learning the identity. A robust modelshouldbeabletoreconstructtheinputwelleveninthepresenceofnoises.Forexample,deletingoraddingafewofwordsfromortoadocumentshouldnotchangethesemanticofthedocument.

CONVOLUTIONALNEURALNETWORK

Convolutional Neural Network (CNN) is a special type of feedforward neural network originallyemployedinthefieldofcomputervision.Itsdesignisinspiredbythehumanvisualcortex,avisualmechanisminanimalbrain.Thevisualcortexcontainsalotofcellsthatareresponsiblefordetectinglightinsmallandoverlappingsub-regionsofthevisualfields,whicharecalledreceptivefields.Thesecellsactas localfiltersovertheinputspace.CNNconsistsofmultipleconvolutional layers,eachofwhichperformsthefunctionthatisprocessedbythecellsinthevisualcortex.

Figure 4 shows a CNN for recognizing traffic signs.21 The input is a 32x32x1 pixel image (32 x 32representsimagewidthxheight;1representsinputchannel).Inthisfirststage,thefilter(size5x5x1)isused toscan the image.Each region in the input image that the filterprojectson isa receptivefield.Thefilterisactuallyanarrayofnumbers(calledweightsorparameters).Asthefilterissliding(orconvolving),itismultiplyingitsweightvalueswiththeoriginalpixelvaluesoftheimage(elementwise multiplications). The multiplications are all summed up to a single number, which is arepresentative of the receptive field. Every receptive field produces a number. After the filterfinishesscanningovertheimage,wecangetanarray(size28x28x1),whichiscalledtheactivationmaporfeaturemap.InCNN,weneedtousedifferentfilterstoscantheinput.InFigure4,weapply108kindsoffiltersandthushave108stackedfeaturemapsinthefirststage,whichconsistsofthefirst convolutional layer. Following the convolutional layer, a subsampling (or pooling) layer isusually used to progressively reduce the spatial size of the representation, thus to reduce thenumber of features and the computational complexity of the network. For example, aftersubsamplinginthefirststage,theconvolutional layerreducesitsdimensionsto(14x14x108).Notethatwhilethedimensionalityofeachfeaturemapisreduced,thesubsamplingstepretainsthemostimportant information, with a commonly used subsampling operation being the max pooling.Afterwards,theoutputfromthefirststagebecomes inputtothesecondstageandthenewfiltersareemployed. Thenew filter size is 5x5x108,where108 is the featuremap sizeof the last layer.After the second stage, CNN uses a fully connected layer and then a softmax readout layerwithoutputclassesforclassification.

Page 7: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

ConvolutionallayersinCNNplaytheroleoffeatureextractor,whichextractslocalfeaturesastheyrestrictthereceptivefieldsofthehiddenlayerstobelocal.ItmeansthatCNNhasaspecialspatially-localcorrelationbyenforcingalocalconnectivitypatternbetweenneuronsofadjacentlayers.Sucha characteristic is useful for classification in NLP, in which we expect to find strong local cluesregardingclassmembership,butthesecluescanappearindifferentplacesintheinput.Forexample,in a document classification task, a single key phrase (or an n-gram) can help in determining thetopicofthedocument.Wewouldliketolearnthatcertainsequencesofwordsaregoodindicatorsof the topic, and do not necessarily carewhere they appear in the document. Convolutional andpoolinglayersallowtheCNNtolearntofindsuchlocalindicators,regardlessoftheirpositions.8

Figure4:Convolutionalneuralnetwork

RECURRENTNEURALNETWORK

Recurrent Neural Network (RNN)22 is a class of neural networks whose connections betweenneurons form a directed cycle. Unlike feedforward neural networks, RNN can use its internal“memory” to process a sequence of inputs, which makes it popular for processing sequentialinformation. The “memory” means that RNN performs the same task for every element of asequence with each output being dependent on all previous computations, which is like“remembering”informationaboutwhathasbeenprocessedsofar.

Figure5:Recurrentneuralnetwork

Figure5showsanexampleofaRNN.The leftgraph isanunfoldednetworkwithcycles,whiletheright graph is a folded sequence network with three time steps. The length of time steps isdeterminedbythelengthofinput.Forexample,ifthewordsequencetobeprocessedisasentenceofsixwords, theRNNwouldbeunfolded intoaneuralnetworkwithsix timestepsor layers.Onelayercorrespondstoaword.

Page 8: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

In Figure 5,𝑥! is the input vector at time step𝑡.ℎ! is the hidden state at time step𝑡, which iscalculatedbasedontheprevioushiddenstateandtheinputatthecurrenttimestep.

ℎ! = 𝑓 𝑤!!ℎ!!! + 𝑤!!𝑥! (5)

InEquation(5),theactivationfunction𝑓isusuallythetanhfunctionortheReLUfunction.𝑤!!istheweight matrix used to condition the input𝑥! .𝑤!! is the weight matrix used to condition theprevioushiddenstateℎ!!!.

𝑦!is theoutputprobability distributionover the vocabulary at step t. For example, ifwewant topredictthenextwordinasentence,itwouldbeavectorofprobabilitiesacrossthewordvocabulary.

𝑦! = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑤!!ℎ! (6)

Thehiddenstateℎ!isregardedasthememoryofthenetwork.Itcapturesinformationaboutwhathappenedinallprevioustimesteps.𝑦!iscalculatedsolelybasedonthememoryℎ!attime𝑡andthecorrespondingweightmatrix𝑤!!.

Unlikeafeedforwardneuralnetwork,whichusesdifferentparametersateachlayer,RNNsharesthesameparameters(𝑊!! ,𝑊!! ,𝑊!!)acrossall steps.Thismeans that itperforms the same taskateachstep,justwithdifferentinputs.Thisgreatlyreducesthetotalnumberofparametersneededtolearn.

Theoretically,RNNcanmakeuseoftheinformationinarbitrarilylongsequences,butinpractice,thestandardRNNislimitedtolookingbackonlyafewstepsduetothevanishinggradientorexplodinggradientproblem.23

Figure6:BidirectionalRNN(left)anddeepbidirectionalRNN(right)

ResearchershavedevelopedmoresophisticatedtypesofRNNtodealwiththeshortcomingsofthestandard RNNmodel:Bidirectional RNN,Deep Bidirectional RNN and Long Short TermMemorynetwork.BidirectionalRNN isbasedontheideathattheoutputateachtimemaynotonlydependonthepreviouselementsinthesequence,butalsodependonthenextelementsinthesequence.Forinstance,topredictamissingwordinasequence,wemayneedtolookatboththeleftandtherightcontext.AbidirectionalRNN24consistsoftwoRNNs,whicharestackedonthetopofeachother.Theonethatprocessestheinputinitsoriginalorderandtheonethatprocessesthereversedinputsequence.TheoutputisthencomputedbasedonthehiddenstateofbothRNNs.DeepbidirectionalRNN issimilartobidirectionalRNN.Theonlydifferenceisthat ithasmultiplelayerspertimestep,

Page 9: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

whichprovideshigherlearningcapacitybutneedsalotoftrainingdata.Figure6showsexamplesofbidirectionalRNNanddeepbidirectionalRNN(withtwolayers)respectively.

LSTMNETWORK

Long Short TermMemorynetwork (LSTM)25 isaspecial typeofRNN,which iscapableof learninglong-termdependencies.

AllRNNshavethe formofachainof repeatingmodules. InstandardRNNs, this repeatingmodulenormally has a simple structure. However, the repeating module for LSTM is more complicated.Insteadofhaving a singleneural network layer, there are four layers interacting in a specialway.Besides,ithastwostates:hiddenstateandcellstate.

Figure7:LongShortTermMemorynetwork

Figure7showsanexampleofLSTM.At timestep𝑡, LSTMfirstdecideswhat information todumpfromthecellstate.Thisdecisionismadebyasigmoidfunction/layer𝜎,calledthe“forgetgate”.Thefunction takesℎ!!!(output from the previous hidden layer) and𝑥!(current input), and outputs anumberin[0,1],where1means“completelykeep”and0means“completelydump”inEquation(7).

𝑓! = 𝜎 𝑊!𝑥! + 𝑈!ℎ!!! (7)

Then LSTM decides what new information to store in the cell state. This has two steps. First, asigmoid function/layer, called the “input gate” as Equation (8), decides which values LSTM willupdate. Next, a tanh function/layer creates a vector of new candidate values𝐶!, which will beaddedtothecellstate.LSTMcombinesthesetwotocreateanupdatetothestate.

𝑖! = 𝜎 𝑊 !𝑥! + 𝑈!ℎ!!! (8)

𝐶! = tanh 𝑊!𝑥! + 𝑈!ℎ!!! (9)

It is now time to update the old cell state𝐶!!!into new cell state𝐶!as Equation (10). Note thatforgetgate𝑓!cancontrolthegradientpassesthroughitandallowforexplicit“memory”deletesandupdates,whichhelpsalleviatevanishinggradientorexplodinggradientprobleminstandardRNN.

𝐶! = 𝑓! ∗ 𝐶!!! + 𝑖! ∗ 𝐶! (10)

Page 10: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Finally,LSTMdecidestheoutput,whichisbasedonthecellstate. LSTMfirstrunsasigmoidlayer,whichdecideswhichpartsof thecellstatetooutput inEquation (11),called“output gate”.Then,LSTMputs the cell state through the tanh functionandmultiplies it by theoutputof the sigmoidgate,sothatLSTMonlyoutputsthepartsitdecidestoasEquation(12).

𝑜! = 𝜎 𝑊!𝑥! + 𝑈!ℎ!!! (11)

ℎ! = 𝑜! ∗ tanh 𝐶! (12)

LSTMiscommonlyappliedtosequentialdatabutcanalsobeusedfor tree-structureddata.Taietal.26 introduced a generalization of the standard LSTM to Tree-structured LSTM (Tree-LSTM) andshowedbetterperformancesforrepresentingsentencemeaningthanasequentialLSTM.

AslightvariationofLSTMistheGatedRecurrentUnit(GRU).27,28Itcombinesthe“forget”and“input”gatesintoasingleupdategate.Italsomergesthecellstateandhiddenstate,andmakessomeotherchanges. The resultingmodel is simpler than the standard LSTMmodel, and has been growing inpopularity.

ATTENTIONMECHANISMWITHRECURRENTNEURALNETWORK

Supposedly, bidirectional RNN and LSTM should be able to dealwith long-range dependencies indata.Butinpractice,thelong-rangedependenciesarestillproblematictohandle.Thus,atechniquecalledtheAttentionMechanismwasproposed.

Theattentionmechanisminneuralnetworksisinspiredbythevisualattentionmechanismfoundinhumans.That is, thehumanvisual attention is able to focusona certain regionof an imagewith“highresolution”whileperceivingthesurroundingimagein“lowresolution”andthenadjustingthefocalpointovertime.InNLP,theattentionmechanismallowsthemodeltolearnwhattoattendtobasedonthe inputtextandwhat ithasproducedsofar,ratherthanencodingthefullsourcetextintoafixed-lengthvectorlikestandardRNNandLSTM.

Figure8:Attentionmechanisminbidirectionalrecurrentneuralnetwork

Bahdanau et al.29 first utilized the attention mechanism for machine translation in NLP. Theyproposedanencoder-decoderframeworkwhereanattentionmechanismisusedtoselectreferencewords in the original language for words in the target language before translation. Figure 8illustrates theuseof theattentionmechanism in their bidirectionalRNN.Note that eachdecoder

Page 11: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

outputword𝑦!dependsonaweightedcombinationofalltheinputstates,notjustthelaststateasinthenormalcase.𝑎!,! areweightsthatdefineinhowmuchofeachinputstateshouldbeweightedfor each output. For example, if𝑎!,! has a big value, it means that the decoder pays a lot ofattentiontothesecondstateinthesourcesentencewhileproducingthesecondwordofthetargetsentence.Theweightsof𝑎!,! sumto1normally.

MEMORYNETWORK

Westonetal.30introducedtheconceptofMemoryNetworks(MemNN)forthequestionansweringproblem.Itworkswithseveralinferencecomponentscombinedwithalargelong-termmemory.Thecomponents can be neural networks. The memory acts as a dynamic knowledge base. The fourlearnable/inferencecomponentsfunctionasfollows:Icomponentcovertstheincominginputtotheinternal feature representation; G component updates old memories given the new input; Ocomponentgeneratesoutput(alsointhefeaturerepresentationspace);Rcomponentconvertstheoutput intoa response format. For instance, givena listof sentencesandaquestion forquestionanswering, MemNN finds evidences from those sentences and generates an answer. Duringinference,theIcomponentreadsonesentenceatatimeandencodesitintoavectorrepresentation.ThentheGcomponentupdatesapieceofmemorybasedonthecurrentsentencerepresentation.Afterallsentencesareprocessed,amemorymatrix(eachrowrepresentingasentence)isgenerated,which stores the semantics of the sentences. For a question, MemNN encodes it into a vectorrepresentation, then theOcomponentuses thevector to select some relatedevidences fromthememoryandgeneratesanoutputvector.Finally, theRcomponent takes theoutputvectoras theinputandoutputsafinalresponse.

BasedonMemNN,Sukhbaataretal.31proposedanEnd-to-EndMemoryNetwork(MemN2N),whichisaneuralnetworkarchitecturewitharecurrentattentionmechanismoverthelong-termmemorycomponent and it can be trained in an End-to-Endmanner through standard backpropagation. Itdemonstrates that multiple computational layers (hops) in the O component can uncover moreabstractive evidences than a single layer and yield improved results for question answering andlanguage modelling. It is worth noting that each computational layer can be a content-basedattention model. Thus, MemN2N refines the attention mechanism to some extent. Note also asimilarideaistheNeuralTuringMachinesreportedbyGravesetal.32

RECURSIVENEURALNETWORK

Recursive Neural Network (RecNN) is a type of neural network that is usually used to learn adirectedacyclicgraphstructure(e.g., treestructure) fromdata.Arecursiveneuralnetworkcanbeseenasageneralizationof the recurrentneuralnetwork.Given the structural representationofasentence (e.g., a parse tree), RecNN recursively generates parent representations in a bottom-upfashion,bycombiningtokenstoproducerepresentationsforphrases,eventuallythewholesentence.The sentence level representation can thenbeused tomake a final classification (e.g., sentimentclassification) for a given input sentence. An example process of vector composition in RecNN isshowninFigure933.Thevectorofnode“veryinteresting”iscomposedfromthevectorsofthenode“very” and the node “interesting”. Similarly, the node “is very interesting” is composed from thephrasenode“veryinteresting”andthewordnode“is”.

Page 12: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Figure9:RecursiveNeuralnetwork

SENTIMENTANALYSISTASKS

Wearenowreadytosurveydeeplearningapplicationsinsentimentanalysis.Butbeforedoingthat,we first briefly introduce themain sentiment analysis tasks in this section. For additional details,pleaserefertoLiu’sbook1onsentimentanalysis.

Researchershavemainly studiedsentimentanalysisat three levelsofgranularity:document level,sentence level,andaspect level.Document level sentiment classificationclassifiesanopinionateddocument(e.g.,aproductreview)asexpressinganoverallpositiveornegativeopinion.Itconsidersthewholedocumentasthebasic informationunitandassumesthatthedocumentisknowntobeopinionated and contain opinions about a single entity (e.g., a particular phone). Sentence levelsentiment classification classifies individual sentences in a document. However, each sentencecannot be assumed to be opinionated. Traditionally, one often first classifies a sentence asopinionated or not opinionated, which is called subjectivity classification. Then the resultingopinionated sentences are classified as expressing positive or negative opinions. Sentence levelsentiment classification can also be formulated as a three-class classification problem, that is, toclassify a sentence as neutral, positive or negative. Comparedwith document level and sentencelevelsentimentanalysis,aspectlevelsentimentanalysisoraspect-basedsentimentanalysisismorefine-grained. Its task is to extract and summarize people’s opinions expressed on entities andaspects/featuresofentities,whicharealsocalledtargets.Forexample,inaproductreview,itaimsto summarize positive and negative opinions on different aspects of the product respectively,although the general sentiment on the product could be positive or negative. The whole task ofaspect-based sentiment analysis consists of several subtasks such as aspect extraction, entityextraction,andaspectsentimentclassification.Forexample,fromthesentence,“thevoicequalityofiPhoneisgreat,butitsbatterysucks”,entityextractionshouldidentify“iPhone”astheentity,andaspect extraction should identify that “voice quality” and “battery” are two aspects. AspectsentimentclassificationshouldclassifythesentimentexpressedonthevoicequalityoftheiPhoneaspositiveandon thebatteryof the iPhoneasnegative.Note that for simplicity, inmostalgorithmsaspect extraction and entity extraction are combined and are called aspect extraction orsentiment/opiniontargetextraction.

Apart fromthesecore tasks, sentimentanalysisalsostudiesemotion analysis,sarcasmdetection,multilingualsentimentanalysis,etc.SeeLiu’sbook1formoredetails. Inthefollowingsections,wesurveythedeeplearningapplicationsinallthesesentimentanalysistasks.

Page 13: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

DOCUMENTLEVELSENTIMENTCLASSIFICATION

Sentimentclassificationatthedocumentlevelistoassignanoverallsentimentorientation/polaritytoanopiniondocument,i.e.,todeterminewhetherthedocument(e.g.,afullonlinereview)conveysanoverallpositiveornegativeopinion.Inthissetting,itisabinaryclassificationtask.Itcanalsobeformulatedasaregressiontask,forexample,toinferanoverallratingscorefrom1to5starsforthereview.Someresearchersalsotreatthisasa5-classclassificationtask.

Sentimentclassificationiscommonlyregardedasaspecialcaseofdocumentclassification.Insuchaclassification, document representation plays an important role, which should reflect the originalinformationconveyedbywordsorsentencesinadocument.Traditionally,thebag-of-wordsmodel(BoW) is used to generate text representations in NLP and text mining, by which a document isregarded as a bag of itswords. Based on BoW, a document is transformed to a numeric featurevectorwithafixedlength,eachelementofwhichcanbethewordoccurrence(absenceorpresence),word frequency, or TF-IDF score. Its dimension equals to the size of the vocabulary. A documentvector fromBoWisnormallyverysparsesinceasingledocumentonlycontainsasmallnumberofwordsinavocabulary.Earlyneuralnetworksadoptedsuchfeaturesettings.

Despiteitspopularity,BoWhassomedisadvantages.Firstly,thewordorderisignored,whichmeansthattwodocumentscanhaveexactlythesamerepresentationaslongastheysharethesamewords.Bag-of-N-Grams,anextensionforBoW,canconsiderthewordorderinashortcontext(n-gram),butit also suffers from data sparsity and high dimensionality. Secondly, BoW can barely encode thesemantics of words. For example, the words “smart”, “clever” and “book” are of equal distancebetweentheminBoW,but“smart”shouldbecloserto“clever”than“book”semantically.

To tackle the shortcomings of BoW, word embedding techniques based on neural networks(introduced in the aforementioned section) were proposed to generate dense vectors (or low-dimensional vectors) for word representation, which are, to some extent, able to encode somesemantic and syntactic properties ofwords.Withword embeddings as input ofwords, documentrepresentation as a dense vector (or called dense document vector) can be derived using neuralnetworks.

Noticethatinadditiontotheabovetwoapproaches,i.e.,usingBoWandlearningdensevectorsfordocuments throughwordembeddings, one can also learn a densedocument vector directly fromBoW.WedistinguishthedifferentapproachesusedinrelatedstudiesinTable2.

When documents are properly represented, sentiment classification can be conducted using avarietyofneuralnetworkmodelsfollowingthetraditionalsupervisedlearningsetting.Insomecases,neuralnetworksmayonlybeusedtoextracttextfeatures/textrepresentations,andthesefeaturesarefedintosomeothernon-neuralclassifiers(e.g.,SVM)toobtainafinalglobaloptimumclassifier.The properties of neural networks and SVM complement each other in such a way that theiradvantagesarecombined.

Besidessophisticateddocument/textrepresentations,researchersalsoleveragedthecharacteristicsofthedata–productreviews,forsentimentclassification.Forproductreviews,severalresearchersfound it beneficial to jointly model sentiment and some additional information (e.g., userinformation and product information) for classification. Additionally, since a document oftencontains longdependencyrelations, theattentionmechanism isalso frequentlyused indocumentlevelsentimentclassification.WesummarizetheexistingtechniquesinTable2.

Page 14: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

ResearchWork

Document/TextRepresentation

Neural NetworksModel

UseAttentionMechanism

Joint ModellingwithSentiment

Moraesetal.34 BoW ANN (Artificial NeuralNetwork)

No -

Le andMikolov35

Learningdensevectoratsentence, paragraph,documentlevel

ParagraphVector No -

Glorotetal.36 BoWtodensedocumentvector

SDA (Stacked DenoisingAutoencoder)

No Unsupervised datarepresentation fromtarget domains (intransfer learningsettings)

Zhai andZhang37

BoWtodensedocumentvector

DAE (DenoisingAutoencoder)

No -

Johnson andZhang38

BoWtodensedocumentvector

BoW-CNNandSeq-CNN No -

Tangetal.39 Word embeddings todensedocumentvector

CNN/LSTM (to learnsentence representation) +GRU (to learn documentrepresentation)

No -

Tangetal.40 Word embeddings todensedocumentvector

UPNN(UserProductNeutralNetwork)basedonCNN No Userinformationand

productinformation

Chenetal.41 Word embeddings todensedocumentvector

UPA (User ProductAttention)basedonLSTM Yes Userinformationand

productInformation

Dou42 Word embeddings todensedocumentvector

MemoryNetwork Yes UserinformationandproductInformation

Xuetal.43 Word embeddings todensedocumentvector

LSTM No -

Yangetal.44 Word embeddings todensedocumentvector

GRU-based sequenceencoder

Hierarchicalattention

-

Yinetal.45 Word embeddings todensedocumentvector

InputencoderandLSTM Hierarchicalattention

Aspect/targetinformation

Zhouetal.46 Word embeddings todensedocumentvector

LSTM Hierarchicalattention

Cross-lingualinformation

Lietal.47 Word embeddings todensedocumentvector

MemoryNetwork Yes Cross-domaininformation

Table2:Deeplearningmethodsfordocumentlevelsentimentclassification

Below,wealsogiveabriefdescriptionoftheseexistingrepresentativeworks.

Moraes et al.34 made an empirical comparison between Support Vector Machines (SVM) andArtificialNeuralNetworks (ANN) for document level sentiment classification,whichdemonstratedthatANNproducedcompetitiveresultstoSVM’sinmostcases.

ToovercometheweaknessofBoW,LeandMikolov35proposedParagraphVector,anunsupervisedlearning algorithm that learns vector representations for variable-length texts such as sentences,paragraphsanddocuments. The vector representations are learnedbypredicting the surroundingwordsincontextssampledfromtheparagraph.

Page 15: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Glorot et al.36 studied domain adaptation problem for sentiment classification. They proposed adeeplearningsystembasedonStackedDenoisingAutoencoderwithsparserectifierunits,whichcanperformanunsupervised text feature/representationextractionusingboth labeled andunlabeleddata.Thefeaturesarehighlybeneficialfordomainadaptionofsentimentclassifiers.

ZhaiandZhang37introducedasemi-supervisedautoencoder,whichfurtherconsidersthesentimentinformation in its learning stage in order to obtain better document vectors, for sentimentclassification.Morespecifically, themodel learnsa task-specific representationof the textualdataby relaxing the loss function in the autoencoder to the Bregman Divergence and also deriving adiscriminativelossfunctionfromthelabelinformation.

Johnson and Zhang38 proposed a CNN variant named BoW-CNN that employs bag-of-wordconversionintheconvolutionlayer.Theyalsodesignedanewmodel,calledSeq-CNN,whichkeepsthesequentialinformationofwordsbyconcatenatingtheone-hotvectorofmultiplewords.

Tangetal.39proposedaneuralnetworktolearndocumentrepresentation,withtheconsiderationofsentence relationships. It first learns the sentence representation with CNN or LSTM from wordembeddings.ThenaGRUisutilizedtoadaptivelyencodesemanticsofsentencesandtheirinherentrelationsindocumentrepresentationsforsentimentclassification.

Tangetal.40applieduserrepresentationsandproductrepresentations inreviewclassification.Theideaisthatthoserepresentationscancaptureimportantglobalcluessuchasindividualpreferencesofusersandoverallqualitiesofproducts,whichcanprovidebettertextrepresentations.

Chenet al.41 also incorporateduser information andproduct information for classificationbut viawordandsentencelevelattentions,whichcantakeintoaccountoftheglobaluserpreferenceandproductcharacteristicsatboththeword levelandthesemantic level.Likewise,Dou42usedadeepmemorynetworktocaptureuserandproductinformation.Theproposedmodelcanbedividedintotwo separate parts. In the first part, LSTM is applied to learn a document representation. In thesecondpart,adeepmemorynetworkconsistingofmultiplecomputational layers(hops) isusedtopredictthereviewratingforeachdocument.

Xuetal.43proposedacachedLSTMmodeltocapturetheoverallsemanticinformationinalongtext.Thememoryinthemodelisdividedintoseveralgroupswithdifferentforgettingrates.Theintuitionis toenable thememorygroupswith lowforgettingrates tocaptureglobalsemantic featuresandtheoneswithhighforgettingratestolearnlocalsemanticfeatures.

Yang et al.44 proposed a hierarchical attention network for document level sentiment ratingpredictionofreviews.Themodelincludestwolevelsofattentionmechanisms:oneatthewordleveland the other at the sentence level, which allow the model to pay more or less attention toindividualwordsorsentencesinconstructingtherepresentationofadocument.

Yin et al.45 formulated the document-level aspect-sentiment rating prediction task as a machinecomprehensionproblemandproposedahierarchicalinteractiveattention-basedmodel.Specifically,documents and pseudo aspect-questions are interleaved to learn aspect-aware documentrepresentation.

Zhouetal.46designedanattention-basedLSTMnetworkforcross-lingualsentimentclassificationatthedocumentlevel.Themodelconsistsoftwoattention-basedLSTMsforbilingualrepresentation,andeach LSTM is alsohierarchically structured. In this setting, it effectively adapts the sentimentinformationfromaresource-richlanguage(English)toaresource-poorlanguage(Chinese)andhelpsimprovethesentimentclassificationperformance.

Page 16: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Li et al.47 proposed an adversarialmemory network for cross-domain sentiment classification in atransfer learning setting, where the data from the source and the target domain are modelledtogether. It jointly trains two networks for sentiment classification and domain classification (i.e.,whetheradocumentisfromthesourceortargetdomain).

SENTENCELEVELSENTIMENTCLASSIFICATION

Sentence level sentiment classification is to determine the sentiment expressed in a single givensentence. As discussed earlier, the sentiment of a sentence can be inferred with subjectivityclassification48 and polarity classification, where the former classifies whether a sentence issubjectiveorobjectiveandthelatterdecideswhetherasubjectivesentenceexpressesanegativeorpositive sentiment. In existing deep learning models, sentence sentiment classification is usuallyformulated as a joint three-way classification problem, namely, to predict a sentence as positive,neural,andnegative.

Same as document level sentiment classification, sentence representation produced by neuralnetworksisalsoimportantforsentencelevelsentimentclassification.Additionally,sinceasentenceis usually short compared to a document, some syntactic and semantic information (e.g., parsetrees,opinionlexicons,andpart-of-speechtags)maybeusedtohelp.Additionalinformationsuchasreview ratings, social relationship, and cross-domain information can be considered too. Forexample, social relationships have been exploited in discovering sentiments in social media datasuchastweets.

Inearly research,parse trees (whichprovidesomesemanticandsyntactic information)wereusedtogetherwiththeoriginalwordsasthe inputtoneuralmodels,sothatthesentimentcompositioncanbebetterinferred.Butlately,CNNandRNNbecomemorepopular,andtheydonotneedparsetrees to extract features from sentences. Instead, CNN and RNN use word embeddings as input,whichalreadyencodesomesemanticandsyntactic information.Moreover,themodelarchitectureofCNNorRNNcanhelplearnintrinsicrelationshipsbetweenwordsinasentencetoo.Therelatedworksareintroducedindetailbelow.

Socher et al.49 first proposed a semi-supervised Recursive Autoencoders Network (RAE) forsentencelevelsentimentclassification,whichobtainsareduceddimensionalvectorrepresentationforasentence.Lateron,Socheretal.50proposedaMatrix-vectorRecursiveNeuralNetwork (MV-RNN), inwhicheachword isadditionallyassociatedwithamatrixrepresentation(besidesavectorrepresentation)inatreestructure.Thetreestructureisobtainedfromanexternalparser.InSocheretal.51,theauthorsfurtherintroducedtheRecursiveNeuralTensorNetwork(RNTN),wheretensor-basedcompositionalfunctionsareusedtobettercapturetheinteractionsbetweenelements.Qianet al.33 proposed two more advanced models, Tag-guided Recursive Neural Network (TG-RNN),which chooses a composition function according to the part-of-speech tags of a phrase, andTag-embeddedRecursiveNeuralNetwork /RecursiveNeural TenserNetwork (TE-RNN/RNTN),whichlearnstagembeddingsandthencombinestagandwordembeddingstogether.

Kalchbrenneretal.52proposedaDynamicCNN(calledDCNN)forsemanticmodellingofsentences.DCNNusesthedynamicK-Maxpoolingoperatorasanon-linearsubsamplingfunction.Thefeaturegraphinducedbythenetworkisabletocapturewordrelations.Kim53alsoproposedtouseCNNforsentence-level sentiment classification and experimentedwith several variants, namely CNN-rand(whereword embeddings are randomly initialized), CNN-static (whereword embeddings are pre-trained and fixed), CNN-non-static (where word embeddings are pre-trained and fine-tuned) andCNN-multichannel(wheremultiplesetsofwordembeddingsareused).

dosSantosandGatti54proposedaCharacter to SentenceCNN (CharSCNN)model.CharSCNNusestwo convolutional layers to extract relevant features from words and sentences of any size to

Page 17: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

perform sentiment analysis of short texts. Wang et al.55 utilized LSTM for Twitter sentimentclassification by simulating the interactions of words during the compositional process.Multiplicative operations betweenword embeddings through gate structures are used to providemoreflexibilityandtoproducebettercompositionalresultscomparedtotheadditiveonesinsimplerecurrentneuralnetwork.SimilartobidirectionalRNN,theunidirectionalLSTMcanbeextendedtoabidirectionalLSTM56byallowingbidirectionalconnectionsinthehiddenlayer.

Wangetal.57proposedaregionalCNN-LSTMmodel,whichconsistsoftwoparts:regionalCNNandLSTM,topredictthevalencearousalratingsoftext.

Wangetal.58describedajointCNNandRNNarchitectureforsentimentclassificationofshorttexts,which takes advantage of the coarse-grained local features generated by CNN and long-distancedependencieslearnedviaRNN.

Guggilla et al.59 presented a LSTM- and CNN-based deep neural network model, which utilizesword2vec and linguistic embeddings for claim classification (classifying sentences to be factual orfeeling).

Huang et al.60 proposed to encode the syntactic knowledge (e.g., part-of-speech tags) in a tree-structuredLSTMtoenhancephraseandsentencerepresentation.

Akhtaret al.61 proposed several multi-layer perceptron based ensemble models for fine-gainedsentimentclassificationoffinancialmicroblogsandnews.

Guan et al.62 employed a weakly-supervised CNN for sentence (and also aspect) level sentimentclassification.Itcontainsatwo-steplearningprocess:itfirstlearnsasentencerepresentationweaklysupervisedbyoverall review ratingsand thenuses the sentence (andaspect) level labels for fine-tuning.

Tengetal.63proposedacontext-sensitive lexicon-basedmethod for sentimentclassificationbasedon a simple weighted-sum model, using bidirectional LSTM to learn the sentiment strength,intensificationandnegationoflexiconsentimentsincomposingthesentimentvalueofasentence.

YuandJiang64studiedtheproblemof learninggeneralizedsentenceembeddings forcross-domainsentence sentiment classification anddesigned a neural networkmodel containing two separatedCNNs that jointly learn two hidden feature representations from both the labeled and unlabeleddata.

Zhao et al.65 introduced a recurrent random walk network learning approach for sentimentclassification of opinionated tweets by exploiting the deep semantic representation of both userpostedtweetsandtheirsocialrelationships.

Mishraetal.66utilizedCNNtoautomaticallyextractcognitivefeaturesfromtheeye-movement(orgaze)dataofhumanreadersreadingthetextandusedthemasenrichedfeaturesalongwithtextualfeaturesforsentimentclassification.

Qian et al.67 presented a linguistically regularized LSTM for the task. The proposed modelincorporateslinguisticresourcessuchassentimentlexicon,negationwordsandintensitywordsintotheLSTMsoastocapturethesentimenteffectinsentencesmoreaccurately.

ASPECTLEVELSENTIMENTCLASSIFICATION

Different from the document level and the sentence level sentiment classification, aspect levelsentiment classification considers both the sentiment and the target information, as a sentiment

Page 18: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

always has a target. As mentioned earlier, a target is usually an entity or an entity aspect. Forsimplicity,bothentityandaspectareusuallyjustcalledaspect.Givenasentenceandatargetaspect,aspectlevelsentimentclassificationaimstoinferthesentimentpolarity/orientationofthesentencetowardthetargetaspect.Forexample,inthesentence“thescreenisveryclearbutthebatterylifeistooshort.”thesentimentispositiveifthetargetaspectis“screen”butnegativeifthetargetaspectis“batterylife”.Wewilldiscussautomatedaspectortargetextractioninthenextsection.

Aspectlevelsentimentclassificationischallengingbecausemodellingthesemanticrelatednessofatarget with its surrounding context words is difficult. Different context words have differentinfluences on the sentiment polarity of a sentence towards the target. Therefore, it is necessarycapture semantic connections between the target word and the context words when buildinglearningmodelsusingneuralnetworks.

Therearethree importanttasks inaspect levelsentimentclassificationusingneuralnetworks.Thefirsttaskistorepresentthecontextofatarget,wherethecontextmeansthecontextualwordsinasentence or document. This issue can be similarly addressed using the text representationapproaches mentioned in the above two sections. The second task is to generate a targetrepresentation,whichcanproperly interactwith itscontext.Ageneralsolution is to learnatargetembedding, which is similar to word embedding. The third task is to identify the importantsentiment context (words) for the specified target. For example, in the sentence “the screen ofiPhoneisclearbutbatterlifeisshort”,“clear”istheimportantcontextwordfor“screen”and“short”is the important context for “battery life”. This task is recently addressed by the attentionmechanism.Althoughmanydeeplearningtechniqueshavebeenproposedtodealwithaspectlevelsentimentclassification,toourknowledge,therearestillnodominatingtechniquesintheliterature.Relatedworksandtheirmainfocusesareintroducedbelow.

Dong et al.68 proposed an Adaptive Recursive Neural Network (AdaRNN) for target-dependenttwitter sentiment classification, which learns to propagate the sentiments of words towards thetargetdependingonthecontextandsyntacticstructure.Itusestherepresentationoftherootnodeasthefeatures,andfeedsthemintothesoftmaxclassifiertopredictthedistributionoverclasses.

Vo and Zhang69 studied aspect-based Twitter sentiment classification by making use of richautomatic features,which are additional features obtained using unsupervised learningmethods.Thepapershowedthatmultipleembeddings,multiplepoolingfunctions,andsentimentlexiconscanofferrichsourcesoffeatureinformationandhelpachieveperformancegains.

Since LSTM can capture semantic relations between the target and its context words in a moreflexibleway,Tangetal.70proposedTarget-dependentLSTM(TD-LSTM)andTarget-connectionLSTM(TC-LSTM)toextendLSTMbytakingthetargetintoconsideration.Theyregardedthegiventargetasafeatureandconcatenateditwiththecontextfeaturesforaspectsentimentclassification.

Ruderetal.71proposedtouseahierarchicalandbidirectionalLSTMmodelforaspectlevelsentimentclassification, which is able to leverage both intra- and inter-sentence relations. The soledependence on sentences and their structures within a review renders the proposed modellanguage-independent. Word embeddings are fed into a sentence-level bidirectional LSTM. FinalstatesoftheforwardandbackwardLSTMareconcatenatedtogetherwiththetargetembeddingandfed into a bidirectional review-level LSTM. At every time step, the output of the forward andbackwardLSTMisconcatenatedand fed intoa final layer,whichoutputsaprobabilitydistributionoversentiments.

Considering the limitationofworkbyDonget al.68 andVoandZhang69, Zhanget al.72proposedasentence levelneuralmodel toaddress theweaknessofpooling functions,whichdonotexplicitlymodeltweet-levelsemantics.Toachievethat,twogatedneuralnetworksarepresented.First,abi-

Page 19: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

directionalgatedneuralnetworkisusedtoconnectthewordsinatweetsothatpoolingfunctionscan be applied over the hidden layer instead of words for better representing the target and itscontexts. Second, a three-way gated neural network structure is used to model the interactionbetweenthetargetmentionanditssurroundingcontexts,addressingthelimitationsbyusinggatedneural network structures to model the syntax and semantics of the enclosing tweet, and theinteraction between the surrounding contexts and the target respectively.Gatedneural networkshavebeenshowntoreducethebiasofstandardrecurrentneuralnetworkstowardstheendsofasequencebybetterpropagationofgradients.

Wangetal.73proposedanattention-basedLSTMmethodwithtargetembedding,whichwasproventobeaneffectivewaytoenforcetheneuralmodeltoattendtotherelatedpartofasentence.Theattentionmechanismisusedtoenforcethemodeltoattendtotheimportantpartofasentence,inresponse to a specific aspect. Likewise, Yang et al.74 proposed two attention-based bidirectionalLSTMstoimprovetheclassificationperformance.LiuandZhang75extendedtheattentionmodellingby differentiating the attention obtained from the left context and the right context of a giventarget/aspect.Theyfurthercontrolledtheirattentioncontributionbyaddingmultiplegates.

Tang et al.76 introduced an end-to-endmemory network for aspect level sentiment classification,whichemploysanattentionmechanismwithanexternalmemorytocapturetheimportanceofeachcontext word with respect to the given target aspect. This approach explicitly captures theimportance of each context word when inferring the sentiment polarity of the aspect. Suchimportancedegreeandtextrepresentationarecalculatedwithmultiplecomputationallayers,eachofwhichisaneuralattentionmodeloveranexternalmemory.

Leietal.77proposedtouseaneuralnetworkapproachtoextractpiecesof inputtextasrationales(reasons) for review ratings. The model consists of a generator and a decoder. The generatorspecifiesadistributionoverpossiblerationales(extractedtext)andtheencodermapsanysuchtexttoa task-specific target vector. Formulti-aspect sentimentanalysis, each coordinateof the targetvectorrepresentstheresponseorratingpertainingtotheassociatedaspect.

Lietal.78 integratedthetargetidentificationtaskintosentimentclassificationtasktobettermodelaspect-sentimentinteraction.Theyshowedthatsentimentidentificationcanbesolvedwithanend-to-endmachinelearningarchitecture,inwhichthetwosub-tasksareinterleavedbyadeepmemorynetwork. In thisway, signals produced in target detection provide clues for polarity classification,andreversely,thepredictedpolarityprovidesfeedbacktotheidentificationoftargets.

Ma et al.79 proposed an Interactive Attention Network (IAN) that considers both attentions ontarget and context. That is, it uses two attention networks to interactively detect the importantwordsofthetargetexpression/descriptionandtheimportantwordsofitsfullcontext.

Chenetal.80proposed toutilizea recurrentattentionnetwork tobetter capture thesentimentofcomplicated contexts. To achieve that, their proposedmodel uses a recurrent/dynamic attentionstructureandlearnsnon-linearcombinationoftheattentioninGRUs.

Tay et al.81 designed a Dyadic Memory Network (DyMemNN) that models dyadic interactionsbetween aspect and context, by using either neural tensor compositions or holographiccompositionsformemoryselectionoperation.

ASPECTEXTRACTIONANDCATEGORIZATION

Toperformaspect levelsentimentclassification,oneneedstohaveaspects (ortargets),whichcanbemanuallygivenorautomaticallyextracted.Inthissection,wediscussexistingworkforautomatedaspect extraction (or aspect term extraction) from a sentence or document using deep learning

Page 20: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

models.Letususeanexampletostatetheproblem.Forexample,inthesentence“theimageisveryclear”theword“image”isanaspectterm(orsentimenttarget).Theassociatedproblemofaspectcategorization is to group the same aspect expressions into a category. For instance, the aspectterms“image”,“photo”and“picture”canbegroupedintooneaspectcategorynamedImage.Inthereviewbelow,weincludetheextractionofbothaspectandentitythatareassociatedwithopinions.

Onereasonwhydeeplearningmodelscanbehelpfulforthistaskisthat,deeplearningisessentiallygoodatlearning(complicated)featurerepresentations.Whenanaspectisproperlycharacterizedinsome feature space, for example, in one or some hidden layer(s), the semantics or correlationbetweenanaspectanditscontextcanbecapturedwiththeinterplaybetweentheircorrespondingfeaturerepresentations. Inotherwords,deep learningprovidesapossibleapproachtoautomatedfeatureengineeringwithouthumaninvolvement.

KatiyarandCardie82investigatedtheuseofdeepbidirectionalLSTMsforjointextractionofopinionentitiesandtheIS-FORMandIS-ABOUTrelationshipsthatconnecttheentities.Wangetal.83furtherproposedajointmodelintegratingRNNandConditionalRandomFields(CRF)toco-extractaspectsandopiniontermsorexpressions.Theproposedmodelcanlearnhigh-leveldiscriminativefeaturesanddouble-propagateinformationbetweenaspectandopiniontermssimultaneously.Wangetal.84further proposed aCoupledMulti-Layer AttentionModel (CMLA) for co-extracting of aspect andopinionterms.ThemodelconsistsofanaspectattentionandanopinionattentionusingGRUunits.An improved LSTM-based approach was reported by Li and Lam85, specifically for aspect termextraction. Itconsistsof threeLSTMs,ofwhichtwoLSTMsare forcapturingaspectandsentimentinteractions.ThethirdLSTMistousethesentimentpolarityinformationasanadditionalguidance.

He et al.86 proposed an attention-based model for unsupervised aspect extraction. The mainintuition is to utilize the attention mechanism to focus more on aspect-related words while de-emphasizing aspect-irrelevant words during the learning of aspect embeddings, similar to theautoencoderframework.

Zhang et al.87 extended a CRF model using a neural network to jointly extract aspects andcorrespondingsentiments.TheproposedCRFvariant replaces theoriginaldiscrete features inCRFwithcontinuouswordembeddings,andaddsaneurallayerbetweentheinputandoutputnodes.

Zhou et al.88 proposed a semi-supervisedword embedding learningmethod to obtain continuousword representations on a large set of reviewswith noisy labels.With theword vectors learned,deeperandhybridfeaturesarelearnedbystackingonthewordvectorsthroughaneuralnetwork.Finally,a logisticregressionclassifiertrainedwiththehybridfeatures isusedtopredicttheaspectcategory.

Yin et al.89 first learnedword embedding by considering the dependency path connecting words.Then they designed some embedding features that consider the linear context and dependencycontextinformationforCRF-basedaspecttermextraction.

Xiong et al.90 proposed an attention-based deep distancemetric learningmodel to group aspectphrases. The attention-based model is to learn feature representation of contexts. Both aspectphraseembeddingandcontextembeddingareusedtolearnadeepfeaturesubspacemetricforK-meansclustering.

Poria et al.91 proposed to use CNN for aspect extraction. They developed a seven-layer deepconvolutional neural network to tag eachword inopinionated sentences as either aspector non-aspectword.Somelinguisticpatternsarealsointegratedintothemodelforfurtherimprovement.

Page 21: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Ying et al.92 proposed two RNN-basedmodels for cross-domain aspect extraction. They first usedrule-basedmethods to generate an auxiliary label sequence for each sentence. They then trainedthemodelsusingboththetruelabelsandauxiliarylabels,whichshowspromisingresults.

OPINIONEXPRESSIONEXTRACTION

In this and thenext few sections,wediscuss deep learning applications to someother sentimentanalysis related tasks. This section focuses on the problem of opinion expression extraction (oropinion term extraction, or opinion identification), which aims to identify the expressions ofsentimentinasentenceoradocument.

Similar to the aspect extraction, opinion expression extraction using deep learning models isworkablebecausetheircharacteristicscouldbeidentifiedinsomefeaturespaceaswell.

Irsoy and Cardie93 explored the application of deep bidirectional RNN for the task, whichoutperformstraditionalshallowRNNswiththesamenumberofparametersandalsopreviousCRFmethods.94

Liu et al.95 presenteda general classof discriminativemodels basedon theRNNarchitecture andword embedding. The authors used pre-trainedword embeddings from three external sources indifferentRNNarchitecturesincludingElman-type,Jordan-type,LSTMandtheirvariations.

Wangetal.83proposedamodelintegratingrecursiveneuralnetworksandCRFtoco-extractaspectand opinion terms. The aforementioned CMLA is also proposed for co-extraction of aspect andopinionterms.84

SENTIMENTCOMPOSITION

Sentimentcompositionclaimsthatthesentimentorientationofanopinionexpressionisdeterminedbythemeaningofitsconstituentsaswellasthegrammaticalstructure.Duetotheirparticulartree-structure design, RecNN is naturally suitable for this task.51 Irsoy and Cardie96 reported that theRecNNwithadeeparchitecturecanmoreaccuratelycapturedifferentaspectsofcompositionalityinlanguage, which benefits sentiment compositionality. Zhu et al.97 proposed a neural network forintegrating the compositional and non-compositional sentiment in the process of sentimentcomposition.

OPINIONHOLDEREXTRACTION

Opinion holder (or source) extraction is the task of recognizing who holds the opinion (orwhom/wheretheopinion is from).1Forexample, inthesentence“Johnhateshiscar”,theopinionholderis“john”.Thisproblemiscommonlyformulatedasasequencelabellingproblemlikeopinionexpressionextractionoraspectextraction.Noticethatopinionholdercanbeeitherexplicit(fromanounphraseinthesentence)orimplicit(fromthewriter)asshownbyYangandCardie98.DengandWiebe99 proposed to use word embeddings of opinion expressions as features for recognizingsources of participant opinions and non-participant opinions, where a source can be the nounphraseorwriter.

TEMPORALOPINIONMINING

Timeisalsoan importantdimension inproblemdefinitionofsentimentsanalysis(seeLiu’sbook1).As time passes by, people may maintain or change their mind, or even give new viewpoints.

Page 22: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Therefore,predictingfutureopinionisimportantinsentimentanalysis.Someresearchusingneuralnetworkshasbeenreportedrecentlytotacklethisproblem.

Chenetal.100proposedaContent-basedSocial InfluenceModel (CIM)tomakeopinionbehaviourpredictionsof twitter users. That is, it uses thepast tweets topredict users’ futureopinions. It isbasedonaneural network framework toencodeboth theuser content and social relation factor(one’sopinionaboutatargetisinfluencedbyone’sfriends).

Rashkin et al.101 used LSTMs for targeted sentiment forecast in the social media context. Theyintroduced multilingual connotation frames, which aim at forecasting implied sentiments amongworldeventparticipantsengagedinaframe.

SENTIMENTANALYSISWITHWORDEMBEDDING

Itisclearthatwordembeddingsplayedanimportantroleindeeplearningbasedsentimentanalysismodels.It isalsoshownthatevenwithouttheuseofdeeplearningmodels,wordembeddingscanbeusedas features fornon-neural learningmodels for various tasks. The section thus specificallyhighlightswordembeddings’contributiontosentimentanalysis.

Wefirstpresenttheworksofsentiment-encodedwordembeddings.Forsentimentanalysis,directlyapplyingregularwordmethodslikeCBOWorSkip-gramtolearnwordembeddingsfromcontextcanencounter problems, becausewordswith similar contexts but opposite sentiment polarities (e.g.,“good”or“bad”)maybemappedtonearbyvectorsintheembeddingspace.Therefore,sentiment-encodedwordembeddingmethodshavebeenproposed.Massel al.102 learnedwordembeddingsthatcancapturebothsemanticandsentimentinformation.Bespalovetal.103showedthatann-grammodel combined with latent representation would produce a more suitable embedding forsentiment classification. Labutov and Lipson104 re-embed existing word embeddings with logisticregressionbyregardingsentimentsupervisionofsentencesasaregularizationterm.

LeandMikolov35proposedtheconceptofparagraphvectortofirstlearnfixed-lengthrepresentationfor variable-length pieces of texts, including sentences, paragraphs and documents. Theyexperimented on both sentence and document-level sentiment classification tasks and achievedperformance gains,which demonstrates themerit of paragraph vectors in capturing semantics tohelp sentiment classification.Tanget al.105,106presentedmodels to learnSentiment-specificWordEmbeddings(SSWE),inwhichnotonlythesemanticbutalsosentimentinformationisembeddedinthe learnedword vectors.Wang and Xia107 developed a neural architecture to train a sentiment-bearingwordembeddingbyintegratingthesentimentsupervisionatboththedocumentandwordlevels.Yuetal.108adopteda refinement strategy toobtain joint semantic-sentimentbearingwordvectors.

Featureenrichmentandmulti-sensewordembeddingsarealsoinvestigatedforsentimentanalysis.Vo and Zhang69 studied aspect-based Twitter sentiment classification by making use of richautomaticfeatures,whichareadditionalfeaturesobtainedusingunsupervisedlearningtechniques.LiandJurafsky109experimentedwiththeutilizationofmulti-sensewordembeddingsonvariousNLPtasks.Experimentalresultsshowthatwhilesuchembeddingsdoimprovetheperformanceofsometasks,theyofferlittlehelptosentimentclassificationtasks.Renetal.110proposedmethodstolearntopic-enrichedmulti-prototypewordembeddingsforTwittersentimentclassification.

Multilinguistic word embeddings have also been applied to sentiment analysis. Zhou et al.111reported a Bilingual Sentiment Word Embedding (BSWE) model for cross-language sentimentclassification. It incorporates the sentiment information into English-Chinesebilingual embeddingsbyemployinglabeledcorporaandtheirtranslation,insteadoflarge-scaleparallelcorpora.Barneset

Page 23: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

al.112 compared several types of bilingual word embeddings and neural machine translationtechniquesforcross-lingualaspect-basedsentimentclassification.

Zhangetal.113integratedwordembeddingswithmatrixfactorizationforpersonalizedreview-basedrating prediction. Specifically, the authors refine existing semantics-oriented word vectors (e.g.,word2vec and GloVe) using sentiment lexicons. Sharma et al.114 proposed a semi-supervisedtechniquetousesentimentbearingwordembeddingsforrankingsentimentintensityofadjectives.Wordembeddingtechniqueshavealsobeenutilizedorimprovedtohelpaddressvarioussentimentanalysistasksinmanyotherrecentstudies.55,62,87,89,95

SARCASMANALYSIS

Sarcasmisaformverbalironyandacloselyrelatedconcepttosentimentanalysis.Recently,thereisagrowinginterestinNLPcommunitiesinsarcasmdetection.ResearchershaveattemptedtosolveitusingdeeplearningtechniquesdueoftheirimpressivesuccessinmanyotherNLPproblems.

Zhangetal.115constructedadeepneuralnetworkmodelfortweetsarcasmdetection.TheirnetworkfirstusesabidirectionalGRUmodeltocapturethesyntacticandsemanticinformationovertweetslocally, and thenuses a pooling neural network to extract contextual features automatically fromhistorytweetsfordetectingsarcastictweets.

Joshi et al.116 investigated word embeddings-based features for sarcasm detection. Theyexperimented four past algorithms for sarcasm detection with augmented word embeddingsfeaturesandshowedpromisingresults.

Poriaetal.117developedaCNN-basedmodelforsarcasmdetection(sarcasticornon-sarcastictweetsclassification),by jointlymodellingpre-trainedemotion, sentimentandpersonality features, alongwiththetextualinformationinatweet.

Peled and Reichart118 proposed to interpret sarcasm tweets based on a RNN neural machinetranslationmodel.

GhoshandVeale119proposedaCNNandbidirectionalLSTMhybridforsarcasmdetectionintweets,whichmodelsbothlinguisticandpsychologicalcontexts.

Mishraetal.66utilizedCNNtoautomaticallyextractcognitivefeaturesfromtheeye-movement(orgaze)data toenrich information for sarcasmdetection.Wordembeddingsarealsoused for ironyrecognitioninEnglishtweets120andforcontroversialwordsidentificationindebates.121

EMOTIONANALYSIS

Emotionsarethesubjectivefeelingsandthoughtsofhumanbeings.Theprimaryemotions includelove, joy,surprise,anger,sadnessandfear.Theconceptofemotioniscloselyrelatedtosentiment.Forexample, thestrengthofasentimentcanbe linkedtothe intensityofcertainemotion like joyandanger.Thus,manydeeplearningmodelsarealsoappliedtoemotionanalysisfollowingthewayinsentimentanalysis.

Wang et al.122 built a bilingual attention networkmodel for code-switched emotion prediction. ALSTMmodel isused toconstructadocument level representationofeachpost,andtheattentionmechanismisemployedtocapturetheinformativewordsfromboththemonolingualandbilingualcontexts.

Page 24: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

Zhou et al.123 proposed an emotional chattingmachine tomodel the emotion influence in large-scaleconversationgenerationbasedonGRU.Thetechniquehasalsobeenappliedinotherpapers.39,72,115

Abdul-MageedandUngar124firstbuiltalargedatasetforemotiondetectionautomaticallybyusingdistantsupervisionandthenusedaGRUnetworkforfine-grainedemotiondetection.

Felboet al.125 usedmillionsof emoji occurrences in socialmedia for pretrainingneuralmodels inordertolearnbetterrepresentationsofemotionalcontexts.

A question-answering approach is proposed using a deep memory network for emotion causeextraction.126 Emotion cause extraction aims to identify the reasons behind a certain emotionexpressedintext.

MULTIMODALDATAFORSENTIMENTANALYSIS

Multimodaldata,suchasthedatacarryingtextual,visual,andacousticinformation,hasbeenusedtohelpsentimentanalysisasitprovidesadditionalsentimentsignalstothetraditionaltextfeatures.Since deep learningmodels canmap inputs to some latent space for feature representation, theinputsfrommultimodaldatacanalsobeprojectedsimultaneouslytolearnmultimodaldatafusion,forexample,byusingfeatureconcatenation, joint latentspace,orothermoresophisticatedfusionapproaches.Thereisnowagrowingtrendofusingmultimodaldatawithdeeplearningtechniques.

Poriaetal.127proposedawayofextractingfeaturesfromshorttextsbasedontheactivationvaluesofaninnerlayerofCNN.ThemainnoveltyofthepaperistheuseofadeepCNNtoextractfeaturesfromtextandtheuseofmultiplekernellearning(MKL)toclassifyheterogeneousmultimodalfusedfeaturevectors.

Berteroetal.128describedaCNNmodelforemotionandsentimentrecognitioninacousticdatafrominteractivedialogsystems.

Fungetal.129demonstratedavirtualinteractiondialoguesystemthathaveincorporatedsentiment,emotionandpersonalityrecognitioncapabilitiestrainedbydeeplearningmodels.

Wangetal.130reportedaCNNstructureddeepnetwork,namedDeepCoupledAdjectiveandNoun(DCAN)neuralnetwork, forvisual sentimentclassification.Thekey ideaofDCAN is toharness theadjectiveandnountextdescriptions, treatingthemas two(weak)supervisionsignals to learntwointermediatesentimentrepresentations.Thoselearnedrepresentationsarethenconcatenatedandusedforsentimentclassification.

Yangetal.131developedtwoalgorithmsbasedonaconditionalprobabilityneuralnetworktoanalysevisualsentimentinimages.

Zhu et al.132 proposed a unified CNN-RNNmodel for visual emotion recognition. The architectureleveragesCNNwithmultiplelayerstoextractdifferentlevelsoffeatures(e.g.,colour,texture,object,etc.)withinamulti-task learning framework.AndabidirectionalRNN isproposed to integrate thelearnedfeaturesfromdifferentlayersintheCNNmodel.

You et al.133 adopted the attention mechanism for visual sentiment analysis, which can jointlydiscover the relevant local image regions and build a sentiment classifier on top of these localregions.

Poriaetal.134proposedsomeadeeplearningmodelformulti-modalsentimentanalysisandemotionrecognition on video data. Particularly, a LSTM-based model is proposed for utterance-level

Page 25: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

sentimentanalysis,whichcancapturecontextual informationfromtheirsurroundings in thesamevideo.

Tripathi et al.135 used deep and CNN-based models for emotion classification on a multimodaldatasetDEAP,whichcontainselectroencephalogramandperipheralphysiologicalandvideosignals.

Zadehetal.136formulatedtheproblemofmultimodalsentimentanalysisasmodellingintra-modalityandinter-modalitydynamicsandintroducedanewneuralmodelnamedTensorFusionNetworktotackleit.

Longetal.137proposedanattentionneuralmodeltrainedwithcognitiongroundedeye-trackingdatafor sentence-level sentiment classification. A Cognition Based Attention (CBA) layer is built forneuralsentimentanalysis.

Wangetal.138proposedaSelect-AdditiveLearning(SAL)approachtotackletheconfoundingfactorproblem in multimodal sentiment analysis, which removes the individual specific latentrepresentations learned by neural networks (e.g., CNN). To achieve it, two learning phases areinvolved, namely, a selectionphase for confounding factor identification and a removal phase forconfoundingfactorremoval.

RESOURCE-POORLANGUAGEANDMULTILINGUALSENTIMENTANALYSIS

Recently, sentiment analysis in resource-poor languages (compared to English) has also achievedsignificantprogressduetotheuseofdeeplearningmodels.Additionally,multilingualfeaturesalsocan help sentiment analysis just like multimodal data. In the same way, deep learning has beenappliedtothemultilingualsentimentanalysissetting.

Akhtaretal.139 reportedaCNN-basedhybridarchitecture for sentenceandaspect level sentimentclassificationinaresource-poorlanguage,Hindi.

Dahouetal.140usedwordembeddingsandaCNN-basedmodelforArabicsentimentclassificationatthesentencelevel.

Singhal and Bhattacharyya141 designed a solution for multilingual sentiment classification atreview/sentencelevelandexperimentedwithmultiplelanguages,includingHindi,Marathi,Russian,Dutch, French, Spanish, Italian,German, andPortuguese.Theauthorsappliedmachine translationtools to translate these languages intoEnglishand thenusedEnglishwordembeddings,polaritiesfromasentimentlexiconandaCNNmodelforclassification.

Joshi et al.142 introduced a sub-word level representation in a LSTM architecture for sentimentclassificationofHindi-Englishcode-mixedsentences.

OTHERRELATEDTASKS

Therearealsoapplicationsofdeeplearninginsomeothersentimentanalysisrelatedtasks.

SentimentIntersubjectivity:Guietal.143tackledtheintersubjectivityprobleminsentimentanalysis,where the problem is to study the gap between the surface form of a language and thecorrespondingabstractconcepts,andincorporatethemodellingofintersubjectivityintoaproposedCNN.

LexiconExpansion:Wangetal.144proposedaPUlearning-basedneuralapproachforopinionlexiconexpansion.

Page 26: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

FinancialVolatilityPrediction:Rekabsazetal.145madevolatilitypredictionsusingfinancialdisclosuresentimentwithwordembedding-based informationretrievalmodels,wherewordembeddingsareusedinsimilarwordsetexpansion.

Opinion Recommendation: Wang and Zhang146 introduced the task of opinion recommendation,whichaimstogenerateacustomizedreviewscoreofaproductthattheparticularuser is likelytogive,aswellasacustomizedreviewthattheuserwouldhavewrittenforthetargetproduct iftheuserhadreviewedtheproduct.Amultiple-attentionmemorynetworkwasproposedto tackle theproblem,whichconsidersusers’reviews,product’sreviews,andusers’neighbours(similarusers).

StanceDetection:Augensteinetal.147proposedabidirectional LSTMswithaconditionalencodingmechanismforstancedetectioninpoliticaltwitterdata.Duetal.148designedatarget-specificneuralattentionmodelforstanceclassification.

CONCLUSION

Applying deep learning to sentiment analysis has become a popular research topic lately. In thispaper, we introduced various deep learning architectures and their applications in sentimentanalysis.Many of these deep learning techniques have shown state-of-the-art results for varioussentimentanalysistasks.Withtheadvancesofdeeplearningresearchandapplications,webelievethattherewillbemoreexcitingresearchofdeeplearningforsentimentanalysisinthenearfuture.

Acknowledgments

BingLiuandShuaiWang’sworkwassupportedinpartbyNationalScienceFoundation(NSF)undergrantno.IIS1407927andIIS-1650900,andbyHuaweiTechnologiesCo.Ltdwitharesearchgift.

References

[1] Liu B. Sentiment analysis: mining opinions, sentiments, and emotions. The Cambridge University Press,2015.

[2]LiuB.Sentimentanalysisandopinionmining(introductionandsurvey),Morgan&Claypool,May2012.

[3]PangBandLeeL.Opinionminingandsentimentanalysis.FoundationsandTrendsinInformationRetrieval,2008.2(1–2):pp.1–135.

[4]GoodfellowI,BengioY,CourvilleA.Deeplearning.TheMITPress.2016.

[5]Glorot X, BordesA, Bengio Y. Deep sparse rectifier neural networks. InProceedings of the InternationalConferenceonArtificialIntelligenceandStatistics(AISTATS2011),2011.

[6] Rumelhart D.E, HintonG.E,Williams R.J. Learning representations by back-propagating errors.Cognitivemodelling,1988.

[7]CollobertR,WestonJ,BottouL,KarlenM,KavukcuogluK,andKuksaP.Naturallanguageprocessing(almost)fromscratch.JournalofMachineLearningResearch,2011.

[8] Goldberg Y. A primer on neural network models for natural language processing. Journal of ArtificialIntelligenceResearch,2016.

[9]BengioY,CourvilleA,VincentP.Representationlearning:areviewandnewperspectives.IEEETransactionsonPatternAnalysisandMachineIntelligence,2013.

Page 27: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[10]LeeH,GrosseR,RanganathR,andNgA.Y.Convolutionaldeepbeliefnetworksforscalableunsupervisedlearningofhierarchicalrepresentations. InProceedingsof the InternationalConferenceonMachineLearning(ICML2009),2009.

[11]BengioY,DucharmeR,VincentP,andJauvinC.Aneuralprobabilisticlanguagemodel.JournalofMachineLearningResearch,2003.

[12] Morin F, Bengio Y. Hierarchical probabilistic neural network language model. In Proceedings of theInternationalWorkshoponArtificialIntelligenceandStatistics,2005.

[13]MikolovT,ChenK,CorradoG,andDeanJ.Efficientestimationofwordrepresentationsinvectorspace.InProceedingsofInternationalConferenceonLearningRepresentations(ICLR2013),2013.

[14]MikolovT,SutskeverI,ChenK,CorradoG,andDeanJ.Distributedrepresentationsofwordsandphrasesand their compositionality. In Proceedings of the Annual Conference on Advances in Neural InformationProcessingSystems(NIPS2013),2013.

[15] Mnih A, Kavukcuoglu K. Learning word embeddings efficiently with noise-contrastive estimation. InProceedings of the Annual Conference onAdvances in Neural Information Processing Systems (NIPS 2013),2013.

[16]HuangE.H,SocherR,ManningC.D.andNgA.Y. Improvingword representationsviaglobal contextandmultiple word prototypes. In Proceedings of the Annual Meeting of the Association for ComputationalLinguistics(ACL2012),2012.

[17]PenningtonJ,SocherR,ManningC.D.GloVe:globalvectorsforwordrepresentation.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2014),2014.

[18] Bengio Y, Lamblin P, Popovici D, and Larochelle H. Greedy layer-wise training of deep networks. InProceedings of the Annual Conference onAdvances in Neural Information Processing Systems (NIPS 2006),2006.

[19]HintonG.E, SalakhutdinovR.R. Reducing thedimensionality of datawithneural networks.Science, July2006.

[20] Vincent P, Larochelle H, Bengio Y, and Manzagol P-A. Extracting and composing robust features withdenoising autoencoders. In Proceedings of the International Conference onMachine Learning (ICML 2008),2008.

[21]SermanetP,LeCunY.Trafficsignrecognitionwithmulti-scaleconvolutionalnetworks. InProceedingsoftheInternationalJointConferenceonNeuralNetworks(IJCNN2011),2011.

[22]ElmanJ.L.Findingstructureintime.CognitiveScience,1990.

[23]BengioY,SimardP,FrasconiP. Learning long-termdependencieswithgradientdescent isdifficult. IEEETransactionsonNeuralNetworks,1994.

[24]SchusterM,PaliwalK.K.Bidirectionalrecurrentneuralnetworks. IEEETransactionsonSignalProcessing,1997.

[25]HochreiterS,SchmidhuberJ.Longshort-termmemory.NeuralComputation,9(8):1735-1780,1997.

[26]TaiK.S,SocherR,ManningC.D.Improvedsemanticrepresentationsfromtree-structuredlongshort-termmemorynetworks.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2015),2015.

[27] Cho K, BahdanauD, Bougares F, SchwenkH and Bengio Y. Learning phrase representations using RNNencoder-decoderforstatisticalmachinetranslation.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2014),2014.

Page 28: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[28] Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks onsequencemodelling.arXivpreprintarXiv:1412.3555,2014.

[29]BahdanauD,ChoK,BengioY.Neuralmachinetranslationbyjointlylearningtoalignandtranslate.arXivpreprintarXiv:1409.0473,2014.

[30]WestonJ,ChopraS,BordesA.Memorynetworks.arXivpreprintarXiv:1410.3916.2014.

[31]SukhbaatarS,WestonJ,FergusR.End-to-endmemorynetworks.InProceedingsofthe29thConferenceonNeuralInformationProcessingSystems(NIPS2015),2015.

[32]GravesA,WayneG,DanihelkaI.NeuralTuringMachines.preprintarXiv:1410.5401.2014.

[33]QianQ,TianB,HuangM,LiuY,ZhuXandZhuX.Learningtagembeddingsandtag-specificcompositionfunctions in the recursive neural network. In Proceedings of the Annual Meeting of the Association forComputationalLinguistics(ACL2015),2015.

[34] Moraes R, Valiati J.F, Neto W.P. Document-level sentiment classification: an empirical comparisonbetweenSVMandANN.ExpertSystemswithApplications.2013.

[35] Le Q, Mikolov T. Distributed representations of sentences and documents. In Proceedings of theInternationalConferenceonMachineLearning(ICML2014),2014.

[36]Glorot X, BordesA, Bengio Y.Domain adaption for large-scale sentiment classification: adeep learningapproach.InProceedingsoftheInternationalConferenceonMachineLearning(ICML2011),2011.

[37]ZhaiS,Zhongfei(Mark)Zhang.Semisupervisedautoencoderforsentimentanalysis.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2016),2016.

[38]JohnsonR,ZhangT.Effectiveuseofwordorderfortextcategorizationwithconvolutionalneuralnetworks.In Proceedings of the Conference of the North American Chapter of the Association for ComputationalLinguistics:HumanLanguageTechnologies(NAACL-HLT2015),2015.

[39]TangD,QinB,LiuT.Documentmodellingwithgatedrecurrentneuralnetworkforsentimentclassification.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2015),2015.

[40] Tang D, Qin B, Liu T. Learning semantic representations of users and products for document levelsentimentclassification.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2015),2015.

[41]ChenH,SunM,TuC,LinY,andLiuZ.Neuralsentimentclassificationwithuserandproductattention.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

[42]DouZY.CapturinguserandproductInformationfordocumentlevelsentimentanalysiswithdeepmemorynetwork. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP2017),2017.

[43]XuJ,ChenD,QiuX,andHuangX.Cached longshort-termmemoryneuralnetworksfordocument-levelsentiment classification. In Proceedings of the Conference on Empirical Methods in Natural LanguageProcessing(EMNLP2016),2016.

[44] Yang Z, Yang D, Dyer C, He X, Smola AJ, and Hovy EH. Hierarchical attention networks for documentclassification. In Proceedings of the Conference of the North American Chapter of the Association forComputationalLinguistics:HumanLanguageTechnologies(NAACL-HLT2016),2016.

[45]YinY,SongY,ZhangM.Document-levelmulti-aspectsentimentclassificationasmachinecomprehension.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2017),2017.

[46] Zhou X, Wan X, Xiao J. Attention-based LSTM network for cross-lingual sentiment classification. InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

Page 29: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[47] Li Z, Zhang Y, Wei Y, Wu Y, and Yang Q. End-to-end adversarial memory network for cross-domainsentiment classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI2017),2017.

[48] Wiebe J, Bruce R, and O’Hara T. Development and use of a gold standard data set for subjectivityclassifications. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL1999),1999.

[49]SocherR,PenningtonJ,HuangE.H,NgA.Y,andManningC.D.Semi-supervisedrecursiveautoencodersforpredicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in NaturalLanguageProcessing(EMNLP2011),2011.

[50]SocherR,HuvalB,ManningC.D,andNgA.Y.Semanticcompositionalitythroughrecursivematrix-vectorspaces.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2012),2012.

[51]SocherR,PerelyginA,WuJ.Y,ChuangJ,ManningC.D,NgA.Y,andPottsC.Recursivedeepmodelsforsemanticcompositionalityoverasentimenttreebank.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2013),2013.

[52]KalchbrennerN,GrefenstetteE,BlunsomP.Aconvolutionalneuralnetwork formodellingsentences. InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2014),2014.

[53]KimY.Convolutionalneuralnetworksforsentenceclassification.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2014),2014.

[54]dosSantos,C.N.,GattiM.Deepconvolutionalneuralnetworksforsentimentanalysisforshorttexts. InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2014),2014.

[55]WangX,LiuY,SunC,WangB,andWangX.Predictingpolaritiesoftweetsbycomposingwordembeddingswith long short-termmemory. In Proceedings of the AnnualMeeting of the Association for ComputationalLinguistics(ACL2015),2015.

[56] Graves A, Schmidhuber J. Framewise phoneme classificationwith bidirectional LSTM and other neuralnetworkarchitectures.NeuralNetworks,2005.

[57]WangJ,YuL-C,LaiR.K.,andZhangX.DimensionalsentimentanalysisusingaregionalCNN-LSTMmodel.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2016),2016.

[58] Wang X, Jiang W, Luo Z. Combination of convolutional and recurrent neural network for sentimentanalysisof short texts. InProceedingsof the InternationalConferenceonComputational Linguistics (COLING2016),2016.

[59] Guggilla C,Miller T, Gurevych I. CNN-and LSTM-based claim classification in online user comments. InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[60]HuangM,QianQ,ZhuX.Encoding syntacticknowledge inneuralnetworks for sentimentclassification.ACMTransactionsonInformationSystems,2017

[61]AkhtarMS,KumarA,GhosalD,EkbalA,andBhattacharyyaP.Amultilayerperceptronbasedensembletechniqueforfine-grainedfinancialsentimentanalysis.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[62]GuanZ,ChenL,ZhaoW,ZhengY,TanS,andCaiD.Weakly-superviseddeeplearningforcustomerreviewsentiment classification. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI2016),2016.

[63] Teng Z, Vo D-T, and Zhang Y. Context-sensitive lexicon features for neural sentiment analysis. InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

Page 30: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[64]YuJ,JiangJ.Learningsentenceembeddingswithauxiliarytasksforcross-domainsentimentclassification.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

[65]ZhaoZ,LuH,CaiD,HeX,ZhuangY.Microblogsentimentclassificationviarecurrentrandomwalknetworklearning.InProceedingsoftheInternalJointConferenceonArtificialIntelligence(IJCAI2017),2017.

[66]MishraA,DeyK,BhattacharyyaP.Learningcognitivefeaturesfromgazedataforsentimentandsarcasmclassificationusingconvolutionalneuralnetwork.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2017),2017.

[67] Qian Q, Huang M, Lei J, and Zhu X. Linguistically regularized LSTM for sentiment classification. InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2017),2017.

[68]DongL,WeiF,TanC,TangD,ZhouM,andXuK.Adaptiverecursiveneuralnetworkfortarget-dependentTwitter sentiment classification. InProceedings of theAnnualMeeting of theAssociation for ComputationalLinguistics(ACL2014),2014.

[69] Vo D-T, Zhang Y. Target-dependent twitter sentiment classification with rich automatic features. InProceedingsoftheInternalJointConferenceonArtificialIntelligence(IJCAI2015),2015.

[70] Tang D, Qin B, Feng X, and Liu T. Effective LSTMs for target-dependent sentiment classification. InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[71]RuderS,GhaffariP,Breslin J.G.Ahierarchicalmodelof reviews foraspect-basedsentimentanalysis. InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2016),2016.

[72]ZhangM,ZhangY,VoD-T.Gatedneuralnetworksfortargetedsentimentanalysis.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2016),2016.

[73]WangY,HuangM,ZhuX,andZhaoL.Attention-basedLSTMforaspect-levelsentimentclassification. InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

[74] Yang M, Tu W, Wang J, Xu F, and Chen X. Attention-based LSTM for target-dependent sentimentclassification.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2017),2017.

[75] Liu J, Zhang Y. Attention modeling for targeted sentiment. In Proceedings of the Conference of theEuropeanChapteroftheAssociationforComputationalLinguistics(EACL2017),2017.

[76]TangD,QinB,andLiuT.Aspect-levelsentimentclassificationwithdeepmemorynetwork.arXivpreprintarXiv:1605.08900,2016.

[77]LeiT,BarzilayR,JaakkolaT.Rationalizingneuralpredictions.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2016),2016.

[78] Li C, Guo X, Mei Q. Deep memory networks for attitude Identification. In Proceedings of the ACMInternationalConferenceonWebSearchandDataMining(WSDM2017),2017.

[79]MaD,LiS,ZhangX,WangH. Interactiveattentionnetworksforaspect-Levelsentimentclassification. InProceedingsoftheInternalJointConferenceonArtificialIntelligence(IJCAI2017),2017.

[80]ChenP,SunZ,BingL,andYangW.Recurrentattentionnetworkonmemoryforaspectsentimentanalysis.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[81]TayY,TuanLA,HuiSC.Dyadicmemorynetworksforaspect-basedsentimentanalysis. InProceedingsoftheInternationalConferenceonInformationandKnowledgeManagement(CIKM2017),2017.

[82] Katiyar A, Cardie C. Investigating LSTMs for joint extraction of opinion entities and relations. InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2016),2016.

Page 31: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[83]WangW,PanSJ,DahlmeierD,andXiaoX.Recursiveneural conditional randomfields foraspect-basedsentiment analysis. InProceedings of the Conference on EmpiricalMethods inNatural Language Processing(EMNLP2016),2016.

[84]WangW,PanSJ,DahlmeierD,andXiaoX.Coupledmulti-Layerattentionsforco-extractionofaspectandopinionterms.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2017),2017.

[85]LiX,LamW.Deepmulti-tasklearningforaspecttermextractionwithmemoryInteraction.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[86]HeR,LeeWS,NgHT,andDahlmeierD.Anunsupervisedneuralattentionmodelforaspectextraction.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2017),2017.

[87]ZhangM,ZhangY,VoD-T.Neuralnetworks foropendomaintargetedsentiment. InProceedingsof theConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2015),2015.

[88] Zhou X, Wan X, Xiao J. Representation learning for aspect category detection in online reviews. InProceedingofAAAIConferenceonArtificialIntelligence(AAAI2015),2015.

[89]YinY,WeiF,DongL,XuK,ZhangM,andZhouM.Unsupervisedwordanddependencypathembeddingsforaspecttermextraction.InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI2016),2016.

[90]XiongS,ZhangY,JiD,andLouY.Distancemetric learningforaspectphrasegrouping. InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[91] Poria S, Cambria E, Gelbukh A. Aspect extraction for opinionminingwith a deep convolutional neuralnetwork.JournalofKnowledge-basedSystems.2016.

[92] Ying D, Yu J, Jiang J. Recurrent neural networks with auxiliary labels for cross-domain opinion targetextraction.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2017),2017

[93]IrsoyO,CardieC.Opinionminingwithdeeprecurrentneuralnetworks.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2014),2014.

[94] Yang B, Cardie C. Extracting opinion expressions with semi-markov conditional random fields. InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2012),2012.

[95]LiuP,JotyS,MengH.Fine-grainedopinionminingwithrecurrentneuralnetworksandwordembeddings.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2015),2015.

[96]IrsoyO,CardieC.Deeprecursiveneuralnetworksforcompositionalityinlanguage.InProceedingsoftheAnnualConferenceonAdvancesinNeuralInformationProcessingSystems(NIPS2014),2014.

[97]ZhuX,GuoH,SobhaniP.Neuralnetworksforintegratingcompositionalandnon-compositionalsentimentinsentimentcomposition.InProceedingsoftheConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies(NAACL-HLT2015),2015.

[98]YangB,CardieC.JointInferenceforfine-grainedopinionextraction.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2013),2013.

[99] Deng L, Wiebe J. Recognizing opinion sources based on a new categorization of opinion types. InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI2016),2016.

[100]ChenC,WangZ,LeiY,andLiW.Content-basedinfluencemodellingforopinionbehaviourPrediction.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[101]RashkinH,BellE,ChoiY,andVolkovaS.Multilingualconnotationframes:acasestudyonsocialmediafor targeted sentiment analysis and forecast. In Proceedings of the Annual Meeting of the Association forComputationalLinguistics(ACL2017),2017.

Page 32: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[102]MassA. L, Daly R. E, PhamP. T,HuangD,NgA. Y. and Potts C. Learningword vectors for sentimentanalysis. InProceedings of the AnnualMeeting of the Association for Computational Linguistics (ACL 2011),2011.

[103]BespalovD,BaiB,QiY,andShokoufandehA.Sentimentclassificationbasedonsupervisedlatentn-gramanalysis. InProceedings of the International Conference on Information andKnowledgeManagement (CIKM2011),2011.

[104]Labutov I,LipsonH.Re-embeddingwords. InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2013),2013.

[105]TangD,WeiF,YangN,ZhouM,LiuT,andQinB.Learningsentiment-specificwordembeddingfortwittersentimentclassification.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2014),2014.

[106]TangD,WeiF,QinB,YangN,LiuT,andZhougM.Sentimentembeddingswithapplicationstosentimentanalysis.IEEETransactionsonKnowledgeandDataEngineering,2016.

[107] Wang L, Xia R. Sentiment Lexicon construction with representation Learning based on hierarchicalsentiment Supervision. In Proceedings of the Conference on Empirical Methods on Natural LanguageProcessing(EMNLP2017),2017.

[108]YuLC,WangJ,LaiKR,andZhangX.Refiningwordembeddingsforsentimentanalysis.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[109]LiJ,JurafskyD.Domulti-senseembeddingsimprovenaturallanguageunderstanding?InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2015),2015.

[110]RenY,ZhangY,Zhang,MandJiD.ImprovingTwittersentimentclassificationusingtopic-enrichedmulti-prototypewordembeddings.InProceedingofAAAIConferenceonArtificialIntelligence(AAAI2016),2016.

[111] Zhou H, Chen L, Shi F, Huang D. Learning bilingual sentiment word embeddings for cross-languagesentimentclassification.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2015),2015.

[112]BarnesJ,LambertP,BadiaT.Exploringdistributionalrepresentationsandmachinetranslationforaspect-based cross-lingual sentiment classification. In Proceedings of the 27th International Conference onComputationalLinguistics(COLING2016),2016.

[113] ZhangW, YuanQ,Han J, andWang J. Collaborativemulti-Level embedding learning from reviews forrating prediction. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2016),2016.

[114]SharmaR,SomaniA,KumarL,andBhattacharyyaP.Sentimentintensityrankingamongadjectivesusingsentiment bearing word embeddings. In Proceedings of the Conference on Empirical Methods on NaturalLanguageProcessing(EMNLP2017),2017.

[115] ZhangM, Zhang Y, Fu G. Tweet sarcasm detection using deep neural network. InProceedings of theInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[116]JoshiA,TripathiV,PatelK,BhattacharyyaP,andCarmanM.Arewordembedding-basedfeaturesusefulfor sarcasm detection? In Proceedings of the Conference on Empirical Methods on Natural LanguageProcessing(EMNLP2016),2016.

[117]PoriaS,CambriaE,HazarikaD,andVijP.Adeeper look intosarcastictweetsusingdeepconvolutionalneuralnetworks.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

Page 33: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[118] Peled L, Reichart R. Sarcasm SIGN: Interpreting sarcasm with sentiment based monolingual machinetranslation.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2017),2017.

[119]GhoshA,VealeT.Magnetsforsarcasm:makingsarcasmdetectiontimely,contextualandverypersonal.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[120]VanHeeC,LefeverE,HosteV.Mondaymorningsaremyfave:)#notexploringtheautomaticrecognitionofironyinenglishtweets.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[121]ChenWF, Lin FY, Ku LW.WordForce: visualizing controversialwords indebates. InProceedingsof theConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

[122]Wang Z, Zhang Y, Lee S, Li S, and Zhou G. A bilingual attention network for code-switched emotionprediction.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[123] Zhou H, Huang M, Zhang T, Zhu X and Liu B. Emotional chatting machine: emotional conversationgenerationwithinternalandexternalmemory.arXivpreprint.arXiv:1704.01074,2017.

[124] Abdul-Mageed M, Ungar L. EmoNet: fine-grained emotion detection with gated recurrent neuralnetworks. InProceedingsof theAnnualMeetingof theAssociationforComputationalLinguistics (ACL2017),2017.

[125]FelboB,MisloveA,SøgaardA,RahwanI,andLehmannS.Usingmillionsofemojioccurrencesto learnany-domainrepresentationsfordetectingsentiment,emotionandsarcasm.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[126]GuiL,HuJ,HeY,XuR,LuQ,andDuJ.Aquestionansweringapproachtoemotioncauseextraction.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[127]PoriaS,CambriaE,GelbukhA.Deepconvolutionalneuraltextfeaturesandmultiplekernellearningforutterance-level multimodal sentiment analysis. In Proceedings of the Conference on Empirical Methods onNaturalLanguageProcessing(EMNLP2015),2015.

[128]BerteroD,SiddiqueFB,WuCS,WanY,ChanR.H,andFungP.Real-timespeechemotionandsentimentrecognitionforinteractivedialoguesystems.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP2016),2016.

[129] Fung P, Dey A, Siddique FB, Lin R, Yang Y, Bertero D, Wan Y, Chan RH, and Wu CS. Zara: a virtualinteractivedialoguesystem incorporatingemotion, sentimentandpersonality recognition. InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[130]WangJ,FuJ,XuY,andMeiT.Beyondobjectrecognition:visualsentimentanalysiswithdeepcoupledadjectiveandnounneuralnetworks. InProceedingsof the Internal JointConferenceonArtificial Intelligence(IJCAI2016),2016.

[131] Yang J, Sun M, Sun X. Learning visual sentiment distributions via augmented conditional probabilityneuralnetwork.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2017),2017.

[132] ZhuX, Li L, ZhangW,RaoT, XuM,HuangQ, andXuD.Dependencyexploitation: aunifiedCNN-RNNapproach for visual emotion recognition. In Proceedings of the Internal Joint Conference on ArtificialIntelligence(IJCAI2017),2017.

[133]YouQ,JinH,LuoJ.Visualsentimentanalysisbyattendingonlocalimageregions.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2017),2017.

[134]PoriaS,CambriaE,HazarikaD,MajumderN,ZadehA,andMorencyLP.Context-dependentsentimentanalysisinuser-generatedvideos.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics(ACL2017),2017.

Page 34: Deep Learning for Sentiment Analysis: A Survey · Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of

[135] Tripathi S, Acharya S, Sharma RD,Mittal S, and Bhattacharya S. Using deep and convolutional neuralnetworksforaccurateemotionclassificationonDEAPdataset.InProceedingsofAAAIConferenceonArtificialIntelligence(AAAI2017),2017.

[136]ZadehA,ChenM,PoriaS,CambriaE,andMorencyLP.Tensorfusionnetworkformultimodalsentimentanalysis. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP2017),2017.

[137]LongY,QinL,XiangR,LiM,andHuangCR.Acognitionbasedattentionmodelforsentimentanalysis.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[138]WangH,MeghawatA,MorencyLP,andXingE.X. Select-additive learning: improvinggeneralization inmultimodalsentimentanalysis.InProceedingsoftheInternationalConferenceonMultimediaandExpo(ICME2017),2017.

[139]AkhtarMS,KumarA,EkbalA,andBhattacharyyaP.Ahybriddeep learningarchitecture for sentimentanalysis.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[140] Dahou A, Xiong S, Zhou J, Haddoud MH, and Duan P. Word embeddings and convolutional neuralnetworkforArabicsentimentclassification.InProceedingsoftheInternationalConferenceonComputationalLinguistics(COLING2016),2016.

[141] Singhal P, Bhattacharyya P. Borrow a little from your rich cousin: using embeddings and polarities ofenglish words for multilingual sentiment classification. In Proceedings of the International Conference onComputationalLinguistics(COLING2016),2016.

[142] Joshi A, Prabhu A, ShrivastavaM, and Varma V. Towards sub-word level compositions for sentimentanalysis of Hindi-English codemixed text. InProceedings of the International Conference on ComputationalLinguistics(COLING2016),2016.

[143] Gui L, Xu R, He Y, LuQ, andWei Z. Intersubjectivity and sentiment: from language to knowledge. InProceedingsoftheInternationalJointConferenceonArtificialIntelligence(IJCAI2016),2016.

[144]Wang Y, Zhang Y, Liu B. Sentiment lexicon expansionbasedonneural PU Learning, double dictionarylookup,andpolarityassociation.InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[145]RekabsazN,LupuM,BaklanovA,HanburyA,DürA,andAndersonL.Volatilitypredictionusingfinancialdisclosures sentimentswithwordembedding-based IRmodels. InProceedingsof theAnnualMeetingof theAssociationforComputationalLinguistics(ACL2017),2017.

[146]WangZ,ZhangY.Opinionrecommendationusinganeuralmodel. InProceedingsoftheConferenceonEmpiricalMethodsonNaturalLanguageProcessing(EMNLP2017),2017.

[147] Augenstein I, Rocktäschel T, Vlachos A, Bontcheva K. Stance detection with bidirectional conditionalencoding. In Proceedings of the Conference on EmpiricalMethods in Natural Language Processing (EMNLP2016),2016.

[148]DuJ,XuR,HeY,GuiL.Stanceclassificationwithtarget-specificneuralattentionnetworks.InProceedingsoftheInternalJointConferenceonArtificialIntelligence(IJCAI2017),2017.