interpreting neural networks to improve politeness...

Interpreting Neural Networks to Improve Politeness Comprehension

Malika Aubakirova

University of [email protected]

Mohit Bansal

UNC Chapel [email protected]

Abstract

We present an interpretable neural network ap-proach to predicting and understanding polite-ness in natural language requests. Our mod-els are based on simple convolutional neuralnetworks directly on raw text, avoiding anymanual identification of complex sentimentor syntactic features, while performing bet-ter than such feature-based models from pre-vious work. More importantly, we use thechallenging task of politeness prediction as atestbed to next present a much-needed under-standing of what these successful networks areactually learning. For this, we present sev-eral network visualizations based on activa-tion clusters, first derivative saliency, and em-bedding space transformations, helping us au-tomatically identify several subtle linguisticsmarkers of politeness theories. Further, thisanalysis reveals multiple novel, high-scoringpoliteness strategies which, when added backas new features, reduce the accuracy gap be-tween the original featurized system and theneural model, thus providing a clear quantita-tive interpretation of the success of these neu-ral networks.

1 Introduction

Politeness theories (Brown and Levinson, 1987; Gu,1990; Bargiela-Chiappini, 2003) include key com-ponents such as modality, indirection, deference,and impersonalization. Positive politeness strate-gies focus on making the hearer feel good throughoffers, promises, and jokes. Negative politenessexamples include favor seeking, orders, and re-quests. Differentiating among politeness types is ahighly nontrivial task, because it depends on fac-tors such as a context, relative power, and culture.

Danescu-Niculescu-Mizil et al. (2013) proposed auseful computational framework for predicting po-liteness in natural language requests by designingvarious lexical and syntactic features about key po-liteness theories, e.g., first or second person start vs.plural. However, manually identifying such polite-ness features is very challenging, because there ex-ist several complex theories and politeness in natu-ral language is often realized via subtle markers andnon-literal cues.

Neural networks have been achieving high perfor-mance in sentiment analysis tasks, via their abilityto automatically learn short and long range spatialrelations. However, it is hard to interpret and ex-plain what they have learned. In this paper, we firstpropose to address politeness prediction via sim-ple CNNs working directly on the raw text. Thishelps us avoid the need for any complex, manually-defined linguistic features, while still performingbetter than such featurized systems. More impor-tantly, we next present an intuitive interpretation ofwhat these successful neural networks are learning,using the challenging politeness task as a testbed.

To this end, we present several visualizationstrategies: activation clustering, first derivativesaliency, and embedding space transformations,some of which are inspired by similar strategies incomputer vision (Erhan et al., 2009; Simonyan etal., 2014; Girshick et al., 2014), and have also beenrecently adopted in NLP for recurrent neural net-works (Li et al., 2016; Kadar et al., 2016). The neu-ron activation clustering method not only rediscov-ers and extends several manually defined featuresfrom politeness theories, but also uncovers multi-ple novel strategies, whose importance we measurequantitatively. The first derivative saliency tech-nique allows us to identify the impact of each phrase

on the final politeness prediction score via heatmaps,revealing useful politeness markers and cues. Fi-nally, we also plot lexical embeddings before and af-ter training, showing how specific politeness mark-ers move and cluster based on their polarity. Suchvisualization strategies should also be useful for un-derstanding similar state-of-the-art neural networkmodels on various other NLP tasks.

Importantly, our activation clusters reveal twonovel politeness strategies, namely indefinite pro-nouns and punctuation. Both strategies displayhigh politeness and top-quartile scores (as definedby Danescu-Niculescu-Mizil et al. (2013)). Also,when added back as new features to the original fea-turized system, they improve its performance and re-duce the accuracy gap between the featurized systemand the neural model, thus providing a clear, quan-titative interpretation of the success of these neuralnetworks in automatically learning useful features.

2 Related Work

Danescu-Niculescu-Mizil et al. (2013) presentedone of the first useful datasets and computational ap-proaches to politeness theories (Brown and Levin-son, 1987; Goldsmith, 2007; Kadar and Haugh,2013; Locher and Watts, 2005), using manually de-fined lexical and syntactic features. Substantial pre-vious work has employed machine learning modelsfor other sentiment analysis style tasks (Pang et al.,2002; Pang and Lee, 2004; Kennedy and Inkpen,2006; Go et al., 2009; Ghiassi et al., 2013). Recentwork has also applied neural network based mod-els to sentiment analysis tasks (Chen et al., 2011;Socher et al., ; Moraes et al., 2013; Dong et al.,2014; dos Santos and Gatti, 2014; Kalchbrenner etal., 2014). However, none of the above methodsfocused on visualizing and understanding the innerworkings of these successful neural networks.

There have been a number of visualization tech-niques explored for neural networks in computer vi-sion (Krizhevsky et al., 2012; Simonyan et al., 2014;Zeiler and Fergus, 2014; Samek et al., 2016; Ma-hendran and Vedaldi, 2015). Recently in NLP, Li etal. (2016) successfully adopt computer vision tech-niques, namely first-order saliency, and present rep-resentation plotting for sentiment compositionalityacross RNN variants. Similarly, Kadar et al. (2016)

analyze the omission scores and top-k contexts ofhidden units of a multimodal RNN. Karpathy et al.(2016) visualize character-level language models.We instead adopt visualization techniques for CNNstyle models for NLP1 and apply these to the chal-lenging task of politeness prediction, which ofteninvolves identifying subtle and non-literal sociolin-guistic cues. We also present a quantitative interpre-tation of the success of these CNNs on the politenessprediction task, based on closing the performancegap between the featurized and neural models.

3 Approach

3.1 Convolutional Neural Networks

We use one convolutional layer followed by a pool-ing layer. For a sentence v1:n (where each word vi

is a d-dim vector), a filter m applied on a window oft words, produces a convolution feature ci = f(m ⇤vi:i+t�1 + b), where f is a non-linear function, andb is a bias term. A feature map c 2 R

n�t+1 is ap-plied on each possible window of words so that c =[c1, ..., cn�t+1]. This convolutional layer is then fol-lowed by a max-over-pooling operation (Collobertet al., 2011) that gives C = max{c} of the partic-ular filter. To obtain multiple features, we use mul-tiple filters of varying window sizes. The result isthen passed to a fully-connected softmax layer thatoutputs probabilities over labels.

4 Experimental Setup

4.1 Datasets

We used the two datasets released by Danescu-Niculescu-Mizil et al. (2013): Wikipedia (Wiki)and Stack Exchange (SE), containing community re-quests with politeness labels. Their ‘feature devel-opment’ was done on the Wiki dataset, and SE wasused as the ‘feature transfer’ domain. We use a sim-pler train-validation-test split based setup for thesedatasets instead of the original leave-one-out cross-validation setup, which makes training extremelyslow for any neural network or sizable classifier.2

1The same techniques can also be applied to RNN models.2The result trends and visualizations using cross-validation

were similar to our current results, in preliminary experiments.We will release our exact dataset split details.

4.2 Training Details

Our tuned hyperparameters values (on the dev set ofWiki) are a mini-batch size of 32, a learning rate of0.001 for the Adam (Kingma and Ba, 2015) opti-mizer, a dropout rate of 0.5, CNN filter windows of3, 4, and 5 with 75 feature maps each, and ReLU asthe non-linear function (Nair and Hinton, 2010). Forconvolution layers, we use valid padding and stridesof all ones. We followed Danescu-Niculescu-Mizilet al. (2013) in using SE only as a transfer domain,i.e., we do not re-tune any hyperparameters or fea-tures on this domain and simply use the chosen val-ues from the Wiki setting. The split and other train-ing details are provided in the supplement.

5 Results

Table 1 first presents our reproduced classificationaccuracy test results (two labels: positive or nega-tive politeness) for the bag-of-words and linguisticfeatures based models of Danescu-Niculescu-Mizilet al. (2013) (for our dataset splits) as well as theperformance of our CNN model. As seen, withoutusing any manually defined, theory-inspired linguis-tic features, the simple CNN model performs betterthan the feature-based methods.3

Next, we also show how the linguistic featuresbaseline improves on adding our novelly discoveredfeatures (plus correcting some exising features), re-vealed via the analysis in Sec. 6. Thus, this reducesthe gap in performance between the linguistic fea-tures baseline and the CNN, and in turn provides aquantitative reasoning for the success of the CNNmodel. More details in Sec. 6.

6 Analysis and Visualization

We present the primary interest and contribution ofthis work: performing an important qualitative andquantitative analysis of what is being learned by ourneural networks w.r.t. politeness strategies.4

6.1 Activation Clusters

Activation clustering is a non-parametric approach(adopted from Girshick et al. (2014)) of computing

3For reference, human performance on the original tasksetup of Danescu-Niculescu-Mizil et al. (2013) was 86.72% and80.89% on the Wiki and SE datasets, respectively.

4We only use the Wiki train/dev sets for all analysis.

Model Wiki SE

Bag-of-Words 80.9% 64.6%Linguistic Features 82.6% 65.2%With Discovered Features 83.8% 65.7%CNN 85.8% 66.4%

Table 1: Accuracy Results on Wikipedia and Stack Exchange.

each CNN unit’s activations on a dataset and thenanalyzing the top-scoring samples in each cluster.We keep track of which neurons get maximally acti-vated for which Wikipedia requests and analyze themost frequent requests in each neuron’s cluster, tounderstand what each neuron reacts to.

6.1.1 Rediscovering Existing Strategies

We find that the different activation clusters ofour neural network automatically rediscover a num-ber of strategies from politeness theories consideredin Danescu-Niculescu-Mizil et al. (2013) (see Table3 in their paper). We present a few such strate-gies here with their supporting examples, and therest (e.g., Gratitude, Greeting, Positive Lexicon, andCounterfactual Modal) are presented in the supple-ment. The majority politeness label of each categoryis indicated by (+) and (-).Deference (+) A way of sharing the burden of arequest placed on the addressee. Activation clusterexamples: {“nice work so far on your rewrite...”;“hey, good work on the new pages...”}Direct Question (-) Questions imposed on theconverser in a direct manner with a demand of a fac-tual answer. Activation cluster examples: {“what’swith the radio , and fist in the air?”; “what levelwarning is appropriate?”}

6.1.2 Extending Existing Strategies

We also found that certain activation clusters de-picted interesting extensions of the politeness strate-gies given in previous work.Gratitude (+) Our CNN learns a special shadeof gratitude, namely it distinguishes a cluster con-sisting of the bigram thanks for. Activation clusterexamples: {“thanks for the good advice.”; “thanksfor letting me know.”}Counterfactual Modal (+) Sentences with Wouldyou/Could you get grouped together as expected; butin addition, the cluster contains requests with Doyou mind as well as gapped 3-grams like Can you ...please?, which presumably implies that the combi-

Figure 1: Saliency heatmaps for correctly classified sentences.

nation of a later please with future-oriented variantscan/will in the request gives a similar effect as theconditional-oriented variants would/could. Activa-tion cluster examples: {can this be reported ... grid,please?”; do you mind having another look?”}

6.1.3 Discovering Novel Strategies

In addition to rediscovering and extending polite-ness strategies mentioned in previous work, our net-work also automatically discovers some novel acti-vation clusters, potentially corresponding to new po-liteness strategies.Indefinite Pronouns (-) Danescu-Niculescu-Mizil et al. (2013) distinguishes requests with firstand second person (plural, starting position, etc.).However, we find activations that also react to in-definite pronouns such as something/somebody. Ac-tivation cluster examples: {“am i missing somethinghere?”; “wait for anyone to discuss it.”}Punctuation (-) Though non-characteristic indirect speech, punctuation appears to be an impor-tant special marker in online communities, which insome sense captures verbal emotion in text. E.g.,one of our neuron clusters gets activated on ques-tion marks “???” and one on ellipsis “...”. Activa-tion cluster examples: {“now???”; “original arti-cle????”; “helllo?????”}5

In the next section, via saliency heatmaps, we willfurther study the impact of indefinite pronouns in thefinal-decision making of the classifier. Finally, inSec. 6.4, we will quantitatively show how our newlydiscovered strategies help directly improve the accu-racy performance of the linguistic features baselineand achieve high politeness and top-quartile scoresas per Danescu-Niculescu-Mizil et al. (2013).

5More examples are given in the supplement.

6.2 First Derivative Saliency

Inspired from neural network visualization in com-puter vision (Simonyan et al., 2014), the first deriva-tive saliency method indicates how much each inputunit contributes to the final decision of the classifier.If E is the input embedding, y is the true label, andSy(E) is the neural network output, then we con-sider gradients @Sy(E)

@e . Each image in Fig. 1 is aheatmap of the magnitudes of the derivative in abso-lute value with respect to each dimension.

The first heatmap gets signals from please (Pleasestrategy) and could you (Counterfactual Modal strat-egy), but effectively puts much more mass on help.This is presumably due to the nature of Wikipediarequests such that the meaning boils down to ask-ing for some help that reduces the social distance.In the second figure, the highest emphasis is put onwhy would you, conceivably used by Wikipedia ad-ministrators as an indicator of questioning. Also,the indefinite pronoun somebody makes a relativelyhigh impact on the decision. This relates back to theactivation clustering mentioned in the previous sec-tion, where indefinite pronouns had their own clus-ter. In the third heatmap, the neural network does notput much weight on the greeting-based start hey, be-cause it instead focuses on the higher polarity6 grati-tude part after the greeting, i.e., on the words thanksfor. This will be further connected in Sec. 6.3.

6.3 Embedding Space Transformations

We selected key words from Danescu-Niculescu-Mizil et al. (2013) and from our new activation clus-ters ( Sec. 6.1) and plotted (via PCA) their embed-

6See Table 3 of Danescu-Niculescu-Mizil et al. (2013) forpolarity scores of the various strategies.

Strategy Politeness In top quartile Examples

21. Indefinite Pronouns -0.13 39% am i missing something here?22. Punctuation -0.71 62% helllo?????

Table 2: Extending Table 3 of Danescu-Niculescu-Mizil et al. (2013) with our novelly discovered politeness strategies.

-1.5

-1

-0.5

0

0.5

1

-1.5 -1 -0.5 0 0.5 1 1.5

could

would

please

could

something

would

revertsomething

why

what great

greatappreciate

thanksthanks

hi

appreciatewhat

can

can

please

revert

hiwhy

Figure 2: Projection before (red) and after (blue) training.

ding space positions before and after training, tohelp us gain insights into specific sentiment trans-formations. Fig. 2 shows that the most positive keyssuch as hi, appreciate, and great get clustered evenmore tightly after training. The key thanks gets a no-tably separated position on a positive spectrum, sig-nifying its importance in the NN’s decision-making(also depicted via the saliency heatmaps in Sec. 6.2).

The indefinite pronoun something is located neardirect question politeness strategy keys why andwhat. Please, as was shown by Danescu-Niculescu-Mizil et al. (2013), is not always a positive word be-cause its sentiment depends on its sentence position,and it moves further away from a positive key group.Counterfactual Modal keys could and would as wellas can of indicative modal get far more separatedfrom positive keys. Moreover, after the training, thedistance between could and would increases but itgets preserved between can and would, which mightsuggest that could has a far stronger sentiment.

6.4 Quantitative Analysis

In this section, we present quantitative measures ofthe importance and polarity of the novelly discov-ered politeness strategies in the above sections, aswell how they explain some of the improved perfor-mance of the neural model.

In Table 3 of Danescu-Niculescu-Mizil et al.(2013), the pronoun politeness strategy with thehighest percentage in top quartile is 2nd Person(30%). Our extension Table 2 shows that our nov-elly discovered Indefinite Pronouns strategy repre-sents a higher percentage (39%), with a politenessscore of -0.13. Moreover, our Punctuation strategyalso turns out to be a top scoring negative politenessstrategy and in the top three among all strategies (af-ter Gratitude and Deference). It has a score of -0.71,whereas the second top negative politeness strategy(Direct Start) has a much lower score of -0.43.

Finally, in terms of accuracies, our newly dis-covered features of Indefinite Pronouns and Punc-tuation improved the featurized system of Danescu-Niculescu-Mizil et al. (2013) (see Table 1).7 Thisreduction of performance gap w.r.t. the CNN par-tially explains the success of these neural models inautomatically learning useful linguistic features.

7 Conclusion

We presented an interpretable neural network ap-proach to politeness prediction. Our simple CNNmodel improves over previous work with manually-defined features. More importantly, we then under-stand the reasons for these improvements via threevisualization techniques and discover some novelhigh-scoring politeness strategies which, in turn,quantitatively explain part of the performance gapbetween the featurized and neural models.

Acknowledgments

We would like to thank the anonymous reviewers fortheir helpful comments. This work was supportedby an IBM Faculty Award, a Bloomberg ResearchGrant, and an NVIDIA GPU donation to MB.

7Our NN visualizations also led to an interesting feature cor-rection. In the ’With Discovered Features’ result in Table 1, wealso removed the existing pronoun features (#14-18) based onthe observation that those had weaker activation and saliencycontributions (and lower top-quartile %) than the new indefi-nite pronoun feature. This correction and adding the two newfeatures contributed ⇠50-50 to the total accuracy improvement.

References

Francesca Bargiela-Chiappini. 2003. Face and polite-ness: new (insights) for old (concepts). Journal ofpragmatics, 35(10):1453–1469.

Penelope Brown and Stephen C Levinson. 1987. Polite-ness: Some universals in language usage, volume 4.Cambridge university press.

Long-Sheng Chen, Cheng-Hsiang Liu, and Hui-Ju Chiu.2011. A neural network based approach for sentimentclassification in the blogosphere. Journal of Informet-rics, 5(2):313–322.

Ronan Collobert, Jason Weston, Leon Bottou, MichaelKarlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011.Natural language processing (almost) from scratch.The Journal of Machine Learning Research, 12:2493–2537.

Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, DanJurafsky, Jure Leskovec, and Christopher Potts. 2013.A computational approach to politeness with applica-tion to social factors. In Proceedings of ACL.

Li Dong, Furu Wei, Chuanqi Tan, Duyu Tang, MingZhou, and Ke Xu. 2014. Adaptive recursive neuralnetwork for target-dependent twitter sentiment classi-fication. In Proceedings of ACL, pages 49–54.

Cıcero Nogueira dos Santos and Maira Gatti. 2014. Deepconvolutional neural networks for sentiment analysisof short texts. In Proceedings of COLING, pages 69–78.

Dumitru Erhan, Yoshua Bengio, Aaron Courville, andPascal Vincent. 2009. Visualizing higher-layer fea-tures of a deep network. University of Montreal, 1341.

M Ghiassi, J Skinner, and D Zimbra. 2013. Twitterbrand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network.Expert Systems with applications, 40(16):6266–6282.

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jiten-dra Malik. 2014. Rich feature hierarchies for accurateobject detection and semantic segmentation. In Pro-ceedings of CVPR, pages 580–587.

Alec Go, Richa Bhayani, and Lei Huang. 2009. Twit-ter sentiment classification using distant supervision.CS224N Project Report, Stanford, 1:12.

Daena J Goldsmith. 2007. Brown and levinsons polite-ness theory. Explaining communication: Contempo-rary theories and exemplars, pages 219–236.

Yueguo Gu. 1990. Politeness phenomena in modern chi-nese. Journal of pragmatics, 14(2):237–257.

Daniel Z Kadar and Michael Haugh. 2013. Understand-ing politeness. Cambridge University Press.

Akos Kadar, Grzegorz Chrupała, and Afra Alishahi.2016. Representation of linguistic form and func-tion in recurrent neural networks. arXiv preprintarXiv:1602.08952.

Nal Kalchbrenner, Edward Grefenstette, and Phil Blun-som. 2014. A convolutional neural network for mod-elling sentences.

Andrej Karpathy, Justin Johnson, and Fei-Fei Li. 2016.Visualizing and understanding recurrent networks. InProceedings of ICLR Workshop.

Alistair Kennedy and Diana Inkpen. 2006. Sentimentclassification of movie reviews using contextual va-lence shifters. Computational intelligence, 22(2):110–125.

Diederik Kingma and Jimmy Ba. 2015. Adam: Amethod for stochastic optimization. In Proceedings ofICLR.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.2012. Imagenet classification with deep convolutionalneural networks. In Proceedings of NIPS, pages 1097–1105.

Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky.2016. Visualizing and understanding neural models innlp.

Miriam A Locher and Richard J Watts. 2005. Polite-ness theory and relational work. Journal of PolitenessResearch. Language, Behaviour, Culture, 1(1):9–33.

Aravindh Mahendran and Andrea Vedaldi. 2015. Un-derstanding deep image representations by invertingthem. In Proceedings of CVPR, pages 5188–5196.IEEE.

Rodrigo Moraes, Joao Francisco Valiati, and WilsonP GaviaO Neto. 2013. Document-level senti-ment classification: An empirical comparison betweensvm and ann. Expert Systems with Applications,40(2):621–633.

Vinod Nair and Geoffrey E Hinton. 2010. Rectified lin-ear units improve restricted boltzmann machines. InProceedings of ICML, pages 807–814.

Bo Pang and Lillian Lee. 2004. A sentimental education:Sentiment analysis using subjectivity summarizationbased on minimum cuts. In Proceedings of ACL, page271.

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.2002. Thumbs up?: sentiment classification using ma-chine learning techniques. In Proceedings of EMNLP,pages 79–86.

Wojciech Samek, Alexander Binder, Gregoire Montavon,Sebastian Bach, and Klaus-Robert Muller. 2016.Evaluating the visualization of what a deep neural net-work has learned. IEEE Transactions on Neural Net-works and Learning Systems.

Karen Simonyan, Andrea Vedaldi, and Andrew Zisser-man. 2014. Deep inside convolutional networks:Visualising image classification models and saliencymaps. In Proceedings of ICLR Workshop.

Richard Socher, Alex Perelygin, Jean Y Wu, JasonChuang, Christopher D Manning, Andrew Y Ng, andChristopher Potts. Recursive deep models for seman-tic compositionality over a sentiment treebank. In Pro-ceedings of EMNLP.

Matthew D Zeiler and Rob Fergus. 2014. Visualizingand understanding convolutional networks. In Pro-ceedings of ECCV, pages 818–833. Springer.

Supplementary Material:

Interpreting Neural Networks to Improve Politeness Comprehension

Malika Aubakirova

University of [email protected]

Mohit Bansal

UNC Chapel [email protected]

1 Supplementary Material

This is supplementary material for the main paper,where we present more analysis and visualizationexamples, and our dataset and training details.

2 Activation Clusters

2.1 Rediscovering Existing Strategies

Gratitude (+) Respect and appreciation paid tothe listener. Activation cluster examples: “{thanksfor the heads up”; “thank you very much for thiskind gesture”; “thanks for help!”}

Greeting (+) A welcoming message for theconverser. Activation cluster examples: {“hey, longtime no seeing! how’s stuff?”; “greetings, sorry tobother you here... ”}

Positive Lexicon (+) Expressions that build a pos-itive relationship in the conversation and containpositive words from the sentiment lexicon, e.g.,great, nice, good. Activation cluster examples:{“your new map is a great”; “very nice article”;“yes, this is a nice illustration. i ’d love to...”}

Counterfactual Modal (+) Indirect strategies thatimply a burden on the addressee and yet provide aface-saving opportunity of denying the request, usu-ally containing hedges such as Would it be.../Couldyou please. Activation cluster examples: {“wouldyou be interested in creating an infobox for wind-mills...?; “would you mind retriveing the biblio-graphic data?”}

Deference (+) A way of sharing the burden of arequest placed on the addressee. Activation clusterexamples: {“nice work so far on your rewrite...”,“hey, good work on the new pages...”, “good pointfor the text...”, “you make some good points...”}

Direct Question (-) Questions imposed on theconverser in a direct manner with a demand of afactual answer. Activation cluster examples: {“whywould one want to re-create gnaa?”; “what’s withthe radio , and fist in the air?”; “what level warningis appropriate?”}

2.2 Extending Existing Strategies

Counterfactual Modal (+) Sentences with Wouldyou/Could you get grouped together as expected; butin addition, the cluster contains requests with Do youmind. Activation cluster examples: “{do you mindhaving another look?”; “do you mind if i migratethese to your userspace for you?”}

Gratitude (+) Our CNN learns a special shade ofgratitude, namely it distinguishes a cluster consist-ing of the bigram thanks for. Activation cluster ex-amples: “thanks for the good advice.”; “thanks forletting me know.”; “fair enough, thanks for assuminggood faith”}

Indicative Modal (+) The same neuron as forcounterfactual modal cluster above also gets acti-vated on gapped 3-grams like Can you ... please?,which presumably implies that the combination of alater please with future-oriented variants can/will inthe request gives a similar effect as the conditional-oriented variants would/could. Activation clusterexamples: “can this be reported to london grid,

please?”; “can you delete it again, please?”; “goodstart . can you add more, please?”}

2.3 Discovering Novel Strategies

Indefinite Pronouns (-) Danescu-Niculescu-Mizil et al. (2013) distinguishes requests with firstand second person (plural, starting position, etc.).However, we find activations that also react toindefinite pronouns such as something/somebody.Activation cluster examples: {“am i missing some-thing here?”; “he ’s gone. was it something i said?”;“you added the tag and then mentioned it on talk-you did not gain consensus first or even wait foranyone to discuss it. ”; “but how can something beboth correct and incorrect” }

Punctuation (-) Though non-characteristic in di-rect speech, punctuation appears to be an impor-tant special marker in online communities, which insome sense captures verbal emotion in text. One ofour neuron clusters gets activated on question marks“???” and one on ellipsis “...”. Activation clusterexamples of question marks: {“now???”; “originalarticle????”; “helllo?????”} Activation cluster ex-amples of ellipsis: {“ummm , it ’s a soft redirect. aplaceholder for a future page ... is there a problem?”; “Indeed ... the trolls just keep coming.”; “I can’tremember if i asked/checked to see if it got to you?so ... did it ?”}

3 First Derivative Saliency

In Fig. 1, we show some additional examples ofsaliency heatmaps. In the first heatmap, we seea clear example of the Positive Lexicon politenessstrategy. The key great captures most of the weightfor the final decision making. Note that, in partic-ular, the question mark in this case provides no in-fluence. Contrast that to the second figure, whichechos back the proposed negative politeness strat-egy on punctuation from Section 6.1.3. Initial ques-tion marks give a high influence in magnitude forthe negative predicted label. In the third example,we see that these punctuation markers still providea lot of emphasis. For instance, other words suchas really, successful and a personal pronoun I havevery little impact. Overall, this exemplifies DirectQuestion strategy since most of the focus is on why.

As was noted in the embedding space transfor-mations discussion, the Gratitude key thanks witha preposition for has a much stronger polarity thanother positive politeness keys in the fourth heatmap.Indeed, can you please does not nearly provide asmuch value. In the fifth heatmap, the sensitivityof the final score comes more from the greeting,namely hi, as compared to the phrase can you pleasetell me or positive lexicon very nice. These re-sults match the politeness score results in Table 3of Danescu-Niculescu-Mizil et al. (2013), where theGreeting strategy has a score of 0.87 compared to0.49 for Please strategy and 0.12 for Positive Lex-icon strategy. The sixth and last heatmap demon-strates the contribution of indefinite pronouns. Inthis case phrase am I missing something with the fo-cus on the latter two words decides the final labelprediction.

4 Dataset and Training Details

We split the Wikipedia and Stack Exchange datasetsof Danescu-Niculescu-Mizil et al. (2013) into train-ing, validation and test sets with 70%, 10%, and20% of the data respectively (after random shuf-fling). Therefore, the final split for Wikipedia is1523, 218 and 436; and for Stack Exchange it is2298, 328, and 657, respectively. We will make thedataset split indices publicly available.

We use 300-dim pre-trained word2vec embed-dings as input to the CNN Mikolov et al. (2014),and then allow fine-tuning of the embeddings dur-ing training. All sentence tokenization is done usingNLTK (Bird, 2006). For words not present in thepre-trained set, we use uniform unit scaling initial-ization.

We implement our model using a python versionof TensorFlow (Abadi et al., 2015). Hyperparam-eters, e.g., the mini-batch size, learning rate, opti-mizer type, and dropout rate were tuned using thevalidation set of Wikipedia via grid search.1 The fi-nal chosen values were a mini-batch size of 32, alearning rate of 0.001 for the Adam Optimizer, adropout rate of 0.5, filter windows are 3, 4, and 5

1Grid search was performed over dropout rates from 0.1 to0.9 in increments of 0.1; four learning rates from 1e-1 to 1e-4; Adam, SGD, and AdaGrad optimizers; filter windows sizesfrom 1 to 3-grams; and features maps ranging from 10 to 200with an incremental step of 20.

Figure 1: Additional saliency heatmaps for correctly classified sentences.

with 75 feature maps each, and ReLU as non-lineartransformation function (Nair and Hinton, 2010).For convolution layers, we use valid padding andstrides of all ones.

References

Martın Abadi, Ashish Agarwal, Paul Barham, EugeneBrevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado,Andy Davis, Jeffrey Dean, Matthieu Devin, SanjayGhemawat, Ian Goodfellow, Andrew Harp, GeoffreyIrving, Michael Isard, Yangqing Jia, Rafal Jozefow-icz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven-berg, Dan Mane, Rajat Monga, Sherry Moore, DerekMurray, Chris Olah, Mike Schuster, Jonathon Shlens,Benoit Steiner, Ilya Sutskever, Kunal Talwar, PaulTucker, Vincent Vanhoucke, Vijay Vasudevan, Fer-nanda Viegas, Oriol Vinyals, Pete Warden, MartinWattenberg, Martin Wicke, Yuan Yu, and XiaoqiangZheng. 2015. TensorFlow: Large-scale machinelearning on heterogeneous systems. Software avail-able from tensorflow.org.

Steven Bird. 2006. Nltk: the natural language toolkit. In

Proceedings of the COLING/ACL on Interactive pre-sentation sessions, pages 69–72. Association for Com-putational Linguistics.

Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, DanJurafsky, Jure Leskovec, and Christopher Potts. 2013.A computational approach to politeness with applica-tion to social factors. In Proceedings of ACL.

Tomas Mikolov, Kai Chen, Greg Corrado, and JeffreyDean. 2014. word2vec.

Vinod Nair and Geoffrey E Hinton. 2010. Rectified lin-ear units improve restricted boltzmann machines. InProceedings of the 27th International Conference onMachine Learning (ICML-10), pages 807–814.

interpreting neural networks to improve politeness...

Documents