understanding hidden memories of recurrent neural networks · 2019. 5. 21. · hkust 35 1. the...

37
Understanding Hidden Memories of Recurrent Neural Networks Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu. THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

Upload: others

Post on 02-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

Understanding Hidden Memories of Recurrent Neural NetworksYao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu.

THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

Page 2: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

2H K U S T

What is a Recurrent Neural Network?

Page 3: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

Introduction

3H K U S T

x(t)

h(t)

tanh

y(t)

A vanilla RNN

What is Recurrent Neural Networks (RNN)?

Machine Translation, Speech Recognition, Language Modeling, …

A deep learning model used for:

Page 4: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

RNN

input

output

hidden state

Visual Analytics Science & Technology

4H K U S TA 2-layer RNN

𝒉(#) = tanh(𝑾𝒉 #,- + 𝑽𝒙(#))

A vanilla RNN takes an input 𝒙(#), and update its hidden state 𝒉(#,-) using:

𝑥(-)

𝑦(-)

𝑥(3)

𝑦(3)

𝑥(4)

𝑦(4)

𝑥(5)

𝑦(5)

IntroductionWhat is Recurrent Neural Networks (RNN)?

x(t)

h(t)

tanh

y(t)

x(t)

h(t)

tanh

y(t)

𝒉-#

𝒉3#

x(t)

h(t)

tanh

y(t)

A vanilla RNN

Page 5: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

5H K U S T

What has the RNN learned from data?

input output?

Page 6: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

6H K U S T

A. map the value of a single hidden unit on data (Karpathy A. et al., 2015)

MotivationWhat has the RNN learned from data?

A unit sensitive to position in a line.

A lot more units have no clear meanings.

Page 7: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

7H K U S T

B. matrix plots (Li J. et. al., 2016)

Each column represents the value of the hidden state vector when reads a input word

MotivationWhat has the RNN learned from data?

Scalability!Machine Translation: 4-layer, 1000 units/layer (Sutskever I. et al., 2014)Language Modeling: 2-layer, 1500 units/layer (Zaremba et al., 2015)

Page 8: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

8H K U S T

Our Solution - RNNVis

Page 9: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

9H K U S T

Our SolutionExplaining individual hidden unitsBi-graph and co-clusteringSequence evaluation

Page 10: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

10H K U S T

Model’s response to a word 𝑤 at step 𝑡: the update of hidden state Δ𝒉 #

Δ𝒉 # = Δℎ:(#) , 𝑖 = 1,… , 𝑛.

SolutionExplaining an individual hidden unit using its most salient words

Larger abs(Δℎ:# ) implies that the word 𝑤 is more salient to unit 𝑖.

Since Δℎ:(#) can vary given the same word 𝑤, we use the expectation:

E Δ𝒉 # |𝑤# = 𝑤

How to define salient?

Can be estimated by running the model on dataset and take the mean.

Page 11: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

11H K U S T

SolutionExplaining an individual hidden unit using its most salient words

response

Unit: #36

Top 4 positive/negative salient words of unit 36 in an RNN (GRU) trained on Yelp review data.

25%- 75%

9%- 91%

Page 12: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

12H K U S T

SolutionExplaining an individual hidden unit using its most salient words

Distribution of model’s response given the word “he”.Units reordered according to the mean. (an LSTM with 600 units)

Highly responsive hidden units

Unit #

mean

25%- 75%

9%- 91%

Page 13: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

13H K U S T

SolutionExplaining an individual hidden unit using its most salient words

Investigating one unit/word at a time…

P: Too much user burden!

S: An overview for easier exploration

Page 14: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

14H K U S T

SolutionExplaining individual hidden unitsBi-graph and co-clusteringSequence evaluation

Page 15: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

15H K U S T

SolutionBi-graph Formulation

he she by can may

HiddenUnits

Words

Page 16: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

16H K U S T

SolutionBi-graph Formulation

he she by can may

HiddenUnits

Words

Page 17: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

17H K U S T

SolutionCo-clustering

he she by can may

HiddenUnits

Words

Algorithm* Spectralco-clustering(DhillonI.S.,2001)

Page 18: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

18H K U S T

SolutionCo-clustering – Edge Aggregation

he she by can may

HiddenUnits

Words

Color: sign of the average edge weightWidth: scale of the average edge weight

Page 19: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

heshe

bycan

may

19H K U S T

Solution

HiddenUnits

Words

Co-clustering - Visualization

Page 20: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

he she by can may

20H K U S T

Solution

HiddenUnits

Words

HiddenUnitsClusters(MemoryChips)

WordsClusters(WordClouds)

Co-clustering - Visualization

Color: each unit’s salience to the selected word

Page 21: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

21H K U S T

SolutionExplaining individual hidden unitsBi-graph and co-clusteringSequence evaluation

Page 22: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

22H K U S T

SolutionGlyph design for evaluating sentences

Eachglyphsummarizesthedynamicsofhiddenunitclusterswhenreadingaword

Theratioofpreservedvalue

Eachbarrepresentstheaveragescaleofthevalueinahiddenunitscluster

Currentvalue

Increasedvalue

Decreasedvalue

Morepositivevaluearepreserved

Morenegativevaluearepreserved

Updatetowardspositive

Updatetowardsnegative

Page 23: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

23H K U S T

Case StudiesHow do RNNs handle sentiments?The language of Shakespeare

Page 24: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

24H K U S T

Case Study – Sentiment AnalysisEach unit has two sidesSingle-layer GRU with 50 hidden units (cells), trained on Yelp review data

Page 25: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

25H K U S T

Case Study – Sentiment AnalysisRNNs can learn to handle the contextSingle-layer GRU with 50 hidden units (cells), trained on Yelp review data

Sentence A: I love the food, though the staff is not helpful Sentence B: The staff is not helpful, though I love the food

A B

negative

positive

Updatetowardspositive

Updatetowardsnegative

Page 26: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

26H K U S T

Case Study – Sentiment AnalysisClues for the problemSingle-layer GRU with 50 hidden units (cells), trained on Yelp review data.

Problem: the data is not evenly sampled.

Page 27: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

27H K U S T

Case Study – Sentiment AnalysisVisual indicator of the performanceSingle-layer GRUs with 50 hidden units (cells), trained on Yelp review data.

Balanced Dataset Unbalanced DatasetAccuracy (test): 88.6%Accuracy (test): 91.9%

Page 28: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

28H K U S T

Case StudiesHow do RNNs handle the sentiments?The language of Shakespeare

Page 29: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

29H K U S T

Case Study – Language ModelingThe language of Shakespeare – A mixture of the old and the new

Page 30: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

30H K U S T

Case Study – Language ModelingThe language of Shakespeare – A mixture of the old and the new

Page 31: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

31H K U S T

Discussion & Future Work• Clustering. The quality of co-clustering? Interactive clustering?

• Glyph-based sentence visualization. Scalability?• Text data. How about speech data?• RNN models. More advanced RNN-based models like attention models?

Page 32: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

32H K U S T

Thank you!Contact: Yao Ming, [email protected]: www.myaooo.com/rnnvisCode: www.github.com/myaooo/rnnvis

Page 33: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

33H K U S T

Technical DetailsExplaining individual hidden units - Decomposition

The output of an RNN at step 𝑡 is typically a probability distribution:

𝑝: = softmax 𝑼𝒉(#) =exp 𝒖:L𝒉 #

∑ exp(𝒖NL𝒉 # )�N

where 𝑼 = 𝒖:L , 𝑖 = 1,2, … , 𝑛, is the output projection matrix.

The numerator of 𝑝: can be decomposed to:

exp 𝒖:L𝒉 # = exp Q𝒖:L 𝒉 R − 𝒉 R,-#

RT-

=Uexp(𝒖:LΔ𝒉 # )𝒕

𝝉T𝟏

Here exp(𝒖:LΔ𝒉 # ) is the multiplicative contribution of input word 𝑤#, the update of hidden state Δ𝒉 # can be regard as the model’s response to 𝑤#.

Page 34: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

34H K U S T

EvaluationExpert Interview

Showa tutorial video

1 42

Explore the tool

3

Compare two models

Answerquestions

5

Finisha survey

Page 35: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

35H K U S T

1. The complexity of the model

• Semantic information are distributed in hidden states of an RNN.

• Machine Translation: 4-layer LSTMs, 1000 units/layer (Sutskever I. et al., 2014)• Language Modeling: 2-layer LSTMs, 650 or 1500 units/layer (Zaremba et al., 2015)

ChallengesWhat are the challenges?

3. The complexity of the data• Patterns in sequential data like texts are difficult to be analyzed and interpreted

2. The complexity of the hidden memory

Page 36: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

36H K U S T

Other FindingsComparing LSTMs and vanilla RNNs

Left(A-C):co-clustervisualizationofthelastlayerofanRNN.Right(D-F):visualizationofthecellstatesofthelastlayerofanLSTM.Bottom(GH):twomodels’responsestothesameword“offer”.

Page 37: Understanding Hidden Memories of Recurrent Neural Networks · 2019. 5. 21. · HKUST 35 1. The complexity of the model • Semantic information are distributed in hidden states of

37H K U S T

Contribution• A visual technique for understanding what RNNs learned.

• A VA tool that ablates the hidden dynamics of a trained RNN.• Interesting findings with RNN models.