cis 522: lecture 12 - penn engineeringcis 522: lecture 12 rnns and lstms 2/25/2020 1 feedback hw2...
TRANSCRIPT
![Page 1: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/1.jpg)
CIS 522: Lecture 12
RNNs and LSTMs2/25/2020
1
![Page 2: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/2.jpg)
Feedback
HW2 took mean=25h. Will somewhat shorten next yearSome did it in <5. How?HW3 is, afaik, the last long HW.
We continue to get “more code”, “more examples” and “more theory” requests. Next year fewer guest lectures.
2
![Page 3: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/3.jpg)
Exam + Projects
3
![Page 4: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/4.jpg)
Today's Agenda● Some neuroscience background● Review of NLP and variable-length problems● RNNs
○ Backpropagation through time (BPTT)● LSTMs
○ Motivation○ Forget gates
● BiLSTMs○ Motivation○ Architecture
4
![Page 5: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/5.jpg)
Neuroscience: The ubiquity of memory across timescales
PPF: Zhang et al 2015
Ca2+ spikes:
Larkum
Recurrent
activity
Plasticity
5
![Page 6: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/6.jpg)
Working memoryPencilAutomobileEvil
6
![Page 7: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/7.jpg)
Working memoryGraphicElementMediocreSteelPlateNociceptionMemory
7
![Page 8: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/8.jpg)
Working memoryRandomBottlePhoneOvereagerCardPlantChimneyTreeHouseRoof
8
![Page 9: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/9.jpg)
A multitude of memory elementsWorking memoryEpisodic memoryProcedural memoryDeclarative memory
9
![Page 10: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/10.jpg)
Hippocampus
10
![Page 11: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/11.jpg)
Thoughts have persistence over time. How can we capture this with an
architecture?
11
![Page 12: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/12.jpg)
…
The API for feedforward nets is too restrictive.
12
![Page 13: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/13.jpg)
How does the number of parameters scale?PollEVWe increase the relevant time horizon. Fully connected network
13
![Page 14: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/14.jpg)
What if we use a deep convnet?Fixed strideFixed filter size on each layer
Does it help?
14
![Page 15: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/15.jpg)
RNNs: We'd like to have a notion of time and maintain state.
…
…
15
![Page 16: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/16.jpg)
We can't learn a new function for each timestep! How do we simplify our architecture?
16
![Page 17: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/17.jpg)
We can't learn a new function for each timestep! How do we simplify our architecture?
The way we think doesn't change from moment to moment.
17
![Page 18: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/18.jpg)
We share weights across time.
…
…
18
![Page 19: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/19.jpg)
Recurrent neural networks (RNNs)● Prior: the function modifying the state of a thought is
invariant to temporal shifts.
19
![Page 20: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/20.jpg)
PollEV how many parameters?
20
![Page 21: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/21.jpg)
RNNs typically have (at least) 3 weight tensors (U, W, V) and 2 biases.
21
![Page 22: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/22.jpg)
Activation functionsUsually tanhWant to keep activations from exploding
22
![Page 23: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/23.jpg)
How does the compute tree look like for an RNN?
23
![Page 24: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/24.jpg)
What do we want to produce?A vector (see last lecture)Text outputText translationsVideo translations Pose tracking etc
24
![Page 25: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/25.jpg)
Sequence-to-Sequence (Seq2Seq) Learning
25
![Page 26: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/26.jpg)
Example application
Towards data science26
![Page 27: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/27.jpg)
The exact unrolled architecture depends on the learning problem.
Variable-length input, single output at the end.
Variable-length input, simultaneous outputs.
27
![Page 28: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/28.jpg)
Teacher forcing helps learn complex patterns concurrently.
28
![Page 29: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/29.jpg)
Backpropagation through time (BPTT)
29
![Page 30: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/30.jpg)
RNNs are awful at learning long-term patterns.
30
![Page 31: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/31.jpg)
RNNs are awful at learning long-term patterns.
What will the gradients do?
31
![Page 32: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/32.jpg)
Which properties do we want memory in an RNN to haveStore information for arbitrary durationBe resistant to noiseBe trainable
32
![Page 33: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/33.jpg)
A hard task for thisA relevant sequence (length L)Followed by an irrelevant sequence (length T>>L)Answer at endIntuition: figure out relevant information. Then keep it for L steps. Trivial solution: using only one tanh neuron!
: Latch to +/- 133
![Page 34: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/34.jpg)
Dynamics and failure
34
![Page 35: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/35.jpg)
Let us think about the gradientsWe want that the irrelevant input has little inputSo the input has little influenceWhat does that mean for the gradient?
35
![Page 36: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/36.jpg)
LSTMs
Leaning on Colah’s blog: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
36
![Page 37: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/37.jpg)
RNN
37
![Page 38: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/38.jpg)
LSTM
38
![Page 39: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/39.jpg)
Stepping through the LSTM: delete if makes senseSay cell should contain if last image I saw is a puppy
Step 1: forget if new subject
39
![Page 40: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/40.jpg)
Write new value if makes sense
Tanh kinda binarizes the to-be-stored values. Even more resilient.
Step 2: write if new subject
40
![Page 41: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/41.jpg)
Commit to memory
Step 3: commit to hidden state
41
![Page 42: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/42.jpg)
Get output hidden state
42
![Page 43: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/43.jpg)
Variant: Look at memory
Why not also directly see the memory?
Gers and Schmidthuber 2000 43
![Page 44: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/44.jpg)
Variant: Couple read and write
Couple delete and write, but can not do decay anymore!
44
![Page 45: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/45.jpg)
Gated Recurrent Unit
45
![Page 46: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/46.jpg)
Success stories of LSTMsSOTA in Precipitation nowcasting (2016)SOTA named entity (2016)Neural decoding (spykesML)Countless others
Above all: easy to train. Crazy useful for just about anything with time
46
![Page 47: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/47.jpg)
BiLSTMs
47
![Page 48: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/48.jpg)
Architecture of a biLSTM
48
![Page 49: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/49.jpg)
BiLSTM performance● In practice, performs better and trains faster than vanilla LSTM● Shorter paths to much of the information. Shorter path = better gradients● Speculation as to why exactly it is better - i.e ensures past and future data
are equally weighted - but no one really knows.
49
![Page 50: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/50.jpg)
Success storiesSOTA in POS tagging 2015 (Bidirectional LSTM-CRF Models for Sequence Tagging)SOTA in speech recognition 2013
Above all: it is reasonably fast to train. Workhorse for countless research applications
50
![Page 51: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/51.jpg)
InferSent● Achieved good results in:
○ Semantic Textual Similarity○ Paraphrase detection○ Caption-image retrieval
● Outperforms newer, more sophisticated models like BERT in tasks such as Semantic Textual Similarity
51
![Page 52: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/52.jpg)
InferSent architectureModel Architecture
52
![Page 53: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/53.jpg)
Finetuning InferSent fro Natural Language Inference
entailment, contradiction and neutral
53
![Page 54: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/54.jpg)
Embeddings from Language Models (ELMo)• BiLSTM developed in 2018 that generates word embeddings based on the
context it appears in• Achieved state of the art in many NLP tasks including:
– Question Answering– Sentiment Analysis for movie reviews– Sentiment Analysis
• Has since been surpassed by BERT-like architectures which we will cover soon!
54
![Page 55: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/55.jpg)
ELMo architecture
55
![Page 56: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/56.jpg)
What we learned today● Discussed modern NLP for variable-length problems● RNNs
○ Architecture○ Backpropagation through time (BPTT)
● LSTMs○ Motivation○ Forget gates
● BiLSTMs○ Motivation○ Architecture
56
![Page 57: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/57.jpg)
PyTorch implementation
57
![Page 58: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/58.jpg)
LSTM
58
![Page 59: CIS 522: Lecture 12 - Penn EngineeringCIS 522: Lecture 12 RNNs and LSTMs 2/25/2020 1 Feedback HW2 took mean=25h. Will somewhat shorten next year Some did it in](https://reader034.vdocument.in/reader034/viewer/2022042310/5ed82dff0fa3e705ec0dfbf8/html5/thumbnails/59.jpg)
biLSTM
59