neural networks and applications to natural language...
TRANSCRIPT
Neural Networks and applications to natural languageprocessing
John Arevalo
MindLAB Research Group - Universidad Nacional de Colombia
June 21, 2018
John Arevalo Neural Networks and applications to natural language processing
Outline
1 Neural NetworksIntroductionInteractive demoNeural Network Training
2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types
3 Convolutional neural networks
OperationsCNN For NLP
4 Language modeling with recurrentneural networks
Recurrent neural networksLong short-term memory networksVariantSome applicationsResources
John Arevalo Neural Networks and applications to natural language processing
Neural Networks
Outline
1 Neural NetworksIntroductionInteractive demoNeural Network Training
2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types
3 Convolutional neural networks
OperationsCNN For NLP
4 Language modeling with recurrentneural networks
Recurrent neural networksLong short-term memory networksVariantSome applicationsResources
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Introduction
Neural Networks
Inspired by nature (the brain)
Simple processing units but many of them and highly interconnected
Distributed processing and memory
Learn from data samples
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Interactive demo
Interactive demo
Quick and dirty introduction to neural networks
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Neural Network Training
Learning as optimization
Problem definition: From x , predict y . i.e. find a function f such thatf (x )→ y .
H : hypothesis space, D :training data
D = {(x1, y1), . . . , (x`, y`)}
fΘ ∈ H denotes a function f parametrized by Θ. To measure howgood fΘ (x ) estimates y , a loss L is defined.
General optimization problem:
minf ∈H
L(f ,D),
Let y = fΘ (x ) be the prediction made by fΘ given x as input.Squared loss is an example:
L(fΘ,D) = E (Θ,D) =∑i=1
‖yi − yi‖22
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Neural Network Training
Other loss functions
L1 loss:
E (Θ,D) =∑i=1
‖yi − yi‖1
Cross-entropy loss:
E (Θ,D) = − ln∏i=1
p(yi |xi ,Θ)
= −∑i=1
[yi ln yi + (1− yi) ln(1− yi)]
Hinge loss:
E (Θ,D) =∑i=1
max(0, 1− yi yi)
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Neural Network Training
Optimization by Gradient descent
Θt+1 = Θt − ηt∇ΘE (Θt)
∇ΘE (Θ) =∂E (Θ)
∂Θ
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Neural Network Training
Backpropagation [Rumelhart, Hinton, 1986]
Efficient strategy to calculate the gradient.
Errors are back-propagated through the network to assign’responsibility’ to each neuron (δi)
Gradient is calculated based on delta values.
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Neural Network Training
Backpropagation [Rumelhart, Hinton, 1986]
Inside a neuron
John Arevalo Neural Networks and applications to natural language processing
Neural Networks Neural Network Training
Backpropagation [Rumelhart, Hinton, 1986]
Bigger network can composed thousands of operations
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning
Outline
1 Neural NetworksIntroductionInteractive demoNeural Network Training
2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types
3 Convolutional neural networks
OperationsCNN For NLP
4 Language modeling with recurrentneural networks
Recurrent neural networksLong short-term memory networksVariantSome applicationsResources
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature extraction
Feature extraction
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature extraction
Features
Features represent our prior knowledge of the problem
Depend on the type of data
Specialized features for practically any kind of data (images, video,sound, speech, text, web pages, etc)
Medical imaging:
Standard computer vision features (color, shape, texture, edges,local-global, etc)Specialized features tailored to the problem at hand
New trend: learning features from data
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Feature learning
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Feature learning
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Feature learning approaches
Unsupervised feature learning
Convolutional neural networks
Recurrent neural networks
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Unsupervised feature learning
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Deep feed-forward neural networks
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
ImageNet 2012 [Krizhevsky, Sutskever, Hinton 2012]
(source: ICML2013 Deep Learning Tutorial, Yan LeCun et al.)John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
ImageNet 2012 [Krizhevsky, Sutskever, Hinton 2012]
(source: ICML2013 Deep Learning Tutorial, Yan LeCun et al.)John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Histopathology basal cell carcinoma
Tumor
Non-tumor
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Convolutional Autoencoder for Histopathology ImageRepresentation Learning
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Digital staining results
Cancer Cancer Cancer Non-cancer Non-cancer Non-cancer
Cancer Cancer Cancer Non-cancer Non-cancer Non-cancer
0.8272 0.9604 0.7944 0.2763 0.0856 0.0303
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
TICA learned features
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Feature learning for natural language data
But what about text?
Neural networks are a hot topic in NLP nowadays:
“NN language models and word embeddings were everywhere atNAACL2015 and ACL2015” C. Manning.Many successful applications:
Speech recognitionLanguage modelingTranslationImage captioning
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Practical considerations
Traditional backpropagation does not work well with multiple layers
It gets stuck in local minima
During the last years several strategies have beendeveloped/discovered (tricks of the trade):
Stochastic gradient descent with minibatches and adaptive learning rateLogistic regression/soft max for classificationNormalization of input variables, shuffling of training samplesRegularization using L1 and L2 norms and dropout
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Feature learning
Implementation
Use of GPUs is mandatory (speed-up > 100x)
Sometimes combined with distributed processing
Practically all the libraries use CUDA
Several higher-level frameworks:
NVIDIA CUDA Deep Neural Network library (cuDNN)CaffeTorchTheanoBlocksEtc.
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Neural Network Types
Types
Feed-forward, multilayerperceptrons
Convolutional neuralnetworks
Recurrent neural networks
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Neural Network Types
Types
Feed-forward, multilayerperceptrons
Convolutional neuralnetworks
Recurrent neural networks
John Arevalo Neural Networks and applications to natural language processing
Feature extraction and Learning Neural Network Types
Types
Feed-forward, multilayerperceptrons
Convolutional neuralnetworks
Recurrent neural networks
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks
Outline
1 Neural NetworksIntroductionInteractive demoNeural Network Training
2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types
3 Convolutional neural networks
OperationsCNN For NLP
4 Language modeling with recurrentneural networks
Recurrent neural networksLong short-term memory networksVariantSome applicationsResources
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolutional neural networks
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolution
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Pooling
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Pooling
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Pooling
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Pooling
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Convolutional neural networks
Repeat this conv->pooling operation multiple times.
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks Operations
Interactive demo
CNN for text classification...
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
Convolution
Use word embeddings to represent words. Each row represent a word.
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
Convolution
Use word embeddings to represent words. Each row represent a word.
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
Convolution
Use word embeddings to represent words. Each row represent a word.
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
Pooling
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
Pooling
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
CNN For NLP
Repeat conv->pool operations mutiple times. Mutiple filters can be usedin the same layer.
John Arevalo Neural Networks and applications to natural language processing
Convolutional neural networks CNN For NLP
Interactive demo
CNN for text classification...
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks
Outline
1 Neural NetworksIntroductionInteractive demoNeural Network Training
2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types
3 Convolutional neural networks
OperationsCNN For NLP
4 Language modeling with recurrentneural networks
Recurrent neural networksLong short-term memory networksVariantSome applicationsResources
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
Recurrent neural network
Neural networks with memory
Feed-forward NN: output exclusivelydepends on the current input
Recurrent NN: output depends incurrent and previous states
This is accomplished throughlateral/backward connections whichcarry information while processing asequence of inputs
(source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
Neural Net Language Model
Problem: predict the next wordgiven the previous 3 words(4-gram language model)
The matrix U corresponds tothe word vector representationof the words.
Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic languagemodel. The Journal of Machine Learning Research, 3, 1137-1155.
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
Character-level language model
(source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
Sequence learning alternatives
(source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
Network unrolling
(source: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
Backpropagation through time (BPTT)
(source: http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Recurrent neural networks
BPTT is hard
The vanishing and theexploding gradientproblem
Gradients could vanish(or explode) whenpropagated several stepsback
This makes difficult tolearn long-termdependencies.
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training Recurrent Neural Networks. Proc. ofICML, abs/1211.5063.
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Long term dependencies
(source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Long short-term memory (LSTM)
Hochreiter, Sepp, and Jurgen Schmidhuber. ”Long short-term memory.”Neuralcomputation 9, no. 8 (1997): 1735-1780.
LSTM networks solve the problem of long-term dependency problem.
They use gates that allow to keep memory through long sequencesand be updated only when required.
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Conventional RNN vs LSTM
(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Forget gate
Controls the flow of theprevious internal state Ct−1
ft = 1⇒ keep previous state
ft = 0⇒ forget previousstate
(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Input gate
Controls the flow of inputinformation (xt)
it = 1⇒ take input intoaccount
it = 0⇒ ignore input
(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Current state calculation
(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Long short-term memory networks
Output gate
Controls the flow ofinformation from the internalstate (xt) to the outside (ht)
ot = 1⇒ allows internalstate out
ot = 0⇒ doesn’t allowinternal state out
(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Variant
Gated recurrent units
Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., &Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statisticalmachine translation. arXiv preprint arXiv:1406.1078.(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
The Unreasonable Effectiveness of Recurrent NeuralNetworks
Famous blog entry from Andrej Karpathy (UofS)
Character-level language models based on multi-layer LSTMs.
Data:
Shakspare playsWikipediaLATEXLinux source code
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Algebraic geometry book in LATEX
(source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Linux source code
/** Increment the size file of the new incorrect UI_FILTER group information
* of the size generatively.
*/
static int indicate_policy(void)
{
int error;
if (fd == MARN_EPT) {
/*
* The kernel blank will coeld it to userspace.
*/
if (ss->segment < mem_total)
unblock_graph_and_set_blocked();
else
ret = 1;
goto bail;
}
segaddr = in_SB(in.addr);
selector = seg / 16;
setup_works = true;
for (i = 0; i < blocks; i++) {
seq = buf[i++];
bpf = bd->bd.next + i * search;
if (fd) {
current = blocked;
}
}
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Interactive demo
RNN for language modeling...
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Image captioning
Karpathy, Andrej, and Li Fei-Fei. ”Deep visual-semantic alignments for generatingimage descriptions.” CVPR2015. arXiv preprint arXiv:1412.2306 (2014).
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Approach
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Image-sentence score model
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Multimodal RNN
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Alignment results
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Some applications
Captioning results
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Resources
Papers (1)
General:
S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735-1780, 1997. Based on TR FKI-207-95, TUM (1995).J. Schmidhuber. Deep Learning in Neural Networks: An Overview. NeuralNetworks, Volume 61, January 2015, Pages 85-117 (DOI:10.1016/j.neunet.2014.09.003)
Language modeling:
Mikolov, Tomas, et al. ”Recurrent neural network based language model.”INTERSPEECH 2010, 11th Annual Conference of the International SpeechCommunication Association, Makuhari, Chiba, Japan, September 26-30, 2010.2010.Mikolov, Tomas, et al. ”Extensions of recurrent neural network language model.”Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE InternationalConference on. IEEE, 2011.Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. ”Generating text withrecurrent neural networks.” Proceedings of the 28th International Conference onMachine Learning (ICML-11). 2011.
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Resources
Papers (2)
Machine translation:
Liu, Shujie, et al. ”A recursive recurrent neural network for statistical machinetranslation.” Proceedings of ACL. 2014.Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. ”Sequence to sequence learningwith neural networks.” Advances in neural information processing systems. 2014.Auli, Michael, et al. ”Joint Language and Translation Modeling with RecurrentNeural Networks.” EMNLP. Vol. 3. No. 8. 2013.
Speech recognition:
Graves, Alex, and Navdeep Jaitly. ”Towards end-to-end speech recognition withrecurrent neural networks.” Proceedings of the 31st International Conference onMachine Learning (ICML-14). 2014.
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Resources
Papers (3)
Image captioning:
Karpathy, Andrej, and Li Fei-Fei. ”Deep visual-semantic alignments for generatingimage descriptions.” CVPR2015. arXiv preprint arXiv:1412.2306 (2014).Vinyals, Oriol, et al. ”Show and tell: A neural image caption generator.” CVPR2015.arXiv preprint arXiv:1411.4555 (2014).Chen, Xinlei, and C. Lawrence Zitnick. ”Learning a recurrent visual representationfor image caption generation.” arXiv preprint arXiv:1411.5654 (2014).Fang, Hao, et al. ”From captions to visual concepts and back.” CVPR2015, arXivpreprint arXiv:1411.4952 (2014).
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Resources
Other resources
This presentation was adapted fromhttp://fagonzalezo.github.io/nn_nlp_tutorial/
Christopher Olah, Understanding LSTM Networks,http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Denny Britz, Recurrent Neural Networks Tutorial,http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
Andrej Karpathy, The Unreasonable Effectiveness of Recurrent NeuralNetworks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Jurgen Schmidhuber, Recurrent Neural Networks,http://people.idsia.ch/∼juergen/rnn.html
John Arevalo Neural Networks and applications to natural language processing
Language modeling with recurrent neural networks Resources
Thanks!
http://www.mindlaboratory.org
John Arevalo Neural Networks and applications to natural language processing