nyai #5 - fun with neural nets by jason yosinski
TRANSCRIPT
MEETUP #5: Neural Nets (Jason Yosinski) &
ML for Production (Ken Sanford)
Fun with Neural Nets
NYAI meetup 24 August 2016 Jason Yosinski
Original slides available under Creative Commons Attribution-ShareAlike 3.0
Geometric Intelligence
Neu
ral n
ets
star
t wor
king
1950 1960 1970 1980 1990 2000 2010 2020 ……
Progress in AI
Neu
ral n
ets
star
t wor
king
1950 1960 1970 1980 1990 2000 2010 2020 ……
Progress in AI
Chen et al., 2014
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Guoguo Chen⇤1 Carolina Parada2 Georg Heigold2
1 Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD2 Google Inc., Mountain View, CA
[email protected] [email protected] [email protected]
ABSTRACT
Our application requires a keyword spotting system with a smallmemory footprint, low computational cost, and high precision. Tomeet these requirements, we propose a simple approach based ondeep neural networks. A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.Keyword recognition results achieve 45% relative improvement withrespect to a competitive Hidden Markov Model-based system, whileperformance in the presence of babble noise shows 39% relative im-provement.
Index Terms— Deep Neural Network, Keyword Spotting, Em-bedded Speech Recognition
1. INTRODUCTION
Thanks to the rapid development of smartphones and tablets, inter-acting with technology using voice is becoming commonplace. Forexample, Google offers the ability to search by voice [1] on Androiddevices and Apple’s iOS devices are equipped with a conversationalassistant named Siri. These products allow a user to tap a device andthen speak a query or a command.
We are interested in enabling users to have a fully hands-freeexperience by developing a system that listens continuously for spe-cific keywords to initiate voice input. This could be especially use-ful in situations like driving. The proposed system must be highlyaccurate, low-latency, small-footprint, and run in computationallyconstrained environments such as modern mobile devices. Runningthe system on the device avoids latency and power implications withconnecting to the server for recognition.
Keyword Spotting (KWS) aims at detecting predefined key-words in an audio stream, and it is a potential technique to providethe desired hands-free interface. There is an extensive literature inKWS, although most of the proposed methods are not suitable forlow-latency applications in computationally constrained environ-ments. For example, several KWS systems [2, 3, 4] assume offlineprocessing of the audio using large vocabulary continuous speechrecognition systems (LVCSR) to generate rich lattices. In this case,their task focuses on efficient indexing and search for keywords inthe lattices. These systems are often used to search large databasesof audio content. We focus instead on detecting keywords in theaudio stream without any latency.
A commonly used technique for keyword spotting is the Key-word/Filler Hidden Markov Model (HMM) [5, 6, 7, 8, 9]. Despitebeing initially proposed over two decades ago, it remains highlycompetitive. In this generative approach, an HMM model is trained
⇤The author performed the work as a summer intern at Google, MTV.
for each keyword, and a filler model HMM is trained from the non-keyword segments of the speech signal (fillers). At runtime, thesesystems require Viterbi decoding, which can be computationally ex-pensive depending on the HMM topology. Other recent work ex-plores discriminative models for keyword spotting based on large-margin formulation [10, 11] or recurrent neural networks [12, 13].These systems show improvement over the HMM approach but re-quire processing of the entire utterance to find the optimal keywordregion or take information from a long time span to predict the entirekeyword, increasing detection latency.
We propose a simple discriminative KWS approach based ondeep neural networks that is appropriate for mobile devices. Werefer to it as Deep KWS . A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.In contrast with the HMM approach, this system does not requirea sequence search algorithm (decoding), leading to a significantlysimpler implementation, reduced runtime computation, and smallermemory footprint. It also makes a decision every 10 ms, minimizinglatency. We show that the Deep KWS system outperforms a standardHMM based system on both clean and noisy test sets, even when asmaller amount of data is used for training.
We describe our DNN based KWS framework in Section 2, andthe baseline HMM based KWS system in Section 3. The experimen-tal setup, results and some discussion follow in Section 4. Section 5closes with the conclusions.
2. DEEP KWS SYSTEM
The proposed Deep KWS framework is illustrated in Figure 1. Theframework consists of three major components: (i) a feature extrac-tion module, (ii) a deep neural network, and (iii) a posterior handlingmodule. The feature extraction module (i) performs voice-activitydetection and generates a vector of features every frame (10 ms).These features are stacked using the left and right context to cre-
Fig. 1. Framework of Deep KWS system, components from left toright: (i) Feature Extraction (ii) Deep Neural Network (iii) PosteriorHandling
Speech recognition, natural language conversation
Neu
ral n
ets
star
t wor
king
1950 1960 1970 1980 1990 2000 2010 2020 ……
Progress in AI
Chen et al., 2014
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Guoguo Chen⇤1 Carolina Parada2 Georg Heigold2
1 Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD2 Google Inc., Mountain View, CA
[email protected] [email protected] [email protected]
ABSTRACT
Our application requires a keyword spotting system with a smallmemory footprint, low computational cost, and high precision. Tomeet these requirements, we propose a simple approach based ondeep neural networks. A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.Keyword recognition results achieve 45% relative improvement withrespect to a competitive Hidden Markov Model-based system, whileperformance in the presence of babble noise shows 39% relative im-provement.
Index Terms— Deep Neural Network, Keyword Spotting, Em-bedded Speech Recognition
1. INTRODUCTION
Thanks to the rapid development of smartphones and tablets, inter-acting with technology using voice is becoming commonplace. Forexample, Google offers the ability to search by voice [1] on Androiddevices and Apple’s iOS devices are equipped with a conversationalassistant named Siri. These products allow a user to tap a device andthen speak a query or a command.
We are interested in enabling users to have a fully hands-freeexperience by developing a system that listens continuously for spe-cific keywords to initiate voice input. This could be especially use-ful in situations like driving. The proposed system must be highlyaccurate, low-latency, small-footprint, and run in computationallyconstrained environments such as modern mobile devices. Runningthe system on the device avoids latency and power implications withconnecting to the server for recognition.
Keyword Spotting (KWS) aims at detecting predefined key-words in an audio stream, and it is a potential technique to providethe desired hands-free interface. There is an extensive literature inKWS, although most of the proposed methods are not suitable forlow-latency applications in computationally constrained environ-ments. For example, several KWS systems [2, 3, 4] assume offlineprocessing of the audio using large vocabulary continuous speechrecognition systems (LVCSR) to generate rich lattices. In this case,their task focuses on efficient indexing and search for keywords inthe lattices. These systems are often used to search large databasesof audio content. We focus instead on detecting keywords in theaudio stream without any latency.
A commonly used technique for keyword spotting is the Key-word/Filler Hidden Markov Model (HMM) [5, 6, 7, 8, 9]. Despitebeing initially proposed over two decades ago, it remains highlycompetitive. In this generative approach, an HMM model is trained
⇤The author performed the work as a summer intern at Google, MTV.
for each keyword, and a filler model HMM is trained from the non-keyword segments of the speech signal (fillers). At runtime, thesesystems require Viterbi decoding, which can be computationally ex-pensive depending on the HMM topology. Other recent work ex-plores discriminative models for keyword spotting based on large-margin formulation [10, 11] or recurrent neural networks [12, 13].These systems show improvement over the HMM approach but re-quire processing of the entire utterance to find the optimal keywordregion or take information from a long time span to predict the entirekeyword, increasing detection latency.
We propose a simple discriminative KWS approach based ondeep neural networks that is appropriate for mobile devices. Werefer to it as Deep KWS . A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.In contrast with the HMM approach, this system does not requirea sequence search algorithm (decoding), leading to a significantlysimpler implementation, reduced runtime computation, and smallermemory footprint. It also makes a decision every 10 ms, minimizinglatency. We show that the Deep KWS system outperforms a standardHMM based system on both clean and noisy test sets, even when asmaller amount of data is used for training.
We describe our DNN based KWS framework in Section 2, andthe baseline HMM based KWS system in Section 3. The experimen-tal setup, results and some discussion follow in Section 4. Section 5closes with the conclusions.
2. DEEP KWS SYSTEM
The proposed Deep KWS framework is illustrated in Figure 1. Theframework consists of three major components: (i) a feature extrac-tion module, (ii) a deep neural network, and (iii) a posterior handlingmodule. The feature extraction module (i) performs voice-activitydetection and generates a vector of features every frame (10 ms).These features are stacked using the left and right context to cre-
Fig. 1. Framework of Deep KWS system, components from left toright: (i) Feature Extraction (ii) Deep Neural Network (iii) PosteriorHandling
Speech recognition, natural language conversation
Neu
ral n
ets
star
t wor
king
1950 1960 1970 1980 1990 2000 2010 2020 ……
Progress in AI
Chen et al., 2014
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Guoguo Chen⇤1 Carolina Parada2 Georg Heigold2
1 Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD2 Google Inc., Mountain View, CA
[email protected] [email protected] [email protected]
ABSTRACT
Our application requires a keyword spotting system with a smallmemory footprint, low computational cost, and high precision. Tomeet these requirements, we propose a simple approach based ondeep neural networks. A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.Keyword recognition results achieve 45% relative improvement withrespect to a competitive Hidden Markov Model-based system, whileperformance in the presence of babble noise shows 39% relative im-provement.
Index Terms— Deep Neural Network, Keyword Spotting, Em-bedded Speech Recognition
1. INTRODUCTION
Thanks to the rapid development of smartphones and tablets, inter-acting with technology using voice is becoming commonplace. Forexample, Google offers the ability to search by voice [1] on Androiddevices and Apple’s iOS devices are equipped with a conversationalassistant named Siri. These products allow a user to tap a device andthen speak a query or a command.
We are interested in enabling users to have a fully hands-freeexperience by developing a system that listens continuously for spe-cific keywords to initiate voice input. This could be especially use-ful in situations like driving. The proposed system must be highlyaccurate, low-latency, small-footprint, and run in computationallyconstrained environments such as modern mobile devices. Runningthe system on the device avoids latency and power implications withconnecting to the server for recognition.
Keyword Spotting (KWS) aims at detecting predefined key-words in an audio stream, and it is a potential technique to providethe desired hands-free interface. There is an extensive literature inKWS, although most of the proposed methods are not suitable forlow-latency applications in computationally constrained environ-ments. For example, several KWS systems [2, 3, 4] assume offlineprocessing of the audio using large vocabulary continuous speechrecognition systems (LVCSR) to generate rich lattices. In this case,their task focuses on efficient indexing and search for keywords inthe lattices. These systems are often used to search large databasesof audio content. We focus instead on detecting keywords in theaudio stream without any latency.
A commonly used technique for keyword spotting is the Key-word/Filler Hidden Markov Model (HMM) [5, 6, 7, 8, 9]. Despitebeing initially proposed over two decades ago, it remains highlycompetitive. In this generative approach, an HMM model is trained
⇤The author performed the work as a summer intern at Google, MTV.
for each keyword, and a filler model HMM is trained from the non-keyword segments of the speech signal (fillers). At runtime, thesesystems require Viterbi decoding, which can be computationally ex-pensive depending on the HMM topology. Other recent work ex-plores discriminative models for keyword spotting based on large-margin formulation [10, 11] or recurrent neural networks [12, 13].These systems show improvement over the HMM approach but re-quire processing of the entire utterance to find the optimal keywordregion or take information from a long time span to predict the entirekeyword, increasing detection latency.
We propose a simple discriminative KWS approach based ondeep neural networks that is appropriate for mobile devices. Werefer to it as Deep KWS . A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.In contrast with the HMM approach, this system does not requirea sequence search algorithm (decoding), leading to a significantlysimpler implementation, reduced runtime computation, and smallermemory footprint. It also makes a decision every 10 ms, minimizinglatency. We show that the Deep KWS system outperforms a standardHMM based system on both clean and noisy test sets, even when asmaller amount of data is used for training.
We describe our DNN based KWS framework in Section 2, andthe baseline HMM based KWS system in Section 3. The experimen-tal setup, results and some discussion follow in Section 4. Section 5closes with the conclusions.
2. DEEP KWS SYSTEM
The proposed Deep KWS framework is illustrated in Figure 1. Theframework consists of three major components: (i) a feature extrac-tion module, (ii) a deep neural network, and (iii) a posterior handlingmodule. The feature extraction module (i) performs voice-activitydetection and generates a vector of features every frame (10 ms).These features are stacked using the left and right context to cre-
Fig. 1. Framework of Deep KWS system, components from left toright: (i) Feature Extraction (ii) Deep Neural Network (iii) PosteriorHandling
Speech recognition, natural language conversation
Reinforcement Learning
Silver et al., 2016
Neu
ral n
ets
star
t wor
king
1950 1960 1970 1980 1990 2000 2010 2020 ……
Progress in AI
Chen et al., 2014
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Guoguo Chen⇤1 Carolina Parada2 Georg Heigold2
1 Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD2 Google Inc., Mountain View, CA
[email protected] [email protected] [email protected]
ABSTRACT
Our application requires a keyword spotting system with a smallmemory footprint, low computational cost, and high precision. Tomeet these requirements, we propose a simple approach based ondeep neural networks. A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.Keyword recognition results achieve 45% relative improvement withrespect to a competitive Hidden Markov Model-based system, whileperformance in the presence of babble noise shows 39% relative im-provement.
Index Terms— Deep Neural Network, Keyword Spotting, Em-bedded Speech Recognition
1. INTRODUCTION
Thanks to the rapid development of smartphones and tablets, inter-acting with technology using voice is becoming commonplace. Forexample, Google offers the ability to search by voice [1] on Androiddevices and Apple’s iOS devices are equipped with a conversationalassistant named Siri. These products allow a user to tap a device andthen speak a query or a command.
We are interested in enabling users to have a fully hands-freeexperience by developing a system that listens continuously for spe-cific keywords to initiate voice input. This could be especially use-ful in situations like driving. The proposed system must be highlyaccurate, low-latency, small-footprint, and run in computationallyconstrained environments such as modern mobile devices. Runningthe system on the device avoids latency and power implications withconnecting to the server for recognition.
Keyword Spotting (KWS) aims at detecting predefined key-words in an audio stream, and it is a potential technique to providethe desired hands-free interface. There is an extensive literature inKWS, although most of the proposed methods are not suitable forlow-latency applications in computationally constrained environ-ments. For example, several KWS systems [2, 3, 4] assume offlineprocessing of the audio using large vocabulary continuous speechrecognition systems (LVCSR) to generate rich lattices. In this case,their task focuses on efficient indexing and search for keywords inthe lattices. These systems are often used to search large databasesof audio content. We focus instead on detecting keywords in theaudio stream without any latency.
A commonly used technique for keyword spotting is the Key-word/Filler Hidden Markov Model (HMM) [5, 6, 7, 8, 9]. Despitebeing initially proposed over two decades ago, it remains highlycompetitive. In this generative approach, an HMM model is trained
⇤The author performed the work as a summer intern at Google, MTV.
for each keyword, and a filler model HMM is trained from the non-keyword segments of the speech signal (fillers). At runtime, thesesystems require Viterbi decoding, which can be computationally ex-pensive depending on the HMM topology. Other recent work ex-plores discriminative models for keyword spotting based on large-margin formulation [10, 11] or recurrent neural networks [12, 13].These systems show improvement over the HMM approach but re-quire processing of the entire utterance to find the optimal keywordregion or take information from a long time span to predict the entirekeyword, increasing detection latency.
We propose a simple discriminative KWS approach based ondeep neural networks that is appropriate for mobile devices. Werefer to it as Deep KWS . A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.In contrast with the HMM approach, this system does not requirea sequence search algorithm (decoding), leading to a significantlysimpler implementation, reduced runtime computation, and smallermemory footprint. It also makes a decision every 10 ms, minimizinglatency. We show that the Deep KWS system outperforms a standardHMM based system on both clean and noisy test sets, even when asmaller amount of data is used for training.
We describe our DNN based KWS framework in Section 2, andthe baseline HMM based KWS system in Section 3. The experimen-tal setup, results and some discussion follow in Section 4. Section 5closes with the conclusions.
2. DEEP KWS SYSTEM
The proposed Deep KWS framework is illustrated in Figure 1. Theframework consists of three major components: (i) a feature extrac-tion module, (ii) a deep neural network, and (iii) a posterior handlingmodule. The feature extraction module (i) performs voice-activitydetection and generates a vector of features every frame (10 ms).These features are stacked using the left and right context to cre-
Fig. 1. Framework of Deep KWS system, components from left toright: (i) Feature Extraction (ii) Deep Neural Network (iii) PosteriorHandling
Speech recognition, natural language conversation
Reinforcement Learning
Silver et al., 2016
Neu
ral n
ets
star
t wor
king
1950 1960 1970 1980 1990 2000 2010 2020 ……
Progress in AI
Chen et al., 2014
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
Guoguo Chen⇤1 Carolina Parada2 Georg Heigold2
1 Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD2 Google Inc., Mountain View, CA
[email protected] [email protected] [email protected]
ABSTRACT
Our application requires a keyword spotting system with a smallmemory footprint, low computational cost, and high precision. Tomeet these requirements, we propose a simple approach based ondeep neural networks. A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.Keyword recognition results achieve 45% relative improvement withrespect to a competitive Hidden Markov Model-based system, whileperformance in the presence of babble noise shows 39% relative im-provement.
Index Terms— Deep Neural Network, Keyword Spotting, Em-bedded Speech Recognition
1. INTRODUCTION
Thanks to the rapid development of smartphones and tablets, inter-acting with technology using voice is becoming commonplace. Forexample, Google offers the ability to search by voice [1] on Androiddevices and Apple’s iOS devices are equipped with a conversationalassistant named Siri. These products allow a user to tap a device andthen speak a query or a command.
We are interested in enabling users to have a fully hands-freeexperience by developing a system that listens continuously for spe-cific keywords to initiate voice input. This could be especially use-ful in situations like driving. The proposed system must be highlyaccurate, low-latency, small-footprint, and run in computationallyconstrained environments such as modern mobile devices. Runningthe system on the device avoids latency and power implications withconnecting to the server for recognition.
Keyword Spotting (KWS) aims at detecting predefined key-words in an audio stream, and it is a potential technique to providethe desired hands-free interface. There is an extensive literature inKWS, although most of the proposed methods are not suitable forlow-latency applications in computationally constrained environ-ments. For example, several KWS systems [2, 3, 4] assume offlineprocessing of the audio using large vocabulary continuous speechrecognition systems (LVCSR) to generate rich lattices. In this case,their task focuses on efficient indexing and search for keywords inthe lattices. These systems are often used to search large databasesof audio content. We focus instead on detecting keywords in theaudio stream without any latency.
A commonly used technique for keyword spotting is the Key-word/Filler Hidden Markov Model (HMM) [5, 6, 7, 8, 9]. Despitebeing initially proposed over two decades ago, it remains highlycompetitive. In this generative approach, an HMM model is trained
⇤The author performed the work as a summer intern at Google, MTV.
for each keyword, and a filler model HMM is trained from the non-keyword segments of the speech signal (fillers). At runtime, thesesystems require Viterbi decoding, which can be computationally ex-pensive depending on the HMM topology. Other recent work ex-plores discriminative models for keyword spotting based on large-margin formulation [10, 11] or recurrent neural networks [12, 13].These systems show improvement over the HMM approach but re-quire processing of the entire utterance to find the optimal keywordregion or take information from a long time span to predict the entirekeyword, increasing detection latency.
We propose a simple discriminative KWS approach based ondeep neural networks that is appropriate for mobile devices. Werefer to it as Deep KWS . A deep neural network is trained to directlypredict the keyword(s) or subword units of the keyword(s) followedby a posterior handling method producing a final confidence score.In contrast with the HMM approach, this system does not requirea sequence search algorithm (decoding), leading to a significantlysimpler implementation, reduced runtime computation, and smallermemory footprint. It also makes a decision every 10 ms, minimizinglatency. We show that the Deep KWS system outperforms a standardHMM based system on both clean and noisy test sets, even when asmaller amount of data is used for training.
We describe our DNN based KWS framework in Section 2, andthe baseline HMM based KWS system in Section 3. The experimen-tal setup, results and some discussion follow in Section 4. Section 5closes with the conclusions.
2. DEEP KWS SYSTEM
The proposed Deep KWS framework is illustrated in Figure 1. Theframework consists of three major components: (i) a feature extrac-tion module, (ii) a deep neural network, and (iii) a posterior handlingmodule. The feature extraction module (i) performs voice-activitydetection and generates a vector of features every frame (10 ms).These features are stacked using the left and right context to cre-
Fig. 1. Framework of Deep KWS system, components from left toright: (i) Feature Extraction (ii) Deep Neural Network (iii) PosteriorHandling
Speech recognition, natural language conversation
Reinforcement Learning
Silver et al., 2016
Not just perceiving the world,but also generating…
Robot Gait Discovery
Hand-Coded Gait
Fixed Shallow Topology, Learned Parameters
Learned Deep Topology, Learned Parameters
Learned Deep Topology, Learned Parameters
Learned Deep Topology, Learned Parameters
9x fasterthan human designed gait
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture
5 convolutional layers 3 FC layers
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture• dataset (big: 250b)
5 convolutional layers 3 FC layers
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture• dataset (big: 250b)
5 convolutional layers 3 FC layers
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture• dataset (big: 250b)
5 convolutional layers 3 FC layers
ImageNet, Deng et al. 2009
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture• dataset (big: 250b)
5 convolutional layers 3 FC layers
ImageNet, Deng et al. 2009
jaguar gibbon great white shark water bottle
golden retriever orangutan fireboat bubble
tobacco shop ambulance cowboy hat mixing bowl
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture• dataset (big: 250b)
5 convolutional layers 3 FC layers
Lion
Krizhevsky et al. 2012
AlexNet
Lion
Recipe for understanding:• architecture• dataset (big: 250b)• parameters (big: 60m)
5 convolutional layers 3 FC layers
? ? ?
< DeepVis Toolbox demo >
Code at: http://yosinski.com/
Lion
Recipe for understanding:• architecture• dataset (big: 250b)• parameters (big: 60m)
See also: Erhan et al, 2009; Szegedy et al., 2013.
Recipe for understanding:• architecture• dataset (big: 250b)• parameters (big: 60m)
yx
r g b
(similar to this)
Deep Neural Networks are Easily Fooled:High Confidence Predictions for Unrecognizable Images
Simonyan ICLR ’14L2
Dai, Lu, Wu, ICLR ’15
PeacockLearnedNo regularization
L2 + L1 + spatial
No regularization
Nguyen, Dosovitskiy, Yosinski, Brox, Clune.“Synthesizing the preferred inputs for neurons in neural networks via deep generator networks”
...
I m age
banana
convertible
.....
Deep% generator%network(prior) DNN% being%visualized
candle
CodeForward%and%backward%passes
u9 u2u1 c1
c2
fc6 fc7fc8fc6
c3 c4 c5. . .
up c o n v o l u t i o n a l c o n v o l u t i o n a l
...
I m age
banana
convertible
.....
Deep% generator%network(prior) DNN% being%visualized
candle
CodeForward%and%backward%passes
u9 u2u1 c1
c2
fc6 fc7fc8fc6
c3 c4 c5. . .
up c o n v o l u t i o n a l c o n v o l u t i o n a l
Nguyen, Dosovitskiy, Yosinski, Brox, Clune.“Synthesizing the preferred inputs for neurons in neural networks via deep generator networks”
Castle Candle
+ =
Fireboat Candle
+ =
“What I cannot create, I do not understand.”
Richard Feynman’s blackboardCar
Engine Intelligencevs.
time
ability
computation
data
scientific understanding
AI Progress
time
ability
computation
data
scientific understanding
AI Progress
Waiting for EEs and Internet
New field
“Pseudobiology” ?(study of fake life)
Thanks!
Hod Lipson
Jeff Clune
Yoshua Bengio
Anh Nguyen
Code/etc:Email:
http://yosinski.com [email protected]
( Slides: http://s.yosinski.com/nyai.pdf )
Food & Drinks:
O’Reilly AI Conference Ticket Giveaway
INTERMISSION
Randomly selected by Jason & Ken