![Page 1: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/1.jpg)
Can Cognitive Neuroscience Provide a Theory of Deep Learning Capacity?Ted Willke and the Mind’s Eye Team
Intel Labs
May 20, 2016
![Page 2: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/2.jpg)
2
“Breakthrough innovation occurs when we bring down boundaries and encourage disciplines to learn from each other”
― Gyan Nagpal, Talent Economics: The Fine Line Between Winning and Losing the Global War for Talent
2
MIND’S EYE
![Page 3: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/3.jpg)
3
Cognitive Neuroscience
![Page 4: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/4.jpg)
4
Cognitive Neuroscience Is the study of the neurobiological mechanisms that underlie
cognitive processes, like attention, control, and decision making
Answer questions like: How does the brain coordinate behaviour to achieve goals? What are the brain structures upon which these functions depend? How does brain function differ amongst people?
Draws upon brain imaging/recordings and other observations to derive models
![Page 5: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/5.jpg)
5
Context-Dependent Decision Making
Michael Shvartsman, Vibhav Srivatsava, Narayanan Sundaram, Jonathan D. Cohen, “Using behavior to decode allocation of attention in context dependent decision making”, accepted at International Conference on Cognitive Modeling, 2016.
![Page 6: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/6.jpg)
6
Selective Forgetting
Kim, Ghootae and Lewis-Peacock, Jarrod A. and Norman, Kenneth A. and Turk-Browne, Nicholas B., “Pruning of memories by context-based prediction error,” Proceedings of the National Academy of Sciences, 2014
![Page 7: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/7.jpg)
7
Production and comprehension of naturalistic narrative speech
Silbert LJ, Honey CJ, Simony E, Poeppel D, Hasson U (2014) Coupled neural systems underlie the production and comprehension of naturalistic narrative speech. Proc Natl Acad Sci USA 111:E4687-4696.
![Page 8: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/8.jpg)
8
CRACKS APPEAR, DISRUPTIVE IDEAS 30 years on
MIT Press, 1986
![Page 9: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/9.jpg)
9
Cognitive Neuroscience
Adapted from Marvin Minksy in Artificial Intelligence at MIT, Expanding Frontiers, Patrick H. Winston (Ed.), Vol.1, MIT Press, 1990. Reprinted in AI Magazine, Summer 1991
evolve
![Page 10: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/10.jpg)
10
Neural networks
![Page 11: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/11.jpg)
11
Neural Network preliminaries
http://wiki.apache.org/hama/MultiLayerPerceptron
![Page 12: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/12.jpg)
12
Neural Network preliminaries
Lecun et al., “Deep Learning” in Nature (2015)
![Page 13: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/13.jpg)
13
Arbitrary functions
https://upload.wikimedia.org/wikipedia/commons/7/7b/XOR_perceptron_net.png
![Page 14: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/14.jpg)
14
The original tenets of parallel distributed processing (roughly)
1. Cognitive processes arise from the real-time propagation of activation via weighted connections
2. Active representations are patterns of activation distributed over ensembles of units
3. Processing is interactive (bidirectional)
4. Knowledge is encoded in the connection weights (not in a separate store)
5. Learning and long-term memory depend on changes to these weights
6. Processing, learning, and representation are graded and continuous
7. Processing, learning, and representation depend on the environment
T.T. Rogers, J.L. McClelland / Cognitive Science 28 (2014)
![Page 15: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/15.jpg)
15
Brain-Inspired machine learning
Structure-Inspired Learning
Neurons (e.g., spiking models)
Networks (e.g., deep belief networks)
Architectures (e.g., Human Brain Project)
Cognitive-Inspired Learning
Reinforcement Learning
Context-based Memory
Noisy Decision Making
15
"Gray754" by Henry Vandyke Carter - Henry Gray(1918) Anatomy of the Human Body
![Page 16: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/16.jpg)
16
Deep learning takes advantage of parallel distributed processing
http://www.amax.com/blog/wp-content/uploads/2015/12/blog_deeplearning3.jpg
![Page 17: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/17.jpg)
17
Winning top spots in visual recognition challenges, etc.
(1) Lin et al., 2015, (2) https://www.cityscapes-dataset.com/dataset-overview/ (3) Deng et al., 2009 (4) http://lsun.cs.princeton.edu/2015.html
MS COCO (Common Objects in Context) CityScapes Datasets (Semantic Understanding)
ImageNet (Object Localization) LSUN (Saliency Prediction)
![Page 18: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/18.jpg)
18
Yang et al. (2015)
What are sitting in the basket on a bicycle?
![Page 19: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/19.jpg)
19
Yang et al. (2015)
Stacked Attention Networks for Image Question Answering
![Page 20: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/20.jpg)
20
The Glory and the remaining mysteryWe have achieved…
Exceeding human-level performance on visual recognition tasks
Mastering more and more complex games (Go)
Demonstrating human-level control in reinforcement learning (Atari)
Question-answering and other AI services are upon us
but we still don’t know…
How learnt (feature) representations are encoded (or if they converge for the same networks trained on the same data)
The capacity for learning representations
The trade-off between efficiency of representation and flexibility of processing
How things learnt interfere with each other
![Page 21: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/21.jpg)
21
Representations and Learning Capacity
![Page 22: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/22.jpg)
22Li et al. (ICLR 2016)
Representation encoding: meaningful and consistent?
Can we reliably map feature representations between these networks?
![Page 23: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/23.jpg)
23Li et al. (ICLR 2016)
Convergent Learning?
Conclusions:
1. Some features are learned reliably in multiple nets (some are not)
2. Units learn to span low-D subspaces, which are common (but specific basis vectors are not)
3. Representations are encoded as a mix of single unit and slightly distributed codes
4. Mean activation values across different networks converge to a nearly identical distribution
![Page 24: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/24.jpg)
24
Can cognitive neuroscience provide any insight into the nature of learning
and task capacity?
![Page 25: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/25.jpg)
25
The appeal of highly-parallel neural networks
Both cognitive neuroscience and machine learning applications exploit the following two features of neural networks to great benefit:
a) The ability to learn and process complex representations, taking into account a large number of interrelated and interacting constraints
b) The ability for the same network to process a wide range of potentially disparate representations (or tasks), sometimes called “multitask learning.”
But what are their limits??
![Page 26: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/26.jpg)
26
The brain: The black box at the end of our necks• Facts:
Only 2% of body weight but uses up to 20% of energy
~200B neurons
Neurons fire up to ~10 kHz
1K to 10K connections per neuron
• Cerebral neocortex:
~20B neurons
~125 trillion synapses
There are more ways to organize the neocortex’s ~125 trillion synapses than stars in the known universe
![Page 27: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/27.jpg)
27
The paradox – one task at a time
![Page 28: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/28.jpg)
28
A fundamental puzzle concerning human processing
Why, in some circumstances is the brain capable of a remarkable degree of parallelism (e.g., locomotion, navigation, speech, and bimanual gesticulation), while in others it’s capacity for parallelism is radically limited (e.g., the inability to conduct mental arithmetic while constructing a grocery list at the same time)??!!
![Page 29: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/29.jpg)
29
A theory
The difference in multitasking ability may reflect the degree to which different tasks rely on shared representations
The more that different processes interact, the stronger the imposition of seriality
May reflect a fundamental trade-off in neural network architectures between the efficiency of shared representations (and the capacity for generalization that they afford) and the effectiveness of multitasking.
![Page 30: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/30.jpg)
30
Multi-tasking and cross-talk
Feng et al. (CABN 2014)
![Page 31: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/31.jpg)
31
You will see a sequence of words. Quickly say the color of the letters.
![Page 32: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/32.jpg)
32
SNOW
![Page 33: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/33.jpg)
33
Ready!
![Page 34: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/34.jpg)
34
BLUE
![Page 35: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/35.jpg)
35
RED
![Page 36: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/36.jpg)
36
BLACK
![Page 37: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/37.jpg)
37
GREEN
![Page 38: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/38.jpg)
41
BLACK
![Page 39: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/39.jpg)
42
BLUE
![Page 40: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/40.jpg)
46
GREEN
![Page 41: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/41.jpg)
49
RED
![Page 42: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/42.jpg)
52
BLUE
![Page 43: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/43.jpg)
54
Now with the words upside down.
![Page 44: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/44.jpg)
55
BLACK
![Page 45: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/45.jpg)
56
GREEN
![Page 46: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/46.jpg)
58
BLACK
![Page 47: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/47.jpg)
59
RED
![Page 48: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/48.jpg)
60
Were you faster to answer?
![Page 49: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/49.jpg)
61
A Demonstration of interference
Stroop (1935)
![Page 50: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/50.jpg)
62
multi-tasking interference (In the stroop test)
Cohen et al. (1990)
Color Word
Verbalize Task
![Page 51: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/51.jpg)
63
Control-Demanding Behavior (Feng et al. 2014)
First to describe the trade-off between the efficiency of representation (“multiplexing”) and the simultaneous engagement of different processing pathways (“multitasking”)
Showed that even a modest amount of multiplexing rapidly introduces cross-talk among processing pathways
Proposed that the large advantage of efficient encoding have driven the human brain to favour this over the capacity for control-demanding processes.
![Page 52: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/52.jpg)
64
Types of interference
![Page 53: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/53.jpg)
65
Maximum independent set (MIS)
The MIS is the largest set of processes in the network that can be simultaneously executed without interference.
![Page 54: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/54.jpg)
66
network structure (distribution complexity)
The network capacity for multitasking depends on the distribution of in-degrees and out-degrees of the network (we only play with in-degree of output components though)
We represent this with a “distribution complexity” symmetry measure (maximized for uniform distribution)
We study the characteristics of the network with DC fixed
![Page 55: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/55.jpg)
67
Takeaway: Even modest amounts of process overlap impose dramatic constraints on parallel processing capability
![Page 56: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/56.jpg)
68
Trade-off between generalization and parallelism: Feed-Forward simulation
![Page 57: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/57.jpg)
69
Training/Test details
Training
20 network groups, 20 random initializations per group
All networks trained on same stimuli, 16 tasks
Trained to generate 1-hot task outputs (MSE < 0.0001)
Test
70/30 split
Generalization is MSE(ave) for ALL stimuli in test set
Parallel processing is measured response to (2,3,4) tasks simultaneously activated, measuring MSE for target pattern
![Page 58: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/58.jpg)
70
Shared Representations
Smaller weights (a) Larger weights (b)
![Page 59: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/59.jpg)
71
Generalization vs parallel processing capability
![Page 60: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/60.jpg)
72
Parallel processing capability vs max initial weights
![Page 61: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/61.jpg)
73
Future work
Extend analysis to weighted graphs
Study more complex networks (i.e., deeper structures, recurrent connections)
Study human performance (via neuroimaging data)!
![Page 62: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/62.jpg)
74
C. elegans
74
The OpenWorm Project(image generated by neuroConstruct)
SINCE 1986
![Page 63: Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16](https://reader031.vdocument.in/reader031/viewer/2022030314/5899b8d31a28aba11e8b686f/html5/thumbnails/63.jpg)
Thank you!