using auto-encoders to model early infant categorization: results, predictions and insights
TRANSCRIPT
Using auto-encoders to model early infant categorization:
results, predictions and insights
Overview
• An odd categorization asymmetry was observed in 3-4 month old infants.
• We explain this asymmetry using a connectionist auto-encoder model.
• Our model made a number of predictions, which turned out to be correct.
• We used a more neurobiologically plausible encoding for the stimuli.
• The model can now show how young infants’ reduced visual acuity may actually help them do basic-level categorization.
Background on infant statistical category-learning
Quinn, Eimas, & Rosenkrantz (1993) noticed a rather surprising categorization asymmetry in 3-4 month old infants:
– Infants familiarized on cats are surprised by novel dogs
– BUT infants familiarized on dogs are bored by novel cats.
How their experiment worked
Familiarization phase: infants saw 6 pairs of pictures of animals, say, cats, from one category (i.e., a total of 12 different animals)
Test phase: infants saw a pair consisting of a new cat and a new dog. Their gaze time was measured for each of the two novel animals.
Familiarization Trials
Infant
Test phase
Infant
Compare looking times
Results (Quinn et al., 1993):The categorization asymmetry
– Infants familiarized on cats look significantly longer at the novel dog in the test phase than the novel cat.
– No significant difference for infants familiarized on dogs on the time they look at a novel cat compared to a novel dog.
Our hypothesis
We assume that infants are hard-wired to be sensitive to novelty (i.e., they look longer at novel objects than at familiar objects).
Cats, on the whole, are less varied and thus are included in the category of Dogs.
Thus, when they have seen a number of cats, a dog is perceived as novel. But, when they have seen a number of dogs, the new cat is perceived as “just another dog.”
Statistical distributions of patterns are what count
The infants are becoming sensitive to the statistical distributions of the patterns they are observing.
Consider the distribution of values of a particular characteristic for Cats and Dogs
0.2 0.4 0.6 0.8 1
cats
dogs
Note that the distribution for Cats is - narrower than that of Dogs- included in that of Dogs.
Suppose an infant has become familiarized with the distribution for cats
0.2 0.4 0.6 0.8 1
cats
dogs
And then sees a dog
Chances are the new stimulus will fall outside of the familiarized range of values
On the other hand, if an infant has become familiarized with
the distribution for Dogs
0.2 0.4 0.6 0.8 1
cats
dogs
And then sees a cat
Chances are the new stimulus will be inside the familiarized range of values
How could we model this asymmetry?
We based our connectionist model on a model of infant categorization proposed by Sokolov (1963).
Sokolov’s (1963) model
Stimulus in the environment
Encode
Stimulus in the environment
Encode
Decode and Compare
equal?
Stimulus in the environment
Encode
Decode and Compare
Adjust
Stimulus in the environment
Encode
Decode and Compare
Adjust
Stimulus in the environment
Encode
Decode and Compare
Adjust
equal?
Stimulus in the environment
Encode
Decode and Compare
Adjust
Stimulus in the environment
Encode
Decode and Compare
Adjust
Stimulus in the environment
Encode
Decode and Compare
Adjust
equal?
Stimulus in the environment
Encode
Decode and Compare
Adjust
Stimulus in the environment
Encode
Decode and Compare
Adjust
Continue looping…
…until the internal representation corresponds to the external stimulus
Using an autoassociator to simulate the Sokolov model
Stimulus from the environment
Stimulus from the environment
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
compare
encode
Stimulus from the environment
decodeadjustweights
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
compare
encode
Stimulus from the environment
decodeadjustweights
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
encode
Stimulus from the environment
decode
compare
encode
Stimulus from the environment
decodeadjustweights
encode
Continue looping…
…until the internal representation corresponds to the external stimulus
Infant looking time network error
In the Sokolov model, an infant continues to look at the image until the discrepancy between the image and the internal representation of the image drops below a certain threshold.
In the auto-encoder model, the network continues to process the input until the discrepancy between the input and the (decoded) internal representation of the input drops below a certain (error) threshold.
Input to our modelWe used a three-layer, 10-8-10, non-linear auto-encoder (i.e., a network that tries to reproduce on output what it sees on input) to model the data.
The inputs were ten feature values, normalized between 0 and 1.0 across all of the images, taken from the original stimuli used by Quinn et al. (1993). They were head length, head width, eye separation, ear separation, ear length, nose length, nose width, leg length vertical extent, and horizontal extent.
The distributions – and, especially, the amount of inclusion – of these features in shown in the following graphs.
-0.2 0.2 0.4 0.6 0.8 1
1
2
3
4
0.2 0.4 0.6 0.8 1
0.5 1
1.5 2
2.5
0.2 0.4 0.6 0.8 1 1.2
0.5
1
1.5
2
2.5
-0.4 -0.2 0.2 0.4 0.6 0.8 1
0.5
1
1.5
2
-0.25 0.25 0.5 0.75 1 1.25 1.5
0.5
1
1.5
2
2.5
0.2 0.4 0.6 0.8 1
0.5
1
1.5
2
2.5
3
ear separation ear length vertical extent
head length head width eye separation
Dogs
Cats
Comparing the distributions of the input features
Results of Our Simulation
0.2
0.3
0.4
0.5
"cats"learned
first
"dogs"learned
first
condition
error
novel cat
novel dog
0.2
0.3
0.4
0.5
"cats"learned
first
"dogs"learned
first
condition
error
novel cat
novel dog
1 2
A Prediction of the auto-encoder model
• If we were to reverse the inclusion relationship between Dogs and Cats, we should be able to reverse the asymmetry.
• We selected the new stimuli from dog- and cat-breeder books (and very slightly morphed some of these stimuli).
• We created a set of Cats and Dogs, such that Cats now included Dogs – i.e., the Cat category was the broad category and the Dog category was the narrow category.
Reversing the Inclusion Relationship
Eye separation
Ear length
“Reversed” distributions:Cats include Dogs
Old distributions:Dogs include Cats
-0.2 0.2 0.4 0.6 0.8 1
1
2
3
4
-0.25 0.25 0.5 0.75 1 1.25 1.5
0.5
1
1.5
2
2.5
Dogs
Cats
Cats
Dogs
0 1 2 3 4 5 6 7 8 9
10 11
0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,1
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14
0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,1
Dogs
Dogs
Cats
Cats
Results
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Cats Dogs
Familiarization stimuli
Netw
ork
err
or
new cat
new dog
20
30
40
50
60
70
80
Cats Dogs
Familiarization stimuliA
tten
tio
n
New cat
New dog
Prediction by the model 3-4 month infant data
Removing the inclusion relationship:Another prediction from the model
Our model also predicts that, regardless of the variance of each category, if we remove the inclusion relationship, we should eliminate the categorization asymmetry.
A new set of cat/dog stimuli was created in which there is no inclusion relationship
Cats
Dogs
Prediction and Empirical Results: The categorization asymmetry disappears.
0
0.1
0.2
0.3
0.4
0.5
Dogs Cats
Familiarization stimuli
Ave
rag
e er
ror
novel dogs
novel cats
0
10
20
30
40
50
60
70
Dogs Cats
Familiarization stimuli
Att
entio
n %
novel dogs
novel cats
Prediction of the auto-encoder Infant data
A critique of our methodology: The use of explicit features
• We used explicit features (head length, leg length, ear separation, nose length, etc.) to characterize the animals (we hand-measured the values using the photos shown to the infants).
• We decided instead to use simply Gabor-filtered spatial-frequency information to characterize the pictures.
The Forest and the Trees:What are “spatial frequencies”?
The Forest from 10 miles away
Very low spatial frequencies
The Forest and the Trees:What are “spatial frequencies”?
Low spatial frequencies
The Forest from 5 miles away
The Forest from 5 miles away
The Forest and the Trees:What are “spatial frequencies”?
Medium spatial frequencies
The Forest from 5 miles away
The Forest from 5 miles away
The Forest from 1 mile away
The Forest and the Trees:What are “spatial frequencies”?
Medium-high spatial frequenciess
The Forest from 5 miles away
The Forest from 5 miles away
The Forest from 05 miles away; outline of some Trees
The Forest from 1/2 mile away; outline of some Trees
The Forest and the Trees:What are “spatial frequencies”?
High spatial frequenciess
The Forest from 5 miles away
The Forest from 5 miles away
The Forest from 05 miles away; outline of some Trees
The Forest from 05 miles away; outline of some Trees
The Forest from 200 m. away; Trees visible, but no branches or leaves
The Forest and the Trees:What are “spatial frequencies”?
Very high spatial frequenciess
The Forest from 5 miles away
The Forest from 5 miles away
The Forest from 05 miles away; outline of some Trees
The Forest from 05 miles away; outline of some Trees
The Forest from 200 yards away; Trees visible, but no branches or leaves
50 m. away; Forest no longer visible. Trees with branches visible but no leaves
The Forest and the Trees:What are “spatial frequencies”?
Extremely high spatial frequencies
The Forest from 5 miles away
The Forest from 5 miles away
The Forest from 05 miles away; outline of some Trees
The Forest from 05 miles away; outline of some Trees
The Forest from 200 yards away; Trees visible, but no branches or leaves
50 yards away; Forest no longer visible. Trees with branches visible but no leaves
10 m. away; Forest no longer visible. Trees with branches and individual leaves visible
The Forest and the Trees:Combining spatial frequencies to obtain the full image
The Forest from 5 miles away
The Forest from 1 mile away
The Forest from 1/2 mile away; outline of some Trees
The Forest from 400 m. away; outline of some Trees
The Forest from 200 m. away; Trees visible, but no branches or leaves
50 m. away; Forest no longer visible. Trees with branches visible but no leaves
10 m. away; Forest no longer visible. Trees with branches and individual leaves visible
Full image
Cats: infant-to-adult visual acuity
Very low spatial frequencies
Two-month old vision
3-4 month old vision
(almost) adult vision
Cats: infant-to-adult visual acuity
Adult Vision with full range of spatial frequencies
Spatial frequency maps of images with Gabor filtering
This allows us to characterize each dog/cat image with a 26-unit vector.
We “cover” this map with spatial-frequency ovals along various orientations of the image. (Each oval is normalized to have approximately the same energy.)
low freq. high
freq.
spatial-frequency map
This is an experiment.
Consider the following image.
Moral of the story:
Sometimes too much detail hinders categorization (even for adults!)
The same is true for infants: Reducing high-frequency information improves category discrimination for distinct categories
Reducing the range of the spatial frequencies from the retinal map to V1 decreases within-category variance.
This decreases the difference between two exemplars of the same category, but increases the difference between exemplars from two different categories.
This will make learning “distant” basic-level or super-ordinate category distinctions easier (but subordinate-level category distinctions will be more difficult).
In other words, reduced visual acuity might actually be good for infant categorization.
• Visual acuity in infants is not the same as that of adults. They do not perceive high-spatial frequencies (i.e., fine details), or perceive them only poorly.
• This reduced visual acuity may actually improve perceptual efficiency by eliminating the “information overload” caused by too many extraneous fine details likely to overwhelm their cognitive system.
• Thus, distant basic-level category and super-ordinate level category learning may actually be facilitated by reduced visual acuity.
Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies
High spatial frequencies
Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies
High spatial frequencies
Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies
High spatial frequencies
Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies
The high spatial frequencies have been removed. The autoencoder will work with input from these images, thereby simulating early infant vision.
Two simulations with Gabor-filtered input
• Reproducing previous results: Using vectors of the 26 weighted spatial-frequency values, instead of explicit feature values, produces autencoder network results similar to those produced by infants tested on the same images
• Reduced visual acuity: This is produced by largely eliminating high-spatial frequency information from the input (i.e., “blurry” vision) actually significantly improves the network’s ability to categorize the images presented to it.
Reproducing previous results (Cats are the more variable category)
Network generalization errors with Gabor-filtered spatial-frequency information
Results for 3-4 month old infants
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Cats Dogs
Familiarization
Netw
ork
err
or
new cat
new dog
Results with explicit feature values (French et al., 2001)
0.24
0.25
0.26
0.27
cats dogs
novel cat
novel dog
Large jump in error
Very little jump in error
Conclusion about the use of Gabor-filtered input instead of explicit
feature measurements
• Spatial frequency data in the model produces a reasonable fit to empirical data.
• We avoid the thorny issue of using a particular set of “high-level” feature measurements (ear length, eye separation, etc.) to characterize the images used in the simulations.
Reduced visual acuity
Reduced perceptual acuity in 3-4 month old infants produces an advantage for differentiating perceptually distant basic-level categories and super-ordinate categories.
Simulation 2: The advantage in 3-4 month old infants of reduced visual acuity
• Above 3-4 cycles/degree: very little contribution
• Above 7.1 cycles/degree: no contribution
The frequencies removed or reduced were:
Network used:
26-16-26 feedforward BP autoencoder network (learning rate: 0.1, momentum: 0.9)
Close categories vs. Very dissimilar categories
When a network is familiarized on one category (say, Cat), reduced visual acuity decreases errors (i.e., improves generalization) for novel exemplars in the same category or very similar categories (like Dog).
But it should help in discriminating dissimilar categories. So, for example, reduced visual acuity should produce a greater jump in error for network (or increased attention for an infant) familiarized on Cats when exposed to Cars.
When trained on one category (Cats), errors on dissimilar categories (Cars) are increased by reduced visual acuity (i.e., better category discrimination).
Larger the error = better discrimination.
Jump in error
0
0.02
0.04
0.06
0.08
0.1
Adult vision Infant vision
A Prediction of the ModelConsider Quinn et al. (1993)
Familiarized on Cats
Jump in interest
No jump in interest.
Cat
Familiarized on Dogs
Dog
But what if we took this test Cat and, by adding only high spatial-frequency information, transformed it into this Dog?
Familiarized on Cats
Prediction: No jump in interest
No jump in interest.
Cat
Familiarized on Dogs
Cat
Presumably what the 3-month old infant would see is this:
The asymmetry would disappear, even though adults would perceive a series of cats followed by a dog and would expect a jump in infants’ interest, as there usually is for a novel dog following familiarization on cats.
Modeling Dogs and Cats: Conclusions
A simple connectionist auto-encoder does a good job of reproducing certain surprising infant categorization data.
This model makes testable predictions…
Gabor-filtered spatial-frequency input is neurobiologically plausible and produces a good approximation to infant categorization data.
A counter-intuitive learning advantage for categorizing distant basic-level categories and super-ordinate categories arises from reduced acuity input.
…that have subsequently been confirmed in infants.
This supports a statistical, perceptually based, on-line categorization mechanism in young infants