this is a heavily data-oriented
TRANSCRIPT
![Page 1: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/1.jpg)
Thomas G. Dietterich
Department of Computer Science
Oregon State University
Corvallis, Oregon 97331
http://www.cs.orst.edu/~tgd
Machine Learning: Making Computer Science
Scientific
![Page 2: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/2.jpg)
Acknowledgements
VLSI Wafer Testing Tony Fountain
Robot Navigation Didac Busquets Carles Sierra Ramon Lopez de Mantaras
NSF grants IIS-0083292 and ITR-085836
![Page 3: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/3.jpg)
Outline
Three scenarios where standard software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
![Page 4: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/4.jpg)
Scenario 1: Reading Checks
Find and read “courtesy amount” on checks:
![Page 5: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/5.jpg)
Possible Methods:
Method 1: Interview humans to find out what steps they follow in reading checks
Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts
![Page 6: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/6.jpg)
Scenario 2: VLSI Wafer Testing
Wafer test: Functional test of each die (chip) while on the wafer
![Page 7: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/7.jpg)
Which Chips (and how many) should be tested?
Tradeoff: Test all chips on wafer?
Avoid cost of packaging bad chips Incur cost of testing all chips
Test none of the chips on the wafer?May package some bad chipsNo cost of testing on wafer
![Page 8: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/8.jpg)
Possible Methods
Method 1: Guess the right tradeoff point Method 2: Learn a probabilistic model
that captures the probability that each chip will be bad Plug this model into a Bayesian decision
making procedure to optimize expected profit
![Page 9: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/9.jpg)
Scenario 3: Allocating mobile robot camera
Binocular
No GPS
![Page 10: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/10.jpg)
Camera tradeoff
Mobile robot uses camera both for obstacle avoidance and landmark-based navigation
Tradeoff: If camera is used only for navigation, robot
collides with objects If camera is used only for obstacle
avoidance, robot gets lost
![Page 11: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/11.jpg)
Possible Methods
Method 1: Manually write a program to allocate the camera
Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking
![Page 12: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/12.jpg)
Challenges for SE Methodology
Standard SE methods fail when…1) System requirements are hard to collect
2) The system must resolve difficult tradeoffs
![Page 13: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/13.jpg)
(1) System requirements are hard to collect
There are no human experts Cellular telephone fraud
Human experts are inarticulate Handwriting recognition
The requirements are changing rapidly Computer intrusion detection
Each user has different requirements E-mail filtering
![Page 14: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/14.jpg)
(2) The system must resolve difficult tradeoffs
VLSI Wafer testing Tradeoff point depends on probability of bad
chips, relative costs of testing versus packaging
Camera Allocation for Mobile Robot Tradeoff depends on probability of
obstacles, number and quality of landmarks
![Page 15: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/15.jpg)
Machine Learning: Replacing guesswork with data
In all of these cases, the standard SE methodology requires engineers to make guesses Guessing how to do character recognition Guessing the tradeoff point for wafer test Guessing the tradeoff for camera allocation
Machine Learning provides a way of making these decisions based on data
![Page 16: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/16.jpg)
Outline
Three scenarios where software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
![Page 17: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/17.jpg)
Basic Machine Learning Methods
Supervised Learning Density Estimation Reinforcement Learning
![Page 18: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/18.jpg)
Supervised Learning
8
3
6
0
1
Training Examples
LearningAlgorithm
Classifier
New Examples
8
![Page 19: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/19.jpg)
AT&T/NCR Check Reading System
Recognition transformer is a neural network trained on 500,000 examples of characters
The entire system is trained given entire checks as input and dollar amounts as output
LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition
![Page 20: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/20.jpg)
Check Reader Performance
82% of machine-printed checks correctly recognized
1% of checks incorrectly recognized 17% “rejected” – check is presented to a
person for manual reading
Fielded by NCR in June 1996; reads millions of checks per month
![Page 21: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/21.jpg)
Supervised Learning Summary
Desired classifier is a function y = f(x) Training examples are desired input-
output pairs (xi,yi)
![Page 22: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/22.jpg)
Density Estimation
Training Examples
LearningAlgorithm
DensityEstimator
P(chipi is bad) = 0.42
Partially-tested wafer
![Page 23: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/23.jpg)
On-Wafer Testing System
Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) Probability model is “naïve Bayes” mixture model
with four components (trained with EM)
W
C209C3C2C1 . . .
![Page 24: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/24.jpg)
One-Step Value of Information
Choose the larger of Expected profit if we predict remaining
chips, package, and re-test Expected profit if we test chip Ci, then
predict remaining chips, package, and re-test [for all Ci not yet tested]
![Page 25: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/25.jpg)
On-Wafer Chip Test Results
$1,160
$1,170
$1,180
$1,190
$1,200
$1,210
$1,220
$1,230
Profit($K)
Test all VOI testing
3.8% increase in profit
![Page 26: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/26.jpg)
Density Estimation Summary
Desired output is a joint probability distribution P(C1, C2, …, C203)
Training examples are points X= (C1, C2, …, C203) sampled from this distribution
![Page 27: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/27.jpg)
Reinforcement Learning
agent
Environment
state s
reward r
action a
Agent’s goal: Choose actions to maximize total reward
Action Selection Rule is called a “policy”: a = (s)
![Page 28: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/28.jpg)
Reinforcement Learning for Robot Navigation
Learning from rewards and punishments in the environment Give reward for reaching goal Give punishment for getting lost Give punishment for collisions
![Page 29: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/29.jpg)
Experimental Results:% trials robot reaches goal
Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)
![Page 30: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/30.jpg)
Reinforcement Learning Summary
Desired output is an action selection policy
Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment
![Page 31: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/31.jpg)
Outline
Three scenarios where software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
![Page 32: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/32.jpg)
Fundamental Issues in Machine Learning
Incorporating Prior Knowledge Incorporating Learned Structures into
Larger Systems Making Reinforcement Learning Practical Triple Tradeoff: accuracy, sample size,
hypothesis complexity
![Page 33: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/33.jpg)
Incorporating Prior Knowledge
How can we incorporate our prior knowledge into the learning algorithm? Difficult for decision trees, neural networks,
support-vector machines, etc.Mismatch between form of our knowledge and
the way the algorithms work Easier for Bayesian networks
Express knowledge as constraints on the network
![Page 34: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/34.jpg)
Incorporating Learned Structures into Larger Systems
Success story: Digit recognizer incorporated into check reader
Challenges: Larger system may make several
coordinated decisions, but learning system treated each decision as independent
Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07
![Page 35: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/35.jpg)
Making Reinforcement Learning Practical
Current reinforcement learning methods do not scale well to large problems
Need robust reinforcement learning methodologies
![Page 36: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/36.jpg)
The Triple Tradeoff
Fundamental relationship between amount of training data size and complexity of hypothesis space accuracy of the learned hypothesis
Explains many phenomena observed in machine learning systems
![Page 37: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/37.jpg)
Learning Algorithms
Set of data points Class H of hypotheses Optimization problem: Find the
hypothesis h in H that best fits the data
TrainingData
h
Hypothesis Space
![Page 38: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/38.jpg)
Triple Tradeoff
Amount of Data – Hypothesis Complexity – Accuracy
N = 1000
Hypothesis Space Complexity
Acc
urac
y
N = 10
N = 100
![Page 39: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/39.jpg)
Triple Tradeoff (2)
Number of training examples N
Acc
urac
y
Hypothesis
Com
plexity
H1
H2
H3
![Page 40: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/40.jpg)
Intuition
With only a small amount of data, we can only discriminate between a small number of different hypotheses
As we get more data, we have more evidence, so we can consider more alternative hypotheses
Complex hypotheses give better fit to the data
![Page 41: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/41.jpg)
Fixed versus Variable-Sized Hypothesis Spaces
Fixed size Ordinary linear regression Bayes net with fixed structure Neural networks
Variable size Decision trees Bayes nets with variable structure Support vector machines
![Page 42: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/42.jpg)
Corollary 1:Fixed H will underfit
Number of training examples N
Acc
urac
y
H1
H2 underfit
![Page 43: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/43.jpg)
Corollary 2:Variable-sized H will overfit
Hypothesis Space Complexity
Acc
urac
y
N = 100overfit
![Page 44: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/44.jpg)
Ideal Learning Algorithm: Adapt complexity to data
Hypothesis Space Complexity
Acc
urac
y
N = 10
N = 100
N = 1000
![Page 45: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/45.jpg)
Adapting Hypothesis Complexity to Data Complexity
Find hypothesis h to minimizeerror(h) + complexity(h)
Many methods for adjusting Cross-validation MDL
![Page 46: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/46.jpg)
Outline
Three scenarios where software engineering methods fail
Machine learning methods applied to these scenarios
Fundamental questions in machine learning
Statistical thinking in computer science
![Page 47: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/47.jpg)
The Data Explosion
NASA Data 284 Terabytes (as of August, 1999) Earth Observing System: 194 G/day Landsat 7: 150 G/day Hubble Space Telescope: 0.6 G/day
http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html
![Page 48: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/48.jpg)
The Data Explosion (2)
Google indexes 2,073,418,204 web pages
US Year 2000 Census: 62 Terabytes of scanned images
Walmart Data Warehouse: 7 (500?) Terabytes
Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes
![Page 49: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/49.jpg)
Old Computer Science Conception of Data
Store Retrieve
![Page 50: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/50.jpg)
New Computer Science Conception of Data
Store Build
Models
Solve
Problems
Problems
Solutions
![Page 51: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/51.jpg)
Machine Learning:Making Data Active
Methods for building models from data Methods for collecting and/or sampling
data Methods for evaluating and validating
learned models Methods for reasoning and decision-
making with learned models Theoretical analyses
![Page 52: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/52.jpg)
Machine Learning andComputer Science
Natural language processing Databases and data mining Computer architecture Compilers Computer graphics
![Page 53: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/53.jpg)
Hardware Branch Prediction
Source: Jiménez & Lin (2000) Perceptron Learning for Predicting the Behavior of Conditional Branches
![Page 54: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/54.jpg)
Instruction Scheduler for New CPU
The performance of modern microprocessors depends on the order in which instructions are executed
Modern compilers rearrange instruction order to optimize performance (“instruction scheduling”)
Each new CPU design requires modifying the instruction scheduler
![Page 55: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/55.jpg)
Instruction Scheduling
Moss, et al. (1997): Machine Learning scheduler can beat performance of commercial compilers and match the performance of research compiler.
Training examples: small basic blocks Experimentally determine optimal instruction
order Learn preference function
![Page 56: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/56.jpg)
Computer Graphics: Video Textures
Generate new video by splicing together short stretches of old video
A B C D E F
B D E D E F A
Apply reinforcement learning to identify good transition points
Arno Schödl, Richard Szeliski, David H. Salesin, Irfan Essa (SIGGRAPH 2000)
![Page 57: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/57.jpg)
Video TexturesArno Schödl, Richard Szeliski, David H. Salesin, Irfan
Essa (SIGGRAPH 2000)
You can find this video at Virtual Fish Tank Movie
![Page 58: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/58.jpg)
Graphics: Image Analogies
: ::
: ?
Hertzmann, Jacobs, Oliver, Curless, Salesin (2000) SIGGRAPH
![Page 59: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/59.jpg)
Learning to Predict Textures
A(p) A’(p)
B(q) B’(q)
Find p to minimize Euclidean distance between
and
B’(q) := A’(p)
![Page 60: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/60.jpg)
Image Analogies
: ::
:
![Page 61: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/61.jpg)
A video can be found at
Image Analogies Movie
![Page 62: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/62.jpg)
Summary
Standard Software Engineering methods fail in many application problems
Machine Learning methods can replace guesswork with data to make good design decisions
![Page 63: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/63.jpg)
Machine Learning and Computer Science
Machine Learning is already at the heart of speech recognition and handwriting recognition
Statistical methods are transforming natural language processing (understanding, translation, retrieval)
Statistical methods are creating opportunities in databases, computer graphics, robotics, computer vision, networking, and computer security
![Page 64: This is a heavily data-oriented](https://reader036.vdocument.in/reader036/viewer/2022062304/55cecc1dbb61eb9b6c8b478f/html5/thumbnails/64.jpg)
Computer Power and Data Power
Data is a new source of power for computer science
Every computer science student should learn the fundamentals of machine learning and statistical thinking
By combining engineered frameworks with models learned from data, we can develop the high-performance systems of the future