1 machine learning spring 2013 rong jin. 2 cse847 machine learning instructor: rong jin office...
Post on 19-Dec-2015
252 Views
Preview:
TRANSCRIPT
1
Machine Learning
Spring 2013
Rong Jin
2
CSE847 Machine Learning Instructor: Rong Jin Office Hour:
Tuesday 4:00pm-5:00pm TA, Qiaozi Gao, Thursday 4:00pm-5:00pm
Textbook Machine Learning The Elements of Statistical Learning Pattern Recognition and Machine Learning Many subjects are from papers
Web site: http://www.cse.msu.edu/~cse847
3
Requirements ~10 homework assignments Course project
Topic: visual object recognition Data: over one million images with extracted
visual features Objective: build a classifier that automatically
identifies the class of objects in images Midterm exam & final exam
4
Goal Familiarize you with the state-of-art in
Machine Learning Breadth: many different techniques Depth: Project Hands-on experience
Develop the way of machine learning thinking Learn how to model real-world problems by
machine learning techniques Learn how to deal with practical issues
5
Course Outline
Theoretical Aspects• Information Theory
• Optimization Theory
• Probability Theory
• Learning Theory
Practical Aspects• Supervised Learning Algorithms
• Unsupervised Learning Algorithms
• Important Practical Issues
• Applications
6
Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning
7
Why Machine Learning? Past: most computer programs are mainly
made by hand Future: Computers should be able to program
themselves by the interaction with their environment
8
Recent Trends Recent progress in algorithm and theory Growing flood of online data Computational power is available Growing industry
Big Data Challenge
• 2.7 Zetabytes (1021) of data exists in the digital universe today.
• Huge amount of data generated on the Internet every minute• YouTube users upload 48 hours
of video, • Facebook users share 684,478
pieces of content, • Instagram users share 3,600 new
photos,http://www.visualnews.com/2012/06/19/how-much-data-created-every-minute/
Big Data Challenge High dimensional data appears in many
applications of machine learning
Fine grained visual classification [1]• 250,000 features
Why Data Size Matters ? Matrix completion• Classification, clustering, recommender systems
Why Data Size Matters ?• Matrix can be perfectly recovered provided
the number of observed entries O(rnlog2(n))
Why Data Size Matters ?• The recovery error can be arbitrarily large if
the number of observed entries < O(rnlog(n))
Why Data Size Matters ?
error
# observed entries
O(rnlog (n)) O(rnlog2(n))
Unknow
n
• Difficult to access finance for small & medium business
• Minimum loan
• Tedious loan approval procedure
• Low approval rate
• Long cycle
• Completely big data driven
• Leverage e-commerce data to financial services
Alibaba Small and Micro Financial Services
• Insurance contracts has year-on-year growth rate of 100%.• Over 1 billion contracts in 2013• Over 100 million contracts one day on November 11, 2013
201
3-05
201
3-06
201
3-07
201
3-08
201
3-09
201
3-10
201
3-11
201
3-12
201
4-01
201
4-02
201
4-03
40.00%
80.00%
120.00%
Overall rate of compensation
Shipping Insurance for Returned Products
Uniform 5% fixed rate
Fixed rate Solely based on historical
data and demographics
Actuarial approach
Simple Easy to explain
Pricing model based on a few couple parameters
Data based pricing
Relatively accurate
Millions of features, real time pricing
Machine learned model
Dynamic pricing
Highly accurate
Shipping Insurance for Returned Products
18
Three Niches for Machine Learning Data mining: using historical data to improve
decisions Medical records medical knowledge
Software applications that are difficult to program by hand Autonomous driving Image Classification
User modeling Automatic recommender systems
19
Typical Data Mining Task
Given:• 9147 patient records, each describing pregnancy and birth• Each patient contains 215 features
Task:• Classes of future patients at high risk for Emergency Cesarean Section
20
Data Mining Results
One of 18 learned rules:If no previous vaginal delivery
abnormal 2nd Trimester Ultrasound Malpresentation at admission
Then probability of Emergency C-Section is 0.6
21
Credit Risk Analysis
Learned Rules:If Other-Delinquent-Account > 2
Number-Delinquent-Billing-Cycles > 1Then Profitable-Costumer ? = no
If Other-Delinquent-Account = 0(Income > $30K or Years-of-Credit > 3)
Then Profitable-Costumer ? = yes
22
Programs too Difficult to Program By Hand
ALVINN drives 70mph on highways
23
Programs too Difficult to Program By Hand
ALVINN drives 70mph on highways
24
Programs too Difficult to Program By Hand
Positive Examples
Negative Examples
Sta
tist
ical
Mod
el
Train Test
Classify Bird Images
Visual object recognition
25
Image Retrieval using Texts
26
Software that Models Users
Description:A homicide detective and a fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings
Rating:
Description: Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son.
Rating:
Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring
Rating:
History What to Recommend?Description: A high-school boy is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour.
Recommend: ?
Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis.
Recommend: ?
No
Yes
27
Netflix Contest
28
Relevant Disciplines Artificial Intelligence Statistics (particularly Bayesian Stat.) Computational complexity theory Information theory Optimization theory Philosophy Psychology …
29
Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning
30
What is the Learning Problem Learning = Improving with experience at some task
Improve over task T With respect to performance measure P Based on experience E
Example: Learning to Play Backgammon T: Play backgammon P: % of games won in world tournament E: opportunity to play against itself
31
Backgammon
More than 1020 states (boards) Best human players see only small fraction of all board
during lifetime Searching is hard because of dice (branching factor > 100)
32
TD-Gammon by Tesauro (1995)
Trained by playing with itself Now approximately equal to the best human
player
33
Learn to Play Chess Task T: Play chess Performance P: Percent of games won in the
world tournament Experience E:
What experience? How shall it be represented? What exactly should be learned? What specific algorithm to learn it?
34
Choose a Target Function Goal:
Policy: : b m Choice of value
function V: b, m
B = board
= real values
35
Choose a Target Function Goal:
Policy: : b m Choice of value
function V: b, m V: b
B = board
= real values
36
Value Function V(b): Example Definition
If b final board that is won: V(b) = 1 If b final board that is lost: V(b) = -1
If b not final board V(b) = E[V(b*)] where b* is final board after playing optimally
37
Representation of Target Function V(b)
Same value
for each board
Lookup table
(one entry for each board)
No Learning No Generalization
Summarize experience into• Polynomials• Neural Networks
38
Example: Linear Feature Representation Features:
pb(b), pw(b) = number of black (white) pieces on board b
ub(b), ub(b) = number of unprotected pieces
tb(b), tb(b) = number of pieces threatened by opponent
Linear function: V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+
w5tw(b)
Learning: Estimation of parameters w0, …, w5
39
Given: board b Predicted value V(b) Desired value V*(b)
Calculateerror(b) = (V*(b) – V(b))2
For each board feature fi
wi wi + cerror(b)fi
Stochastically minimizesb (V*(b)-V(b))2
Tuning Weights
Gradient Descent Optimization
40
Obtain Boards
Random boards Beginner plays Professionals plays
41
Obtain Target Values Person provides value V(b) Play until termination. If outcome is
Win: V(b) 1 for all boards Loss: V(b) -1 for all boards Draw: V(b) 0 for all boards
Play one move: b b’V(b) V(b’)
Play n moves: b b’… b(n)
V(b) V(b(n))
42
A General Framework
MathematicalModeling
Finding Optimal Parameters
Statistics Optimization+
Machine Learning
43
Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning
44
Importants Issues in Machine Learning Obtaining experience
How to obtain experience? Supervised learning vs. Unsupervised learning
How many examples are enough? PAC learning theory
Learning algorithms What algorithm can approximate function well, when? How does the complexity of learning algorithms impact the learning
accuracy? Whether the target function is learnable?
Representing inputs How to represent the inputs? How to remove the irrelevant information from the input representation? How to reduce the redundancy of the input representation?
top related