supervised learning

Communication Systems and Networks

Part 2Supervised Learning

Machine Learning1Supervised LearningSupervised learningis themachine learningtask of inferring a function fromsupervised(labeled) training data. (Wikipedia)Thetraining dataconsist of a set oftraining examples. In supervised learning, each example is apairconsisting of an input object (typically a vector) and a desired output value. A supervised learning algorithm analyzes the training data and produces an inferred function, which is called aclassifier(if the output is discrete) or aregression function(if the output is continuous). The inferred function should predict the correct output value for any valid input object. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

Learning a Class from ExamplesClass C of a family carPrediction: Is car x a family car?Knowledge extraction: What do people expect from a family car?Output: Positive (+) and negative () examplesInput representation: x1: price, x2 : engine power

Training set X

Class C

Hypothesis class H

Error of h on H

S, G, and the Version Spacemost specific hypothesis, Smost general hypothesis, Gh H, between S and G isconsistent and make up the version space(Mitchell, 1997)MarginChoose h with largest margin

Steps in Supervised LearningDetermine the type of training examples. In the case of handwriting analysis, for example, this might be a single handwritten character, an entire handwritten word, or an entire line of handwriting.Gather a training set. The training set needs to be representative of the real-world use of the function. Thus, a set of input objects is gathered and corresponding outputs are also gathered, either from human experts or from measurements.

Gather : mengumpulkan 9Steps in Supervised LearningDetermine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of thecurse of dimensionality; but should contain enough information to accurately predict the output.Determine the structure of the learned function and corresponding learning algorithm. For example, the engineer may choose to usesupport vector machinesordecision trees.

Steps in Supervised LearningComplete the design. Run the learning algorithm on the gathered training set. Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called avalidationset) of the training set, or viacross-validation.Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set.

Facts in Supervised LearningA wide range of supervised learning algorithms is available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems (No free lunch theorem).

Noise and Model ComplexityUse the simpler one becauseSimpler to use (lower computational complexity)Easier to train (lower space complexity)Easier to explain (more interpretable)Generalizes better (lower variance - Occams razor)simpler explanations are more plausible and any unnecessary complexity should be shaved off.

Multiple Classes, Ci i=1,...,K

Train hypotheses hi(x), i =1,...,K:

Regression

Emperical errorModel Selection & GeneralizationLearning is an ill-posed problem; data is not sufficient to find a unique solutionGeneralization: How well a model performs on new dataOverfitting: H more complex than C or f Underfitting: H less complex than C or fIll-posed 16Triple Trade-OffThere is a trade-off between three factors (Dietterich, 2003):Complexity of H, c (H),Training set size, N, Generalization error, E, on new dataAs N, EAs c (H), first E and then ECross-ValidationTo estimate generalization error, we need data unseen during training. We split the data asTraining set (50%)Validation set (25%)Test (publication) set (25%)Resampling when there is few dataDimensions of a Supervised LearnerModel: Loss function:Optimization procedure:

supervised learning

Documents