know what it knows : a framework for self-aware learningknow what it knows : a framework for...

24
Know What it Knows : A Framework for Self-Aware Learning Lihong Li, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008 Presented by โ€“ Praveen Venkateswaran, 2/26/18 Slides and images based on the paper

Upload: others

Post on 28-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Know What it Knows : A Framework for Self-Aware Learning

Lihong Li, Michael L. Littman, Thomas J. Walsh Rutgers UniversityICML-08. 25th International Conference on Machine Learning, Helsinki, Finland, July 2008

Presented by โ€“ Praveen Venkateswaran, 2/26/18

Slides and images based on the paper

Page 2: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

KWIK Framework

ยด The paper introduces a new framework called โ€œKnow What It Knowsโ€ (KWIK)

ยด Utility in settings where active exploration can impact the training examples that the learner is exposed to (e.g.) Reinforcement learning, Active learning

ยด Core of many reinforcement learning algorithms have the idea of distinguishing between instances that have been learned with sufficient accuracy vs. those whose outputs are unknown.

Page 3: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

KWIK Framework

ยด The paper makes explicit some properties that are sufficient for a learning algorithm to be used in efficient exploration algorithms.

ยด Describes set of hypothesis classes for which KWIK algorithms can be created.

ยด The learning algorithm needs to make only accurate predictions, although it can opt out of predictions by saying โ€œI donโ€™t knowโ€ (โŠฅ).

ยด However, there must be a (polynomial) bound on the number of times the algorithm can respond โŠฅ. The authors call such a learning algorithm KWIK.

Page 4: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Motivation : Example

ยด Each edge associated with binary cost vector โ€“ dimension d = 3ยด Cost of traversing edge = cost vector โ— fixed weight vector ยด Fixed weight vector given to be ๐’˜ = [1,2,0]ยด Agent does not know ๐’˜, but knows topology and all cost vectorsยด In each episode, agent starts from source and moves along a path to sinkยด Every time an edge is crossed, agent observes its true cost

SourceSink

Learning Task โ€“ Take non-cheapest path in as few episodes as possible!

ยด Given w, we see ground truth cost of top path = 12, middle path = 13, and bottom path = 15

Page 5: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

ยด Simple approach โ€“ Agent assumes edge costs are uniform and walks the shortest path (middle)

ยด Since agent observes true cost upon traversal, it gathers 4 instances of [1,1,1] โ†’ 3 and 1 instance of [1,0,0] โ†’ 1

ยด Using standard regression, the learned weight vector ๐’˜" = [1,1,1]ยด Using ๐’˜" , cost of top = 14, middle = 13, bottom = 14

Motivation : Simple Approach

Using these estimates, an agent will take the middle path forever without realizing its not optimal

SourceSink

Page 6: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

ยด Consider learning algorithm that โ€œknows what it knowsโ€

ยด Instead of creating ๐’˜" , it only attempts to reason if edge costs can be obtained from available data

ยด After the first episode, it knows the middle pathโ€™s cost is 13

ยด Last edge of bottom path has cost vector [0,0,0] which has to be 0, however penultimate edge has cost [0,1,1]

ยด Regardless of ๐’˜, ๐’˜โ—[0,1,1] = ๐’˜โ—([1,1,1] - [1,0,0]) = ๐’˜โ—[1,1,1] - ๐’˜โ—[1,0,0] = 3-1 = 2ยด Hence, agent knows that cost of bottom is 14 which is worse than middle

Motivation : KWIK Approach

SourceSink

Page 7: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

ยด However [0,0,1] on top path is linearly independent of previously seen cost vectors - (i.e.) its cost is unconstrained in the eyes of the agent

ยด The agent โ€œknows that it doesnโ€™t knowโ€

ยด Try out the top path in the second episode. Observes [0,0,1]โ†’0 allowing it to solve for ๐’˜ and accurately predict the cost for any vector

ยด Also finds that optimal path is top with high confidence

Motivation : KWIK Approach

SourceSink

Page 8: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

ยด In general, any algorithm that guesses a weight vector may never find the optimal path.

ยด An algorithm that uses linear algebra to distinguish known from unknown costs will either take an optimal route or discover the cost of a linearly independent cost vector on each episode. Thus, it can never choose suboptimal paths more than d times.

ยด KWIKโ€™s ideation originated from its use in multi-state sequential decision making problems (like this one)

ยด Also action selection in bandit and associative bandit problems (bandit with inputs) can be addressed by choosing the better arm when payoffs are known and an unknown arm otherwise

ยด Can also be used in active learning and anomaly detection since both require some degree of reasoning if recently presented input is predictable from previous examples.

Motivation : KWIK Approach

Page 9: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

ยด Input set X, output set Y. Hypothesis class H consists of a set of functions from X to Y HโŠ† (X โ†’ Y)

ยด The target function ๐’‰โˆ— โˆˆ H is source of training examples and is unknown to the learner (we assume target function is in the hypothesis class)

KWIK : Formal Definition

Protocol for a run :

ยด H and accuracy parameters ๐ and ๐œน are known to both the learner and the environment

ยด Environment selects ๐’‰โˆ— adversarially

ยด Environment selects input ๐’™ โˆˆ X adversarially and informs the learner

ยด Learner predicts an output ๐’š" โˆˆ Y โˆช{โŠฅ} where โŠฅrefers to โ€œI donโ€™t knowโ€

ยด If ๐’š" โ‰  โŠฅ, it should be accurate (i.e.) |๐’š" - ๐’š|โ‰ค ๐, where ๐’š = ๐’‰โˆ—(๐’™). Otherwise the run is considered a failure.

ยด The probability of a failed run must be bounded by ๐œน

Page 10: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

KWIK : Formal Definition

ยด Over a run, the total number of steps on which ๐’š" = โŠฅ must be bounded by B(๐, ๐œน), ideally polynomial in 1/ ๐, 1/ ๐œน, and parameters defining H.

ยด If ๐’š" = โŠฅ, the learner makes an observation ๐’› โˆˆ Z, of the output where ยด ๐’› = ๐’š in the deterministic case

ยด ๐’› = 1 with probability ๐’š and ๐’› = 0 with probability (1 - ๐’š) in the Bernoulli case

ยด ๐’› = ๐’š + ๐œผ for zero-mean random variable ๐œผ in the additive noise case

Page 11: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

KWIK : Connection to PAC and MB

ยด In all 3 frameworks โ€“ PAC (Probably Approximately Correct), MB (Mistake-Bound) and KWIK (Know What It Knows) โ€“ a series of inputs (instances) is presented to the learner.

Each rectangular

box is an input

ยด In the PAC model, the learner is provided with labels (correct outputs) for an initial sequence of inputs, depicted by crossed boxes in figure.

ยด Learner is then responsible for producing accurate outputs (empty boxes) for all new inputs where inputs are drawn from a fixed distribution.

Page 12: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

KWIK : Connection to PAC and MB

Correct

ยด In the MB model, the learner is expected to produce an output for every input.

ยด Labels are provided to the learner whenever it makes a mistake (black boxes)

ยด Inputs selected adversarially so there is no bound on when last mistake may be made

ยด MB algorithms guarantee that the total number of mistakes made is small (i.e.) ratio of incorrect to correct โ†’ 0 asymptotically

Incorrect

Page 13: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

KWIK : Connection to PAC and MB

ยด KWIK model has elements of both PAC and MB. Like PAC, KWIK algorithms are not allowed to make mistakes. Like MB, inputs to a KWIK algorithm are selected adversarially.

ยด Instead of bounding mistakes, a KWIK algorithm must have a bound on the number of label requests (โŠฅ) it can make.

ยด A KWIK algorithm can be turned into an MB algorithm with the same bound by having the algorithm guess an output each time its not certain.

Page 14: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Examples of KWIK algorithms

ยด The algorithm keeps a mapping ๐’‰2 initialized to ๐’‰2(๐’™) = โŠฅ โˆ€ ๐’™ โˆˆ X

ยด The algorithm reports ๐’‰2(๐’™) for every input ๐’™ chosen by the environment

ยด If ๐’‰2(๐’™) = โŠฅ, environment reports correct label ๐’š (noise-free) and algorithm reassigns ๐’‰2(๐’™) = ๐’š

ยด Since it only reports โŠฅ once for each input, the KWIK bound is |X|

Memorization Algorithm - The memorization algorithm can learn any hypothesis class with input space X with a KWIK bound of |X|. This algorithm can be used when the input space X is finite and observations are noise free.

Page 15: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Examples of KWIK algorithms

ยด The algorithm initializes ๐‘ฏ2 = H. ยด For every input ๐’™, it computes ๐‘ณ6 = {๐’‰(๐’™) | ๐’‰ โˆˆ ๐‘ฏ2 } (i.e.) set of all outputs for all

hypotheses that have not yet been ruled out.

ยด If |๐‘ณ6| = 0 , version space has been exhausted and ๐’‰โˆ— โˆ‰ H

ยด If |๐‘ณ6| = 1, all hypotheses in ๐‘ฏ2 agree on the output.

ยด If |๐‘ณ6| > 1, two hypotheses disagree, โŠฅ is returned and the true label ๐’š is received. The version space is then updated such that ๐‘ฏ2 โ€™= {๐’‰ |๐’‰ โˆˆ ๐‘ฏ2 โˆง ๐’‰(๐’™) = ๐’š}

ยด Hence the new version space reduces (i.e.) |๐‘ฏ2 โ€™| โ‰ค |๐‘ฏ2 โ€™|-1ยด If |๐‘ฏ2| = 1, then |๐‘ณ6| = 1 and the algorithm will no longer return โŠฅ. Hence |H|โˆ’1 is

the bound of โŠฅโ€™s that can be returned.

Enumeration Algorithm - The enumeration algorithm can learn any hypothesis class H with a KWIK bound of |H|โˆ’1. This algorithm can be used when the hypothesis class H is finite and observations are noise free.

Page 16: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Illustrative Example

ยด Input space X = ๐Ÿ๐‘ท and output space Y = {trouble, no trouble}ยด Memorization algorithm achieves bound of ๐Ÿ๐’ since it may have to see each

possible subset of people (where ๐’ = |P|)

ยด Enumeration algorithm can achieve a bound of ๐’(๐’ โˆ’ ๐Ÿ) since there is a hypothesis for each possible assignment of a and b.

ยด Each time the algorithm reports โŠฅ, it can rule out one possible combination

Given a group of people P where a โˆˆ P is a troublemaker and b โˆˆ P is a nice person who can control a. For any subset of people G โŠ† P, figure out if there will be trouble if you do not know the identities of a and b.

Page 17: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Examples of KWIK algorithmsKWIK bounds can also be achieved when the set of hypothesis class and input space are infinite

If there is an unknown point and a given target function that maps input points to their Euclidean distance (|๐’™ โˆ’ ๐’„|A) from that unknown point, the planar distance algorithm can learn in this hypothesis class with a KWIK bound of 3.

Page 18: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Examples of KWIK algorithms

ยด Given initial input ๐’™, the algorithm returns โŠฅ and receives output ๐’š. Hence, it means that the unknown point must lie on the circle depicted in Figure (a)

ยด For a future new input, the algorithm again returns โŠฅ and similarly finds a second circle (Fig (b)). If the circles intersect at one point, it becomes trivial. Else there will be 2 potential locations.

ยด Next, for any new input that is not collinear with the first 2 inputs, the algorithm returns โŠฅ which will then allow it to identify the location of the unknown point.

ยด Generally, a d-dimensional version of this problem will have a KWIK bound of d+1

Page 19: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Examples of KWIK algorithmsWhat if the observations are not noise-free?

Source : Twitter

The coin learning algorithm can accurately predict the probability that a biased coin will come up heads given Bernoulli observations with a KWIK bound of B(๐, ๐œน) = ๐Ÿ

๐Ÿ๐๐Ÿln ๐Ÿ

๐œน= O( ๐Ÿ

๐๐Ÿln ๐Ÿ

๐œน)

ยด The unknown probability of heads is ๐’‘ and we want to estimate ๐’‘" with high accuracy (|๐’‘" - ๐’‘| โ‰ค ๐) and high probability (1- ๐œน)

ยด Since observations are noisy, we see either 1 with probability ๐’‘ or 0 with probability (1- ๐’‘).

ยด Each time the algorithm returns โŠฅ, it gets an independent trial that it can use to compute ๐’‘" = ๐Ÿ

๐‘ปโˆ‘ ๐’›๐’•๐‘ป๐’•F๐Ÿ where ๐’›๐’• โˆˆ {0,1} is the ๐‘กโ€™th

observation in ๐‘‡ trials.

ยด The number of trials needed to be (1- ๐œน) certain the estimate is within ๐can be computed using a Hoeffding bound.

Page 20: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Combining KWIK LearnersCan KWIK learners be combined to provide learning guarantees for more

complex hypothesis classes?

Let F : X โ†’ Y and ๐‘ฏ๐Ÿ, โ€ฆ, ๐‘ฏ๐’Œ be the learnable hypothesis classes with bounds of ๐‘ฉ๐Ÿ(๐, ๐œน), โ€ฆ, ๐‘ฉ๐’Œ(๐, ๐œน) where ๐‘ฏ๐’Š โŠ† F โˆ€ 1โ‰ค i โ‰ค ๐’Œ (i.e.) hypothesis classes share the same input/output sets. The union algorithm can learn the joint hypothesis class H = โˆช๐’Š ๐‘ฏ๐’Š with a KWIK bound of ๐‘ฉ(๐, ๐œน) = (1-๐’Œ) + โˆ‘ ๐‘ฉ๐’Š(๐, ๐œน)๏ฟฝ

๐’Š

ยด The union algorithm maintains a set of active algorithms ๐‘จ2, one for each hypothesis class.

ยด Given an input ๐’™, the algorithm queries each ๐’Š โˆˆ ๐‘จ2, to obtain the prediction ๐’š๐’Š"from each active sub-algorithm and adds it to the set ๐‘ณ6.

ยด If โˆƒ โŠฅ โˆˆ ๐‘ณ6, it returns โŠฅ and obtains correct output ๐’š and sends it to any sub-algorithm ๐’Š for which ๐’š๐’Š" = โŠฅ to allow it to learn.

ยด If |๐‘ณ6| > 1, then there is a disagreement among the sub-algorithms and again returns โŠฅ and removes any sub-algorithm that made a prediction other than ๐’šor โŠฅ.

ยด So for each input that returns โŠฅ, either a sub-algorithm reported โŠฅ (maximum โˆ‘ ๐‘ฉ๐’Š(๐, ๐œน)๏ฟฝ๐’Š times) or two sub-algorithms disagreed and at least one was removed

(at most (๐’Œ-1) times) thus resulting in the bound.

Page 21: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Combining KWIK Learners : ExampleLet X = Y = ๐•ฝ. Define ๐‘ฏ๐Ÿ consisting of functions mapping distance of input ๐’™ from an unknown point (|๐’™ โˆ’ ๐’„|). ๐‘ฏ๐Ÿ can be learned with a bound of 2 using a 1-D version of the planar algorithm. Let ๐‘ฏ๐Ÿ be the set of lines {f |f(๐’™) =๐’Ž๐’™ + ๐’ƒ } which can also be learnt with a bound of 2 using the regression algorithm. We need to learn H = ๐‘ฏ๐Ÿ โˆช ๐‘ฏ๐Ÿ

ยด Let the first input be ๐’™๐Ÿ= 2. The union algorithm asks ๐‘ฏ๐Ÿ and ๐‘ฏ๐Ÿ for the output and both return โŠฅ. And say it receives the output ๐’š๐Ÿ= 2

ยด For the next input ๐’™๐Ÿ= 8, ๐‘ฏ๐Ÿ and ๐‘ฏ๐Ÿ still return โŠฅ and receive ๐’š๐Ÿ= 4 ยด Then, for the third input ๐’™๐Ÿ‘= 1, ๐‘ฏ๐Ÿ returns 4 since (|2-4|= 2 and |8-4|= 4) and ๐‘ฏ๐Ÿ

having computed ๐’Ž = 1/3 and ๐’ƒ = 4/3 returns 5/3.

ยด Since ๐‘ฏ๐Ÿ and ๐‘ฏ๐Ÿ disagree, the algorithm returns โŠฅ and then finds out ๐’š๐Ÿ‘= 3, and then makes all future predictions accurately.

Page 22: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Combining KWIK LearnersWhat about disjoint input spaces (i.e.) ๐‘ฟ๐’Š โˆฉ ๐‘ฟ๐’‹ = โˆ… if ๐’Š โ‰  ๐’‹

Let ๐‘ฏ๐Ÿ, โ€ฆ, ๐‘ฏ๐’Œ be the learnable hypothesis classes with bounds of ๐‘ฉ๐Ÿ(๐, ๐œน), โ€ฆ, ๐‘ฉ๐’Œ(๐, ๐œน)where ๐‘ฏ๐’Š โˆˆ (๐‘ฟ๐’Š โ†’ Y). The input-partition algorithm can learn the hypothesis class H = ๐‘ฟ๐Ÿ โˆช โ‹ฏโˆช ๐‘ฟ๐’Œ โ†’ Y with a KWIK bound of ๐‘ฉ(๐, ๐œน) = โˆ‘ ๐‘ฉ๐’Š(๐, ๐œน/๐’Œ)๏ฟฝ

๐’Š

ยด The input-partition algorithm runs the learning algorithm for each subclass ๐‘ฏ๐’Š

ยด When it receives an input ๐’™ โˆˆ ๐‘ฟ๐’Š, it returns the response from ๐‘ฏ๐’Š which is ๐accurate.

ยด To achieve (1- ๐œน) certainty, it insists on (1- ๐œน/๐’Œ) certainty from each of the sub-algorithms. By the union bound, the overall failure probability must be less than the sum of the failure probabilities for the sub-algorithms

Page 23: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Disjoint Spaces Example

ยด An MDP consists of ๐’ states and ๐’Žactions. For each combination of state and action and next state, the transition function returns a probability.

ยด As the reinforcement-learning agent moves around in the state space, it observes stateโ€“actionโ€“state transitions and must predict the probabilities for transitions it has not yet observed.

ยด In the model-based setting, an algorithm learns a mapping from the size ๐’๐Ÿ๐’Žinput space of stateโ€“actionโ€“state combinations to probabilities via Bernoulli observations.

ยด Thus, the problem can be solved via the input-partition algorithm over a set of individual probabilities learned via the coin-learning algorithm. The resulting KWIK bound is B(๐, ๐œน) = O(๐’

๐Ÿ๐’Ž๐๐Ÿ

ln ๐’๐’Ž๐œน

) .

Page 24: Know What it Knows : A Framework for Self-Aware LearningKnow What it Knows : A Framework for Self-Aware Learning LihongLi, Michael L. Littman, Thomas J. Walsh Rutgers University ICML-08.25th

Conclusion

ยด The paper describes the KWIK (โ€œKnow What It Knowsโ€) model of supervised learning and identifies and generalizes key steps for efficient exploration.

ยด The authors provide algorithms for some basic hypothesis classes for both deterministic and noisy observations.

ยด They also describe methods for composing hypothesis classes to create more complex algorithms.

ยด Open Problem โ€“ How do you adapt the KWIK framework when the target hypothesis can be chosen from outside the hypothesis class H.