classification with costly features using deep reinforcement … · 2020. 3. 26. · title:...

1
Classification with Costly Features using Deep Reinforcement Learning Jaromír Janisch Tomáš Pevný Viliam Lisý Artificial Intelligence Center, Dept. of Computer Science, FEE, CTU in Prague Abstract TL;DR: With approximated Q-learning, we train a classifier that maximizes accuracy while minimizing required informaon. We study a classificaধon problem where each feature can be acquired for a cost and the goal is to opধmize a trade- off between the expected classificaধon error and the fea- ture cost. We revisit a former approach that has framed the problem as a sequenধal decision-making problem and solved it by Q-learning with a linear approximaধon, where individual acধons are either requests for feature values or terminate the episode by providing a classificaধon decision. On a set of eight problems, we demonstrate that by re- placing the linear approximaধon with neural networks the approach becomes comparable to the state-of-the-art al- gorithms developed specifically for this problem. The ap- proach is flexible, as it can be improved with any new rein- forcement learning enhancement, it allows inclusion of pre- trained high-performance classifier, and unlike prior art, its performance is robust across all evaluated datasets. Costly Features Features can only be retrieved only ađer paying some cost, which can be in form of money, me or other resources. Feature group Cost Common informaধon (weight, age, …) $0 Simple examinaধons (heart rate, blood pressure) $5 Blood screeing $ 20 Radiograph $ 50 Magneধc resonance $ 200 Table 1. Example features and their costs from medical domain. The goal is to find opধmal soluধon for any given sample that maximizes accuracy and minimizes cost. Sequential Decision-Making Problem One sample at a ধme, we sequenally acquire features, un- ধl enough informaধon is gathered and classificaধon is per- formed. Problem definition The goal is to maximize expected accuracy while minimiz- ing cost. The model is a pair of funcধons: y θ (x) class, z θ (x) cost. Objecধve: min θ E (x,y )∈D (y θ (x),y )+ λz θ (x) We use binary loss and fixed cost vector c. y,y )= 0 if ˆ y = y 1 if ˆ y = y z θ (x)= c · f θ (x) Effect of different λ The λ controls the weight of the feature costs in the objec- ধve. Low labda translates to cheap features, high lambda to expensive features. 0 10 20 30 Cost 0.75 0.80 0.85 0.90 0.95 Accuracy λ =0.0001 λ =0.001 λ =0.01 λ =0.1 λ =1.0 Markov Decision Process ¯ x a f 4 r = -λc f 4 a y 1 class a f 2 r = -λc f 2 a y 2 r = -y,y ) ··· ··· The problem is modeled as an MDP with full informaধon. The agent has only parধal-visibility of states. s =(x, y, F ) ∈S A = A c ∪A f r (s, a)= -λc(a) -(a, y ) t(s, a)= (x, y, F∪ a) if a ∈A f T if a ∈A c We treat each sample as a separate episode. The total re- ward per episode is: R(x, y )= - (y θ (x),y )+ λz θ (x) So, by maximizing the expected reward, we are solving the problem objecধve. Model We use Q-learning with funcধon approximaধon in various im- plementaধons: linear approximaধon, approximaধon with neu- ral networks (DQN) and with recent deep RL techniques (dou- ble Q-learning, dueling architecture and Retrace). neural network features ¯ x mask m Q values Extensions Pretraining: Since the Q-values for A c acধons are terminal, this part of the model can be pretrained supervisedly. High-Performance Classifier: We can include another classi- fier as a terminal acধon to our model. Deep RL extensions: Any new techniques from Deep RL can be directly used. We implemented Double DQN with Dueling architecture and Retrace importance sampling. Other possibiliধes: Cost-sensiধve classificaধon, more HPCs, feature grouping, hard budget, etc. Analysis of behaviour We analyzed the behaviour of the agent on miniboone dataset, ploষng a histogram of amount of used features for different seষngs. 0 20 40 Used features 0% 2% 4% 6% Percentage 0 20 40 Used features 0% 1% 2% 3% 4% 5% Percentage 0 20 40 Used features 0% 2% 4% 6% 8% 10% Percentage HPC used (a) λ =0.003 (b) λ =0.0003 (c) HPC, λ =0.0003 Acknowledgments: This research was supported by the European Of- fice of Aerospace Research and Development (grant no. FA9550-18-1- 7008) and by The Czech Science Foundaধon (grants no. 18-21409S and 18-27483Y). The GPU used for this research was donated by the NVIDIA Corporaধon. Computaধonal resources were provided by the CESNET LM2015042 and the CERIT Scienধfic Cloud LM2015085, provided un- der the program Projects of Large Research, Development, and Innova- ধons Infrastructures. Results We compare to AdaptGbrt [2], BudgetPrune [3], Q-learning with linear approximaধon [1], DQN implementaধon and base- line neural network and SVM (with all features). 0 20 40 Used features 0.84 0.86 0.88 0.90 0.92 0.94 Accuracy 0 20 40 Used features 0.92 0.94 0.96 0.98 Accuracy RL RL-dqn RL-lin BudgetPrune AdaptGbrt plain NN SVM (a) miniboone (b) forest-2 0 100 200 300 400 Used features 0.1 0.2 0.3 0.4 Accuracy 0 200 400 600 Used features 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy 0 2 4 6 Cost 0.7 0.8 0.9 1.0 Accuracy (c) cifar (d) mnist (e) wine 0 1 2 3 Cost 0.35 0.40 0.45 0.50 0.55 Accuracy 0 20 40 Used features 0.6 0.7 0.8 0.9 Accuracy 0 100 200 300 Used features 0.50 0.55 0.60 0.65 Accuracy (f) yeast (g) forest (h) cifar-2 Figure 1. Comparison with prior-art algorithms. Dataset feats. cls. #trn #val #tst costs mnist 784 10 50k 10k 10k U cifar 400 10 40k 10k 10k U cifar-2 400 2 40k 10k 10k U forest 54 7 200k 81k 300k U forest-2 54 2 200k 81k 300k U miniboone 50 2 45k 19k 65k U wine 13 3 70 30 78 V yeast 8 10 600 200 684 V Table 2. Used datasets. The cost is either uniform (U) or variable (V). Using HPC and pretraining We studied using HPC and pretraining and noted that using HPC is not always helpful (see Figure 3c). However pretrain- ing always bootstrap the training so that it converges quicker. 0 10 20 30 Used features 0.90 0.91 0.92 0.93 0.94 0.95 Accuracy RL RL-dqn RL-nohpc RL-nopretrain plain NN SVM 0 20 40 Used features 0.90 0.91 0.92 0.93 0.94 0.95 0.96 Accuracy 0 10 20 Used features 0.95 0.96 0.97 0.98 Accuracy RL RL-nohpc RL-fakehpc plain NN SVM (a) miniboone (b) forest (c) forest-2 (fake-HPC) 0 500 1000 1500 2000 Training steps -0.6 -0.5 -0.4 -0.3 -0.2 Reward RL RL-dqn RL-nohpc RL-nopretrain Figure 2. Do HPC and pretraining bring any value? References [1] Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, and Patrick Gallinari. Datum-wise classificaধon: a sequenধal approach to spar- sity. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 375–390. Springer, 2011. [2] Feng Nan and Venkatesh Saligrama. Adapধve classificaধon for pre- dicধon under a budget. In Advances in Neural Informaon Processing Systems, pages 4730–4740, 2017. [3] Feng Nan, Joseph Wang, and Venkatesh Saligrama. Pruning random forests for predicধon on a budget. In Advances in Neural Informaon Processing Systems, pages 2334–2342, 2016.

Upload: others

Post on 10-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification with Costly Features using Deep Reinforcement … · 2020. 3. 26. · Title: Classification with Costly Features using Deep Reinforcement Learning Author: JaromÃr

Classification with Costly Features using Deep Reinforcement LearningJaromír Janisch • Tomáš Pevný • Viliam Lisý

Artificial Intelligence Center, Dept. of Computer Science, FEE, CTU in Prague

Abstract

TL;DR:With approximated Q-learning, we train a classifier that

maximizes accuracy while minimizing required informa on.

We study a classifica on problem where each feature can

be acquired for a cost and the goal is to op mize a trade-

off between the expected classifica on error and the fea-

ture cost. We revisit a former approach that has framed

the problem as a sequen al decision-making problem and

solved it by Q-learning with a linear approxima on, where

individual ac ons are either requests for feature values or

terminate the episode by providing a classifica on decision.

On a set of eight problems, we demonstrate that by re-

placing the linear approxima on with neural networks the

approach becomes comparable to the state-of-the-art al-

gorithms developed specifically for this problem. The ap-

proach is flexible, as it can be improved with any new rein-

forcement learning enhancement, it allows inclusion of pre-

trained high-performance classifier, and unlike prior art, its

performance is robust across all evaluated datasets.

Costly Features

Features can only be retrieved only a er paying some cost,

which can be in form of money, me or other resources.

Feature group Cost

Common informa on (weight, age, …) $ 0

Simple examina ons (heart rate, blood pressure) $ 5

Blood screeing $ 20

Radiograph $ 50

Magne c resonance $ 200

Table 1. Example features and their costs from medical domain.

The goal is to find op mal solu on for any given sample that

maximizes accuracy and minimizes cost.

Sequential Decision-Making Problem

One sample at a me, we sequen ally acquire features, un-

l enough informa on is gathered and classifica on is per-

formed.

Problem definition

The goal is to maximize expected accuracy while minimiz-

ing cost. The model is a pair of func ons: yθ(x) → class,

zθ(x) → cost.

Objec ve:

minθ

E(x,y)∈D

[`(yθ(x), y) + λzθ(x)

]

We use binary loss and fixed cost vector c.

`(y, y) ={

0 if y 6= y

1 if y = yzθ(x) = c · fθ(x)

Effect of different λ

The λ controls the weight of the feature costs in the objec-

ve. Low labda translates to cheap features, high lambda to

expensive features.

0 10 20 30Cost

0.75

0.80

0.85

0.90

0.95

Acc

urac

y

λ = 0.0001λ = 0.001

λ = 0.01

λ = 0.1

λ = 1.0

Markov Decision Process

x

af4 r = −λcf4

ay1

class

af2 r = −λcf2

ay2 r = −`(y, y)· · ·

· · ·

The problem is modeled as an MDP with full informa on.

The agent has only par al-visibility of states.

s = (x, y, F) ∈ S A = Ac ∪ Af

r(s, a) ={

−λc(a)−`(a, y) t(s, a) =

{(x, y, F ∪ a) if a ∈ Af

T if a ∈ Ac

We treat each sample as a separate episode. The total re-

ward per episode is:

R(x, y) = −[`(yθ(x), y) + λzθ(x)

]So, by maximizing the expected reward, we are solving the

problem objec ve.

Model

We use Q-learning with func on approxima on in various im-

plementa ons: linear approxima on, approxima onwith neu-

ral networks (DQN) andwith recent deep RL techniques (dou-

ble Q-learning, dueling architecture and Retrace).

neural network

features xmask m

Q values

Extensions

Pretraining: Since the Q-values for Ac ac ons are terminal,

this part of the model can be pretrained supervisedly.

High-Performance Classifier: We can include another classi-

fier as a terminal ac on to our model.

Deep RL extensions: Any new techniques from Deep RL can

be directly used. We implemented Double DQNwith Dueling

architecture and Retrace importance sampling.

Other possibili es: Cost-sensi ve classifica on, more HPCs,

feature grouping, hard budget, etc.

Analysis of behaviour

We analyzed the behaviour of the agent on miniboone

dataset, plo ng a histogram of amount of used features for

different se ngs.

0 20 40Used features

0%

2%

4%

6%

Per

cent

age

0 20 40Used features

0%

1%

2%

3%

4%

5%

Per

cent

age

0 20 40Used features

0%

2%

4%

6%

8%

10%

Per

cent

age

HPC used

(a) λ = 0.003 (b) λ = 0.0003 (c) HPC, λ = 0.0003

Acknowledgments: This research was supported by the European Of-

fice of Aerospace Research and Development (grant no. FA9550-18-1-

7008) and by The Czech Science Founda on (grants no. 18-21409S and

18-27483Y). The GPU used for this research was donated by the NVIDIA

Corpora on. Computa onal resources were provided by the CESNET

LM2015042 and the CERIT Scien fic Cloud LM2015085, provided un-

der the program Projects of Large Research, Development, and Innova-

ons Infrastructures.

Results

We compare to AdaptGbrt [2], BudgetPrune [3], Q-learning

with linear approxima on [1], DQN implementa on and base-

line neural network and SVM (with all features).

0 20 40Used features

0.84

0.86

0.88

0.90

0.92

0.94

Acc

urac

y

0 20 40Used features

0.92

0.94

0.96

0.98

Acc

urac

y RL

RL-dqn

RL-lin

BudgetPrune

AdaptGbrt

plain NN

SVM

(a) miniboone (b) forest-2

0 100 200 300 400Used features

0.1

0.2

0.3

0.4

Acc

urac

y

0 200 400 600Used features

0.5

0.6

0.7

0.8

0.9

1.0

Acc

urac

y

0 2 4 6Cost

0.7

0.8

0.9

1.0

Acc

urac

y

(c) cifar (d) mnist (e) wine

0 1 2 3Cost

0.35

0.40

0.45

0.50

0.55

Acc

urac

y

0 20 40Used features

0.6

0.7

0.8

0.9

Acc

urac

y

0 100 200 300Used features

0.50

0.55

0.60

0.65

Acc

urac

y

(f) yeast (g) forest (h) cifar-2

Figure 1. Comparison with prior-art algorithms.

Dataset feats. cls. #trn #val #tst costs

mnist 784 10 50k 10k 10k U

cifar 400 10 40k 10k 10k U

cifar-2 400 2 40k 10k 10k U

forest 54 7 200k 81k 300k U

forest-2 54 2 200k 81k 300k U

miniboone 50 2 45k 19k 65k U

wine 13 3 70 30 78 V

yeast 8 10 600 200 684 V

Table 2. Used datasets. The cost is either uniform (U) or variable (V).

Using HPC and pretraining

We studied using HPC and pretraining and noted that using

HPC is not always helpful (see Figure 3c). However pretrain-

ing always bootstrap the training so that it converges quicker.

0 10 20 30Used features

0.90

0.91

0.92

0.93

0.94

0.95

Acc

urac

y

RL

RL-dqn

RL-nohpc

RL-nopretrain

plain NN

SVM

0 20 40Used features

0.90

0.91

0.92

0.93

0.94

0.95

0.96

Acc

urac

y

0 10 20Used features

0.95

0.96

0.97

0.98

Acc

urac

y

RL

RL-nohpc

RL-fakehpc

plain NN

SVM

(a) miniboone (b) forest (c) forest-2 (fake-HPC)

0 500 1000 1500 2000Training steps

−0.6

−0.5

−0.4

−0.3

−0.2

Rew

ard

RL

RL-dqn

RL-nohpc

RL-nopretrain

Figure 2. Do HPC and pretraining bring any value?

References

[1] Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, and Patrick

Gallinari. Datum-wise classifica on: a sequen al approach to spar-

sity. In Joint European Conference on Machine Learning and Knowledge

Discovery in Databases, pages 375–390. Springer, 2011.

[2] Feng Nan and Venkatesh Saligrama. Adap ve classifica on for pre-

dic on under a budget. In Advances in Neural Informa on Processing

Systems, pages 4730–4740, 2017.

[3] Feng Nan, Joseph Wang, and Venkatesh Saligrama. Pruning random

forests for predic on on a budget. In Advances in Neural Informa on

Processing Systems, pages 2334–2342, 2016.