classification with costly features using deep reinforcement … · 2020. 3. 26. · title:...
TRANSCRIPT
Classification with Costly Features using Deep Reinforcement LearningJaromír Janisch • Tomáš Pevný • Viliam Lisý
Artificial Intelligence Center, Dept. of Computer Science, FEE, CTU in Prague
Abstract
TL;DR:With approximated Q-learning, we train a classifier that
maximizes accuracy while minimizing required informa on.
We study a classifica on problem where each feature can
be acquired for a cost and the goal is to op mize a trade-
off between the expected classifica on error and the fea-
ture cost. We revisit a former approach that has framed
the problem as a sequen al decision-making problem and
solved it by Q-learning with a linear approxima on, where
individual ac ons are either requests for feature values or
terminate the episode by providing a classifica on decision.
On a set of eight problems, we demonstrate that by re-
placing the linear approxima on with neural networks the
approach becomes comparable to the state-of-the-art al-
gorithms developed specifically for this problem. The ap-
proach is flexible, as it can be improved with any new rein-
forcement learning enhancement, it allows inclusion of pre-
trained high-performance classifier, and unlike prior art, its
performance is robust across all evaluated datasets.
Costly Features
Features can only be retrieved only a er paying some cost,
which can be in form of money, me or other resources.
Feature group Cost
Common informa on (weight, age, …) $ 0
Simple examina ons (heart rate, blood pressure) $ 5
Blood screeing $ 20
Radiograph $ 50
Magne c resonance $ 200
Table 1. Example features and their costs from medical domain.
The goal is to find op mal solu on for any given sample that
maximizes accuracy and minimizes cost.
Sequential Decision-Making Problem
One sample at a me, we sequen ally acquire features, un-
l enough informa on is gathered and classifica on is per-
formed.
Problem definition
The goal is to maximize expected accuracy while minimiz-
ing cost. The model is a pair of func ons: yθ(x) → class,
zθ(x) → cost.
Objec ve:
minθ
E(x,y)∈D
[`(yθ(x), y) + λzθ(x)
]
We use binary loss and fixed cost vector c.
`(y, y) ={
0 if y 6= y
1 if y = yzθ(x) = c · fθ(x)
Effect of different λ
The λ controls the weight of the feature costs in the objec-
ve. Low labda translates to cheap features, high lambda to
expensive features.
0 10 20 30Cost
0.75
0.80
0.85
0.90
0.95
Acc
urac
y
λ = 0.0001λ = 0.001
λ = 0.01
λ = 0.1
λ = 1.0
Markov Decision Process
x
af4 r = −λcf4
ay1
class
af2 r = −λcf2
ay2 r = −`(y, y)· · ·
· · ·
The problem is modeled as an MDP with full informa on.
The agent has only par al-visibility of states.
s = (x, y, F) ∈ S A = Ac ∪ Af
r(s, a) ={
−λc(a)−`(a, y) t(s, a) =
{(x, y, F ∪ a) if a ∈ Af
T if a ∈ Ac
We treat each sample as a separate episode. The total re-
ward per episode is:
R(x, y) = −[`(yθ(x), y) + λzθ(x)
]So, by maximizing the expected reward, we are solving the
problem objec ve.
Model
We use Q-learning with func on approxima on in various im-
plementa ons: linear approxima on, approxima onwith neu-
ral networks (DQN) andwith recent deep RL techniques (dou-
ble Q-learning, dueling architecture and Retrace).
neural network
features xmask m
Q values
Extensions
Pretraining: Since the Q-values for Ac ac ons are terminal,
this part of the model can be pretrained supervisedly.
High-Performance Classifier: We can include another classi-
fier as a terminal ac on to our model.
Deep RL extensions: Any new techniques from Deep RL can
be directly used. We implemented Double DQNwith Dueling
architecture and Retrace importance sampling.
Other possibili es: Cost-sensi ve classifica on, more HPCs,
feature grouping, hard budget, etc.
Analysis of behaviour
We analyzed the behaviour of the agent on miniboone
dataset, plo ng a histogram of amount of used features for
different se ngs.
0 20 40Used features
0%
2%
4%
6%
Per
cent
age
0 20 40Used features
0%
1%
2%
3%
4%
5%
Per
cent
age
0 20 40Used features
0%
2%
4%
6%
8%
10%
Per
cent
age
HPC used
(a) λ = 0.003 (b) λ = 0.0003 (c) HPC, λ = 0.0003
Acknowledgments: This research was supported by the European Of-
fice of Aerospace Research and Development (grant no. FA9550-18-1-
7008) and by The Czech Science Founda on (grants no. 18-21409S and
18-27483Y). The GPU used for this research was donated by the NVIDIA
Corpora on. Computa onal resources were provided by the CESNET
LM2015042 and the CERIT Scien fic Cloud LM2015085, provided un-
der the program Projects of Large Research, Development, and Innova-
ons Infrastructures.
Results
We compare to AdaptGbrt [2], BudgetPrune [3], Q-learning
with linear approxima on [1], DQN implementa on and base-
line neural network and SVM (with all features).
0 20 40Used features
0.84
0.86
0.88
0.90
0.92
0.94
Acc
urac
y
0 20 40Used features
0.92
0.94
0.96
0.98
Acc
urac
y RL
RL-dqn
RL-lin
BudgetPrune
AdaptGbrt
plain NN
SVM
(a) miniboone (b) forest-2
0 100 200 300 400Used features
0.1
0.2
0.3
0.4
Acc
urac
y
0 200 400 600Used features
0.5
0.6
0.7
0.8
0.9
1.0
Acc
urac
y
0 2 4 6Cost
0.7
0.8
0.9
1.0
Acc
urac
y
(c) cifar (d) mnist (e) wine
0 1 2 3Cost
0.35
0.40
0.45
0.50
0.55
Acc
urac
y
0 20 40Used features
0.6
0.7
0.8
0.9
Acc
urac
y
0 100 200 300Used features
0.50
0.55
0.60
0.65
Acc
urac
y
(f) yeast (g) forest (h) cifar-2
Figure 1. Comparison with prior-art algorithms.
Dataset feats. cls. #trn #val #tst costs
mnist 784 10 50k 10k 10k U
cifar 400 10 40k 10k 10k U
cifar-2 400 2 40k 10k 10k U
forest 54 7 200k 81k 300k U
forest-2 54 2 200k 81k 300k U
miniboone 50 2 45k 19k 65k U
wine 13 3 70 30 78 V
yeast 8 10 600 200 684 V
Table 2. Used datasets. The cost is either uniform (U) or variable (V).
Using HPC and pretraining
We studied using HPC and pretraining and noted that using
HPC is not always helpful (see Figure 3c). However pretrain-
ing always bootstrap the training so that it converges quicker.
0 10 20 30Used features
0.90
0.91
0.92
0.93
0.94
0.95
Acc
urac
y
RL
RL-dqn
RL-nohpc
RL-nopretrain
plain NN
SVM
0 20 40Used features
0.90
0.91
0.92
0.93
0.94
0.95
0.96
Acc
urac
y
0 10 20Used features
0.95
0.96
0.97
0.98
Acc
urac
y
RL
RL-nohpc
RL-fakehpc
plain NN
SVM
(a) miniboone (b) forest (c) forest-2 (fake-HPC)
0 500 1000 1500 2000Training steps
−0.6
−0.5
−0.4
−0.3
−0.2
Rew
ard
RL
RL-dqn
RL-nohpc
RL-nopretrain
Figure 2. Do HPC and pretraining bring any value?
References
[1] Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, and Patrick
Gallinari. Datum-wise classifica on: a sequen al approach to spar-
sity. In Joint European Conference on Machine Learning and Knowledge
Discovery in Databases, pages 375–390. Springer, 2011.
[2] Feng Nan and Venkatesh Saligrama. Adap ve classifica on for pre-
dic on under a budget. In Advances in Neural Informa on Processing
Systems, pages 4730–4740, 2017.
[3] Feng Nan, Joseph Wang, and Venkatesh Saligrama. Pruning random
forests for predic on on a budget. In Advances in Neural Informa on
Processing Systems, pages 2334–2342, 2016.