classification with costly features using deep reinforcement … · 2020. 3. 26. · title:...

Classification with Costly Features using Deep Reinforcement LearningJaromír Janisch • Tomáš Pevný • Viliam Lisý

Artificial Intelligence Center, Dept. of Computer Science, FEE, CTU in Prague

Abstract

TL;DR:With approximated Q-learning, we train a classifier that

maximizes accuracy while minimizing required informa on.

We study a classifica on problem where each feature can

be acquired for a cost and the goal is to op mize a trade-

off between the expected classifica on error and the fea-

ture cost. We revisit a former approach that has framed

the problem as a sequen al decision-making problem and

solved it by Q-learning with a linear approxima on, where

individual ac ons are either requests for feature values or

terminate the episode by providing a classifica on decision.

On a set of eight problems, we demonstrate that by re-

placing the linear approxima on with neural networks the

approach becomes comparable to the state-of-the-art al-

gorithms developed specifically for this problem. The ap-

proach is flexible, as it can be improved with any new rein-

forcement learning enhancement, it allows inclusion of pre-

trained high-performance classifier, and unlike prior art, its

performance is robust across all evaluated datasets.

Costly Features

Features can only be retrieved only a er paying some cost,

which can be in form of money, me or other resources.

Feature group Cost

Common informa on (weight, age, …) $ 0

Simple examina ons (heart rate, blood pressure) $ 5

Blood screeing $ 20

Radiograph $ 50

Magne c resonance $ 200

Table 1. Example features and their costs from medical domain.

The goal is to find op mal solu on for any given sample that

maximizes accuracy and minimizes cost.

Sequential Decision-Making Problem

One sample at a me, we sequen ally acquire features, un-

l enough informa on is gathered and classifica on is per-

formed.

Problem definition

The goal is to maximize expected accuracy while minimiz-

ing cost. The model is a pair of func ons: yθ(x) → class,

zθ(x) → cost.

Objec ve:

minθ

E(x,y)∈D

[`(yθ(x), y) + λzθ(x)

]

We use binary loss and fixed cost vector c.

`(y, y) ={

0 if y 6= y

1 if y = yzθ(x) = c · fθ(x)

Effect of different λ

The λ controls the weight of the feature costs in the objec-

ve. Low labda translates to cheap features, high lambda to

expensive features.

0 10 20 30Cost

0.75

0.80

0.85

0.90

0.95

Acc

urac

y

λ = 0.0001λ = 0.001

λ = 0.01

λ = 0.1

λ = 1.0

Markov Decision Process

x

af4 r = −λcf4

ay1

class

af2 r = −λcf2

ay2 r = −`(y, y)· · ·

· · ·

The problem is modeled as an MDP with full informa on.

The agent has only par al-visibility of states.

s = (x, y, F) ∈ S A = Ac ∪ Af

r(s, a) ={

−λc(a)−`(a, y) t(s, a) =

{(x, y, F ∪ a) if a ∈ Af

T if a ∈ Ac

We treat each sample as a separate episode. The total re-

ward per episode is:

R(x, y) = −[`(yθ(x), y) + λzθ(x)

]So, by maximizing the expected reward, we are solving the

problem objec ve.

Model

We use Q-learning with func on approxima on in various im-

plementa ons: linear approxima on, approxima onwith neu-

ral networks (DQN) andwith recent deep RL techniques (dou-

ble Q-learning, dueling architecture and Retrace).

neural network

features xmask m

Q values

Extensions

Pretraining: Since the Q-values for Ac ac ons are terminal,

this part of the model can be pretrained supervisedly.

High-Performance Classifier: We can include another classi-

fier as a terminal ac on to our model.

Deep RL extensions: Any new techniques from Deep RL can

be directly used. We implemented Double DQNwith Dueling

architecture and Retrace importance sampling.

Other possibili es: Cost-sensi ve classifica on, more HPCs,

feature grouping, hard budget, etc.

Analysis of behaviour

We analyzed the behaviour of the agent on miniboone

dataset, plo ng a histogram of amount of used features for

different se ngs.

0 20 40Used features

0%

2%

4%

6%

Per

cent

age


0%

1%

2%

3%

4%

5%

Per

cent

age


0%

2%

4%

6%

8%

10%

Per

cent

age

HPC used

(a) λ = 0.003 (b) λ = 0.0003 (c) HPC, λ = 0.0003

Acknowledgments: This research was supported by the European Of-

fice of Aerospace Research and Development (grant no. FA9550-18-1-

7008) and by The Czech Science Founda on (grants no. 18-21409S and

18-27483Y). The GPU used for this research was donated by the NVIDIA

Corpora on. Computa onal resources were provided by the CESNET

LM2015042 and the CERIT Scien fic Cloud LM2015085, provided un-

der the program Projects of Large Research, Development, and Innova-

ons Infrastructures.

Results

We compare to AdaptGbrt [2], BudgetPrune [3], Q-learning

with linear approxima on [1], DQN implementa on and base-

line neural network and SVM (with all features).


0.84

0.86

0.88

0.90

0.92

0.94

Acc

urac

y


0.92

0.94

0.96

0.98

Acc

urac

y RL

RL-dqn

RL-lin

BudgetPrune

AdaptGbrt

plain NN

SVM

(a) miniboone (b) forest-2

0 100 200 300 400Used features

0.1

0.2

0.3

0.4

Acc

urac

y

0 200 400 600Used features

0.5

0.6

0.7

0.8

0.9

1.0

Acc

urac

y

0 2 4 6Cost

0.7

0.8

0.9

1.0

Acc

urac

y

(c) cifar (d) mnist (e) wine

0 1 2 3Cost

0.35

0.40

0.45

0.50

0.55

Acc

urac

y


0.6

0.7

0.8

0.9

Acc

urac

y


0.50

0.55

0.60

0.65

Acc

urac

y

(f) yeast (g) forest (h) cifar-2

Figure 1. Comparison with prior-art algorithms.

Dataset feats. cls. #trn #val #tst costs

mnist 784 10 50k 10k 10k U

cifar 400 10 40k 10k 10k U

cifar-2 400 2 40k 10k 10k U

forest 54 7 200k 81k 300k U

forest-2 54 2 200k 81k 300k U

miniboone 50 2 45k 19k 65k U

wine 13 3 70 30 78 V

yeast 8 10 600 200 684 V

Table 2. Used datasets. The cost is either uniform (U) or variable (V).

Using HPC and pretraining

We studied using HPC and pretraining and noted that using

HPC is not always helpful (see Figure 3c). However pretrain-

ing always bootstrap the training so that it converges quicker.


0.90

0.91

0.92

0.93

0.94

0.95

Acc

urac

y

RL

RL-dqn

RL-nohpc

RL-nopretrain

plain NN

SVM


0.90

0.91

0.92

0.93

0.94

0.95

0.96

Acc

urac

y


0.95

0.96

0.97

0.98

Acc

urac

y

RL

RL-nohpc

RL-fakehpc

plain NN

SVM

(a) miniboone (b) forest (c) forest-2 (fake-HPC)

0 500 1000 1500 2000Training steps

−0.6

−0.5

−0.4

−0.3

−0.2

Rew

ard

RL

RL-dqn

RL-nohpc

RL-nopretrain

Figure 2. Do HPC and pretraining bring any value?

References

[1] Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, and Patrick

Gallinari. Datum-wise classifica on: a sequen al approach to spar-

sity. In Joint European Conference on Machine Learning and Knowledge

Discovery in Databases, pages 375–390. Springer, 2011.

[2] Feng Nan and Venkatesh Saligrama. Adap ve classifica on for pre-

dic on under a budget. In Advances in Neural Informa on Processing

Systems, pages 4730–4740, 2017.

[3] Feng Nan, Joseph Wang, and Venkatesh Saligrama. Pruning random

forests for predic on on a budget. In Advances in Neural Informa on

Processing Systems, pages 2334–2342, 2016.

classification with costly features using deep reinforcement … · 2020. 3. 26. · title:...

Documents