george boulougaris, kostas kolomvatsos, stathes hadjiefthymiades building the knowledge base of a...

George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades

Building the Knowledge Base of a Buyer Agent Using Reinforcement

Learning Techniques

Pervasive Computing Research Group, Department of Informatics

and TelecommunicationsUniversity of Athens, Greece

WCCI – IJNN 2010

Barcelona - Spain

Outline

2

IntroductionMarket MembersScenarioBuyer Q-Table Buyer Purchase BehaviorResults

Introduction

3

Intelligent AgentsAutonomous software componentsRepresent UsersLearn from their owners

Electronic MarketsPlaces where entities not known in advance

can negotiate over the exchange of productsReinforcement Learning

General framework for sequential decision making

Leads to the maximum long-term reward at every state of the world

Market Members

4

BuyersSellersMiddle entities (matchmakers, brokers,

market entities)Intelligent agents may represent each of these

entitiesEntities do not have any information about

the rest in the market

Scenario (1/2)

5

Buyers: could interact with sellerscould interact with brokers or matchmakers (matchmakers

cannot sell products)want to buy the most appropriate product in the most

profitable priceWe focus on the interaction between buyers and

selling entities (sellers or brokers)Most of the research efforts focus only on the

reputation of entitiesWe utilize Q-Learning that is appropriate to result

actions that lead to the maximum long-term reward (based on a number of parameters) at every state of the world

Scenario (2/2)

6

The products parameters for each selling entity are:IDTime validityPriceTime availabilityRelevance

Each selling entity represents the state that the buyer is

Buyer Q-Table (1/3)

7

The buyer has one Q-Table for each productRows represent states and columns represent

actionsThere are M+1 columns (M is the number of

selling entities)Actions [1..M] represent the transition to the [1..M]

entity (row of the Q-Table)The transition to another entity corresponds to a

‘not-buy-from-this-entity’ actionAction M+1 represent the purchase action (from

the specific entity)The buyer final Q-Table is a 3D table

Buyer Q-Table (2/3)

8

The buyer takes into consideration the following information in order to build the Q-Table:Relevancy factorPriceResponse timeNumber of transitions

The equation used is:

where l is the learning rate, r is the reward, γ is the future reward discount factor, st and at is the state and the action at the time t

a)Q(s,)a',Q(s'maxγrl)a,Q(s)a,Q(s

Aa'tttt

Buyer Q-Table (3/3)

9

Issues concerning the reward:has 5% decrement when deal with entities not having the

productis based on:

the reward for the relevancythe reward for the pricethe reward for the response timethe reward for the required transitions

the greater the relevancy is the greater the reward becomes

the smaller the price is the greater the reward becomesthe smaller the response time is the greater the reward

becomesthe smaller the number of transitions is the greater the

reward becomes

Buyer Purchase Behavior

10

The buyer is based on the Q-Table for the purchase action

There are two phases in its behaviorFirst Phase

It creates the Q-TableIt uses a specific number of episodes in the training

phaseSecond Phase

It utilizes the Q-Table for its purchasesAt first randomly selects an entity (row) for a specific

productAccordingly selects the action with the highest rewardIf the best action is to return to a previous visited entity

with inability to deliver, the purchase is not feasible

Results (1/4)

11

We consider a dynamic market where the number and the characteristics of entities is not static

In our experiments we take into consideration the following probabilities:2% that a new product is available in an entity5% that a product is totally new in the market5% that a product is no longer available in an

entity2% that an entity is totally new in the market1% that an entity is not able anymore for

negotiationsWe examine the purchases of 400 products in

each experiment

Results (2/4)

12

Tables creation time resultsEntities Number

(5 Products each)

First Table creation time

(ms)

Average tables creation time (except

first table) (ms)4 15 0

15 125 17.8650 1685 402.73

100 16520 3546.44200 208088 41846.64

Entities Number(40 Products

each)

First Table creation time (ms)

Average tables creation time

(except first table) (ms)

6 156 15.5015 561 114.3330 3510 453.7360 32667 2386.50

100 191303 8254.22

Products Number(15 Entities)

First Table creation time (ms)

Average tables creation time

(except first table) (ms)

5 125 17.8640 561 114.3380 1029 210.60

150 1731 374.57500 6155 1319.14

1000 14917 2940.865000 193644 14914.00

Results (3/4)

13

Q-Learning reduces the required purchase stepsEntities

Number(5 Products

each)

Total moves for 400

Products

Total moves for 400

Products (without using

Q-learning)

Moves reduction using Q-learning

4 653 2000 -67.35%15 716 6400 -88.81%50 714 20400 -96.50%

100 732 40400 -98.19%200 768 80400 -99.04%

Entities Number

(40 Products each)

Total moves for 400

Products

Total moves for 400

Products (without using

Q-learning)

Moves reduction using Q-learning

6 718 2800 -74.36%15 705 6400 -88.98%30 703 12400 -94.33%60 693 24400 -97.16%

100 712 40400 -98.24%

Results (4/4)

14

Q-Learning reduces the average price and the average response time as the number of entities increases

Q-Learning does not affect basic parameters as the number of products increases

15

Thank you

http://p-comp.di.uoa.gr

george boulougaris, kostas kolomvatsos, stathes hadjiefthymiades building the knowledge base of a...

Documents