modeling the process of collaboration and negotiation with incomplete information katia sycara,...
TRANSCRIPT
Modeling the Process of Collaboration and Negotiation with
Incomplete Information
Katia Sycara, Praveen Paruchuri, Nilanjan Chakraborty
Collaborators: Roie Zivan, Laurie Weingart, Geoff Gordon, Miro Dudik
MURI 14 Program Review-- September 10, 2009
2
TheoryFormation
Identify Cultural FactorsCUNY, Georgetown, CMU
Computational ModelsCMU, USC
Virtual HumansUSC
ImplementationCMU
RESEARCHPRODUCTS
Surveys & InterviewsCUNY, CMU, U Mich, Georgetown
Cross-Cultural Interactions
U Pitt, CMU
Data AnalysisCUNY, Georgetown,
U Pitt, CMU
validation
validationvalidation
Validated Theories
Models
Modeling Tools
Briefing Materials
Scenarios
Training Simulations
Common task
Subgroup task
MURI 14 Program Review-- September 10, 2009
3
Problem• Computational model of reasoning in
Cooperation and Negotiation (C&N)
• Capture the rich process of C&N– Not just outcome– Not just offer-counteroffer but additional
communications
• Account for cultural, social factors
• Rewards of other agents not known
• Uncertain and dynamic environment MURI 14 Program Review--
September 10, 2009
MURI 14 Program Review-- September 10, 2009
4
Contributions• Created an initial model from real human data. The
model:– Applicable in a uniform way to both collaboration and
negotiation – Derives sequences of actions for an agent from real
transcripts, as opposed to state of the art work where action selection is constructed heuristically
– Adapts its beliefs during the course of the interaction– Learns elements of the negotiation (e.g. other party
type) as the interaction proceeds– Produces optimal activity sequences considering also
the other agents– Has only incomplete information about others
POMDP: Partially Observable Markov Decision Process
• Agent has initial beliefs• Agent takes an action • Gets an observation• Interprets the observation• Updates beliefs• Decides on an action• Repeats
Agent takes optimal action considering world/other agents
Elements: {States, Actions, Transitions, Rewards, Observations }MURI 14 Program Review--
September 10, 2009
The World(Other agents)
The World(Other agents)
Agent
ActionObservation
MURI 14 Program Review-- September 10, 2009
6
Why POMDP based modeling ?
– Decentralized algorithm– Incorporated in an agent that interacts with others– Can represent communication (arguments, offers, preferences
etc)– Many conversational turns – Learns e.g. the model of the other player– Adaptive best response – Computationally efficient for realistic interactions– Extendable to more the two agents
Natural way to represent cultural and social factors in C and NMURI 14 Program Review--
September 10, 2009
MURI 14 Program Review-- September 10, 2009
7
Output of POMDP
• The output is a policy matrix
• Policy: Optimal action to take, given current state (observations and other’s model)
• At run-time, agent consults the matrix and takes appropriate action
MURI 14 Program Review-- September 10, 2009
8
Simplified Example
• Two agents negotiating– Seller S (POMDP Agent)– Buyer B (Other player)
• Single item negotiation
• Initially buyer at 0 price and seller at max = 10
MURI 14 Program Review-- September 10, 2009
MURI 14 Program Review-- September 10, 2009
9
Example: State Space• State composed of 2 parts –
– Seller Type, Buyer type– Negotiation status: current offers
• Agent types: cooperative or non-cooperative
• Negotiation modeled from Seller’s perspective– Initially high uncertainty of Buyer type
• Seller’s belief about Buyer, and state of negotiation are dynamic
MURI 14 Program Review-- September 10, 2009
MURI 14 Program Review-- September 10, 2009
10
Example: POMDP State• Agent Type: cooperative vs non-cooperative
– 0 cooperative, 1 non-cooperative
– Discretized to {0, .5 , 1}• Price discretized to the set {0,1,..,9,10}• Sample state:
• State space = Number of Buyer types * Negotiation states = 363
Me (Seller) Type= CoopYou (Buyer) = Unknown
Negotiation status: <S price, =$10; B price=$0>
MURI 14 Program Review-- September 10, 2009
MURI 14 Program Review-- September 10, 2009
11
Example: Action & Transition
• Action set: {Concede 2, Concede 1, Concede 0, Accept, Reject}
• Transition: Probability of ending in some state if agent takes a particular action in current state
MURI 14 Program Review-- September 10, 2009
MURI 14 Program Review-- September 10, 2009
12
Me = CoopYou = Unknown
My price = $10Your price = $0
Me = CoopYou = Coop
( $9, $0 )
Me = CoopYou = Coop
( $9, $1 )
Me = CoopYou = Coop
( $9, $2 )
Me = CoopYou = Ncoop
( $9, $0 )
Me = CoopYou = Ncoop
( $9, $1 )
Me = CoopYou = Ncoop
( $9, $2 )
Concede 1
Concede 0
0.1 0.7 0.2 0.6 0.35 0.05
Concede 2
0.5 0.50.35
0.65
( $4, $6) ( $6, $4)
Concede 1 0.75 0.25
Me = Coop
You = Coop
( $8, $0 )
Me = CoopYou = Coop
( $8, $1 )
Me = CoopYou = Coop
( $8, $2 )
0.1 0.7 0.2
Me = Coop
You = Coop
( $7, $0 )
Me = CoopYou = Coop
( $7, $1 )
Me = CoopYou = Coop
( $7, $2 )
0.1 0.4 0.5
Concede 2Concede 0
( $5, $5)
Concede 1
Concede 0
Agree
MURI 14 Program Review-- September 10, 2009
13
Building Initial Simplified POMDP• Human negotiation transcripts
– 2 players (Grocer and Florist) with 4 issues
• Mapped dialogues to 14 base codes (actions)
• Other player’s type known for each transcript– Used for training and validation of the model
• Transition: Frequency of reaching some state, given a code
• Observation: Frequency of observing a code given some negotiation state
MURI 14 Program Review-- September 10, 2009
MURI 14 Program Review-- September 10, 2009
14
POMDP construction
MURI 14 Program Review-- September 10, 2009
14
Grocer-Florist Transcript
<Player, Action code>
Model Generator
Model generated
Reasoning over model
Prescription of optimal actions given
state of interaction
(Empty)Learns
MURI 14 Program Review-- September 10, 2009
15
Codes usedCode Definition Code Definition Code Definition
OFFER REACTIONS Misc Miscellaneous
OS Single-Issue RPO Agreement to offer made
SBF Substantiation
OM Multi-Issue RPS Agreement with statement
Q Question
PROVIDE INFORMATION
RNO Disagreement with offer
PC Procedural Comment
IP Issue Preferences RNS Disagreement with statement
INT Summarizing
IR Priorities TP Threat/Power
IB Bottom-line
Courtesy of Laurie Weingart
MURI 14 Program Review-- September 10, 2009
16
Sample Grocer-Florist Transcript• Speaker Code Unit• Florist PC So let’s start with temperature • Grocer RPS Okay• Florist OS So I would suggest a temperature of 64 degrees• Grocer RPS Okay• Florist Q How does that work for you?• Grocer IP Well personally for the grocery I think it is better to have a
higher temperature • Grocer SBF Just because I want the customers to feel comfortable • Grocer SBF And if it is too cold that might turn the customers away a little
bit• Florist RPS Okay• Grocer SBF "And also if it is warm, people are more apt to buy cold drinks
to keep themselves comfortable and cool"• Florist RPS That's true.• Grocer OS I think 66 would be good. • Grocer SBF That way it is not too cold and it is not too hot as well.• Grocer SBF And its good for the customers.• Florist RPO "Okay, yeah"
• Assumed Florist is Cooperative
MURI 14 Program Review-- September 10, 2009
17
Grocer POMDP generated
Me = CoopYou = Coop
70F, 62F
Me = CoopYou = Coop
70F, 64F
Me = CoopYou = Coop
66F, 64F
Me = CoopYou = Coop
66F, 66F
Agrees without committing
Proposes 66F
Grocer substantiates his offer
Discuss preferences and support their positions
Reward 60 points for both Grocer and Florist
64FFlorist
Florist
Doesn’t commit
Florist
Agrees to 66F
MURI 14 Program Review-- September 10, 2009
18
Negotiation Game
MURI 14 Program Review-- September 10, 2009
18
Agent: (Grocer)Optimal POMDP policy
Human(Florist)
Grocer Action
Florist Action
•Sequential•Process oriented•Blends computational and social science results
MURI 14 Program Review-- September 10, 2009
19
Initial results – Classification of Florist• 10 transcripts for training: 4 cooperatives, 6 non-
cooperatives• 5 for testing –average of correctly classified• X axis – Number of communications• Y axis – Uncertainty of belief of grocer about florist
Un
cert
ain
ty o
f be
lief
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8
MURI 14 Program Review-- September 10, 2009
20
Modeling Cultural Factors
• How do we model cultural factors for C and N in a POMDP?
• How do we validate the model?
• Is the model general enough to exhibit plausible culturally-specific human behavior?
MURI 14 Program Review-- September 10, 2009
21
Culture and POMDP• Initial beliefs about others’ social value orientation and
behavior usually reflect own culture beliefs about the interaction
• Culture influences frequency of particular actions and communications
• Interpretation of each observation refines the agent’s model of others
• Interpretation is influenced by culture– Model can capture cultural misinterpretations and their
consequences in terms of strategy and outcomes• Agents from different cultures can have different rewards
for the same actions
MURI 14 Program Review-- September 10, 2009
22
Other’s type
• Includes factors such as:– Social Value Orientation
• Pro-Social/cooperative, individualistic, competitive, altruistic
– Trust, Reputation etc– Cultural factors
• Individualist vs Collectivist
• Egalitarian vs Hierarchy
• Direct vs Indirect communication
MURI 14 Program Review-- September 10, 2009
A’s culture
A’s history with B
Context
B’s culture
B’s history with A
ContextB’s behavior
A’s interpretation of B’s intent
A’s real intent
A’s behavior
B’s interpretation of A’s intent
B’s real intent
B’s schema
A’s schema
B’s schema
A’s schema
Cognitive Schema of A POMDP
State Space
Initial Beliefs
Actions
Observations
Transition
Reward
Reward
A’s culture
A’s history with B
Context
B’s culture
B’s history with A
ContextB’s behavior
A’s interpretation of B’s intent
A’s real intent
A’s behavior
B’s interpretation of A’s intent
B’s real intent
B’s schema
A’s schema
B’s schema
A’s schema
Capturing initial state of model
State Space
Initial Beliefs
Actions
Observations
Transition
Reward
Reward
Survey experiments
Observer Experiments
A’s culture
A’s history with B
Context
B’s culture
B’s history with A
ContextB’s behavior
A’s interpretation of B’s intent
A’s real intent
A’s behavior
B’s interpretation of A’s intent
B’s real intent
B’s schema
A’s schema
B’s schema
A’s schema
Capturing model dynamics
State Space
Initial Beliefs
Actions
Observations
Transition
Reward
Reward
Intercultural transcripts
MURI 14 Program Review-- September 10, 2009
26
Plans for Next Year• Initial beliefs from Observer Experiment and from
surveys (US, Turkey, Egypt, Qatar)• Collect intra-cultural negotiation transcripts
– US, Turkey, Egypt
• Build POMDPs from intra-cultural negotiation transcripts– US, Turkey, Egypt
• Build POMDPs from inter-cultural negotiation transcripts– US-Hong Kong, US-German, US-Israeli (have) (courtesy of Wendi
Adair and Jeanne Brett)
– US-Turkish, US-Egyptian, US-Qatari (collect)
MURI 14 Program Review-- September 10, 2009
27
Plans for Next Year• Validate the predictive behavior of the models
– Using the transcripts for training and testing
• Use the models in negotiation with humans
• Use the models in what-if scenarios
• Use the models to generate hypotheses to test with human subjects
• Initial models for collaboration scenarios using POMDP