are you for real? detecting identity fraud via dialogue

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processingand the 9th International Joint Conference on Natural Language Processing, pages 1762–1771,Hong Kong, China, November 3–7, 2019. c©2019 Association for Computational Linguistics

1762

Are You for Real? Detecting Identity Fraud via Dialogue Interactions

Weikang Wang1,2, Jiajun Zhang1,2, Qian Li3, Chengqing Zong1,2,4 and Zhifei Li31 National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China

2 University of Chinese Academy of Sciences, Beijing, China3 Mobvoi, Beijing, China

4 CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China{weikang.wang, jjzhang, cqzong}@nlpr.ia.ac.cn

{qli, zfli}@mobvoi.com

Abstract

Identity fraud detection is of great importancein many real-world scenarios such as the finan-cial industry. However, few studies addressedthis problem before. In this paper, we focuson identity fraud detection in loan applicationsand propose to solve this problem with a novelinteractive dialogue system which consists oftwo modules. One is the knowledge graph(KG) constructor organizing the personal infor-mation for each loan applicant. The other isstructured dialogue management that can dy-namically generate a series of questions basedon the personal KG to ask the applicants anddetermine their identity states. We also presenta heuristic user simulator based on problemanalysis to evaluate our method. Experimentshave shown that the trainable dialogue systemcan effectively detect fraudsters, and achievehigher recognition accuracy compared withrule-based systems. Furthermore, our learneddialogue strategies are interpretable and flexi-ble, which can help promote real-world appli-cations.1

1 Introduction

Identity fraud is one person using another person’spersonal information or combining a few piecesof real data with bogus information to deceive athird person. Nowadays, identity fraud is becomingan increasingly prevalent issue and has left manyfinancial firms nursing huge losses. Besides, forpersons whose identities have been stolen, theymay receive unexpected bills and their credit willalso be affected. Although identity fraud is a veryserious problem in modern society, there are noeffective fraud detection methods at present andlittle attention has been paid to this problem.

Intuitively, a simple way to detect identity fraudin loan applications is directly asking applicants

1https://github.com/Leechikara/Dialogue-Based-Anti-Fraud

Nanjing University Baidu No.3 Gulou Street

The fake personal information of a loan applicant:

System: When was Nanjing University founded?

Applicant: I am not sure.

System: Which is the nearest subway station to Nanjing University?

Applicant: I am not quite clear.

Decision: Fraud (Correct)

System: Which university did you graduate from?

Applicant: Nanjing University.

System: Which company do you work for?

Applicant: Baidu.

System: Where do you live?

Applicant: No.3 Gulou Street.

Decision: Non-Fraud (Wrong)

School Company Residence

Figure 1: Dialogue examples of two possible fraud de-tection methods. The first one is directly asking appli-cants about their personal information. The second oneis asking applicants about questions that are related totheir personal information.

about their personal information. However, asshown in Fig. 1, this method is prone to errorsbecause fraudsters may well know the fake infor-mation. Fortunately, we find fraudsters generallyare not clear about answers to questions that arerelated to the fake information.2 We refer to thesequestions as derived questions, which can be con-structed based on triplets where the head entity isthe personal information entity. For example, thefirst derived question about “Nanjing University” isbased on (Nanjing University, FoundedDate, 1902).In Fig. 1, the applicant claims to graduate from“Nanjing University” but can not answer derivedquestions about this school. This fact indicates thatthe applicant is likely to be a fraudster.

Based on the above finding, we aim to design a

2This finding is based on the premise that loan applicantsanswer questions without any help (e.g., using automatic QAsystems or information retrieval tools). In fact, this premise isreasonable in many real scenarios, such as dialogue with videocalls and phone calls in which we can monitor the applicantswith a camera or require them to answer questions within fewseconds (e.g., 5 seconds).

https://github.com/Leechikara/Dialogue-Based-Anti-Fraud

https://github.com/Leechikara/Dialogue-Based-Anti-Fraud

1763

dialogue system to detect identity fraud by askingderived questions. However, there are three majorchallenges in achieving this goal.

First, designing derived questions requires ahigh-quality KG. However, owing to the sparse-ness problem (Ji et al., 2016; Trouillon et al., 2017)of the KG, many entities have no triplets for derivedquestion generation. Second, randomly selectingtriplets to generate questions is feasible but it is notthe optimal questioning strategies to detect fraud-sters. Third, because of privacy issues, evaluatinganti-fraud systems with real applicants is not prac-tical. And existing user simulation methods (Liet al., 2016; Georgila et al., 2006; Pietquin and Du-toit, 2006) do not apply to our task. Hence, how toevaluate our systems efficiently is a problem.

To address the above problems, we first completean existing KG with geographic information in anelectronic map (Section 2). In the new KG, nearlyall personal information entities can find triplets forderived question generation. Then, based on theKG, we present structured dialogue management(Section 3) to explore the optimal dialogue strategywith reinforcement learning. Specifically, our di-alogue management consists of (1) the KG-baseddialogue state tracker (KG-DST) that treats em-beddings of nodes in the KG as dialogue states and(2) the hierarchical dialogue policy (HDP) wherehigh-level and low-level agents unfold the dialoguetogether. Finally, based on intuitive analysis, wefind the applicants’ behavior is related to somefactors (Section 5.1). Thus, we introduce hypothe-ses to formalize the effect of these factors on theapplicants’ behavior and propose a heuristic usersimulator to evaluate our systems.

Experiments have shown that the data-driven sys-tem significantly outperforms rule-based systemsin the fraud detection task. Besides, the ablationstudy proves that the proposed dialogue manage-ment can improve the recognition accuracy andlearning efficiency because of its ability to modelstructured information. We also analyze the behav-ior of our system and find the learned anti-fraudpolicy is interpretable and flexible.

To summarise, our main contributions are three-fold: (1) As far as we know, this is the first workto detect identity fraud through dialogue interac-tions. (2) We point out three major challenges ofidentity fraud detection and propose correspondingsolutions. (3) Experiments have shown that ourapproach can detect identity fraud effectively.

2 Knowledge Graph Constructor

There are four types of personal information in aChinese loan application form: “School”, “Com-pany”, “Residence” and “BirthPlace”. To generatederived questions, we link all personal informationentities to nodes in an existing Chinese KG3 andcrawl triplets that are directly related to them. How-ever, owing to the fact that the KG is largely sparse,nearly a half of entities4 cannot be linked. Thuswe use wealthy geographic information about or-ganizations and locations in electronic maps (e.g.,Amap5) to complete the KG.

Nanjing University

FoundedDate

1902

Park

Gulou Subway StationGulou Park

SubwayStation

Park

SubwayStation

Figure 2: An example of the KG for Nanjing University.The green edge represents the triplet crawled from theexisting KG and the blue edges represent the tripletsgenerated based on a navigation electronic map.

Specifically, for each personal information entity,we first crawl its points of interest (POI6) withinone kilometer and the POI types in the Amap. Ifthere are multiple POI for the same type, we onlykeep the nearest one. Then we generate triplets inthe form of ($Personal Information Entity$, $POItype$, $POI$) to indicate the fact that the nearest$POI type$ to the $Personal Information Entity$ is$POI$. Besides, for any two entities, if the distancebetween them is less than 100 meters, we gener-ate two triplets to represent the bi-directional adja-cency relation between them. In the end, as shownin Fig. 2, we combine triplets from the two infor-mation sources (the Chinese KG and the electronicmap) to construct a new KG. In this KG, nearly allpersonal information entities can be linked. Andfor each relation7, we design a language templatefor the question generation.

3 Dialogue System Design

The overview of our system is shown in Fig. 3.The core of the system is dialogue management

3https://www.ownthink.com4Most of them are “Residence” and “BirthPlace”.5https://www.amap.com6The POI are the specific locations (e.g., subway stations)

that someone may find useful in navigation systems.7After the data cleaning, there are 40 relations in all.

https://www.ownthink.com

https://www.amap.com

1764

Applicant

Natural Language Generation

Dialogue Management

Nanjing University

FoundedDate

1902 Park

Gulou Subway StationGulou Park

SubwayStation

Park

SubwayStation

No.3 Gulou Street

SubwayStation

Zifeng Tower

ShoppingMall

School

Residence

Baidu

Robin Li

Company

Founder

User

Response: D. I am not quite clear

Which is the nearest subway station to Nanjing University?A. Gulou Subway StationB. Xin Jiekou Subway StationC. Jimingsi Subway StationD. I am not quite clear

Question:

System Action:(Nanjing University, SubwayStation,

Gulou Subway Station)Not Explored Answer NodeExplored Answer NodeExtinct WorkerActivate WorkerManager

Figure 3: Overview of our approach. To build the directed graph for dialogue management, we reverse directionsof all edges in the original personal KG to make the head entity read information from its tail entities. Besides,we add a special node “User” and new edges to represent the applicant’s personal information. In this graph, thedirection of each edge is the direction of message passing in KG-DST. The blue and green edges indicate that twoagents select nodes to unfold the dialogue according to HDP.

which is organized as a directed graph G(V,E). Ineach turn, our system first infers dialogue stateswith the KG-based dialogue state tracker by com-puting embeddings of nodes. In this graph, theembedding of “User” node is the dialogue state ofa high-level agent (manager), and the embeddingsof nodes adjacent to “User” (named as personalinformation nodes) are the dialogue states of low-level agents (workers). Then our system unfoldsthe dialogue according to the hierarchical dialoguepolicy. Concretely, the manager first selects a per-sonal information node (e.g., “Nanjing University”)as the worker, and then the worker will select anode (e.g., “Gulou Subway Station”) from its pre-decessors (named as answer nodes). After that, thesampled nodes of two agents form the final systemaction (a triplet). Next, based on the triplet and apredefined template, the natural language genera-tion module will give a multiple-choice question tothe applicant. After the applicant gives a response,the embeddings of all nodes will be updated togenerate new dialogue states for the next turn.

3.1 KG-based Dialogue State Tracker

There are three types of nodes in G(V,E): the“User” node vu, the personal information nodevp ∈ Vp and the answer node va ∈ Va. In thet-th turn, KG-DST first gives an initial embeddingto v ∈ Vp ∪ Va. The initial embedding is theconcatenation of static features and dialogue fea-tures. Then, v will gather information from itspredecessors N(v). After multiple message pass-ings, we get its final embedding Et(v). Next, vu

will aggregate information from Vp to generateits embedding Et(vu). Finally, Et(vu) and Et(vp)are the dialogue states of the manager and workerrespectively.

Static Features. Specifically, for v ∈ Vp∪Va, thestatic features include the degree and type. Besides,for va ∈ Va, we use the “spread degree on the inter-net” to distinguish different answer nodes becausewe find there is an obvious correlation between this“spread degree” feature and applicants’ behavior inour human experiments (Section 5.1). To get the“spread degree” feature, we first treat the answernode va and its adjacent personal information nodevp as the keyword8, and then search it in the searchengine. The number9 of the retrieved results willbe the “spread degree” feature of va. In the end,each static feature is encoded as a one-hot vectorand they are concatenated to form a vector St(v).

Dialogue Features. The dialogue features recordthe dynamic information of v ∈ Vp∪Va during thedialogue. Specifically, dialogue features includewhether the node has been explored by the man-ager or workers and whether the node appearedin the system action of the last turn. In addition,for vp ∈ Vp, the dialogue features include the in-teraction turns of the corresponding worker andthe number of correctly/incorrectly answered ques-tions about vp. For va ∈ Va, the dialogue features

8In fact, the keyword is the head entity and tail entity of atriplet. For example, for the answer node “1902”, the keywordis “Nanjing University 1902”.

9If there are multiple keywords for an answer node (e.g.,“Gulou Subway Station”), we take the average.

1765

include whether applicants know va is the answerto a derived question. Similarly, dialogue featureswill be encoded as a one-hot vector Dt(v).

Message Passing. In Fig. 3, the applicant does notknow “Gulou Subway Station” is the nearest sub-way station to “Nanjing University”. In such case,the personal information about “School” may befake. Besides, for another question “What’s thenearest park to Nanjing University?”, the applicantmay not know the answer because the distance be-tween “Gulou Park” and “Gulou Subway Station”is less than 100 meters. Thus, we want the knowninformation of “Gulou Subway Station” to be sentto its successors.

Specifically, for v ∈ Vp ∪ Va, we compute itsembedding recursively as follows:

Ekt (v) = max

v′∈N(v)tanh

(WkEk−1

t (v′))

(1)

where Ekt (v) is the depth-k node embedding in

the t-th turn, N(v) denotes the set of nodes adja-cent to v, Wk is the parameter in the k-th iterationand the aggregate function is the element-wise maxoperation. The final node embedding is the con-catenation of embeddings at each depth:

Et(v) =[E0

t (v), ..., EKt (v)

](2)

where E0t (v) = [St(v), Dt(v)] and K is a hyper-

parameter.After getting the embedding of vp ∈ Vp, we

compute the embedding of vu by aggregating infor-mation from Vp:

Et(vu) = maxvp∈Vp

tanh (WpEt(vp)) (3)

where Wp is the parameter.In the end, Et(vp) is the worker’s dialogue state

which contains information of a part of the graphand Et(vu) is the manager’s dialogue state whichcontains information of the whole graph.

3.2 Hierarchical Dialogue PolicyAfter obtaining the dialogue states and node em-beddings, our system will unfold the dialogue ac-cording to a hierarchical policy.

Specifically, the manager first selects vp ∈ Vp

as a worker to verify the identity state of vp accord-ing to a high-level policy πm. Then, the workerwill choose some answer nodes from its predeces-sors N(vp) to generate questions about vp accord-ing to a low-level policy πw. If the worker gives

the decision dw ∈ {Fraud,Non-Fraud} about theidentity state of vp, πw will end and the managerwill select a new worker again or give the finaldecision. If the manager gives the final decisiondm ∈ {Fraud,Non-Fraud} about the applicant’sidentity state, πm will end. Formally, πm and πw

are defined as follows:

πmt (vp|Et(vu)) ∝ exp (Wm [Et(vu), Et(vp)] + bm)

πmt (dm|Et(vu)) ∝ exp (Wm [Et(vu), E(dm)] + bm)

πwt (va|Et(vp)) ∝ exp (Ww [Et(vp), Et(va)] + bw)

πwt (d

w|Et(vp)) ∝ exp (Ww [Et(vp), E(dw)] + bw)(4)

where {Wm,Ww, bm, bw, E(dm), E(dw)} are pa-rameters, Et(vu) and Et(vp) are dialogue states ofthe manager and worker in the t-th turn, E(dm)is the encoding of the manager’s terminal actionwhich has the same dimension as Et(vp), andE(dw) is the encoding of the worker’s terminalaction which has the same dimension as Et(va).

Besides, to prevent the two agents from mak-ing decisions in haste, domain rules are applied totheir dialogue policies by “Action Mask” (Williamset al., 2017). Specifically, domain rules are definedas follows. First, only after all or at least three an-swer nodes related to a worker have been exploredcan the worker make the decision. Second, onlyafter all workers have made decisions or at leastone worker’s decision is “Fraud” can the managermake the final decision.

4 Training

4.1 Reward Function

We expect the system can give correct decisionsabout applicants within minimum turns. Thus, atthe end of each dialogue, the manager receivesa positive reward rmcrt for correct decision, or anegative reward −rmwrg for wrong decision. If themanager selects a worker to unfold the dialoguein the t-th turn and the worker gives nwt questionsto the applicant, the manager will receive a nega-tive reward −nwt ∗ rturn. Besides, we provide aninternal reward to optimize the low-level policy.Specifically, if the worker gives a correct decisionabout the corresponding personal information, itwill receive a positive reward rwcrt. Otherwise, itwill receive a negative reward −rwwrg. And in eachturn, the worker receives a negative reward −rturnto encourage shorter interactions.

1766

4.2 Reinforcement LearningThe two agents can be trained with policy gradi-ent (Williams, 1992) approach as follows:

∇πmt =

(Rm

t − V mt

(Et(vu)

))∇ log πm

t

(amt |Et(vu)

)∇πw

t =(Rw

t − V wt

(Et(vp)

))∇ log πw

t

(awt |Et(vp)

)(5)

where Rmt and Rw

t are the discounted returns oftwo agents, amt and awt are their sampled actions,V mt

(Et(vu)

)and V w

t

(Et(vp)

)are value networks

which are optimized by minimizing mean-squareerrors to Rm

t and Rwt respectively.

4.3 Pre-TrainingBefore reinforcement learning (RL), supervisedlearning (SL) is applied to mimic dialogues pro-vided by a rule-based system. Rules are defined asfollows. First, the manager selects a worker ran-domly. Then, the worker will select answer nodesrandomly to generate questions. Let ncrt/nwrg de-notes the number of correctly/incorrectly answeredquestions in this worker’s decision process. If|ncrt−nwrg| ≥ 3 or all answer nodes related to thisworker have been explored, the worker will giveits decision. If ncrt < nwrg, the worker’s decisionwill be “Fraud” and the manager’s decision willbe “Fraud” too. Otherwise, the worker’s decisionwill be “Non-Fraud” and the manager will choosea new worker to continue the dialogue. In the end,if all workers’ decisions are both “Non-Fraud”, themanager’s decision will be “Non-Fraud”.

5 Experiments and Results

5.1 User Simulator and Human ExperimentsSimulating users’ behavior is an efficient way toevaluate dialogue systems. In our task, the ap-plicants’ behavior is answering derived questions.Thus, the key of user simulator is to estimate theprobability p(ki), where ki is a binary random vari-able which denotes whether or not the applicantknows the triplet fact ti behind a question qi.

Intuitively, p(ki) depends on three factors. First,if the applicant’s identity state is “Non-Fraud”,p(ki = 1) will be greater than p(ki = 0). Second,the wider a triplet fact ti spreads on the internet,the more likely applicants know it. For example, al-most all of applicants know (Baidu, Founder, RobinLi) because there are a lot of web pages containingthis fact on the internet. Third, if applicants knowother triplets that are related to ti, they may well

know ti because it is easy to deduce ti based onwhat they know. For example, if applicants know(Nanjing University, Park, Gulou Park) and (GulouPark, SubwayStation, Gulou Subway Station), theymay well know (Nanjing University, SubwaySta-tion, Gulou Subway Station).

To formalize the effect of the three factors onapplicants’ behavior, we introduce three hypothe-ses: (1) For both fraudsters and normal applicants,p(ki = 1) is proportional to the “spread degree” ofti. (2) The “spread degree” of ti can be approxi-mated by the number of retrieved results (denotedas Freq(ehi , e

ti)) in search engine where the key-

word is the head entity ehi and the tail entity eti ofti. (3) For any three triplets, if they form a closedloop (regardless of directions) and applicants knowtwo of them, the applicants must know all of them.

To generate simulated loan applicants, we firstestimate the function relations between p(ki = 1)and Freq(ehi , e

ti) via human experiments. Specif-

ically, we ask 31 volunteers to answer derivedquestions10 about their own and others’ personalinformation. And then, for the question qi, weplace it into a discrete bin according to the loga-rithm of Freq(ehi , e

ti). In each bin, we use the ratio

of correctly answered questions to approximatep(ki = 1). In the end, the relations are shown inFig. 4. We can find that the statistical distributionsof real behavior patterns of normal applicants andfraudsters are distinguishable and the results agreewith our first two intuitions.

[0.00, 1.84)[1.84, 3.68)

[3.68, 5.53)[5.53, 7.37)

[7.37, 9.21)[9.21, 11.05)

[11.05, 12.89)[12.89, 14.74)

[14.74, 16.58)[16.58, + )

log(Freq(ehi , et

i ) + 1)

0.2

0.4

0.6

0.8

1.0

p(k i

=1)

Non-FraudFraud

Figure 4: The relations between Freq(ehi , eti) and p(ki)

for two kinds of applicants. Freq(ehi , eti) is used to ap-

proximate the “spread degree” of ti. p(ki = 1) indi-cates the probability that applicants know ti.

Then, we get simulated loan applicants11 follow-ing a “sampling and calibration” manner. Specif-ically, given an applicant’s personal information,we first sample the identity state randomly. If the

10There are 1516 derived questions in all.11Note that we can generate any number of simulated appli-

cants based on one applicant’s personal information.

1767

Flat RuleHierarchical Rule MP-S HP-S Full-S

0.4

0.5

0.6

0.7

0.8

0.9

Accu

racy

0.7480.787

0.532

0.804

0.884

Flat RuleHierarchical Rule MP-S HP-S Full-S

6

7

8

9

10

11

Aver

age

Turn

s

10.0

10.4

8.0 7.9

9.8

Figure 5: Performance of different systems. Tested on 10 epochs using the best model during training.

sampling result is “Fraud”, we will sample 1 ∼ 4information item(s) randomly to be the fake in-formation. Generally, forging information about“School” and “Company” may result in a largerloan. Thus, when sampling the fake information,the sampling probability of “School” and “Com-pany” is twice the sampling probability of “Resi-dence” and “BirthPlace”. Then, for each personalinformation and its related triplet ti, we sampleki based on (1) whether the personal informationis fake (2) Freq(ehi , e

ti) and (3) the corresponding

function relation in Fig. 4. Because the sampling re-sults {k1, ...,kn} are independent from each other,there may be the situations where the samplingresults do not satisfy the rule defined in our thirdhypothesis. If that happens, we calibrate it until allsampling results agree with the hypothesis. Finally,if ki = 1, the applicant will give the correct an-swer to the question qi. Otherwise, the applicant’sresponse is “D. I am not quite clear.”.

5.2 Baselines

We compare our model (denoted as Full-S) withtwo rule-based baselines. In addition, to study theeffect of message passing and hierarchical policyon the model training, we compare Full-S with twoneural baselines for the ablation study.• Flat Rule: The system selects 10 questions

randomly to ask applicants. If the number ofcorrectly answered questions is fewer than thenumber of incorrectly answered questions, thesystem’s decision will be “Fraud”. Otherwise,the system’s decision will be “Non-Fraud”.• Hierarchical Rule: A rule-based system which

uses a hand-crafted hierarchical policy to un-fold dialogues. As shown in Section 4.3, weuse this system to pre-train Full-S.

• MP-S: A neural dialogue system which usesmessage passing to infer dialogue states butuses a flat policy to unfold dialogues. That is,the manager selects answer nodes directly togenerate derived questions.• HP-S: A neural dialogue system which uses

the hierarchical policy to unfold dialogues butdoes not use message passing to infer dialoguestates. That is, K is 0 in Eq. 2.

5.3 Implementation Details

We collect 906 applicants’ personal information,and randomly select 706 for training, 100 for dev,and 100 for test. In each batch, we sample 32applicants’ information for simulation. The maxi-mum interaction turns of the system and the workerare 40 and 10 respectively. The iteration depthK is 2 in message passing. In the reward func-tion, rmcrt = 3, rmwrg = 3, rwcrt = 1, rwwrg = 1,rturn = 0.1. The discount factors are 0.999 and0.99 for the manager and worker respectively. Allneural dialogue systems are both pre-trained withrule-based systems for 20 epochs. We pre-trainMP-S with Flat Rule because they both use the flatpolicy. Besides, we pre-train HP-S and Full-S withHierarchical Rule because they both use the hierar-chical policy. In the RL stage, all neural dialoguesystems are trained for 300 epochs. When testing,we repeat 10 epochs and take the average.

5.4 Test Performance

We compare Full-S with baselines in terms of twometrics: recognition accuracy and average turns.

Fig. 5 shows the test performance. We can seethat the accuracy of Flat Rule is lower than Hier-archical Rule, and the accuracy of the data-drivencounterpart of Flat Rule (MP-S) is just slightly

1768

20 70 120 170 220 270 320Simulation Epoch

0.4

0.5

0.6

0.7

0.8

0.9

Accu

racy

Full-SHP-SMP-S

Figure 6: Accuracy curves of different neural models indev set. The first “20 epochs” indicates the pre-trainingstage. The last “300 epochs” indicates the RL stage.

higher than randomly guessing. It means that usingthe hierarchical policy to unfold dialogues is neces-sary for our task. Besides, HP-S achieves a higheraccuracy than its rule-based counterpart (Hierarchi-cal Rule) within much fewer turns. It proves thatthe data-driven system is more efficient than therule-based system. Finally, equipped with messagepassing and hierarchical policy, Full-S achieves thebest accuracy. And it is interesting to note that Full-S requires more turns but achieves much higheraccuracy than HP-S. One possbile reason is thatHP-S may easily trap in local optimum withoutmessage passing to infer dialogue states.

5.5 Ablation Study

To study the effect of message passing and hierar-chical policy, we show the learning curves of threeneural dialogue systems in Fig. 6. Each learningcurve is averaged on 10 epochs.

We find that, compared with Full-S and HP-S,MP-S is unable to learn any useful dialogue pol-icy during training. There are two reasons for this.First, the action space of flat policy is too large,which results that MP-S suffers from the sparsereward and long horizon issues. Second, withoutexplicitly modeling the logic relation between themanager and workers, MP-S is prone to errors. Be-sides, we can see that the convergence speed ofFull-S is faster than HP-S in both the pre-trainingand the RL stages. This is because message passingcan model structured information of the KG, andhence Full-S is more efficient in policy learning.

5.6 Manager’s Policy Analysis

To better understand the high-level dialogue policy,we analyze the manager’s behavior in Full-S.

1 2 3 4Manager's Time Step

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

p(am t

)

SchoolCompanyResidenceBirthPlace

Figure 7: Manager’s action probability curves. Eachcurve indicates the probability of selecting a piece ofpersonal information to verify. For each curve, we takethe average of all dialogues during testing.

First, we show the manager’s action probabilitycurves in Fig. 7. We can see that selecting “School”and “Company” to verify personal information hasa priority over “Residence” and “BirthPlace” inthe first decision step. And in the following twodecision steps, the probabilities of selecting “Resi-dence” and “BirthPlace” will increase. This is be-cause simulated applicants tend to forge personalinformation about “School” and “Company” fora larger loan. Consequently, to discover fake in-formation faster, the manager learns to prioritizedifferent information items.

Second, intuitively the manager’s policy shouldfollow two logic rules in our task:

Rule1: If a worker’s decision is “Fraud” (Cond1),the dialogue should end immediately and the man-ager’s decision will be “Fraud” (RS1).

Rule2: If all workers’ decisions are both “Non-Fraud” (Cond2), the manager’s decision will be“Non-Fraud” (RS2).

To test whether the manager follows the two rules,we calculate the probabilities of RS1 and RS2under Cond1 and Cond2 respectively. Specifi-cally, in the test data, p(RS1|Cond1) = 0.95 andp(RS2|Cond2) = 0.96. It proves that the managerwill adopt workers’ suggestions in most situations.

Meanwhile, we study cases where the managerdoes not follow the two rules and find some interest-ing phenomena. Specifically, if only one worker’sdecision is “Fraud” and the applicant can answer afew questions given by this worker, the manager’sdecision may be “Non-Fraud”. Besides, if all work-ers’ decisions are both “Non-Fraud” but the appli-cant can not answer most of the questions given

1769

All triplets that are related to “Shanghai Sports University” (replaced with $School$ for short):

($School$, SuperMarket, Educational Supermarket) ($School$, PetMarket, Seasons Garden)($School$, LocatedIn, Shanghai) ($School$, FoundedDate, 2002)($School$, DigitalMall, JinLu Security) ($School$, FruitShop, Xiao Liu Fruit)($School$, ConvenienceStore, HaoDe) (Xiao Liu Fruit, ConvenienceStore, HaoDe)(HaoDe, FruitShop, Xiao Liu Fruit)

HP-S Full-S

System: Which is the nearest pet market to $School$? System: Which is the nearest pet market to $School$?Applicant: I am not quite clear. Applicant: I am not quite clear.System: Which is the nearest digital mall to $School$? System: Which is the nearest digital mall to $School$?Applicant: I am not quite clear. Applicant: I am not quite clear.System: Where is $School$ located? System: Which is the nearest fruit shop to $School$?Applicant: Shanghai. Applicant: Xiao Liu Fruit

System: Which is the nearest supermarket to $School$?Applicant: Educational SupermarketSystem: When was $School$ founded?Applicant: 2002

Decison: Fraud (Wrong) Decison: Non-Fraud (Correct)

Table 1: Examples of the low-level policies in two systems. Note that the information about “School” is not fake.

by one worker, the manager’s decision may still be“Fraud”. In fact, when the two cases happen, theworker may make the wrong decision. However,the manager can still give the correct decision. Itmeans the manager is robust to workers’ mistakes.

5.7 Worker’s Policy Analysis

To better understand the low-level dialogue policyand the effect of message passing on it, we compareworkers’ behaviors in HP-S and Full-S.

Table 1 shows an example of verifying personalinformation about “School” in HP-S and Full-S.We can see that the two systems give the same twoquestions in the first two turns. This is because thetriplets behind the two questions are rarely knownto fraudsters. It means that the low-level policieslearn to give priority to such triplets for better dis-tinguishing fraudsters from normal applicants. Inthe third turn, HP-S gives a question that is easy toanswer for fraudsters and makes the wrong deci-sion. However, Full-S notices the applicant givesthe correct answer to a question that is hard to an-swer for fraudsters. Thus, Full-S does not makethe decision in haste but continue the dialogue. Be-sides, it is worth noting that Full-S has not chosen($School$, ConvenienceStore, HaoDe) to generatethe derived question. This is because the messagepassing mechanism models the relation between“HaoDe” and “Xiao Liu Fruit”. Specifically, be-cause the two entities are closely related to eachother, if applicants know “Xiao Liu Fruit”, theymay well know “HaoDe”. Thus, there is no needto select this triplet anymore.

6 Related work

As far as we know, there is no published workabout detecting identity fraud via interactions. Wedescribe the two most related directions as follows:

Deception Detection. Detecting deception is alongstanding research goal in many artificial in-telligence topics. Existing work has mainly fo-cused on extracting useful features from non-verbalbehaviors (Meservy et al., 2005; Lu et al., 2005;Bhaskaran et al., 2011), speech cues (Levitan et al.,2018; Graciarena et al., 2006) or both (Krishna-murthy et al., 2018; Perez-Rosas et al., 2015) totrain a classification model. In their work, the defi-nition of deception is telling a lie. Besides, existingwork requires labeled data, which is often hard toget. In contrast, we focus on detecting identityfraud through multi-turn interactions and use rein-forcement learning to explore the anti-fraud policywithout any labeled data.

Dialogue System. Our work is also related totask-oriented dialogue systems (Young et al., 2013;Wen et al., 2017; Li et al., 2017; Gasic et al., 2011;Wang et al., 2018, 2019). Existing systems havemainly focused on slot-filling tasks (e.g., booking ahotel). In such tasks, a set of system actions can bepre-defined based on the business logic and slots.In contrast, the system actions in our task are select-ing nodes in the KG to generate questions. Thus,the structured information is important in our task.Besides, some works also try to model structured in-formation in dialogue systems. For example, Penget al. (2017) used hierarchical reinforcement learn-

1770

ing (Vezhnevets et al., 2017; Kulkarni et al., 2016;Florensa et al., 2017) to design multi-domain dia-logue management. Chen et al. (2018) used graphneural networks (Battaglia et al., 2018; Li et al.,2015; Scarselli et al., 2009; Niepert et al., 2016)to improve the sample-efficiency of reinforcementlearning. He et al. (2017) used DynoNet to incorpo-rate structured information in the collaborative dia-logue setting. Compared with them, our method isa combination of the graph neural networks and hi-erarchical reinforcement learning, and experimentsprove that they both work in the novel dialoguetask.

7 Conclusion

This paper proposes to detect identity fraud auto-matically via dialogue interactions. To achieve thisgoal, we present structured dialogue managementto explore anti-fraud dialogue strategies based ona KG with reinforcement learning and a heuristicuser simulator to evaluate our systems. Experi-ments have shown that end-to-end systems outper-form rule-based systems and the proposed dialoguemanagement can learn interpretable and flexibledialogue strategies to detect identity fraud moreefficiently. We believe that this work is a basic firststep in this promising research direction and willhelp promote many real-world applications.

8 Acknowledgments

The research work described in this paper hasbeen supported by the National Key Research andDevelopment Program of China under Grant No.2017YFB1002103. We would like to thank Shao-nan Wang, Yang Zhao, Haitao Lin, Cong Ma, LuXiang and Junnan Zhu for their suggestions on thispaper and Fenglv Lin for his help in the POI datasetconstruction.

References

Peter W Battaglia, Jessica B Hamrick, Victor Bapst,Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Ma-teusz Malinowski, Andrea Tacchetti, David Raposo,Adam Santoro, Ryan Faulkner, et al. 2018. Rela-tional inductive biases, deep learning, and graph net-works. arXiv preprint arXiv:1806.01261.

Nisha Bhaskaran, Ifeoma Nwogu, Mark G Frank, andVenu Govindaraju. 2011. Lie to me: Deceit detec-tion via online behavioral learning. In Face and Ges-ture 2011, pages 24–29. IEEE.

Lu Chen, Bowen Tan, Sishan Long, and Kai Yu. 2018.Structured dialogue policy with graph neural net-works. In Proceedings of the 27th InternationalConference on Computational Linguistics, pages1257–1268.

Carlos Florensa, Yan Duan, and Pieter Abbeel.2017. Stochastic neural networks for hierar-chical reinforcement learning. arXiv preprintarXiv:1704.03012.

Milica Gasic, Filip Jurcıcek, Blaise Thomson, Kai Yu,and Steve Young. 2011. On-line policy optimisationof spoken dialogue systems via live interaction withhuman subjects. In 2011 IEEE Workshop on Auto-matic Speech Recognition & Understanding, pages312–317. IEEE.

Kallirroi Georgila, James Henderson, and OliverLemon. 2006. User simulation for spoken dialoguesystems: Learning and evaluation. In Ninth Interna-tional Conference on Spoken Language Processing.

Martin Graciarena, Elizabeth Shriberg, Andreas Stol-cke, Frank Enos, Julia Hirschberg, and Sachin Ka-jarekar. 2006. Combining prosodic lexical and cep-stral systems for deceptive speech detection. In2006 IEEE International Conference on AcousticsSpeech and Signal Processing Proceedings, vol-ume 1, pages I–I. IEEE.

He He, Anusha Balakrishnan, Mihail Eric, and PercyLiang. 2017. Learning symmetric collaborative dia-logue agents with dynamic knowledge graph embed-dings. In Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics (Vol-ume 1: Long Papers), pages 1766–1776, Vancouver,Canada. Association for Computational Linguistics.

Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016.Knowledge graph completion with adaptive sparsetransfer matrix. In Thirtieth AAAI Conference onArtificial Intelligence.

Gangeshwar Krishnamurthy, Navonil Majumder, Sou-janya Poria, and Erik Cambria. 2018. A deep learn-ing approach for multimodal deception detection.arXiv preprint arXiv:1803.00344.

Tejas D Kulkarni, Karthik Narasimhan, ArdavanSaeedi, and Josh Tenenbaum. 2016. Hierarchicaldeep reinforcement learning: Integrating temporalabstraction and intrinsic motivation. In Advances inneural information processing systems, pages 3675–3683.

Sarah Ita Levitan, Angel Maredia, and Julia Hirschberg.2018. Acoustic-prosodic indicators of deceptionand trust in interview dialogues. Proc. Interspeech2018, pages 416–420.

Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao,and Asli Celikyilmaz. 2017. End-to-end task-completion neural dialogue systems. arXiv preprintarXiv:1703.01008.

1771

Xiujun Li, Zachary C Lipton, Bhuwan Dhingra, LihongLi, Jianfeng Gao, and Yun-Nung Chen. 2016. Auser simulator for task-completion dialogues. arXivpreprint arXiv:1612.05688.

Yujia Li, Daniel Tarlow, Marc Brockschmidt, andRichard Zemel. 2015. Gated graph sequence neuralnetworks. arXiv preprint arXiv:1511.05493.

Shan Lu, Gabriel Tsechpenakis, Dimitris N Metaxas,Matthew L Jensen, and John Kruse. 2005. Blobanalysis of the head and hands: A method for de-ception detection. In Proceedings of the 38th An-nual Hawaii International Conference on SystemSciences, pages 20c–20c. IEEE.

Thomas O Meservy, Matthew L Jensen, John Kruse,Judee K Burgoon, Jay F Nunamaker, Douglas PTwitchell, Gabriel Tsechpenakis, and Dimitris NMetaxas. 2005. Deception detection through auto-matic, unobtrusive analysis of nonverbal behavior.IEEE Intelligent Systems, 20(5):36–43.

Mathias Niepert, Mohamed Ahmed, and KonstantinKutzkov. 2016. Learning convolutional neural net-works for graphs. In International conference onmachine learning, pages 2014–2023.

Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao,Asli Celikyilmaz, Sungjin Lee, and Kam-Fai Wong.2017. Composite task-completion dialogue policylearning via hierarchical deep reinforcement learn-ing. In Proceedings of the 2017 Conference on Em-pirical Methods in Natural Language Processing,pages 2231–2240.

Veronica Perez-Rosas, Mohamed Abouelenien, RadaMihalcea, Yao Xiao, CJ Linton, and Mihai Burzo.2015. Verbal and nonverbal clues for real-life de-ception detection. In Proceedings of the 2015 Con-ference on Empirical Methods in Natural LanguageProcessing, pages 2336–2346.

Olivier Pietquin and Thierry Dutoit. 2006. A prob-abilistic framework for dialog simulation and opti-mal strategy learning. IEEE Transactions on Audio,Speech, and Language Processing, 14(2):589–599.

Franco Scarselli, Marco Gori, Ah Chung Tsoi, MarkusHagenbuchner, and Gabriele Monfardini. 2009. Thegraph neural network model. IEEE Transactions onNeural Networks, 20(1):61–80.

Theo Trouillon, Christopher R Dance, Eric Gaussier,Johannes Welbl, Sebastian Riedel, and GuillaumeBouchard. 2017. Knowledge graph completion viacomplex tensor factorization. The Journal of Ma-chine Learning Research, 18(1):4735–4772.

Alexander Sasha Vezhnevets, Simon Osindero, TomSchaul, Nicolas Heess, Max Jaderberg, David Silver,and Koray Kavukcuoglu. 2017. Feudal networks forhierarchical reinforcement learning. In Proceedingsof the 34th International Conference on MachineLearning-Volume 70, pages 3540–3549. JMLR. org.

Weikang Wang, Jiajun Zhang, Qian Li, Mei-YuhHwang, Chengqing Zong, and Zhifei Li. 2019. In-cremental learning from scratch for task-oriented di-alogue systems. In Proceedings of the 57th AnnualMeeting of the Association for Computational Lin-guistics, pages 3710–3720, Florence, Italy. Associa-tion for Computational Linguistics.

Weikang Wang, Jiajun Zhang, Han Zhang, Mei-YuhHwang, Chengqing Zong, and Zhifei Li. 2018. Ateacher-student framework for maintainable dialogmanager. In Proceedings of the 2018 Conference onEmpirical Methods in Natural Language Processing,pages 3803–3812, Brussels, Belgium. Associationfor Computational Linguistics.

Tsung-Hsien Wen, David Vandyke, Nikola Mrksic,Milica Gasic, Lina M Rojas Barahona, Pei-Hao Su,Stefan Ultes, and Steve Young. 2017. A network-based end-to-end trainable task-oriented dialoguesystem. In Proceedings of the 15th Conference ofthe European Chapter of the Association for Com-putational Linguistics: Volume 1, Long Papers, vol-ume 1, pages 438–449.

Jason D Williams, Kavosh Asadi, and Geoffrey Zweig.2017. Hybrid code networks: practical and efficientend-to-end dialog control with supervised and rein-forcement learning. In Proceedings of the 55th An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), volume 1,pages 665–677.

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforce-ment learning. Machine learning, 8(3-4):229–256.

Steve Young, Milica Gasic, Blaise Thomson, and Ja-son D Williams. 2013. Pomdp-based statistical spo-ken dialog systems: A review. Proceedings of theIEEE, 101(5):1160–1179.

are you for real? detecting identity fraud via dialogue

Documents