1 yuxiao dong *, jie tang $, tiancheng lou #, bin wu &, nitesh v. chawla * how long will she...
TRANSCRIPT
1
Yuxiao Dong*, Jie Tang$, Tiancheng Lou#, Bin Wu&, Nitesh V. Chawla*
How Long will She Call Me? Distribution, Social Theory and Duration
Prediction
*University of Notre Dame$Tsinghua University#Google Inc.&Beijing U. of Posts & Telecoms
Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.
2
Outline
Motivation
Dynamic Distribution on Duration
Social Theory on Duration
Duration Prediction
Conclusion
3
Motivation
Mobile calls between humans are ubiquitous at any time …
91% of American adults have a mobile phone in May 2013[1]. Mobile users can’t leave their phone alone for 6 minutes and
check it up to 150 times a day[2]. People make, receive or avoid 22 phone calls every day[2].
1. Pew Internet: Mobile Reports. June 6, 2013. http://pewinternet.org/Commentary/2012/February/Pew-Internet-Mobile.aspx 2. Tomi Ahonen. Communities Dominate Brands. http://communities-dominate.blogs.com/
4
Duration Macro-Distribution
1. M. Seshadri, A. Srid. J. Bolot. C. Faloutsos and J. Leskovec. Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. In KDD’08.2. P. Melo, L. Akoglu, C. Faloutsos and A. Loureiro. Surprising Patterns for the call duration distribution of mobile phone users. In PKDD’10
Double pareto lognormal distribution (DPLN) [1]. Truncated log-logistic distribution(TLAC)[2].
5
Mobile Data
Call Detailed Records (CDR): 3.9 million CDRs; 2 months (Dec. 2007 & Jan. 2008); Non-America.
Mobile Network: 272,345 users and 521,925 call edges.
Pareto Principle: 20% pairs of users produce 80% calls.
One-week data is available at http://arnetminer.org/mobile-duration
6
1. V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.
Existing Macro-Distribution. DPLN distribution TLAC distribution
Dynamic Dist. on Duration Temporal distribution. Demographics distribution.
Roadmap
[1]
7
1. V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.
Existing Macro-Distribution. DPLN distribution TLAC distribution
Dynamic Dist. on Duration Temporal distribution. Demographics distribution.
Social Theory on Duration Strong/weak tie Homophily Opinion leader Social balance
Roadmap
[1]
8
1. V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.
Existing Macro-Distribution. DPLN distribution TLAC distribution
Dynamic Dist. on Duration Temporal distribution. Demographics distribution.
Social Theory on Duration Strong/weak tie Homophily Opinion leader Social balance
Duration Prediction Dynamic factors Social factors
Roadmap
[1]
10
Periodicity
Periodic patterns for mobile call duration: Working time (8:00AM-7:00PM), 75 seconds in average; Evening (7:00PM-12:00AM), increasing to150 seconds on mid-night; Early Moring (12:00AM-8:00AM), decreasing to 50 seconds.
11
Demographics
Call Duration VS. Demographics: Longer calls by female than male; Longer calls between 2 females than 2 males; Longer calls from M to F than F call M; Longer calls if younger.
13
Social Theory Strong/weak tie:
How long do people with a strong or weak tie call? Link homophily:
Do similar users tend to call each other with long or short duration? Opinion leader:
How different are the calling behaviours between opinion leaders and ordinary users?
Social balance: How does the duration-based network satisfy social balance theory?
14
Strong/Weak Tie
Using the #calls to measure the tie strength between two users.
1. http://www.thomashutter.com/index.php/2012/01/facebook-die-rolle-von-social-networks-in-der-informationsverbreitung/2. Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08.
[1]
15
Strong/Weak Tie
Call Duration VS. Social Tie: The stronger tie, shorter calls. 80% probability that the call is < 60s if they
call each other for 1000 times two month. Different from online instant messaging
network[2].
Using the #calls to measure the tie strength between two users.
1. http://www.thomashutter.com/index.php/2012/01/facebook-die-rolle-von-social-networks-in-der-informationsverbreitung/2. Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08.
Probability that the call is < 60s.
[1]
16
Link Homophily
Using #common neighbours between two users to measure homophily.
1. Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug. 2013
[1]
17
Link Homophily
Call Duration VS. Link Homophily: More common neighbors, shorter calls. 80% probability that the call is < 60s, if they have >30
common neighbors. Call Duration VS. Social Tie + Link Homophily: More homophily and stronger ties, shorter calls.
Using #common neighbours between two users to measure homophily.
1. Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug. 2013
Probability that the call is < 60s.
[1]
18
Opinion Leader
Using PageRank to mine top 1% users as opinion leaders in mobile call network.
The other as ordinary users.
[1]
1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973
19
Opinion Leader
Call Duration VS. Opinion Leader: OL make shorter calls in general, the prob is
about 80% that OL’s calls are < 60s; Calls between 2 OLs are shorter.
Using PageRank to mine top 1% users as opinion leaders in mobile call network.
The other as ordinary users.
OL: opinion leaderOU: ordinary user Probability that the call is < 60s.
[1]
1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973
20
Social Balance
Structural balance: all three users are friends or only one pair of them are friends.
Assume two users are friends if they call each other at least once.
Relationship balance: the balance rate is the percentage of triangles with even number of negative ties.
Assume a tie is a negative one based on #calls or average duration between two nodes.
21
Social Balance
Call Duration VS. Social Balance: Unbalanced in structural balance Balanced in relationship balance
Structural balance: all three users are friends or only one pair of them are friends.
Assume two users are friends if they call each other at least once.
Relationship balance: the balance rate is the percentage of triangles with even number of negative ties.
Assume a tie is a negative one based on #calls or average duration between two nodes.
< 20%, not balanced
23
Prediction Scenario
v3
v4
v5
v2
v1
38s
62s
132s
95s
Time 1
47s
33s
v1: female, 29yv2: male, 31yv3: male, 60yv4: female, 63yv5: female, 27y
Attribute factors
24
Prediction Scenario
v3
v4
v5
v2
v1
47s
38s
62s
132s
95s
v3
v4
v5
v2
v1
19s
40s
441s
78s
63s
Time 1 Time 2
Opinion leader: v5
Strong tie: v4, v5
Weak tie: v1, v3
Homophily: v3, v5
Social balance: v3, v4, v5
33s
76s
16s
v1: female, 29yv2: male, 31yv3: male, 60yv4: female, 63yv5: female, 27y
Attribute factors Social factors
25
Prediction Scenario
v3
v4
v5
v2
v1
138s
54s
95s
49s
Time 3
Can we predict how long this call lasts for?
v3
v4
v5
v2
v1
47s
38s
62s
132s
95s
v3
v4
v5
v2
v1
19s
40s
441s
78s
63s
Time 1 Time 2
33s
76s
16s
v5 calls to v3 on Mon. 10:00PMOpinion leader: v5
Strong tie: v4, v5
Weak tie: v1, v3
Homophily: v3, v5
Social balance: v3, v4, v5
v1: female, 29yv2: male, 31yv3: male, 60yv4: female, 63yv5: female, 27y
Attribute factors Social factors Temporal factors
26
Social Time-dependent Factor Graph (STFG)
PFG: partially labeled factor graph[1]
TRFG: social triad based factor graph[2]
1. W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11.2. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.
27
Social Time-dependent Factor Graph (STFG)
PFG: partially labeled factor graph[1]
TRFG: social triad based factor graph[2]
STFG: partially labeled + social triad + time dependent
1. W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11.2. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.
30
Social Time-dependent FG
Joint distribution:
Attributes Social
Attribute factor:
Social factor:
Exponential-linear functions to initialize factors
Temporal
Temporal factor:
32
Learning Algorithm
1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.
Gradient decent method.
33
Learning Algorithm
1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.
Gradient decent method.
Using Loopy Belief Propagation to compute expectation.
34
Experimental Setup Prediction
Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period
35
Experimental Setup Prediction
Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period
Data First 7-week CDR data as historic data Case 1: 1st call duration in 8th week as next call prediction Case 2: average duration in 8th week as next average prediction
36
Experimental Setup Prediction
Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period
Data First 7-week CDR data as historic data Case 1: 1st call duration in 8th week as next call prediction Case 2: average duration in 8th week as next average prediction
Binary Prediction 60% calls are less than 60 seconds and remaining 40% are > 60s; There is a jump on telephone bill when it reaches 1 minute; Setting threshold = 60 seconds to classify calls as long or short calls in
this work.
37
Experimental Setup (Cont.)
Baseline Predictors SVM: support vector machine by SVM-light. LRC: logistic regression in Weka. Bnet: Bayes Network CRF: conditional random field
Evaluation Precision / Recall / F1-Measure
39
Factor Contribution
G: genderA: age
B: social balanceT: social tie
H: homophilyO: opinion leader
W: weekD: day
41
Conclusion & Future Work
Conclusions:Social theory and dynamic distribution have obvious existence in
duration network;Our proposed model can significantly improve the prediction accuracy.
Interesting observations:Young females tend to make long calls, in particular in the evening;Familiar people (more calls and more common neighbors) make
shorter calls.
Future work:Inferring call duration by regression model.Modeling duration prediction into a mobile application.