jaideep
TRANSCRIPT
![Page 1: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/1.jpg)
4/9/2013 University of Minnesota 1
Social AnalyticsMining Behaviors of a Connected
World
PAKDD SchoolApril 11, 2013
Sydney, Ausytralia
Jaideep Srivastava
University of Minnesota
![Page 2: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/2.jpg)
4/9/2013 University of Minnesota 2
Course Outline
• Module 1• Introduction to Social Analytics – applying data mining to social computing
systems; examples of a number of social computing systems, e.g.
FaceBook, MMO games, etc.
• Module 2 • Computational online trust
• Identifying key influencers
• Information flow in networks
• Module 3• Analysis of clandestine networks
• Katana - game analytics engine
![Page 3: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/3.jpg)
Part I: Background
![Page 4: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/4.jpg)
Social Network Analysis
Social science networks have widespread application in various fields
Most of the analyses techniques have come from Sociology, Statistics and Mathematics
See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis
12/02/06 IEEE ICDM 2006 4
Epidemiology
Sociology
Organizational
Theory
Social
PsychologyAnthropology
Socio-Cognitive
Networks
Cognitive
Knowledge
Networks
Social
Networks
Knowledge
Networks
Perception
Reality
Acquaintance
(links)
Knowledge
(content)Epidemiology
Sociology
Organizational
Theory
Social
PsychologyAnthropology
Socio-Cognitive
Networks
Cognitive
Knowledge
Networks
Social
Networks
Knowledge
Networks
Perception
Reality
Acquaintance
(links)
Knowledge
(content)
Socio-Cognitive
Networks
Cognitive
Knowledge
Networks
Social
Networks
Knowledge
Networks
Socio-Cognitive
Networks
Cognitive
Knowledge
Networks
Social
Networks
Knowledge
Networks
Perception
Reality
Acquaintance
(links)
Knowledge
(content)
![Page 5: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/5.jpg)
What have been it‟s key scientific successes?
In classical social sciences numerous results
„Six degree of separation‟ [Milgram]
Popularized by the „Kevin Bacon game‟
„The strength of weak ties‟ [Granovetter]
„Online networks as social networks‟ [Wellman, Krackhardt]
„Dunbar Number‟
Various types of centrality measures
Etc.
In the Web era
„The Bow-Tie model of the Web‟ [Raghavan]
„Preferential attachment model‟ [Barabasi] (Yes and No)
„Powerlaw of degree distribution‟ [Lots of people] (NO!)
Etc.
![Page 6: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/6.jpg)
Application successes
Numerous in social sciences
Google – PageRank
LinkedIn – expanding your Cognitive Social Network
making you aware that „you‟re more connected and closer than you
think you are‟
Expertise discovery in organizations
Knowledge experts, „authorities‟
Well-connected individuals, „hubs‟
Rapid-response teams in emergency management
Information flow in organizations
Twitter – real time information dissemination
Etc.
![Page 7: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/7.jpg)
Online (Multiplayer) Games
How enagaging
How
social
Low
Low
High
High
![Page 8: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/8.jpg)
Player Behavior & Revenue Model
Blizzard (subscription)
World of Warcraft
12 million subscribers
Revenue model
$15/month
Approx $3billion annual
revenue
4 hours a day, 7 days a
week!
Zynga (free2play)
Farmville, Fishville, Mafia
Wars, etc.
180 million players
Revenue model
Virtual goods
$700 million in 2010
0.5 hrs a day, 7 days a week
Hard core gamers Everyone
Less socially acceptable More socially acceptable
Like Cocaine Like Caffeine
![Page 9: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/9.jpg)
Implications of this „addiction‟
3 billion hours a week are being spent playing online
games
Jane McGonigal in “Reality is Broken”
Labor economics
What is the impact of so much labor being removed from the pool
[Castranova]
Entertainment economics
If MMO players can get 100 hrs/month of entertainment by spending
$25 or so, what will happen to other entertainment industries?
Psychological/Sociological
Is it an addiction – the prevailing view (Chinese government‟s „detox
centers‟ for kids)
Are they fulfilling a deeper need that real world is not (McGonigal)
Societal
A trend far too important to not be taken seriously!
![Page 10: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/10.jpg)
Business Example
![Page 11: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/11.jpg)
Levis‟ – Example of Social Retail
Levis‟ leverages its brand to ensure customers provide their social
network
Levis‟ can leverage predictive social analytics technology to understand
the value of the customer‟s social network
11
![Page 12: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/12.jpg)
Ninja Metrics confidential information. Copyright 2012
Opportunity, Innovation, Impact
Companies do not understand the social graph of their customers
It‟s not just about how they relate to their customers, but also about
how customers relate to each other
Understanding these relationships unlocks immense value
Innovation: Understanding the social network of customers
Key influencers, relationship strength, …
Impact: Deriving actionable insights from this understanding
Customer acquisition, retention, customer care, …
Social recommendation, influence-based marketing, identifying trend-setters, …
12
vs.
![Page 13: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/13.jpg)
13
Unlocking true value by product, category, or store
-$79.63
-$128.61
-$293.79
0021
![Page 14: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/14.jpg)
True Value of each customer
True value = individual value + social value
Who really matters, and to what degree
Some empirical facts
31% activity due to socialization
23% more individual + 8% more social activity
14
The individual‟s
lifetime value
their social
influence
and their true
total
![Page 15: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/15.jpg)
Impact of New Instrumentation on Science
1950s
Invention of the electron microscope fundamentally changed
chemistry from „playing with colored liquids in a lab‟ to „truly
understanding what‟s going on‟
1970s
Invention of gene sequencing fundamentally changed biology from
a qualitative field to a quantitative field
1980s
Deployment of the Hubble (and other) Space telescopes has had
fundamental impact on astronomy and astrophysics
2000s
Massive adoption is fundamentally changing social science
research
Massively Multiplayer Online Games (MMOGs) and Virtual Worlds
(VWs) are acting as „macroscopes of human behavior‟
![Page 16: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/16.jpg)
The Virtual World
Observatory (VWO) Project
• Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers
• Noshir Contractor, Northwestern: Networks
• M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups
• Jaideep Srivastava, Minnesota: Computer Science
• Dmitri Williams, USC: Social Psychology
• Collaborators
• Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan
(Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci,
Michigan), …
• Data and technology partners
• Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft
(Chevalier‟s Romance), others …
• Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …
![Page 17: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/17.jpg)
Overall Goals of the VWO Project
Sticky Social
Media
• Games
• Virtual Worlds
• Other social
apps
Analytics
• Social science
models
• New algorithms
• Ultra-scalability
tons of
data
Basic Science
• behavior, socialization, …
• novel, scalable algorithms
• NSF
Government applications
• team dynamics (Army)
• skills acq, leadership (Army)
• social influence & adolescent
health (CDC)
• etc.
Business applications
• customer churn
• game design
• anti-social behavior
• etc.
![Page 18: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/18.jpg)
Collaborators, Sponsors, Partners
Team
Financial Sponsors
Data Partners
Technology Partners
![Page 19: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/19.jpg)
Part II – Impact on Science
![Page 20: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/20.jpg)
Findings from a Player Survey
![Page 21: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/21.jpg)
Who is playing?
It is not just a bunch of kids
Average age is 31.16 (US population median is 35)
More players in their 30s than in their 20s.
![Page 22: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/22.jpg)
How much do they play?
Mean is 25.86
hours/week
Compares to
US mean of
31.5 for TV
(Hu et al,
2001)
• From prior experimental work, MMO play eats into entertainment TV and going out, not news
• So much for kids being the ones with the free time.
![Page 23: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/23.jpg)
Gender Differences More men players (78/22%)
Men played to compete , and women played to socialize
Men play more other games, but it was the women who were more
satisfied EQ2 players
Women: 29.32 hours/week
Men: 25.03 hours/week
Likelihood of quitting: “no plans to quit”:
women 48.66%, men 35.08%
Self reported play times
Women: 26.03 (3 hours less than actual)
Men: 24.10 (1 hour less than actual)
Boys and girls are socialized early on, and thus have clear role expectations for their behaviors and identities (Gender Role Theory in action!!)
![Page 24: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/24.jpg)
Playing with a partner
![Page 25: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/25.jpg)
Inferring RW gender from VW data
Goal
What virtual world behaviors and characteristics predict real world gender?
Data:
Survey Data n=7119
Survey Character Store
EQ2 Character Store
Variables
Avatar Characteristics:
Gender, Race, Class, Experience, Guild Rank, Alignment, Archetypes
Game play Behaviors:
Total Deaths & Quests, PvP Kills & Deaths, Achievement Points, Number of Characters, Time played, Communication patterns
25
![Page 26: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/26.jpg)
Gender prediction results
Close to 95% prediction accuracy Decision trees work rather well
Character Gender, Race and Class are significant predictors to real life gender.
Gender swapping is rarer, but systematically different by real gender
Players tend to choose:
character gender based on their real life gender
character races that are gendered: women play elves/men play barbarians
classes that are gendered: women play priests/men play fighters
26
![Page 27: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/27.jpg)
Gender swapping behavior
27
Game Character
Gender
Real Gender
Male Female Total
Male 4065 82.6% 98 8.2% 4163 68.0%
Female 855 17.4% 1104 91.8% 1959 32.0%
Total 4920 100.0% 1202 100.0% 6122 100.0%
Observation• Far more males gender swap than females
• Why?• Men are more creative?
• Women have less identity confusion?
• Women get their „fill of gender swapping in real life‟
![Page 28: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/28.jpg)
Economics: A test of RW VW
mapping
Do players behave in virtual worlds as we expect
them to in the actual world?
Economics is an obvious dimension to test
In the real world, perfect aggregate data are hard to
get
![Page 29: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/29.jpg)
GDP and Price Level
GDP and price levels are robust but comparatively unstable
GDP and Prices on Antonia Bayle
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
January February March April May
GD
P (
Go
ld)
0
20
40
60
80
100
120
140
160
Pri
ces (
Jan
uary
= 1
00)
Nominal GDP Price Level (January = 100)
![Page 30: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/30.jpg)
Money Supply and Price
The instability is explicable through the Quantity Theory of Money
a rapid influx of money . . .
. . . dramatically boosted prices
More evidence that this behaves like a real economy
Change in Money Supply and Population on Antonia Bayle
0
500
1000
1500
2000
2500
February March April May
Cha
nge
in M
oney
(000
Gol
d)
-5000
-4000
-3000
-2000
-1000
0
1000
2000
3000
4000
Cha
nge
in A
ccou
nts
Change in Money Supply (000 Gold) Change in Active Accounts
-20
-10
0
10
20
30
40
50
February March April May June
Perc
en
t C
han
ge i
n P
rice L
ev
el
Price Level on Antonia Bayle
Price Level
![Page 31: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/31.jpg)
Networks in Virtual Worlds
SONIC
Advancing the Science of Networks in Communities
![Page 32: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/32.jpg)
Why do we create and sustain
networks?
Theories of self-interest
Theories of social and resource exchange
Theories of mutual interest and collective action
Theories of contagion
Theories of balance
Theories of homophily
Theories of proximity
Theories of co-evolution
Sources: Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses
about organizational networks: An analytic framework and empirical example. Academy of Management Review.
Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford University Press.
SONIC
Advancing the Science of Networks in Communities
![Page 33: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/33.jpg)
B F
C
A
E
D
-
+
B F
C
A
E
D
- +
B F
C
A
E
D
+
F
E
D
B
C
A
- +
Novice Expert
“Structural signatures” of Social Theories
Self interest Exchange
Collective Action
Balance
Homophily Contagion
SONIC
Advancing the Science of Networks in Communities
![Page 34: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/34.jpg)
Partnership
Trade Mail
Instant messaging
Black: maleRed: female
SONIC
Advancing the Science of Networks in Communities
![Page 35: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/35.jpg)
Social Networks
as network structure frequency
vectors in a bag-of-words model
0
0
1
1
0
0
0
4
0
1
2
3
1
0
0
5
(1)
(2)
(n)
(1) (2) (n)
0
0
0
0
0
0
0
2
Cluster Structure
Vectors using
Text clustering
methods
Cluster means provide
modes of network
structure configurations
Making up all the
social networks
Attribute values for
each cluster can be
used to discover
trends between network
structures and attributes
Social theory + IR
based network
analysis
![Page 36: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/36.jpg)
Results – normalized network structure
vector means for all clusters
![Page 37: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/37.jpg)
Results Clusters 1 and 4 are similar
Groups kill fewer monsters
Group members in cluster 4 do not communicate much
Group members in cluster 1 generally limit their communication to just one other person in the group
Most people belong to these two clusters
Consistent with previous research - users in virtual environments are
less likely to interact with strangers
[N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring
the social dynamics of massively multiplayer online games, Proceedings
CHI06, ACM Press, New York, 407-416.]
Cluster 5 groups have many 1-edge and 2-out stars Most of the communication is one way possibly indicating presence of
central people
Maximum number of monsters killed out of all clusters
Performance of the groups is very good
Minimal communication
It is possible that cluster 5 consists of groups more focused on playing and performing well in the game and less on socializing
![Page 38: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/38.jpg)
Some Open Questions
How similar/dissimilar are online social networks from real-
world social networks?
Is online socialization
Only a sustainability activity of real-world networks?
Causing new social networks to be formed?
Is fundamentally a different type of networking activity?
A fundamental tenet of socialization has been
“geography/proximity drives socialization”
How is this being impacted by online socialization?
![Page 39: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/39.jpg)
TeamSkill: Modeling Team
Chemistry in Online Multi-
Player Games
![Page 40: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/40.jpg)
Description and motivation
The goal of this work is to improve skill assessment approaches used in multiplayer games, especially team-based games
Xbox Live, PSN, Steam, and Battle.net have between 149-167 million total users, collectively (and that number is growing) Many of the most popular games are team-based: Halo, Call of Duty,
TF2, CS, etc
Why? Better skill assessment = fairer games = less player attrition
More accurate rankings of players/teams
Most previous work has focused on individuals, not teams
It‟s a hard problem Online setting: Updates to a player‟s skill distribution must be done
after each game is played
Applicability: Any generalized assessment technique cannot include game-specific data
Our datasets are from the games of professional Halo 3 players
40
![Page 41: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/41.jpg)
What is Halo?
Halo is one of the most well-known “first-person shooter” video game series in the world
Online/LAN-based multiplayer is its most significant component
650,000 to 850,000 unique users per day at its peak
41
![Page 42: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/42.jpg)
Our work focuses on professional Halo 3 players
Online scrimmage data, as well as complete tournament data, is readily available from bungie.net and mlgpro.com
Players are highly-skilled individually – allows us to better focus on group-level performance characteristics
Players change teams regularly, helps isolate impact of particular players on overall team performance
Unexplored boundary case for skill assessment
What is Major League Gaming (MLG)?
• A professional league for competitive gaming, including Halo 3 from ‟08-‟10
• Best players in the world compete in this league
• Last tournament watched by over 1,000,000 people online
• Our datasets are comprised of games played by these players
Major League Gaming
42
![Page 43: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/43.jpg)
An old problem (with different applications in different contexts)
Paired comparison estimation Foundational work by Thurstone (1927) and Bradley-Terry (1952)
Elo (1959), popularized in chess ranking (FIDE, USCF)
Glicko (1993) – player-level ratings volatility incorporated (σ2), addition of rating periods
TrueSkill (2006) - factor-graph based approach used in Microsoft‟s Xbox Live gaming service Used to match players/teams up with each other online
Our work focuses on Elo, Glicko, and TrueSkill
Skill assessment
43
![Page 44: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/44.jpg)
Elo (1959)
Arpad Elo, Dept of Physics, Marquette University
Was a master chess player in USCF
Proposed the Elo Rating System
Replaced earlier systems of competitive rewards (i.e.,
tournaments) in USCF/FIDE
Simple to implement: assumes player skill is normally-
distributed with constant variance β2
Still widely-used today
44
![Page 45: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/45.jpg)
Glicko (1993)
Mark E. Glickman, Department of Health Policy and Management , Boston University
Proposed the Glicko Rating System Addresses rating reliability issue through the use of
rating periods and player-specific variances
Elo is a special case of Glicko
Iterative approach for approximating the marginal posterior distribution of a player‟s skill conditional on other players‟ priors More computationally tractable for large datasets
45
![Page 46: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/46.jpg)
TrueSkill (2006)
Ralf Herbrich and ThoreGraepel at Microsoft Research, Cambridge, UK
Used for automated ranking/matchmaking on Xbox Live
Uses factor graphs to model multi-player, multi-team environments
Converges quickly (~5 games for a player)
Large-scale deployment 30 million Xbox Live members
150+ games use TrueSkill for ranking & matchmaking
46
![Page 47: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/47.jpg)
Issues with current approaches
Basic idea
Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)
Sum across all team members
Not intuitive: Team chemistry is a well-known concept in team-based competition [Martens 1987, Yukelson 1997] and it is not captured in any of these models
Can think of it as the overall dynamics of a team resulting from leadership, confidence, relationships, and mutual trust
Independence assumption not realistic in teams, especially at high levels of play
1234
1 2 3 4
+
47
![Page 48: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/48.jpg)
More information is available, however…
Observation: We have more information than just the history of individual players – we also know the histories of groups of players
Player 1‟s history ∩ player 2‟s history history of {1, 2}
Existing approaches only make use of top row
Idea: Estimate the skills of subgroups of players on a team, combine in some way, and use to produce better estimate of a team‟s skill
1234
1 2 3 4
123 124 134 234
14 2312 13 24 34
48
![Page 49: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/49.jpg)
Reframing the assessment
problem
Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)
Player-level skill representation generalized subgroup skill representation
For each game and for each k <= size of the team (K), treat each group as you would an individual player and update skill accordingly
Hashing of skill variable matrix according to unordered subgroup membership
Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting
Each rating says something about the skill of that particular subgroup
…but how do we aggregate these ratings to estimate the skill of a team?
k=4
k=3
k=2
k=1
1234
1 2 3 4
123 124 134 234
14 2312 13 24 34
49
![Page 50: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/50.jpg)
TeamSkill
Four different aggregation approaches:
TeamSkill-K
TeamSkill-AllK
TeamSkill-AllK-EV
TeamSkill-AllK-LS
50
![Page 51: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/51.jpg)
Aggregation issues
Real world case: assume that players can leave/join the 4-player team and look at the timeline
The question at each point in time, t: how best to combine the available group ratings to produce a team rating?
The problem: the feature space is expanding and contracting over time.
t = 1 t = 100 t = 200time:
After 1 game New player – 5 - who‟s
never played with 1, 2, or 3New player – 6 – who‟s
played with 3 or 5 (not both)
black = history available red = no history available
51
![Page 52: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/52.jpg)
Data set overview
Collected over the course of 2009
7,590 games (2,076 from tournaments and 5,514 from Xbox Live scrimmages)
448 players on 140 different professional and semi-professional teams
Games took place during January 2008 through January 2010
Websites pertaining to this data http://stats.halofit.org - Player/team
statistics
http://halofit.org – Datasets and related information
52
“Friend or foe” social network for all tournaments in 2008 and
2009
![Page 53: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/53.jpg)
Evaluation The TeamSkill approaches were evaluated by predicting the outcomes
of games occurring prior to 10 MLG tournaments and comparing their
accuracy to unaltered versions (k = 1) of their base learner rating
systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible
choices of k for teams of 4, 1 ≤ k ≤ 4, were used)
For each tournament, we evaluated each rating approach using:
3 types of training data sets - games consisting only of previous tournament data,
games from online scrimmages only, and games of both types.
3 periods of game history - all data except for the data between the test tournament
and the one preceding it (“long”), all data between the test tournament and the one
preceding it (“recent”), and all data before the test tournament (“complete”).
We will only show “complete” as results for “long” and “recent” mirrored those for “complete”
2 types of games - the full dataset and those games considered “close” (i.e., prior
probability of one team winning close to 50%).
53
![Page 54: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/54.jpg)
Results – all games
Prediction accuracy for both tournament and scrimmage/custom games using complete history
Prediction accuracy for tournament games using complete history
Prediction accuracy for scrimmage/custom games using complete history54
![Page 55: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/55.jpg)
Results – “close” games
Prediction accuracy for both tournament and scrimmage/custom games using complete history
Prediction accuracy for tournament games using complete history
Prediction accuracy for scrimmage/custom games using complete history55
![Page 56: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/56.jpg)
Conclusions
Modified several existing assessment approaches to
operate on generalized player subgroup entities (instead of
just individual players)
Introduced four aggregation methods for combining player
subgroup information to produce final forecast
Shown evidence that close games are decided on the
basis of team chemistry, consistent with sports psychology
research
56
![Page 57: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/57.jpg)
Part III: Impact on Business
![Page 58: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/58.jpg)
Player Churn Prediction
![Page 59: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/59.jpg)
Time Equals Stickiness
There are less quitters as the levels go up, and focus
should be on the first 20 levels.
0.00
1.00
2.00
3.00
4.00
5.00
6.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
Rat
io o
f Q
uitt
ers
to S
taye
rs
Character Level
6
![Page 60: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/60.jpg)
Solo vs. Social Players
Isolated players are 3.5x more likely to quit (B = 1.26, p<.001).
Focus design on facilitating social interaction.
0.00
1.00
2.00
3.00
4.00
5.00
6.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69
Solo
Social
Rat
io o
f Q
uitt
ers
to S
taye
rs
Character Level
![Page 61: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/61.jpg)
Problem Statement & Approach
Objective: At time t, for player p, given a window w, compute
the probability, P(p,t,w), of p churning within the interval
(t,t+w), and the confidence in this probability.
Approach: Estimate P(p,t,w) and its confidence using
Data about p‟s activity and socialization behavior
Statistics and machine learning (linear, non-linear, ensemble, etc.)
Use of socio-psychological theories of player motivation
Novel synthesis of data driven (DD) and theory driven (TD) approaches
61
![Page 62: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/62.jpg)
Player Motivation Theories
62
Nick Yee, 2005
Richard Bartle, 1996
![Page 63: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/63.jpg)
Mapping MMO Behaviors to Theory
Example MMORPG
Persistent virtual world
Players have avatars
Quest-driven
Participation in groups,
guilds
Dataset (game logs)
Time period: FEB-JUN, 2006
No. of accounts: ~16000
Churner definition: 2 months of inactivity
63
![Page 64: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/64.jpg)
Synthesis of TD and DD approach
64
![Page 65: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/65.jpg)
Model Evaluation (Confusion Matrix)
Model performance for different features (10 fold cross-validation)
Observations Decision tree learning works quite well
DD model very accurate (76%); difficult to interpret (485 nodes)
TD model interpretable (7 – 59 nodes), but less accurate
Achievement-orientation more important than socialization
65
Features Tree size (nodes)
Precision (%) Recall (%) F-measure (%)
Pure DD 485 69.3 84 76
TD:Achievement+Social
59 67.1 76.2 71.3
TD: Achievement 37 67.2 73.2 70.1
TD: Socialization 7 64.3 55.4 59.5
![Page 66: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/66.jpg)
Model Evaluation (Lift Chart)
Observations
TD model does better in predicting top quintile of churners
DD model performs better in the 40%-70% range
66
![Page 67: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/67.jpg)
Ensemble Approach – Model
Evaluation
Ensemble approach
A „committee of classifiers‟ votes on the result
Heterogeneity helps
Improvements over single model Best recall value shows 7.94% improvement , i.e. model
can identify larger proportion of potential churners)
Best F-measure shows 1.25% improvement
67
No. of clusters
Precision Recall F-measure
5 66.23 91.94 76.99
10 66.49 91.08 76.87
15 67.58 89.80 77.12
20 67.6 90.13 77.25
![Page 68: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/68.jpg)
Conclusions
Comparison of theory-driven and data-driven model in
terms of prediction accuracy and model interpretability
Achievement-orientation is more important than
socialization-orientation in identifying potential churners
Ensemble model can identify larger proportion of
potential churners as compared to single global model
68
![Page 69: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/69.jpg)
4/9/2013 University of Minnesota 69
Course Outline
• Module 1• Introduction to Social Analytics – applying data mining to social computing
systems; examples of a number of social computing systems, e.g.
FaceBook, MMO games, etc.
• Module 2 • Computational online trust
• Identifying key influencers
• Module 3• Analysis of clandestine networks
• Katana - game analytics engine
![Page 70: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/70.jpg)
Computational Trust in
Multiplayer Online
Games
![Page 71: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/71.jpg)
“If you want to go fast walk alone, if you
want to go far then walk with a group.”- Proverb from Ghana
![Page 72: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/72.jpg)
Virtual Worlds & Massive Online
Games
Massively Multiplayer Online Role
Playing Games
(MMORPGs/MMOs) Simulated Environments like
SecondLife
Millions of people can interact with one another is shared virtual environment
People can engage in a large number of activities with one another and with the environment
Many of the observed behaviors have offline analogs
7272
![Page 73: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/73.jpg)
Big Picture Questions
How is trust expressed differently in different social
contexts?
Cooperative (PvE), Adversarial (PvP), …
How is trust expressed in different types of social
networks?
Housing, Mentoring, Trade, Group, …
What are the characteristics of trust and related networks
in MMOs?
Similarities and differences with social networks in other domains
e.g., citation networks, co-authorship networks
What role can features derived from the trust network play
in prediction tasks e.g., link prediction (formation,
breakage, change), trust propensity, success prediction
73
![Page 74: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/74.jpg)
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust as a multi-modal multi-level
network phenomenonNetwork formulations of Traditional
Trust related concepts
Incorporating social science theories
in trust related prediction task
Generative Network Models for Trust based
social interactions
Trust as a Multi-Level Network Phenomenon
![Page 75: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/75.jpg)
Computational Trust in Multiplayer Online Games
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Different Social environments result in
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
No Honor Amongst Thieves
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Most types of homophily do not
carry over to the MMO domain
Trust as a multi-modal multi-level
network phenomenon
Social Characteristics of Trust
Network formulations of Traditional
Trust related concepts
Incorporating social science theories
in trust related prediction task
Generative Network Models for Trust based
social interactions
Trust as a Multi-Level Network Phenomenon
![Page 76: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/76.jpg)
Computational Trust in Multiplayer Online Games
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Item Recommendation Success Prediction (Social Capital)
Effect of network structure on trustEffects of social environments on trust
Different Social environments result in
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
No Honor Amongst Thieves
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Most types of homophily do not
carry over to the MMO domain
Trust as a multi-modal multi-level
network phenomenon
Social Characteristics of Trust
Network formulations of Traditional
Trust related concepts
Incorporating social science theories
in trust related prediction task
Generative Network Models for Trust based
social interactions
Trust and Prediction
Trust as a Multi-Level Network Phenomenon
![Page 77: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/77.jpg)
Computational Trust in Multiplayer Online Games
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across Social networks. Predict trust propensity.
Item Recommendation Success Prediction (Social Capital)
Effect of network structure on trustEffects of social environments on trust
Different Social environments result in
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
No Honor Amongst Thieves
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Most types of homophily do not
carry over to the MMO domain
Trust as a multi-modal multi-level
network phenomenon
Social Characteristics of Trust
Network formulations of Traditional
Trust related concepts
Incorporating social science theories
in trust related prediction task
Generative Network Models for Trust based
social interactions
Trust and Prediction
Trust as a Multi-Level Network Phenomenon
![Page 78: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/78.jpg)
Trust and Homophily
Most types of homophily do not
carry over to the MMO domain
Computational Social Science
Semantics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Structure
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Applications
Characteristics of Trust Networks
Gold Farmer Detection
No Honor Amongst Thieves
Detection of Clandestine Actors
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
![Page 79: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/79.jpg)
Trust and Homophily
Most types of homophily do not
carry over to the MMO domain
Computational Social Science
Semantics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Structure
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Applications
Characteristics of Trust Networks
Gold Farmer Detection
No Honor Amongst Thieves
Detection of Clandestine Actors
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
![Page 80: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/80.jpg)
Housing-Trust in EQ2
• Access permissions to in-game house
as trust relationships
• None: Cannot enter house.
• Visitor: Can enter the house and can
interact with objects in the house.
• Friend: Visitor + move items
• Trustee: Friend + remove items
• Houses can contain also items which
allow sales to other characters without
exchanging on the market
80
![Page 81: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/81.jpg)
Homophily and Trust
Homophily: Birds of a feather flock together
There is no one form of homophily and homophily in
general is described in multiple ways: Status vs. Value
Homophily
Each of these homophiles are in turn defined in multiple
ways themselves
Previous literature instantiates homophily in MMOs in
terms of player characteristics and behavior in the game
81
![Page 82: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/82.jpg)
Network Models, Homophily and
Trust Networks in MMOs
RQ1: Does homophily in MMOs operate in ways similar to
homophily in the offline world?
RQ2: How do we map characteristics that define
homophily in the offline world to online settings?
82
![Page 83: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/83.jpg)
Mapping Homophily in MMOs
In general, studies of homophily in MMOs assume only one type of homophily and generalize based on that type
Even in the offline world homophily is of different types
Hence the necessity of Mapping Homophily which we address here
Mapping and Proteus Effect83
![Page 84: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/84.jpg)
Trust and Homophily in MMOs
Homophily Type Hypothesis Observation
H
1
Gender Homophily Players trust other players who
are of the same gender
?
H
2
Age Homophily Players trust other players who
are of the same age cohorts
?
H
3
Class Homophily Players trust other players who
are of the same class
?
H
4
Race Homophily Players trust other players who
are of the same race
?
H
5
Guild Homophily Players trust other players who
belong to the same guild
?
H
6
Level Homophily Players trust other players who
are at a similar level
?
H
7
Challenge
Homophily
Players trust other players who
like similar types of challenges
?
84
![Page 85: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/85.jpg)
Key Observations
H1: Players trust other players who are of the same gender?
H2: Players trust other players who are of similar age?
85
H3: Players trust other players who are of the same class?
In general players trust other players who are of
the same gender
The stronger the type of trust the lesser is the age
difference between the people specifying trust
Class does not seem to effect the choice of trusting
others
![Page 86: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/86.jpg)
Key Observations
H4: Players trust other players who are of the same race?
H5: Players trust other players who are of the same guild?
86
H6: Players trust other players more who are level at a similar rate?
Race does not seem to effect the choice of trusting
others
In general, the stronger the type of trust, the greater
is the percentage of the people who trust people in
their own guilds
Leveling at the same rate does not seem to greatly
effect trust amongst players
![Page 87: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/87.jpg)
Key Observations
H7: Players trust other players who are of the same level?
• Level difference seems to have some effect on trust
• For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level
players are more likely to trust players who are at a higher level
87
Summary:
• Homophily is observed for a subset of types in MMOs as compared to what it is
observed for in the offline world
• The types of homophily which are not observed in MMOs are the ones which are
greatly effected by game mechanics
![Page 88: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/88.jpg)
Trust and Homophily
Most types of homophily do not
carry over to the MMO domain
Computational Social Science
Semantics
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Structure
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Applications
Characteristics of Trust Networks
Gold Farmer Detection
No Honor Amongst Thieves
Detection of Clandestine Actors
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
![Page 89: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/89.jpg)
Social Networks: General
Observations There is an extensive literature on characteristics of social
networks (Leskovec PAKDD 2005, Leskovec ICDM 2005,
McGlohon ICDM 2008, McGlohon KDD 2008)
The network exhibits monotonically
shrinking diameter over time (Leskovec PAKDD 2005,
Leskovec ICDM 2005, McGlohon ICDM 2008)
At a certain point in time called the Gelling Point many
smaller connected connect together and become part of
the largest connected component (Leskovec PAKDD 2005,
Leskovec ICDM 2005, McGlohon ICDM 2008)
The largest connected component (LCC) comprises of
the majority of the nodes in the network (>= 80%)
(McGlohon ICDM 2008, McGlohon KDD 2008)
89
![Page 90: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/90.jpg)
Social Networks: General
Observations The size of the second and the third largest connected
components remain constant (more or less) even though
the identity of these components change over time
(Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
ICDM 2008, McGlohon KDD 2008)
Network isolates are few in number (<5%)
(Leskovec PAKDD 2005, Leskovec ICDM 2005)
The number of connected components decreases over
time
(Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon
ICDM 2008)
Relatively fast growth of LCC close to the gelling point
(Leskovec PAKDD 2005, Leskovec ICDM 2005)
90
![Page 91: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/91.jpg)
Trust Networks in MMOs
Data from 4 servers is available. Results from one server
(Player vs. Environment, „guk‟) are shown
The network consists of 15,237 nodes, 30,686 edges
and 1,476 connected components
Dataset spans from January 2006 to August 2006
Average node degree of 4.03. The size of the three
largest connected components are as follows: 9039, 51
and 49. The largest connected component accounts for
59% of all the nodes in the network
The Trust Network on „guk‟ on August 31, 2006
91
![Page 92: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/92.jpg)
Key Observations
Observation 1: Preferential Attachment: The rich get richer but not too richExplanation: Social bandwidth is limited, Dunbar Number
Observation 2: The growth of the LCC is retarded after the gelling pointExplanation: The trust network has a relatively low growth rate as compared to the other networks
Observation 3: Non-monotonic change in the diameter of the largest connected componentExplanation: Players have different levels of activity at various points in time and can also “drop out” of the network if they churn from the game
92
![Page 93: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/93.jpg)
Key Observations
Observation 4: A large number of isolate components are
observed (> 1000)
Explanation: People join in groups and spend all the time
playing with one another instead of interacting with people
from the outside
Observation 5: The number of isolate components
increases monotonically over time
Explanation: (Same as observation 4)
Observation 6: Nodes in the non-LCC constitute a
significant portion of the network (41%, 8 months after
gelling point)
Explanation: (Same as observation 4)
93
![Page 94: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/94.jpg)
Generative Models of Trust Networks
Time bound Preferential Attachment: The rich get rich
but not so much after a certain point in time. Edge
formation is bound by time
Presence of Auxiliary Components: Isolate components
are added to the network at an almost constant rate over
time
94
![Page 95: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/95.jpg)
Generative Models of Trust Networks
(ii)
Non-Monotonic Decrease in the Diameter:
Nodes become inert after a certain point in time. Sample
the lifetime of nodes from a normal distribution
Homophily in Edge formation: Probability of edge
formation dependent upon node degree as well as
agreement (similarity) in node characteristics
95
![Page 96: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/96.jpg)
Results
Diameter: Non-Monotonically Changing Diameter
% LCC as being relatively small
96
![Page 97: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/97.jpg)
Results
Number of Connected Components
97
Network growth and the gelling point
![Page 98: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/98.jpg)
Conclusion: Trust Networks
Trust Networks in MMOs exhibit many properties which are
not exhibited by other social networks in most other
domains
Proposed a model based on observations and domain
knowledge
Models of social networks should incorporate the
peculiarities which are observed in MMOs in general
Generalization? Similar observations have been made for
mentoring networks but not for PvP, Trade and Chat
Networks
98
![Page 99: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/99.jpg)
Trust and Homophily
Most types of homophily do not
carry over to the MMO domain
Computational Social Science
Semantics
Algorithmic
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Gold Farmer Detection
No Honor Amongst Thieves
Detection of Clandestine Actors
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Trust Prediction Family of Problems
Structure
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Characteristics of Trust Networks
![Page 100: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/100.jpg)
Trust Prediction Family of
Problems
Trust Prediction: Given a trust network G predict which
nodes are going to trust one another in the future
Prediction Across Networks: Given a set of actors who participate in multiple types of interactions using features from one network predict the existence of links in the other network
100
![Page 101: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/101.jpg)
Trust Prediction Family of
Problems
Machine Learning/Classification approach to the problem
of trust prediction
Introduced a set of new problems (inter-network link
prediction, trust propensity prediction)
Proposed an algorithm for link prediction which used
domain knowledge from social science theories
New Contributions:
Does the social context (adversarial vs. cooperative) effect
prediction results?
If so then what is the implication for generalization?
101
![Page 102: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/102.jpg)
Prediction Task
Trust Prediction as a classification problem
60,000 examples for each prediction task
10 Fold Cross-validation
Data from Guk (Cooperative) and Nagafen (Adversarial) Servers
Six Standard Classifiers for Comparison: J48, JRip, AdaBoost, Bayes Network, Naive Bayes and k-nearest neighbor
Positive Example:
Training Period Test Period
Negative Example:
Training Period Test Period
102
![Page 103: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/103.jpg)
Prediction: Group Network
103
In general good prediction results are obtained for the combat network (in either direction), except one case
The grouping network is a union of all grouping instances and thus it is extremely dense (1,796,438 edges, 31,900 nodes)
![Page 104: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/104.jpg)
Prediction: Group Network
104
Grouping is not a good predictor of mentoring but the vice versa is correct
Mentoring is (more often than not is accompanied by grouping) but grouping can be in a variety of contexts e.g., raids, quests, dungeon instances, …
![Page 105: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/105.jpg)
Prediction: Combat Network
In general good prediction results are obtained for the combat network (in either direction)
In general if players have played against another person then they have friends in other networks who have done the same or something similar
105
![Page 106: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/106.jpg)
Comparison across Social
Environments
106
T: Trust; M: Mentoring; B: Trade
In general the results are similar for both the
environments with some exceptions
Mentoring-Mentoring: In the adversarial environment a
more cliqueish behavior is observed i.e., friends of
friends are likely to mentor one another in the future
which is not the case for the cooperative environment
![Page 107: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/107.jpg)
Comparison across Social
Environments
107
Mentoring-Related Tasks: In general mentoring is
much more prevalent in the adversarial environment as
compared to the cooperative environment (~3 times) and
is much more intense. Overlap between mentoring and
trade is thus more likely
![Page 108: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/108.jpg)
Trust and Homophily
Most types of homophily do not
carry over to the MMO domain
Computational Social Science
Semantics
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Structure
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Characteristics of Trust Networks
Algorithmic
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Applications
Gold Farmer Detection
No Honor Amongst Thieves
Detection of Clandestine Actors
![Page 109: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/109.jpg)
Trust Amongst Clandestine Actors
“A plague upon‟t when thieves
cannot be true one to another!”– Sir Falstaff, Henry IV, Part 1, II.ii
Do gold farmers
trust each other?
109
![Page 110: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/110.jpg)
Hypergraphs to Represent Tripartite Graphs
110
• Accounts can have several characters
• Houses can be accessed by several characters
• Projecting to one- or two-model data obscures crucial
information about embededdness and paths
• Figure 2a: Can ca31 access the same house as ca11?
• Figure 2b: Are characters all owned by same account?
![Page 111: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/111.jpg)
Hypergraphs: Key Concepts
• Hyperedge: An edge between three or
more nodes in a graph. We use three
types of nodes: Character, account and
house
• Node Degree: The number of
hyperedges which are connected to a
node
NDh1 = 3
• Edge Degree: The number of
hyperedges that an edge participates in
EDa1-h1 = 2
111
![Page 112: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/112.jpg)
Network Characteristics
112
• Long tail distributions are observed for the various degree distributions
• The mapping from character-house to an account is always unique
• Players who are connected to a large number of houses are highly active
players otherwise as well
![Page 113: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/113.jpg)
Characteristics of Hypergraph Projection Networks
• Account Projection: Majority of the gold farmer nodes are isolates
(79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47)
• Character Projection: Majority of the gold farmer nodes are isolates
(84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23)
• House Projection: 521 gold farmer houses. Most are isolates (not
shown) but others are part of complex structures. Densely connected
network with gold farmers (7.56) and affiliates (84.02)
113The Housing Projection Network
![Page 114: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/114.jpg)
Key Observations
• Gold farmers grant trust ties less frequently than either
affiliates or general players
• Gold farmers grant and receive fewer housing permissions
(1.82) than their affiliates (4.03) or general player population
(2.73)
114
Total degree In degree Out degree
< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
![Page 115: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/115.jpg)
Key Observations
• No honor among thieves
• Gold farmers also have very low tendency to grant other
gold farmers permission (0.29)
• Affiliates also unlikely to trust other affiliates (0.70)
115
Total degree In degree Out degree
< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
![Page 116: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/116.jpg)
Key Observations
• Affiliates are brokers:
• Farmers trust affiliates more (1.82) than other farmers (0.29)
• Affiliates trust farmers more (1.28) than other affiliates (0.70)
• Non-affiliates have a greater tendency to grant permissions to
affiliates (7.77) than in general (2.73)
116
Total degree In degree Out degree
< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
![Page 117: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/117.jpg)
Frequent Itemset Mining for Frequent Hyper-subgraphs
Support of a Hyper-subgraph: Given a sub-hypergraph of size k,
subP is the pattern of interest containing the label P, shP is a pattern
of the same size as subP and contains the label P, the support is
defined as follows:
117
Support of pattern also containing a gold farmer (red) = 5/8
![Page 118: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/118.jpg)
Frequent Itemset Mining for Frequent Hyper-subgraphs
Confidence of a Hyper-Subgraph: Given a sub-hypergraph of
size k, subP is the pattern of interest containing the label P, subG is
a pattern which is structurally equivalent but which does not contain
the label P, the confidence is defined as follows:
118
Confidence of pattern and containing a gold farmer = 5/7
![Page 119: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/119.jpg)
Frequent Patterns of GFs
• Very low (s ≤ 0.1) support and confidence for almost all
(except 8) frequent patterns with gold farmers
• Remaining 8 patterns can be used for discrimination
between gold farmers and non-gold farmers in a subset of
the instances
• Gold farmers & affiliates are more connected: A 3rd of more
complex patterns are associated with affiliates (15/44)
119
![Page 120: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/120.jpg)
Application: Gold Farmer Detection (Behavioral)
120
Classification using in-game behavioral and demographic features
Each model corresponds to a grouping of different features
![Page 121: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/121.jpg)
Classifier Metric Initial Dataset Label Prop Change In Performance
Bayes Net Precision 0.17 0.189 0.019
Recall 0.834 0.819 -0.015
F-Score 0.282 0.307 0.025
J48 Precision 0.494 0.62 0.126
Recall 0.189 0.337 0.148
F-Score 0.273 0.437 0.164
J Rip Precision 0.495 0.537 0.042
Recall 0.462 0.462 0
F-Score 0.478 0.497 0.019
KNN Precision 0.436 0.46 0.024
Recall 0.396 0.428 0.032
F-Score 0.415 0.443 0.028
Logistic Regression Precision 0.455 0.534 0.079
Recall 0.189 0.271 0.082
F-Score 0.267 0.36 0.093
Naïve Bayes Precision 0.146 0.142 -0.004
Recall 0.538 0.502 -0.036
F-Score 0.23 0.221 -0.009
Adaboost w/ DT Precision 0.405 0.471 0.066
Recall 0.105 0.08 -0.025
F-Score 0.167 0.137 -0.03
Application: Gold Farmer Detection (Label
Propagation)• Problem: Not all people who are labeled as normal players are such. Some
of them are gold farmers but have not been identified as such
• Analogue: If someone is socializing almost exclusively with criminals then
he may be a criminal
121
![Page 122: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/122.jpg)
Gold Farmer Detection
Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s character in the game e.g., character race, character gender, distribution of gaming activities etc.
Model 2 (Item Based Features): These are the features which are derived from items bought and sold from the consignment network. These features are based on the frequency of the frequent items sold or bought by gold farmers.
Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.
Model 4 (Item Network Based Features): Features which are derived from the item network in a manner analogous to Model 2.
Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1 and Model 4.
Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and Model 4.
Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described above.
122
![Page 123: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/123.jpg)
Application: Structural Signatures Approach Applied to Trade Networks
A set of standard machine learning models are used: Naive Bayes, Bayes Net,
Logistic Regression, KNN, J48, JRip, AdaBoost and SMO
Not sufficient data is present to use the structural signature approach to catch
gold farmers
Alternative: Application of the same approach in other networks: Trade Network
123
![Page 124: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/124.jpg)
Conclusion: Gold Farmer Behavior Analysis and Prediction
124
Representation Related Issues
addressed with respect to trust in
MMOsApplication of frequent pattern mining to
discover distinct trust patterns associated
with gold farmers
No honor between thieves:
Gold farmers tend not to trust other
gold farmers
![Page 125: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/125.jpg)
Computational Trust in Multiplayer Online Games
Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Item Recommendation Success Prediction (Social Capital)
Effect of network structure on trustEffects of social environments on trust
Different Social environments result in
differences in network signatures
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
No Honor Amongst Thieves
Department of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
Most types of homophily do not
carry over to the MMO domain
Trust as a multi-modal multi-level
network phenomenon
Social Characteristics of Trust
Network formulations of Traditional
Trust related concepts
Incorporating social science theories
in trust related prediction task
Generative Network Models for Trust based
social interactions
Trust and Prediction
Trust as a Multi-Level Network Phenomenon
![Page 126: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/126.jpg)
Trust and Homophily
Most types of homophily do not
carry over to the MMO domain
Computational Social Science
Semantics
Predictive Analysis
Trust Prediction Family of Problems
Predict Formation, Change, Breakage of Trust with in the same
network and across social networks. Predict trust propensity.
Structure
Trust based and other social networks in MMOs
exhibit anomalous network characteristics
Applications
Characteristics of Trust Networks
Gold Farmer Detection
No Honor Amongst Thieves
Detection of Clandestine Actors
Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota
Muhammad Aurangzeb Ahmad
![Page 127: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/127.jpg)
Conclusion
Availability of data allows one to analyze phenomenon
where it was not possible to do so in the past (Trust in
MMOs in the current case)
Explored a series of big-picture questions pertaining to
trust in MMOs
Similar affordances and contexts (with respect to the offline
world) lead to similar outcomes in the online world
Explored and expanded the scope of trust related
prediction tasks
Analysis is required on more datasets for generalizability
127
![Page 128: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/128.jpg)
Acknowledgement
DMR Lab
Professor Jaideep Srivastava and all DMR lab members especially
Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak,
Kyong Jin Shim and Nisheeth Srivastava
Virtual Worlds Observatory
Professor Noshir Contractor (Northwestern), Professor Scott Poole
(UIUC), Professor Dmitri Williams (USC)
External Collaborators at Northwestern, UIUC, USC, U Toronto
especially Brian Keegan
Funding Agencies:
NSF, AFRL, NSCTA, IARPA
128
![Page 129: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/129.jpg)
Publication Summary
Publication Type Total Thesis Related Comment
Book 1 1 Book on Clandestine
Behaviors and Networks
(Springer 2012)
Conference
Papers
14 5
Book Chapters 1 -
Journal Papers 2 -
Workshop Papers 10 3 2 Best Paper Awards
Short/Poster
Papers
8 1
Technical Reports 4 1
Tutorials 4 1
Patents 2 127 Co-authors
129
![Page 130: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/130.jpg)
A Computational Model
for
Social Influence
![Page 131: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/131.jpg)
Objective: Model social
influence in a dynamic network
setting as a spread of cascades
to identify key influential nodes
131
![Page 132: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/132.jpg)
Outline
Team Overview
Optimal Target Selection
Current approaches
Proposed Approach
Results
Update summary
Q&A
132
![Page 133: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/133.jpg)
Motivation
Courtesy:http://blog.twitter.com/2011/06/global-pulse.html
Global retweets of Tweets coming from Japan for one
hour after the earthquake
133
OriginalRetweet
![Page 134: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/134.jpg)
Optimal Target People Selection
Goal: Find the optimal targets to maximize influence
spread in network
Why is it important?
Public Polls: Sentiment spread
Sales and Marketing: Word-of-mouth spread
Public Health: Disease spread
Influence: A force that attempts to change the opinion or
behavior of the individual
Influence is causal
My friend buys a product -> I buy a product
134
![Page 135: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/135.jpg)
Current Approaches (1/2)
Assume a certain model for propagation of influence [1]
Independent Cascade Model
Each node is independently influences the neighbor
using a biased coin toss
Linear Threshold Model
Each node picks a random threshold
Each node infects the neighbor by a specified
amount
All of them need to know the
propagation probability135
![Page 136: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/136.jpg)
Current Approaches (1/2)
Use a two step approach
Step 1: Choose influence Model
Step 2: Find the subset of top-k nodes that maximizes
the influence
Bad News: Step 2 is NP-Hard [2]
Popular method for step 2: Greedy Heuristics
Greedy heuristics gives (1-1/e) approximation
Address scalability issues in the second step [3][4][5][6]
CELF [3]: Influence maximization function follows law
of diminishing returns (sub-modular)
136
![Page 137: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/137.jpg)
Limitations of current approaches
Estimation/Learning of propagation probabilities
Data-driven and model free approach
Propagation models are not content unaware
Content-aware influencer mining
No causality effect in influence model
Add causal effect in influence propagation
137
![Page 138: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/138.jpg)
Proposed Approach
Communication
Logs
Network
Structure
Frequent
Sequence
Mining
Network
Discovery of
Influencers
Ordered list of
influencers
138
![Page 139: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/139.jpg)
Sequence Cascade Mining
Let Support = 2
Peter, Charles, Kim, Vicky, Nancy
Network structure
Append neighbors from network
Check downward closure and
support count
Example
i=2
Temporal order
139
![Page 140: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/140.jpg)
Sequence Cascade Mining
Let Support = 2
Peter, Charles, Kim, Vicky, Nancy
Network structure
Append neighbors from network
Example
i=3
140
Check downward closure and
support count
![Page 141: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/141.jpg)
Sequence Cascade Mining
Let Support = 2
Peter, Charles, Kim, Vicky, Nancy
Network structure
Append neighbors from network
Example
i=4
Break loop; No more work to do…
Return Frequent Sequences
141
![Page 142: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/142.jpg)
Sequence Cascade Mining
L1 Frequent Sequence Mining
Candidate Generation for k+1
By appending neighbors
Downward closure pruning
Support Count and reorganize
142
![Page 143: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/143.jpg)
Influence Function
Node pattern Influence Set (Q(i,p))
=
Influence Set of a Node
Influence Set of a Node Set
It can be shown that influence function I(V,s) is
monotone and sub-modular143
![Page 144: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/144.jpg)
Network Discovery of Influencers
Greedy heuristic with (1-1/e) approximation
guarantee 144
![Page 145: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/145.jpg)
Network Discovery of Influencers
Frequent Sequences
P
C
K V
N
Example
Choose the node with max out-degree
C chosen
Current Set of Influenced People
(Randomly break tie between P
and C)
145
C K P
i=1
Find top-3
influencers
![Page 146: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/146.jpg)
Network Discovery of Influencers
Frequent Sequences
P
C
K V
N
Example
Choose the node with max out-degree
P chosen
Current Set of Influenced People
146
C K P
i=2
V N
![Page 147: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/147.jpg)
Network Discovery of Influencers
Frequent Sequences
P
C
K V
N
Example
Choose the node with max out-degree
K chosen
Current Set of Influenced People
147
C K P
i=3
V N
(Randomly break tie between K, V,
N)
![Page 148: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/148.jpg)
Context-Aware Influencers
Bag words
representing
the context
Frequent
Sequence
Mining
Network
Discovery of
Influencers
Context Specific
influencers
148
![Page 149: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/149.jpg)
Initial Results (1/2)
Significant performance gain over established
baseline PrefixSpan
DBLP Dataset USPTO Dataset
149
![Page 150: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/150.jpg)
Initial Results (2/2)
DBLP
Dataset
USPTO
Dataset
150
![Page 151: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/151.jpg)
Top Influencers
151
Thomas Huang
Philip Yu
Elisa Bertino
DBLP
Dataset
USPTO
Dataset
Dieter FreitagCTO, FRX Polymers
American Plastics Hall of FameProfessor, UIUC
Professor, UIC
Professor, Purdue
Akira Suzuki2010 Nobel Prize winner
In Chemistry
*
*
Wilhelm BrandesScientist, Bayer Crop Science
* Prolific inventors 1988-1997 as per USPTO announcement [7]
![Page 152: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/152.jpg)
What‟s next?
Short Term
o Implement NDISC algorithm
o Verify the influence spread of NDISC compared to popular
baselines (PMIA, DegreeDiscountIC, SPM, and SP1M)
o Qualitatively analyze the influencer results compared to
baseline algorithms
Long Term
o Need to extend the approach for dynamic time evolving
content and network structures
o Need implement influence analysis for streaming
algorithms
o Modeling of influence propagation in collaborative networks
using Hypergraphs
o Need to do this for multi-relational graphs152
![Page 153: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/153.jpg)
Overall Summary
Optimal Target People Selection
Task 3.3
UMN Team:
Karthik Subbian (Reseacher)
Jaideep Srivastava(Faculty)
Goals Optimal target people selection using
model-free influence models
Develop a content-centric approach for sequential cascade mining
Develop greedy heuristics for influencer mining from cascades
Tracking the influence of target nodes using online algorithms
Novel Ideas Optimal Target Selection
Content-centric
Mining Sequential cascades in network
Finding influencers from cascades
Discover context specific influencers
Future Plans Short Term
Implement NDISC and compare with baseline algorithms
Long Term
Finding influencer in dynamic time evolving network structures
Modeling of influence propagation in hyper-graphs and multi-relational graphs
![Page 154: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/154.jpg)
4/9/2013 University of Minnesota 154
Course Outline
• Module 1• Introduction to Social Analytics – applying data mining to social computing
systems; examples of a number of social computing systems, e.g.
FaceBook, MMO games, etc.
• Module 2 • Computational online trust
• Identifying key influencers
• Information flow in networks
• Module 3• Analysis of clandestine networks
• Katana - game analytics engine
![Page 155: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/155.jpg)
155
Untangling Dark WebsTheories, Methods, and Models for a
Computational Social Science of
Clandestine Networks
![Page 156: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/156.jpg)
Defining Clandestine Behavior
Clandestine behavior is socially and culturally constructed.
There is no computational definition of what constitutes
clandestine behavior
“Kept secret or done secretively, esp. because illicit”
(Webster‟s Definition)
Clandestine network thus involve actors and/or activities
which are illicit in nature
![Page 157: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/157.jpg)
Kevin Bacon Linked to Al-Qaeda !
Just because two things are related does not mean that they have a concrete
relationship. Unless …..
![Page 158: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/158.jpg)
Clandestine organizations as
networks
Networks are more flexible organizational forms than
markets or hierarchies [Powell 1990; Podolny 1998; Brass, Galaskiewicz, et al. 2004; Robins 2009]
Criminals are embedded within organizations
supporting division of labor and specialization [Cressy 1972; Canter & Alison 2000; Waring 2002]
Trust relations mediate functional relations like
grouping, exchange, & communication [McIntosh 1974; von Lampe 2004]
Balancing security vs. efficiency; time-to-task; resilience
vs. flexibility [Milward & Raab 2003, 2006; Morselli, Giguere, & Petit 2007]
![Page 159: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/159.jpg)
Contrasting network
assumptions
Current assumptions Network theory assumptions
1 Networks are information systems Networks are multi-functional
communication systems
2 Networks have uniplex ahistoric relations Networks have multiplex and dynamic
relations
3 Networks are hierarchically organized, C2
structures
Networks are temporary, dynamic,
emergent, adaptive, flexible
4 Boundary specification is a political tool Boundary specification is an analytic tool
5 Networks are globalized and homophilous Networks are local, glocal, global, and
heterogeneous
Stohl & Stohl (2005)
![Page 160: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/160.jpg)
Information systems?
Network ties aren‟t just for sharing information and
resources, but also represent latent relationships
Networks are not a machine to be broken, but a social
organism that can adapt and reproduce
What are underlying processes that govern how networks evolve
and maintain their structure?
Networks are structures for sensemaking & socialization
Bomb-making websites easy to disrupt but message boards which
foster communication and solidarity among people with similar
sympathies and values much less so
![Page 161: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/161.jpg)
Uniplex, ahistoric relationships?
Essential covert relationships like trust & knowledge exchanged based
on multiplex ties and attributes
Shared background, common identity, family ties, reciprocity
Jihadi cells prevalent in Montreal, London, Madrid, Hamburg because of
poor social integration & high disaffection
Longitudinal data necessary to measure changes over time
Repeated measurements of the same network actors, ties, attributes
Collection problems:
Left censoring: important changes happened before data collection
Right censoring: data collection does not last long enough to capture important
changes
Boundary specification: Omitting important actors, ties, attributes
Cognitive biases: Over or under-recall
Instrumentation: measurement inconsistencies, panel conditioning & attrition,
missing data
![Page 162: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/162.jpg)
Hierarchies?
Licit and illicit organizations are no longer hierarchical pyramids with
clear chains of command
Network identities and roles are not fixed and constant
Networks do not operate according to formal & unambiguous rules
Cells: Operations can be performed spontaneously & independently
Individuals may identify with an organization, but are not part of it
Networks operate at multiple levels
“Organizations created out of complex webs of exchange, dependency, reciprocity
among multiple organizations” (Monge &Contractor 2003)
Political parties, bureaucrats, businesses, charitable organizations, clinics, schools, &
houses of worship are legitimate peripheral actors which enable covert organizations
![Page 163: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/163.jpg)
Easily specified?
Specifying rules for including/excluding actors in network membership
is essential part of research design
Family members, politicians, businesses, charitable organizations, …
Unit of analysis: individuals, teams, organizations?
Example: Snowball sampling
Identify initial set then their links to second degree, third degree, etc.
Missing crucial seed actors, time intensive, super-abundance of unreliable data,
“everyone” connected by 6 degrees of separation
Labels are fuzzy & prone to political framing
“Terrorist vs. freedom-fighter”: UK vs. IRA? China vs. ETIM? Russia vs. Chechnya?
Afghanistan/Pakistan vs. Taliban? Iraq/Turkey vs. Kurds?
Distinguish between being in contact vs. being operational
Ability to mobilize, control, coordinate members
Look for how organizations cleave when they are forming or being changed
![Page 164: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/164.jpg)
Global & homophilous?
Terrorist networks encompass very different ideologies and goals
Religious (salafism), ethnic (Kurds), and nationalist (ETA) movements
Coercive bargaining (IRA, PLO, ETA) vs. war-inducing (AQ)
Different motivations leads to different formation processes &
structures
Hamas, Hezbollah, ETA emphasize shared background/ethnicity vs. “movements”
like al Qaeda or militias committed to shared ideology
Relational homophily drives creation of self-similar strong ties which remain dormant,
difficult to identify & enter, easy to disassemble
Ideological homophily drives creation of single-issue ties which remain salient, more
permeable boundaries means easier to enter but harder to prevent re-formation
![Page 165: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/165.jpg)
Criminal networks literature
Balancing security vs. efficiency; time-to-task; resilience vs. flexibility Milward & Raab 2003; Morselli, Giguere, Petit 2007
Secrecy-oriented networks emphasize sparse, decentralized networks to avoid
detection Erickson 1981
Decentralization a common response to authorities‟ targeting and seizure Morselli & Petit 2007
Leaders on periphery (low closeness) to avoid detection Baker & Faulkner 1993
Drug traffickers exhibit higher centralization in core of participants with stable
roles, but insulate by adding participants to extend periphery of core Morselli, Giguere, Petit 2007; Dorn, Oette, White 1998
165
![Page 166: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/166.jpg)
Covert networks are dynamic
networks
Multiple types of relationships
Familial ties, communication, exchange, authority, & other latent relationships
Actor-actor, actor-attribute, actor-event networks
Individuals central in one network are peripheral in other networks
Positions and relationships are not stable
Removing links & nodes can alter pattern but does not address underlying processes
that govern how networks evolve, maintain, dissolve
Outwardly different networks share common structural properties
Faust & Skvoretz (2002): Senate co-sponsorship most similar to cow-licking
Outwardly similar networks generated by different processes
![Page 167: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/167.jpg)
Cutpoints & bridges
Brokers are excellent targets
to disassemble networks
Cutpoint
Removing a node creates a new
component
Bridges
Removing a link creates a new
component
![Page 168: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/168.jpg)
![Page 169: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/169.jpg)
Clandestine networks are
complex
Multiple types of relationships [Stohl & Stohl 2007]
Networks have multiplex and dynamic relations – trust, exchange,
communication, authority, and other relationships
Networks are temporary, dynamic, emergent, adaptive, flexible
Networks are local, glocal, global, and heterogeneous – different ideologies,
motivations, goals lead to different structures & processes
Descriptive network analysis does not address
underlying processes of how networks emerge, stabilize,
dissolve
Goal: To disrupt network, understand and attack the
processes which create and stabilize
![Page 170: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/170.jpg)
Modeling Clandestine
Organizations
and Behaviors as
Networks
![Page 171: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/171.jpg)
Introduction to network analysis
Networks are sets of nodes
connected by links
Nodes can be people, groups,
webpages, etc.
Links can be friendships, exchanges,
affiliations, etc.
A B
![Page 172: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/172.jpg)
Networks - Directed & undirected
Communication vs. friendship networks
A B
D E
F G
C A B
D E
F G
C
![Page 173: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/173.jpg)
Degree distributions
What‟s the probability P(k)
of randomly selecting a
node with degree k in this
network?
24/31
6/31
1/31
P(k
)
k1 4 12
![Page 174: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/174.jpg)
Power laws
• Large networks can have
degree distributions that span
several orders of magnitude
• Many real world networks follow
a power law degree
distribution
• Scale free networks, 80/20 rule,
Pareto principle, Zipf‟s Law, long tail,
etc.
kkP ~)(
Internet routers Movie actors
Physicists Neuroscientists
ᵞ ᵞ
ᵞ ᵞ
![Page 175: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/175.jpg)
Deg distributions across
networks
![Page 176: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/176.jpg)
Path length
Path length: number of links
between two nodes (degrees
of separation)
BACDE = 4
Geodesic: Shortest path
length between two nodes
BAE = 2
Diameter: Network‟s largest
geodesic\eccenctricity
BAEFGH\BACJIH
A B
DC
E
F
G H
I
J K
![Page 177: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/177.jpg)
Density, clustering, centralization
Density
Observed edges in network / maximum possible
edges
Clustering
Count ties among alters, removing ego and ties to
ego
Observed ties in actor‟s ego network / maximum
possible ties in ego network
Network centralization
Variation in individual actors‟ centralities
High centralization when few actors possess higher
centrality than average
Low centralization when actors all have similar
centralities
B
D
G
F
E
C
H
I
A
![Page 178: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/178.jpg)
Paths & clustering across
networks
![Page 179: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/179.jpg)
Small worlds
Paradox: Individuals within the
network are highly clustered but
also have small average
geodesics to other members
Randomly rewiring a fraction of
links on a regularly-clustered
network drastically shortens
average eccentricity
Random rewiring, however
still maintains high clustering
over several orders of
magnitude
![Page 180: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/180.jpg)
Degree centrality
Degree: total number of links with other actors
In-degree: Directional links to actor from other actors
Out-degree: Directional links from actor to other actors
“Popularity”
A
D
B
H I
F
G
E
C
J
![Page 181: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/181.jpg)
Closeness centrality
How easily one actor can reach rest of network
Actor with shortest average path length
“Pulse-taker”
A
D
B
H I
F
G
E
C
J
![Page 182: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/182.jpg)
Betweenness centrality
How much an actor lies between distinct groups
Number of geodesics passing through actor
“Broker”
A
D
B
H I
F
G
E
C
J
![Page 183: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/183.jpg)
Multi-Dimensional Networks -
Attributes
Selection: Immutable characteristics
Race, ethnicity, gender, etc.
Simple process: Attributes remain fixed and influence how
connections are formed
Homophily
Influence: alterable characteristics
Interests, activities, infectiousness, etc.
Complex process: Feedback and interactions between ego‟s
attributes, network structure, alters‟ attributes
Diffusion and coevolution
![Page 184: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/184.jpg)
Selection and homophily
Existing attributes drive creation & destruction of
connections
“Birds of a feather flock together”
A B
D E
F G
C A B
D E
F G
C
Time
![Page 185: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/185.jpg)
Influence and diffusion
Existing connections drive creation & destruction of
attributes
A B
D E
F G
C A B
D E
F G
C
Time
![Page 186: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/186.jpg)
Multi-relational
Organizations: authority, trust, & friendship
A B
D E
F G
C
![Page 187: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/187.jpg)
Triad census & network motifs
16 possible triads types\motifs\isomorphisms in a directed network
(Reciprocated ties, unreciprocated ties, null ties)
Triad census: frequency of each structure in a network
Compare frequencies in observed network, measuring deviations against
frequencies in random networks
Simmelian ties: Transitive-reciprocated triads (3-0-0) occur appear
more frequently than any other motif in social networks
003 012 102 021D 021U 021C 111D 111U
030T 030C 201 120D 120U 120C 210 300
![Page 188: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/188.jpg)
Network families
![Page 189: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/189.jpg)
Growth & preferential
attachment
How do you generate scale-free networks? Rich get richer (Yule 1925, Simon 1955, de Solla Price 1976, Albert & Barabasi 1998)
?
![Page 190: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/190.jpg)
Data collection issues
Data on clandestine organizations are doubly hard to
obtain Clandestine networks by definition seek to avoid detection or identification
Law enforcement prohibited from collecting some types of data, reluctant to disclose
extant data to prevent adaptation
Criminal network studies to date generally rely on:
Evidence entered into court proceedings [Sparrow 1991; Baker & Faulkner 1993]
Imputation from secondary or tertiary sources [Krebs 2002; Sageman 2004]
![Page 191: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/191.jpg)
Problems with approaches
Collection of incomplete network
data seriously compromises
reliability of findings [Wasserman & Faust
1994]
Boundary specification of nodes –
peripheral & legitimate actors often
omitted despite playing crucial roles
Censored data – only one type of
interaction recorded; earlier & later ties
may be impossible to capture
Lack of attributes – occupation, gender,
affiliation, personality, psychological
states strongly influence tie-formation
behavior over time [Robins 2009]
Krebs 2002
![Page 192: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/192.jpg)
Introduction to MMOGs,
Gold Farming, and
Everquest II
![Page 193: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/193.jpg)
MMOGs
Massively Multiplayer Online Games (MMOGs)
Shared online persistent virtual environments where
millions of people can interact with one another
Players can complete quests, interact with the game
environment, interact with other players
Many of the behaviors which are observed in the real world
are also observed in MMOGs e.g., friendship, economic
behavior, backstabbing, illicit behaviors etc
![Page 194: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/194.jpg)
Gold farming
194
• Gold farming and real money trade
involve the exchange of virtual in-game
resources for “real world” money
• Laborers in China and S.E. Asia paid to
perform repetitive in-game practices
(“farming”) to accumulate virtual wealth
(“gold”)
• Western players purchase farmed gold to
obtain more powerful items/abilities and
open new areas within the game
• Market for real money trade exceeds $3
billion annually [Lehdonvirta & Ernkvist 2011]
![Page 195: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/195.jpg)
Deviance
Game companies actively ban accounts involved in gold farming operations
Why?
Upsets game economy equilibrium by inflating prices
Excludes other players from shared game environments
Automated bots/scripts ruin social interactions
Pay-to-play upsets meritocratic expectations
Theft of billing or account information
Legal ramifications of virtual items as “property”
195
![Page 196: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/196.jpg)
Identification
Most gold farmers are caught by:
Other players reporting behavior
Farmers’ solicitations and spamming
Organized “sting operations”
Administrator heuristics
Farming operations employ highly-specialized operations that have to balance practices to efficiently accumulate gold with practices to avoid detection
![Page 197: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/197.jpg)
Changes in ban reasons, all
197
![Page 198: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/198.jpg)
Ninja Metrics confidential information. Copyright 2012
Dynamics, bursts, lifecycle
![Page 199: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/199.jpg)
Mapping
Gold farmers potentially operate under similar motivations
and constraints as other clandestine or criminal
organizations
Profit motive – illicit goods with minimal input costs provide
arbitrage opportunities
Distribution challenges – suboptimal processes and
structures for generating and distributing goods so to avoid
risk of detection
Selection pressures – authorities confiscate goods and detain
participants when detected
![Page 200: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/200.jpg)
EverQuest II
Massively Multiplayer Online Role
Playing Game (MMORPG, MMO)
Data spanning from January 2006
to Mid-September 2006
2.1 million players across multiple
servers
Our analysis focuses on a subset of
the server
In-game action data available as
well as social interaction data (trust,
mentoring, questing, grouping,
trade) etc
![Page 201: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/201.jpg)
EverQuest II – Gold Farmers
Gold Farmers are explicitly labeled in the dataset
Gold Farmers constitute only a small percentage of all of
the players (1-3%)
Gold Farmers bots are not as common in EverQuest II as it
is generally assumed to be the case in MMOs
![Page 202: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/202.jpg)
Types of Gold Farmers
There are multiple types of Gold Farmers
• Gatherers: Accumulate gold or other resources
• Bankers: Low-activity reserve accounts
• Mules and dealers: Single user accounts to transmit money and
interact with the customer
• Barkers: Spammers marking services in the game
Issues: The dataset however does not have labels for
the gold farmer sub-types
Open Question: Are there other types of gold farmers?
![Page 203: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/203.jpg)
Descriptive properties
and
Statistical network
models
![Page 204: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/204.jpg)
Properties of Clandestine
Networks
Clandestine Networks are embedded in larger networks
and have been studied in isolation as well as being
embedded in larger networks
Do clandestine networks exhibit properties that distinguish
them from normal networks?
Behavioral Signatures
Structural Signatures
Spatio-Temporal Signatures
![Page 205: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/205.jpg)
Centrality comparisons
Compared to the population-at-large, do farmers or affiliates have higher or lower…
Incoming trade relationships? (In-degree)
Outgoing trade relationships? (Out-degree)
Incoming transactions? (In-weight)
Outgoing transactions? (Out-weight)
Proximity to the rest of the network? (Closeness)
Level of brokerage? (Betweenness)
Higher levels of “prestige”? (Eigenvector)
Tendency for counterparties to also trade? (Clustering)
![Page 206: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/206.jpg)
Multinomial logit
Compared to non-farmers, do farmers & affiliates have significantly different structures?
Farmers (z-statistic) Affiliates (z-statistic)
In-degree -0.0473 (-9.24)*** 0.0626 (47.21)***
Out-degree -0.189 (-5.11)*** 0.0529 (44.25)***
In-weight 0.00691 (23.04)*** 0.00851 (32.64)***
Out-weight 0.00796 (24.32)*** 0.009556 (32.97)***
Closeness -0.679 (-6.38)*** -2.438 (-13.24)***
Betweenness 1.56x10-7 (1.94) 1.36x10-7 (35.81)***
Eigenvector -10300 (-8.60)*** 5377 (35.73)***
Clustering coefficient 0.905 (8.64)*** -0.0926 (-0.81)
![Page 207: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/207.jpg)
Network characterizations
Degree distribution
What fraction, P(k), of nodes in the network have kconnections?
Weight distribution
What fraction, P(s), of links in the network have had stransactions?
Power law/scale free distributions
80/20 rule: minority of nodes have majority of links
Generated by growth and preferential attachment
Linear on a log-log plot… with some interesting exceptions
![Page 208: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/208.jpg)
Degree distribution & attenuation
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 10 100
P(k
)
k
Farmer-In Farmer-Out Affiliates-In Affiliates-Out Non-Affiliates-In Non-Affiliates-Out
![Page 209: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/209.jpg)
Growth constraints
• Power law scaling followed by exponential cut-off• Aging: old nodes stop accepting
new links
• Cost: becomes more expensive to accept new links
• Capacity: nodes stop accepting links above threshold
• Copying: new nodes imitate connections of existing nodes
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1 10 100
P(k
)
k
Farmer-In Farmer-Out Affiliates-In
Affiliates-Out Non-Affiliates-In Non-Affiliates-Out
![Page 210: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/210.jpg)
Assortative v. random mixing
Is the degree of a node correlated with its neighbors’ degrees? No: random mixing
Yes, positive: assortative mixing
Yes, negative: dissortative mixing
Assortative mixing Well-connected nodes are connected to
other well-connected nodes
Poorly connected nodes connected to other poorly connected nodes
Dissortative mixing Well-connected nodes are connected to
poorly-connected nodes
![Page 211: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/211.jpg)
Assortativity in context
Dissortativity found in biological, ecological, & technological networks that require failure tolerance
Assortativity found in social and collaboration networks
![Page 212: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/212.jpg)
Comparative network analysis
How does the gold farming network compare against a “real world” criminal network?
Criminal network data hard to get a hold of in first place
Problems defining boundaries, multiplex relations, etc.
Carlo Morelli’s CAVIAR drug trafficking network (N=110, E=295)
![Page 213: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/213.jpg)
Assortativity
0.1
1
10
1 10 100
Kn
n(k
)
k
Affiliates-InIn Affiliates-OutOut Non-Affiliates-InIn Non-Affiliates-OutOut
Farmer-InIn Farmer-OutOut Caviar-InIn Caviar-OutOut
Gold farmers and drug traffickers both adopt
avoidance interaction structures
Normal players and farmer affiliates both adopt
collaborative interaction structures
![Page 214: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/214.jpg)
Conclusions – Assortativity
Non-affiliated players exhibit assortativity
Collaboration > Resilience
Affiliate network generally assortative
Collaboration > Resilience
High degree outliers likely unidentified farmers
Farmers’ network is dissortative
Collaboration < Resilience
Drug trafficking network similarly dissortative
Collaboration < Resilience
![Page 215: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/215.jpg)
Attack & failure tolerance
Failure: random removal of nodes
Attack strategies Degree attack: removal of best-connected nodes
Edge attack: removal of high-transaction dyads
Outcomes Fraction of remaining nodes in largest connected component
Fraction of remaining nodes as isolates
![Page 216: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/216.jpg)
Study 2 - Attack & failure analysis
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.01% 0.10% 1.00% 10.00% 100.00%
Fra
cti
on
Node fraction removed
Degree attack - LCC Degree attack - Isolate fraction Random failure - LCC
Random failure - Isolate fraction Edge attack - LCC Edge attack - Isolate fraction
Degree attack fractures farming network
faster than random failure or edge attack
![Page 217: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/217.jpg)
Comparative attack & failure analysis – LCC fraction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.01% 0.10% 1.00% 10.00% 100.00%
Fra
cti
on
Node fraction removed
Farmer attack - LCC fraction Farmer failure - LCC fraction Caviar attack - LCC fraction Cavaiar failure - LCC fraction
![Page 218: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/218.jpg)
Comparative attack & failure analysis – Isolate fraction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.01% 0.10% 1.00% 10.00% 100.00%
Fra
cti
on
Node fraction removed
Farmer attack - Isolate Farmer failure - Isolate Caviar attack - Isolate Caviar failure - Isolate
![Page 219: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/219.jpg)
Conclusions – Attack tolerance
Edge attack strategy has poorer performance than even random failure
Gold farmers & drug traffickers respond similarly to degree attack
![Page 220: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/220.jpg)
Month 1
220
![Page 221: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/221.jpg)
Month 2
221
![Page 222: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/222.jpg)
Month 3
222
![Page 223: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/223.jpg)
Month 4
223
![Page 224: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/224.jpg)
Month 5
224
![Page 225: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/225.jpg)
Month 6
225
![Page 226: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/226.jpg)
Month 7
226
![Page 227: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/227.jpg)
Month 8
227
![Page 228: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/228.jpg)
Month 1 in Month 5
228
![Page 229: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/229.jpg)
Month 4 in Month 5
229
![Page 230: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/230.jpg)
Month 5 in Month 5
230
![Page 231: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/231.jpg)
Month 6 in Month 5
231
![Page 232: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/232.jpg)
Month 8 in Month 5
232
![Page 233: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/233.jpg)
Predictive Modeling and
Machine Learning
approaches
![Page 234: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/234.jpg)
Gold Farmer Detection
How does one catch gold farmers?
Are there characteristics which can
be used to distinguish gold farmers?
What attributes can be used to detect GFs?
Player Character Attributes: Gender, Class etc of the player
character
Player Activity Data: Player activities over the course of time e.g.,
number of quests, type of quests, NPCs killed?
Player Socialization Data: Grouping others, trust, mentoring
Player Demographics: Real world age, gender, location
![Page 235: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/235.jpg)
Machine Learning Approaches
Multiple ways to catch gold farmers:
Binary Classification Problem
Multi-class Classification
One class Classification problem
Cascading Classifiers Problem
Outlier Detection
Label Propagation Problem
Combination of these
Class Labeling Issues: Not all players who are labeled as
normal players are normal players
![Page 236: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/236.jpg)
Machine Learning Binary Classification
Approach
Two main classes: GFs and non-GFs
Highly Skewed Distribution
9,178 Gold Farmer Characters out of a total of 2.1
million characters
Standard ser of combinations of classifiers and
features e.g., Naive Bayes, Bayes Net, Logistic
Regression, KNN, J48, JRip, AdaBoost and SMO etc
236
![Page 237: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/237.jpg)
Feature Set
• Demographic Features*
• Performance Features
• Task distributions (set of tasks performed)
• Sequence of activities performed by gold farmers
• Examples: KKKdDKdEESSKD, SSSEKdKdDD
– K= Killed Monster, d = damage points, D = Character Death, S =
Completed a recipe e.g., spell* All information is annonymized.
237
![Page 238: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/238.jpg)
Sequence Patterns
238
![Page 239: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/239.jpg)
239
Binary Classification Approach:
Classification Models
![Page 240: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/240.jpg)
Initial Classification Results
Frequent Pattern Mining Association
![Page 241: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/241.jpg)
Label Propagation
Research Problem: Not all people who are labeled as
normal players are such. Some of them are gold farmers
but have not been identified as such
Analogue: Not all people who are free i.e., not in jails
are innocent
Solution: Use label propagation to label people who
may be gold farmers
Analogue: If someone is socializing almost exclusively
with criminals then he may be a criminal
![Page 242: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/242.jpg)
Label Propagation
(a) Classification Results (b) Social Networks within
class boundaries
![Page 243: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/243.jpg)
Label Propagation
Propagate Labels based on the Social Networks of the Gold Farmers
• Guilt by association
• Propagate labels based on
the social neighborhood and
socialization patterns of players
• If player A spends > 80% of time
socializing (grouping, trusting,
mentoring other gold farmers then
he is likely a gold farmer
• Alternative methods: Propagation
based on similarities
![Page 244: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/244.jpg)
Results
Classifier Metric Initial Dataset Label Prop Change In Performance Lift
Bayes Net Precision 0.17 0.189 0.019 1.11
Recall 0.834 0.819 -0.015 0.98
F-Score 0.282 0.307 0.025 1.09
J48 Precision 0.494 0.62 0.126 1.26
Recall 0.189 0.337 0.148 1.78
F-Score 0.273 0.437 0.164 1.60
J Rip Precision 0.495 0.537 0.042 1.08
Recall 0.462 0.462 0 1.00
F-Score 0.478 0.497 0.019 1.04
KNN Precision 0.436 0.46 0.024 1.06
Recall 0.396 0.428 0.032 1.08
F-Score 0.415 0.443 0.028 1.068
Logistic Regression Precision 0.455 0.534 0.079 1.17
Recall 0.189 0.271 0.082 1.43
F-Score 0.267 0.36 0.093 1.35
Naïve Bayes Precision 0.146 0.142 -0.004 0.97
Recall 0.538 0.502 -0.036 0.93
F-Score 0.23 0.221 -0.009 0.96
Adaboost w/ DT Precision 0.405 0.471 0.066 1.16
Recall 0.105 0.08 -0.025 0.76
F-Score 0.167 0.137 -0.03 0.82
![Page 245: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/245.jpg)
Results Comparison
MetricSocialComp’09
(F0)Label
Propagation
Lift (Social Comp vsNew Features + Label
Propagation)
Precision 0.493 0.537 1.089Recall 0.304 0.462 1.520
F-Score 0.376 0.497 1.322
• The recall improves significantly from the previous results which implies that the
accounts that we are catching more gold farmers
• In case of Label propagation the precision increases from 0.49 to 0.54 which
implies that more of the users being identified as gold farmers by us are indeed
gold farmers
![Page 246: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/246.jpg)
Ninja Metrics confidential information. Copyright 2012
Gold Farmer Detection as a One class
classification Problem
• The labels of one class are known for certain
• The labels for the rest of the records are not known
with certainty
• Use the known class for training and classify the
rest of the records for the known class
• Issues: The known class consists of many
subclasses which different feature sets
![Page 247: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/247.jpg)
Disambiguating Gold Farming
sub-classes
Heuristics can be used to disambiguate the different sub-
classes
Gatherers have high in-game intense activity associated
with them
Mules have high trade volume but low in-game activity
Bankers have low level trade volume and low in-game
activity
Spammers have little of no trade activity (in general) but
denser chat networks
![Page 248: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/248.jpg)
Ninja Metrics confidential information. Copyright 2012
Disambiguating Gold Farming sub-
classes
248
High Intensity Gatherer
High Intensity Normal Player
Banker
Player with Periodic Behavior
Low intensity Player
0000 hour 1200 hours
0000 hour 1200 hours
0000 hour 1200 hours
0000 hour 1200 hours
0000 hour 1200 hours
![Page 249: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/249.jpg)
Research Question
“How do Gold Farmers change their
behaviors as a consequence of game
admin's behaviors?”
Can we anticipate GF change in
behaviors in advance?
![Page 250: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/250.jpg)
Change Detection in Gold Farmer Behavior
• Research Question: How do gold famers respond to global
enforcement of policies by the game admins
• The in-game activities of gold farmers can be represented as a time series
• Applied clustering to these series and 3 clusters made the
most sense (most clear separation)
• Each cluster can be mapped to a gold farmer subtype
• However 20-30% of all players
in each cluster are non-GFs
• Change detection to determine
when the time series changed
250
![Page 251: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/251.jpg)
Interpretation: Global Changes in GF Behaviors - Adaptation
• Gold farmers and game admins change their activities
based on how the other acts in the game
• Previously there was anecdotal evidence for this change,
we have established that this happens at not just the activity
level but at the structural pattern level
• Can we predict how the gold farmers will act if game
admins adopt a certain banning policy?
251
![Page 252: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/252.jpg)
Research Question
“A plague upon‟t when thieves
cannot be true one to another!”– Sir Falstaff, Henry IV, Part 1, II.ii
Do gold farmers
trust each other?
![Page 253: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/253.jpg)
Housing-Trust in EQ2
• Access permissions to in-game house
as trust relationships
• None: Cannot enter house.
• Visitor: Can enter the house and can
interact with objects in the house.
• Friend: Visitor + move items
• Trustee: Friend + remove items
• Houses can contain also items which
allow sales to other characters without
exchanging on the market
253
![Page 254: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/254.jpg)
Hypergraphs to Represent Tripartite Graphs
254
• Accounts can have several characters
• Houses can be accessed by several characters
• Projecting to one- or two-model data obscures crucial
information about embededdness and paths
• Figure 2a: Can ca31 access the same house as ca11?
• Figure 2b: Are characters all owned by same account?
![Page 255: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/255.jpg)
Hypergraphs: Key Concepts
• Hyperedge: An edge between three or
more nodes in a graph. We use three
types of nodes: Character, account and
house
• Node Degree: The number of
hyperedges which are connected to a
node
• NDh1 = 3
• Edge Degree: The number of
hyperedges that an edge participates in
• EDa1-h1 = 2
255
![Page 256: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/256.jpg)
Approach
• Game administrators miss gold farmers and deviance is not
a simple binary classification task
• Guilt by association: Identify “affiliates” who have ever
interacted with identified gold farmers, but have not been
identified as gold farmers themselves
B CA
Farmer Affiliate Non-affiliate
![Page 257: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/257.jpg)
Network Characteristics
257
• Long tail distributions are observed for the various degree distributions
• The mapping from character-house to an account is always unique
![Page 258: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/258.jpg)
Characteristics of Hypergraph Projection Networks
• Account Projection: Majority of the gold farmer nodes are isolates
(79%). Affiliates well-connected (8.89) vs non-affiliates (3.47)
• Character Projection: Majority of the gold farmer nodes are isolates
(84%). Affiliates well-connected (10.42) vs non-affiliates (3.23)
• House Projection: 521 gold farmer houses. Most are isolates (not
shown) but others are part of complex structures. Densely connected
network with gold farmers (7.56) and affiliates (84.02)
258
![Page 259: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/259.jpg)
Key Observations
• Picky picky: Gold farmers grant trust ties less frequently than
either affiliates or general players
• Gold farmers grant and receive fewer housing permissions
(1.82) than their affiliates (4.03) or general player population
(2.73)
259
Total degree In degree Out degree
< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
![Page 260: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/260.jpg)
Key Observations
• No honor among thieves
• Gold farmers also have very low tendency to grant other
gold farmers permission (0.29)
• Affiliates also unlikely to trust other affiliates (0.70)
260
Total degree In degree Out degree
< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
![Page 261: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/261.jpg)
Key Observations
• Affiliates are brokers:
• Farmers trust affiliates more (1.82) than other farmers (0.29)
• Affiliates trust farmers more (1.28) than other affiliates (0.70)
• Non-affiliates have a greater tendency to grant permissions to
non-affiliates (7.77) than in general (2.73)
261
Total degree In degree Out degree
< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >
Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07
Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70
Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34
![Page 262: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/262.jpg)
Frequent Pattern Mining: Key Terms
262
t1: Beer, Diaper, Milk
t2: Beer, Cheese
t3: Cheese, Boots
t4: Beer, Diaper, Cheese
t5: Beer, Diaper, Clothes, Cheese, Milk
t6: Diaper, Clothes, Milk
t7: Diaper, Milk, Clothes
• Items: Cheese, Milk, Beer, Clothes, Diaper, Boots
• Transactions: t1,t2, …, tn
• Itemset: {Cheese, Milk, Butter}
• Support of an itemset: Percentage of transactions which
contain that itemset
• Support( {Diaper, Clothes, Milk} ) = 3/7
Market Basket Transaction
dataset example
![Page 263: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/263.jpg)
Frequent Itemset Mining for Frequent Hyper-subgraphs
o Support of a Hyper-subgraph: Given a sub-hypergraph of size
k, subP is the pattern of interest containing the label P, shP is a
pattern of the same size as subP and contains the label P, the
support is defined as follows:
263
Support of pattern also containing a gold farmer (red) = 5/8
![Page 264: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/264.jpg)
Frequent Itemset Mining for Frequent Hyper-subgraphs
o Confidence of a Hyper-Subgraph: Given a sub-hypergraph of
size k, subP is the pattern of interest containing the label P, subG
is a pattern which is structurally equivalent but which does not
contain the label P, the confidence is defined as follows:
264
Confidence of pattern and containing a gold farmer = 5/7
![Page 265: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/265.jpg)
Frequent Patterns of GFs
• Less than 0.1 support and confidence for almost all
(except 8) frequent patterns with gold farmers
• Remaining 8 patterns can be used for discrimination
between gold farmers and non-gold farmers
• Gold farmers & affiliates are more connected: A third of
more complex patterns (k >= 10 nodes) are associated
with affiliates (15/44)
265
![Page 266: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/266.jpg)
Conclusion and contributions
266
Using hypergraphs to represent
complex data structures and
dependencies
Application of frequent pattern mining to
discover distinct trust patterns associated
with gold farmers
No honor between thieves:
Gold farmers tend not to trust other
gold farmers
![Page 267: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/267.jpg)
Implications
Social organization and behavioral
patterns of clandestine activity as
co-evolutionary outcomes
Using online behavioral patterns to inform
and develop metrics/algorithms for
detecting offline clandestine activity
Clandestine networks as “dual use”
technologies – ethical and legal
implications of improving detection?
[Keegan, Ahmad, et al. 2011]
![Page 268: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/268.jpg)
Limitations and future work
• Housing/trust ties mediated by other or multiplex
relationships
• Communication, grouping, mentoring, trading, etc.
• Multiple types of deviance and deviants: Modeling
role specialization & division of labor
• Using frequent subgraphs patterns as
discriminating features for ML models
• Changes in frequent subgraphs over time
268
![Page 269: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/269.jpg)
Contraband; Online and Offline
o Contraband are illegally obtained items
constituting a parallel or shadow economy
which evade regulation or taxation.
269
o Governments try to interrupt such exchanges especially
when they involve dangerous items like weapons and
drugs
o Extremely difficult to obtain data about contraband. Thus
analysis is limited
o Online analogues of such behaviors offer the possibility
of analyzing such behaviors and closing this gap
![Page 270: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/270.jpg)
Research Questions
oWhat do the trade networks of gold
farmers & normal players look like?
270
oDo gold farmers exhibit distinctive
behavioral patterns for buying and
selling items?
oCan we use contraband
networks to catch gold farmers?
oWhat are the characteristics of
contraband networks in MMOs?
![Page 271: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/271.jpg)
271
Consignment Trade in EverQuest II
o Gold farmer trading activity is a significant fraction of the trading activity
and then it significantly decline
o Possible Explanation: (i) Real decline in gold farming activity (ii) Gold
farmers change their strategies to evade detection
![Page 272: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/272.jpg)
272
Consignment Trade in EverQuest II
![Page 273: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/273.jpg)
273
Gold Farmer Trading Items
• Even though the trade volume of gold farmers decreases over
time, the total number of unique items traded by them
remains constant
• This implies that there is a subset of items that gold farmers
are interested in buying and selling
• Specialized items have a low trade volume
![Page 274: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/274.jpg)
Frequent Pattern Mining for Contraband and Trade
Network
oMain Idea: There certain behaviors and/or
activities that are associated more with gold
farmers as compared to other people
oOnce identified these can be used as feature
sets to build models to classify people as
deviant vs. non-deviant
oWe analyze the social
networks as well as the
item-usage networks of gold farmers
![Page 275: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/275.jpg)
Item-Projection Networks and FPM Framework
oTwo mode network of players and items sold
oProjection network of items: An edge is
created between two items if they have been
traded by the same person
oWhat are the items which are traded by gold
farmers as compared to others?
Item A
Item B
Player XItem A
Item B
![Page 276: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/276.jpg)
Frequent Items Associated with Gold Farmers
![Page 277: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/277.jpg)
Characteristics of Items associated with Gold Farmers
o With normal player a range of items types are associated with buying a
selling activity
o Surprisingly not only are certain items associated with gold farmer
selling activity but also buying activity
o The items that gold farmers buy are usually low end items (i) Used for
crafting (ii) Cornering the market?
o The items that are sold by gold
farmers are usually high end items
and in many cases almost
exclusively sold by them
![Page 278: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/278.jpg)
Construction of Item-Networks
o Apply standard frequent-pattern mining techniques (Apriori, FP-Tree) to
determine frequently traded items
o For all the items which occur frequently create an edge between them
o For items which are sold in different transactions by the same people then
also form an edge between them
![Page 279: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/279.jpg)
Examples of Extracted Item Network
![Page 280: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/280.jpg)
Frequent Subgraphs as features
oMain Idea: In addition to the features based on
player characteristics use sub-graphs as
features
![Page 281: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/281.jpg)
Prediction Models
Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s
character in the game e.g., character race, character gender, distribution of gaming activities etc. These
are the same features which were used by Ahmad et al [SocCom‟09].
Model 2 (Item Based Features): These are the features which are derived from items bought and sold
from the consignment network. These features are based on the frequency of the frequent items sold or
bought by gold farmers.
Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.
Model 4 (Item Network Based Features): Features which are derived from the item network in a
manner analogous to Model 2.
Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1
and Model 4.
Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and
Model 4.
Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described
above.
![Page 282: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/282.jpg)
Results
A set of standard machine learning models are used: Naive Bayes, Bayes Net,
Logistic Regression, KNN, J48, JRip, AdaBoost and SMO
![Page 283: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/283.jpg)
283
Conclusion
o Described a phenomenon analogous to contraband in the offline world
o Analysis of gold farmer item networks reveal that they exhibit characteristics
which are different from that of normal players
o Items that gold farmers buy are usually low end items and items that they sell
are often high end items
o One can use frequent items and networks
of such items associated with gold farmers to build classifiers which can be
used to catch gold farmers
![Page 284: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/284.jpg)
Future work
Association rule learning for temporal patterns bursts of
activity predict gold farming?
Statistical modeling of sudden changes and system
responses in network over time
Expanding hypergraph approaches to representing
complex relationships
Comparative analysis against EVE Online, other games
with similar “gold farmers get banned” regimes
![Page 285: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/285.jpg)
Some ethical quandaries
Euphemisms abound: “removing” links and nodes Should scholars be engaged in “destructive science”?
Clandestine network analysis as dual use technology “Terrorist vs. freedom-fighter”
Used for good (MENA revolutions), evil (al-Qaeda), unclear (Wikileaks)
Legal dimensions of information theory and methods Different assumptions in model & methods foreground different suspects
Minimize false positives or false negatives? Maximize true positives or true
negatives?
![Page 286: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/286.jpg)
Legal questions
If moral, ethical, and legal boundaries that constrain
behavior are indistinct or unenforceable in virtual world,
who defines regulations to define and curb excesses? Some normative rules hard-wired into code, others permitted by code
[Lastowka & Hunter 2006; Grimmelmann 2006; Ondrejka 2006]
From false positives to due process Responsibility to disclose proprietary methodological approaches?
Heightened or different burden of proof given superabundance of data?
Expectations of privacy?
Demonstrating intent?
Right to representation and due process?
![Page 287: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/287.jpg)
Katana:
A Game Analytics Engine
![Page 288: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/288.jpg)
![Page 289: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/289.jpg)
MMOG/VW
Game Knowledge
Game logs @ partner
3rd Party Custom
Data/Axciom/TRW
Analysis Engine
HADOOP – Cloudera ReleaseRDB - MSSQL Server
MMOG Vendor
Actionable Insights
Free Websites
AnalyticsArchitecture
Data Piped from partner
289
![Page 290: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/290.jpg)
Game
Data
290
Sony
CR3
…
Data
Modeling
Data
Warehouse
Hadoop Cluster• 4 Units of Dell PowerEdge R510 (rack server)• Each with 14 TB hard drive, 12 GB memory, 2 CPU x 8 parallelization = 16 CPUs
Data
Mart
Domain
Knowledge
• Data cleaning• Data transformation• Normalization• Loading
MS SQL Server 2008• Dell Precision 1500 (Workstation)• 1.5 TB hard drive, 4 GB memory, 2 Dual core CPUs
Katana Analytics Engine (UI)
Analytics Engines
Standalone Java applications
Churn
Analysis
Gold
Farming
Network
Value
Terabytes
•Alienware workstations• 2 TB hard drive, 8 GB memory, 2 Dual core CPUs
• Java applet embedded in web browser• Stand-alone deployable Java appletScheduled
Pipeline Flow
Analytics Pipeline
![Page 291: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/291.jpg)
![Page 292: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/292.jpg)
![Page 293: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/293.jpg)
![Page 294: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/294.jpg)
![Page 295: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/295.jpg)
![Page 296: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/296.jpg)
Google: Aspiring to know all
aspects of your social life
![Page 297: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/297.jpg)
Social Networks
Social Multimedia
Networks
Text-based Conversation
Networks
![Page 298: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/298.jpg)
Data Collected, Benefits, Impact Data collected
Activity logs: Clicks, page visits, chats, friends, uploads,
downloads, tags, comments, responses, joining and leaving of
communities, people friended, people blocked, ads clicked on,
ads ignored, frequency of logging in.
Network Logs: What are people‟s activities when I tag,
comment, respond, stay idle, post, recommend and ignore?
Basically: „Too much data about you‟
Benefits
Advertising: Easiest way to spread an infection by tapping key
nodes in Google's network
Sentiment Analysis: Detect how your product or service is
doing
Etc., etc., etc., etc., basically too many to list
Impact
Tremendous
Also very scary!
![Page 299: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/299.jpg)
Facebook: The operating
system for your social life,
and more
![Page 300: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/300.jpg)
> 950 million users for Facebook
>14% of humanity!!
>35% of all who have computers!!!
> 50% of Facebook users log on every day
(http://www.facebook.com/press/info.php?statistics)
spending an average of 14 minutes per day
(http://mashable.com/2010/02/16/facebook-nielsen-stats/)
Ultimate social data, no end to what can be done with it
No wonder FB is being pegged at $100+ billion IPO
They know everything
So even if Google doesn‟t scare you, Facebook should!
![Page 301: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/301.jpg)
Impact of new instrumentation on science
1950s
Invention of the electron microscope fundamentally changed
chemistry from „playing with colored liquids in a lab‟ to „truly
understanding what‟s going on‟
1970s
Invention of gene sequencing fundamentally changed biology from
a qualitative field to a quantitative field
1980s
Deployment of the Hubble (and other) Space telescopes has had
fundamental impact on astronomy and astrophysics
2000s
Massive adoption is fundamentally changing social science
research
Massively Multiplayer Online Games (MMOGs) and Virtual Worlds
(VWs) are acting as „macroscopes of human behavior‟
![Page 302: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/302.jpg)
Some leading edge US research
programs we are participating in
Biggest
US Army‟s Network Science Collaborative Technology Alliance
Led by BBN, with over 25 institutions and 150 researchers
Approximately $200 million over a 10 year period (2009 – 2019)
http://www.ns-cta.org/ns-cta-blog/
A number of DARPA programs
Social Media in Strategic Communication (SMISC)
Started in November 2011
http://www.darpa.mil/Our_Work/I2O/Programs/Social_Media_in_Strategic_Comm
unication_(SMISC).aspx
Graph-Theoretic Research in Algorithms and PHenomenology of Social
Networks (GRAPHS)
Started in March 2012
http://www.darpa.mil/Our_Work/DSO/Programs/Graph-
theoretic_Research_in_Algorithms_and_the_Phenomenology_of_Social_Network
s_%28GRAPHS%29.aspx
![Page 303: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/303.jpg)
Summary – The Big Picture
Converging trends
Rapid increase in the usage of the Internet/Web
increased amount of interactions on line
huge amount of socialization on line
Increase in resolution and deployment of data collection „probes‟, e.g. GPS,
cell phone/PDA, wireless enabled laptop, RFID tags, …
increased ability to monitor and record interactions at a really fine
granularity
Dramatic increase in storage capacity and decrease in storage costs
feasible to store all the data collected
Fundamental advances in computational methods for data analytics
Becoming possible to really understand individual and group
behavior at a fine granularity
Great opportunities for
Basic R&D
Applied R&D
Entrepreneurship
But, putting together the right team and partnerships is critical!
![Page 304: Jaideep](https://reader035.vdocument.in/reader035/viewer/2022081801/5551355bb4c905b3598b529f/html5/thumbnails/304.jpg)
and last,
but certainly not the least
- thank you for your invitation