jaideep

304
4/9/2013 University of Minnesota 1 Social Analytics Mining Behaviors of a Connected World PAKDD School April 11, 2013 Sydney, Ausytralia Jaideep Srivastava University of Minnesota [email protected]

Upload: xinhua-zhu

Post on 12-May-2015

336 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Jaideep

4/9/2013 University of Minnesota 1

Social AnalyticsMining Behaviors of a Connected

World

PAKDD SchoolApril 11, 2013

Sydney, Ausytralia

Jaideep Srivastava

University of Minnesota

[email protected]

Page 2: Jaideep

4/9/2013 University of Minnesota 2

Course Outline

• Module 1• Introduction to Social Analytics – applying data mining to social computing

systems; examples of a number of social computing systems, e.g.

FaceBook, MMO games, etc.

• Module 2 • Computational online trust

• Identifying key influencers

• Information flow in networks

• Module 3• Analysis of clandestine networks

• Katana - game analytics engine

Page 3: Jaideep

Part I: Background

Page 4: Jaideep

Social Network Analysis

Social science networks have widespread application in various fields

Most of the analyses techniques have come from Sociology, Statistics and Mathematics

See (Wasserman and Faust, 1994) for a comprehensive introduction to social network analysis

12/02/06 IEEE ICDM 2006 4

Epidemiology

Sociology

Organizational

Theory

Social

PsychologyAnthropology

Socio-Cognitive

Networks

Cognitive

Knowledge

Networks

Social

Networks

Knowledge

Networks

Perception

Reality

Acquaintance

(links)

Knowledge

(content)Epidemiology

Sociology

Organizational

Theory

Social

PsychologyAnthropology

Socio-Cognitive

Networks

Cognitive

Knowledge

Networks

Social

Networks

Knowledge

Networks

Perception

Reality

Acquaintance

(links)

Knowledge

(content)

Socio-Cognitive

Networks

Cognitive

Knowledge

Networks

Social

Networks

Knowledge

Networks

Socio-Cognitive

Networks

Cognitive

Knowledge

Networks

Social

Networks

Knowledge

Networks

Perception

Reality

Acquaintance

(links)

Knowledge

(content)

Page 5: Jaideep

What have been it‟s key scientific successes?

In classical social sciences numerous results

„Six degree of separation‟ [Milgram]

Popularized by the „Kevin Bacon game‟

„The strength of weak ties‟ [Granovetter]

„Online networks as social networks‟ [Wellman, Krackhardt]

„Dunbar Number‟

Various types of centrality measures

Etc.

In the Web era

„The Bow-Tie model of the Web‟ [Raghavan]

„Preferential attachment model‟ [Barabasi] (Yes and No)

„Powerlaw of degree distribution‟ [Lots of people] (NO!)

Etc.

Page 6: Jaideep

Application successes

Numerous in social sciences

Google – PageRank

LinkedIn – expanding your Cognitive Social Network

making you aware that „you‟re more connected and closer than you

think you are‟

Expertise discovery in organizations

Knowledge experts, „authorities‟

Well-connected individuals, „hubs‟

Rapid-response teams in emergency management

Information flow in organizations

Twitter – real time information dissemination

Etc.

Page 7: Jaideep

Online (Multiplayer) Games

How enagaging

How

social

Low

Low

High

High

Page 8: Jaideep

Player Behavior & Revenue Model

Blizzard (subscription)

World of Warcraft

12 million subscribers

Revenue model

$15/month

Approx $3billion annual

revenue

4 hours a day, 7 days a

week!

Zynga (free2play)

Farmville, Fishville, Mafia

Wars, etc.

180 million players

Revenue model

Virtual goods

$700 million in 2010

0.5 hrs a day, 7 days a week

Hard core gamers Everyone

Less socially acceptable More socially acceptable

Like Cocaine Like Caffeine

Page 9: Jaideep

Implications of this „addiction‟

3 billion hours a week are being spent playing online

games

Jane McGonigal in “Reality is Broken”

Labor economics

What is the impact of so much labor being removed from the pool

[Castranova]

Entertainment economics

If MMO players can get 100 hrs/month of entertainment by spending

$25 or so, what will happen to other entertainment industries?

Psychological/Sociological

Is it an addiction – the prevailing view (Chinese government‟s „detox

centers‟ for kids)

Are they fulfilling a deeper need that real world is not (McGonigal)

Societal

A trend far too important to not be taken seriously!

Page 10: Jaideep

Business Example

Page 11: Jaideep

Levis‟ – Example of Social Retail

Levis‟ leverages its brand to ensure customers provide their social

network

Levis‟ can leverage predictive social analytics technology to understand

the value of the customer‟s social network

11

Page 12: Jaideep

Ninja Metrics confidential information. Copyright 2012

Opportunity, Innovation, Impact

Companies do not understand the social graph of their customers

It‟s not just about how they relate to their customers, but also about

how customers relate to each other

Understanding these relationships unlocks immense value

Innovation: Understanding the social network of customers

Key influencers, relationship strength, …

Impact: Deriving actionable insights from this understanding

Customer acquisition, retention, customer care, …

Social recommendation, influence-based marketing, identifying trend-setters, …

12

vs.

Page 13: Jaideep

13

Unlocking true value by product, category, or store

-$79.63

-$128.61

-$293.79

0021

Page 14: Jaideep

True Value of each customer

True value = individual value + social value

Who really matters, and to what degree

Some empirical facts

31% activity due to socialization

23% more individual + 8% more social activity

14

The individual‟s

lifetime value

their social

influence

and their true

total

Page 15: Jaideep

Impact of New Instrumentation on Science

1950s

Invention of the electron microscope fundamentally changed

chemistry from „playing with colored liquids in a lab‟ to „truly

understanding what‟s going on‟

1970s

Invention of gene sequencing fundamentally changed biology from

a qualitative field to a quantitative field

1980s

Deployment of the Hubble (and other) Space telescopes has had

fundamental impact on astronomy and astrophysics

2000s

Massive adoption is fundamentally changing social science

research

Massively Multiplayer Online Games (MMOGs) and Virtual Worlds

(VWs) are acting as „macroscopes of human behavior‟

Page 16: Jaideep

The Virtual World

Observatory (VWO) Project

• Four PIs, 30+ Post-docs, PhD and MS students, UGs, high-schoolers

• Noshir Contractor, Northwestern: Networks

• M. Scott Poole, Illinois Urbana-Champaign/NCSA: Groups

• Jaideep Srivastava, Minnesota: Computer Science

• Dmitri Williams, USC: Social Psychology

• Collaborators

• Castronova (Sociology, Indiana), Yee (Xerox PARC), Consalvo, Caplan

(Economics, Delaware), Burt (Sociology, U of Chicago), Adamic (Info Sci,

Michigan), …

• Data and technology partners

• Sony (EverQuest 2), Linden Labs (2nd Life), Bungie (Halo3), Kingsoft

(Chevalier‟s Romance), others …

• Cloudera Systems (Hadoop), Microsoft (SQL Server), Weka, …

Page 17: Jaideep

Overall Goals of the VWO Project

Sticky Social

Media

• Games

• Virtual Worlds

• Other social

apps

Analytics

• Social science

models

• New algorithms

• Ultra-scalability

tons of

data

Basic Science

• behavior, socialization, …

• novel, scalable algorithms

• NSF

Government applications

• team dynamics (Army)

• skills acq, leadership (Army)

• social influence & adolescent

health (CDC)

• etc.

Business applications

• customer churn

• game design

• anti-social behavior

• etc.

Page 18: Jaideep

Collaborators, Sponsors, Partners

Team

Financial Sponsors

Data Partners

Technology Partners

Page 19: Jaideep

Part II – Impact on Science

Page 20: Jaideep

Findings from a Player Survey

Page 21: Jaideep

Who is playing?

It is not just a bunch of kids

Average age is 31.16 (US population median is 35)

More players in their 30s than in their 20s.

Page 22: Jaideep

How much do they play?

Mean is 25.86

hours/week

Compares to

US mean of

31.5 for TV

(Hu et al,

2001)

• From prior experimental work, MMO play eats into entertainment TV and going out, not news

• So much for kids being the ones with the free time.

Page 23: Jaideep

Gender Differences More men players (78/22%)

Men played to compete , and women played to socialize

Men play more other games, but it was the women who were more

satisfied EQ2 players

Women: 29.32 hours/week

Men: 25.03 hours/week

Likelihood of quitting: “no plans to quit”:

women 48.66%, men 35.08%

Self reported play times

Women: 26.03 (3 hours less than actual)

Men: 24.10 (1 hour less than actual)

Boys and girls are socialized early on, and thus have clear role expectations for their behaviors and identities (Gender Role Theory in action!!)

Page 24: Jaideep

Playing with a partner

Page 25: Jaideep

Inferring RW gender from VW data

Goal

What virtual world behaviors and characteristics predict real world gender?

Data:

Survey Data n=7119

Survey Character Store

EQ2 Character Store

Variables

Avatar Characteristics:

Gender, Race, Class, Experience, Guild Rank, Alignment, Archetypes

Game play Behaviors:

Total Deaths & Quests, PvP Kills & Deaths, Achievement Points, Number of Characters, Time played, Communication patterns

25

Page 26: Jaideep

Gender prediction results

Close to 95% prediction accuracy Decision trees work rather well

Character Gender, Race and Class are significant predictors to real life gender.

Gender swapping is rarer, but systematically different by real gender

Players tend to choose:

character gender based on their real life gender

character races that are gendered: women play elves/men play barbarians

classes that are gendered: women play priests/men play fighters

26

Page 27: Jaideep

Gender swapping behavior

27

Game Character

Gender

Real Gender

Male Female Total

Male 4065 82.6% 98 8.2% 4163 68.0%

Female 855 17.4% 1104 91.8% 1959 32.0%

Total 4920 100.0% 1202 100.0% 6122 100.0%

Observation• Far more males gender swap than females

• Why?• Men are more creative?

• Women have less identity confusion?

• Women get their „fill of gender swapping in real life‟

Page 28: Jaideep

Economics: A test of RW VW

mapping

Do players behave in virtual worlds as we expect

them to in the actual world?

Economics is an obvious dimension to test

In the real world, perfect aggregate data are hard to

get

Page 29: Jaideep

GDP and Price Level

GDP and price levels are robust but comparatively unstable

GDP and Prices on Antonia Bayle

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

4,500,000

5,000,000

January February March April May

GD

P (

Go

ld)

0

20

40

60

80

100

120

140

160

Pri

ces (

Jan

uary

= 1

00)

Nominal GDP Price Level (January = 100)

Page 30: Jaideep

Money Supply and Price

The instability is explicable through the Quantity Theory of Money

a rapid influx of money . . .

. . . dramatically boosted prices

More evidence that this behaves like a real economy

Change in Money Supply and Population on Antonia Bayle

0

500

1000

1500

2000

2500

February March April May

Cha

nge

in M

oney

(000

Gol

d)

-5000

-4000

-3000

-2000

-1000

0

1000

2000

3000

4000

Cha

nge

in A

ccou

nts

Change in Money Supply (000 Gold) Change in Active Accounts

-20

-10

0

10

20

30

40

50

February March April May June

Perc

en

t C

han

ge i

n P

rice L

ev

el

Price Level on Antonia Bayle

Price Level

Page 31: Jaideep

Networks in Virtual Worlds

SONIC

Advancing the Science of Networks in Communities

Page 32: Jaideep

Why do we create and sustain

networks?

Theories of self-interest

Theories of social and resource exchange

Theories of mutual interest and collective action

Theories of contagion

Theories of balance

Theories of homophily

Theories of proximity

Theories of co-evolution

Sources: Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses

about organizational networks: An analytic framework and empirical example. Academy of Management Review.

Monge, P. R. & Contractor, N. S. (2003). Theories of Communication Networks. New York: Oxford University Press.

SONIC

Advancing the Science of Networks in Communities

Page 33: Jaideep

B F

C

A

E

D

-

+

B F

C

A

E

D

- +

B F

C

A

E

D

+

F

E

D

B

C

A

- +

Novice Expert

“Structural signatures” of Social Theories

Self interest Exchange

Collective Action

Balance

Homophily Contagion

SONIC

Advancing the Science of Networks in Communities

Page 34: Jaideep

Partnership

Trade Mail

Instant messaging

Black: maleRed: female

SONIC

Advancing the Science of Networks in Communities

Page 35: Jaideep

Social Networks

as network structure frequency

vectors in a bag-of-words model

0

0

1

1

0

0

0

4

0

1

2

3

1

0

0

5

(1)

(2)

(n)

(1) (2) (n)

0

0

0

0

0

0

0

2

Cluster Structure

Vectors using

Text clustering

methods

Cluster means provide

modes of network

structure configurations

Making up all the

social networks

Attribute values for

each cluster can be

used to discover

trends between network

structures and attributes

Social theory + IR

based network

analysis

Page 36: Jaideep

Results – normalized network structure

vector means for all clusters

Page 37: Jaideep

Results Clusters 1 and 4 are similar

Groups kill fewer monsters

Group members in cluster 4 do not communicate much

Group members in cluster 1 generally limit their communication to just one other person in the group

Most people belong to these two clusters

Consistent with previous research - users in virtual environments are

less likely to interact with strangers

[N. Ducheneaut, N. Yee, E. Nickell and R. Moore, “Alone Together?” Exploring

the social dynamics of massively multiplayer online games, Proceedings

CHI06, ACM Press, New York, 407-416.]

Cluster 5 groups have many 1-edge and 2-out stars Most of the communication is one way possibly indicating presence of

central people

Maximum number of monsters killed out of all clusters

Performance of the groups is very good

Minimal communication

It is possible that cluster 5 consists of groups more focused on playing and performing well in the game and less on socializing

Page 38: Jaideep

Some Open Questions

How similar/dissimilar are online social networks from real-

world social networks?

Is online socialization

Only a sustainability activity of real-world networks?

Causing new social networks to be formed?

Is fundamentally a different type of networking activity?

A fundamental tenet of socialization has been

“geography/proximity drives socialization”

How is this being impacted by online socialization?

Page 39: Jaideep

TeamSkill: Modeling Team

Chemistry in Online Multi-

Player Games

Page 40: Jaideep

Description and motivation

The goal of this work is to improve skill assessment approaches used in multiplayer games, especially team-based games

Xbox Live, PSN, Steam, and Battle.net have between 149-167 million total users, collectively (and that number is growing) Many of the most popular games are team-based: Halo, Call of Duty,

TF2, CS, etc

Why? Better skill assessment = fairer games = less player attrition

More accurate rankings of players/teams

Most previous work has focused on individuals, not teams

It‟s a hard problem Online setting: Updates to a player‟s skill distribution must be done

after each game is played

Applicability: Any generalized assessment technique cannot include game-specific data

Our datasets are from the games of professional Halo 3 players

40

Page 41: Jaideep

What is Halo?

Halo is one of the most well-known “first-person shooter” video game series in the world

Online/LAN-based multiplayer is its most significant component

650,000 to 850,000 unique users per day at its peak

41

Page 42: Jaideep

Our work focuses on professional Halo 3 players

Online scrimmage data, as well as complete tournament data, is readily available from bungie.net and mlgpro.com

Players are highly-skilled individually – allows us to better focus on group-level performance characteristics

Players change teams regularly, helps isolate impact of particular players on overall team performance

Unexplored boundary case for skill assessment

What is Major League Gaming (MLG)?

• A professional league for competitive gaming, including Halo 3 from ‟08-‟10

• Best players in the world compete in this league

• Last tournament watched by over 1,000,000 people online

• Our datasets are comprised of games played by these players

Major League Gaming

42

Page 43: Jaideep

An old problem (with different applications in different contexts)

Paired comparison estimation Foundational work by Thurstone (1927) and Bradley-Terry (1952)

Elo (1959), popularized in chess ranking (FIDE, USCF)

Glicko (1993) – player-level ratings volatility incorporated (σ2), addition of rating periods

TrueSkill (2006) - factor-graph based approach used in Microsoft‟s Xbox Live gaming service Used to match players/teams up with each other online

Our work focuses on Elo, Glicko, and TrueSkill

Skill assessment

43

Page 44: Jaideep

Elo (1959)

Arpad Elo, Dept of Physics, Marquette University

Was a master chess player in USCF

Proposed the Elo Rating System

Replaced earlier systems of competitive rewards (i.e.,

tournaments) in USCF/FIDE

Simple to implement: assumes player skill is normally-

distributed with constant variance β2

Still widely-used today

44

Page 45: Jaideep

Glicko (1993)

Mark E. Glickman, Department of Health Policy and Management , Boston University

Proposed the Glicko Rating System Addresses rating reliability issue through the use of

rating periods and player-specific variances

Elo is a special case of Glicko

Iterative approach for approximating the marginal posterior distribution of a player‟s skill conditional on other players‟ priors More computationally tractable for large datasets

45

Page 46: Jaideep

TrueSkill (2006)

Ralf Herbrich and ThoreGraepel at Microsoft Research, Cambridge, UK

Used for automated ranking/matchmaking on Xbox Live

Uses factor graphs to model multi-player, multi-team environments

Converges quickly (~5 games for a player)

Large-scale deployment 30 million Xbox Live members

150+ games use TrueSkill for ranking & matchmaking

46

Page 47: Jaideep

Issues with current approaches

Basic idea

Given: skill ratings of each team member, i.e., si ~ N(μi, σi2)

Sum across all team members

Not intuitive: Team chemistry is a well-known concept in team-based competition [Martens 1987, Yukelson 1997] and it is not captured in any of these models

Can think of it as the overall dynamics of a team resulting from leadership, confidence, relationships, and mutual trust

Independence assumption not realistic in teams, especially at high levels of play

1234

1 2 3 4

+

47

Page 48: Jaideep

More information is available, however…

Observation: We have more information than just the history of individual players – we also know the histories of groups of players

Player 1‟s history ∩ player 2‟s history history of {1, 2}

Existing approaches only make use of top row

Idea: Estimate the skills of subgroups of players on a team, combine in some way, and use to produce better estimate of a team‟s skill

1234

1 2 3 4

123 124 134 234

14 2312 13 24 34

48

Page 49: Jaideep

Reframing the assessment

problem

Alterations to existing approaches necessary (i.e., Elo/Glicko/TrueSkill)

Player-level skill representation generalized subgroup skill representation

For each game and for each k <= size of the team (K), treat each group as you would an individual player and update skill accordingly

Hashing of skill variable matrix according to unordered subgroup membership

Treat Elo/Glicko/TrueSkill as „base learners‟, a la boosting

Each rating says something about the skill of that particular subgroup

…but how do we aggregate these ratings to estimate the skill of a team?

k=4

k=3

k=2

k=1

1234

1 2 3 4

123 124 134 234

14 2312 13 24 34

49

Page 50: Jaideep

TeamSkill

Four different aggregation approaches:

TeamSkill-K

TeamSkill-AllK

TeamSkill-AllK-EV

TeamSkill-AllK-LS

50

Page 51: Jaideep

Aggregation issues

Real world case: assume that players can leave/join the 4-player team and look at the timeline

The question at each point in time, t: how best to combine the available group ratings to produce a team rating?

The problem: the feature space is expanding and contracting over time.

t = 1 t = 100 t = 200time:

After 1 game New player – 5 - who‟s

never played with 1, 2, or 3New player – 6 – who‟s

played with 3 or 5 (not both)

black = history available red = no history available

51

Page 52: Jaideep

Data set overview

Collected over the course of 2009

7,590 games (2,076 from tournaments and 5,514 from Xbox Live scrimmages)

448 players on 140 different professional and semi-professional teams

Games took place during January 2008 through January 2010

Websites pertaining to this data http://stats.halofit.org - Player/team

statistics

http://halofit.org – Datasets and related information

52

“Friend or foe” social network for all tournaments in 2008 and

2009

Page 53: Jaideep

Evaluation The TeamSkill approaches were evaluated by predicting the outcomes

of games occurring prior to 10 MLG tournaments and comparing their

accuracy to unaltered versions (k = 1) of their base learner rating

systems - Elo, Glicko, and TrueSkill (for TeamSkill-K, all possible

choices of k for teams of 4, 1 ≤ k ≤ 4, were used)

For each tournament, we evaluated each rating approach using:

3 types of training data sets - games consisting only of previous tournament data,

games from online scrimmages only, and games of both types.

3 periods of game history - all data except for the data between the test tournament

and the one preceding it (“long”), all data between the test tournament and the one

preceding it (“recent”), and all data before the test tournament (“complete”).

We will only show “complete” as results for “long” and “recent” mirrored those for “complete”

2 types of games - the full dataset and those games considered “close” (i.e., prior

probability of one team winning close to 50%).

53

Page 54: Jaideep

Results – all games

Prediction accuracy for both tournament and scrimmage/custom games using complete history

Prediction accuracy for tournament games using complete history

Prediction accuracy for scrimmage/custom games using complete history54

Page 55: Jaideep

Results – “close” games

Prediction accuracy for both tournament and scrimmage/custom games using complete history

Prediction accuracy for tournament games using complete history

Prediction accuracy for scrimmage/custom games using complete history55

Page 56: Jaideep

Conclusions

Modified several existing assessment approaches to

operate on generalized player subgroup entities (instead of

just individual players)

Introduced four aggregation methods for combining player

subgroup information to produce final forecast

Shown evidence that close games are decided on the

basis of team chemistry, consistent with sports psychology

research

56

Page 57: Jaideep

Part III: Impact on Business

Page 58: Jaideep

Player Churn Prediction

Page 59: Jaideep

Time Equals Stickiness

There are less quitters as the levels go up, and focus

should be on the first 20 levels.

0.00

1.00

2.00

3.00

4.00

5.00

6.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69

Rat

io o

f Q

uitt

ers

to S

taye

rs

Character Level

6

Page 60: Jaideep

Solo vs. Social Players

Isolated players are 3.5x more likely to quit (B = 1.26, p<.001).

Focus design on facilitating social interaction.

0.00

1.00

2.00

3.00

4.00

5.00

6.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69

Solo

Social

Rat

io o

f Q

uitt

ers

to S

taye

rs

Character Level

Page 61: Jaideep

Problem Statement & Approach

Objective: At time t, for player p, given a window w, compute

the probability, P(p,t,w), of p churning within the interval

(t,t+w), and the confidence in this probability.

Approach: Estimate P(p,t,w) and its confidence using

Data about p‟s activity and socialization behavior

Statistics and machine learning (linear, non-linear, ensemble, etc.)

Use of socio-psychological theories of player motivation

Novel synthesis of data driven (DD) and theory driven (TD) approaches

61

Page 62: Jaideep

Player Motivation Theories

62

Nick Yee, 2005

Richard Bartle, 1996

Page 63: Jaideep

Mapping MMO Behaviors to Theory

Example MMORPG

Persistent virtual world

Players have avatars

Quest-driven

Participation in groups,

guilds

Dataset (game logs)

Time period: FEB-JUN, 2006

No. of accounts: ~16000

Churner definition: 2 months of inactivity

63

Page 64: Jaideep

Synthesis of TD and DD approach

64

Page 65: Jaideep

Model Evaluation (Confusion Matrix)

Model performance for different features (10 fold cross-validation)

Observations Decision tree learning works quite well

DD model very accurate (76%); difficult to interpret (485 nodes)

TD model interpretable (7 – 59 nodes), but less accurate

Achievement-orientation more important than socialization

65

Features Tree size (nodes)

Precision (%) Recall (%) F-measure (%)

Pure DD 485 69.3 84 76

TD:Achievement+Social

59 67.1 76.2 71.3

TD: Achievement 37 67.2 73.2 70.1

TD: Socialization 7 64.3 55.4 59.5

Page 66: Jaideep

Model Evaluation (Lift Chart)

Observations

TD model does better in predicting top quintile of churners

DD model performs better in the 40%-70% range

66

Page 67: Jaideep

Ensemble Approach – Model

Evaluation

Ensemble approach

A „committee of classifiers‟ votes on the result

Heterogeneity helps

Improvements over single model Best recall value shows 7.94% improvement , i.e. model

can identify larger proportion of potential churners)

Best F-measure shows 1.25% improvement

67

No. of clusters

Precision Recall F-measure

5 66.23 91.94 76.99

10 66.49 91.08 76.87

15 67.58 89.80 77.12

20 67.6 90.13 77.25

Page 68: Jaideep

Conclusions

Comparison of theory-driven and data-driven model in

terms of prediction accuracy and model interpretability

Achievement-orientation is more important than

socialization-orientation in identifying potential churners

Ensemble model can identify larger proportion of

potential churners as compared to single global model

68

Page 69: Jaideep

4/9/2013 University of Minnesota 69

Course Outline

• Module 1• Introduction to Social Analytics – applying data mining to social computing

systems; examples of a number of social computing systems, e.g.

FaceBook, MMO games, etc.

• Module 2 • Computational online trust

• Identifying key influencers

• Module 3• Analysis of clandestine networks

• Katana - game analytics engine

Page 70: Jaideep

Computational Trust in

Multiplayer Online

Games

Page 71: Jaideep

“If you want to go fast walk alone, if you

want to go far then walk with a group.”- Proverb from Ghana

Page 72: Jaideep

Virtual Worlds & Massive Online

Games

Massively Multiplayer Online Role

Playing Games

(MMORPGs/MMOs) Simulated Environments like

SecondLife

Millions of people can interact with one another is shared virtual environment

People can engage in a large number of activities with one another and with the environment

Many of the observed behaviors have offline analogs

7272

Page 73: Jaideep

Big Picture Questions

How is trust expressed differently in different social

contexts?

Cooperative (PvE), Adversarial (PvP), …

How is trust expressed in different types of social

networks?

Housing, Mentoring, Trade, Group, …

What are the characteristics of trust and related networks

in MMOs?

Similarities and differences with social networks in other domains

e.g., citation networks, co-authorship networks

What role can features derived from the trust network play

in prediction tasks e.g., link prediction (formation,

breakage, change), trust propensity, success prediction

73

Page 74: Jaideep

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Trust as a multi-modal multi-level

network phenomenonNetwork formulations of Traditional

Trust related concepts

Incorporating social science theories

in trust related prediction task

Generative Network Models for Trust based

social interactions

Trust as a Multi-Level Network Phenomenon

Page 75: Jaideep

Computational Trust in Multiplayer Online Games

Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade

Different Social environments result in

differences in network signatures

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

No Honor Amongst Thieves

Department of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Most types of homophily do not

carry over to the MMO domain

Trust as a multi-modal multi-level

network phenomenon

Social Characteristics of Trust

Network formulations of Traditional

Trust related concepts

Incorporating social science theories

in trust related prediction task

Generative Network Models for Trust based

social interactions

Trust as a Multi-Level Network Phenomenon

Page 76: Jaideep

Computational Trust in Multiplayer Online Games

Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Item Recommendation Success Prediction (Social Capital)

Effect of network structure on trustEffects of social environments on trust

Different Social environments result in

differences in network signatures

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

No Honor Amongst Thieves

Department of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Most types of homophily do not

carry over to the MMO domain

Trust as a multi-modal multi-level

network phenomenon

Social Characteristics of Trust

Network formulations of Traditional

Trust related concepts

Incorporating social science theories

in trust related prediction task

Generative Network Models for Trust based

social interactions

Trust and Prediction

Trust as a Multi-Level Network Phenomenon

Page 77: Jaideep

Computational Trust in Multiplayer Online Games

Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across Social networks. Predict trust propensity.

Item Recommendation Success Prediction (Social Capital)

Effect of network structure on trustEffects of social environments on trust

Different Social environments result in

differences in network signatures

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

No Honor Amongst Thieves

Department of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Most types of homophily do not

carry over to the MMO domain

Trust as a multi-modal multi-level

network phenomenon

Social Characteristics of Trust

Network formulations of Traditional

Trust related concepts

Incorporating social science theories

in trust related prediction task

Generative Network Models for Trust based

social interactions

Trust and Prediction

Trust as a Multi-Level Network Phenomenon

Page 78: Jaideep

Trust and Homophily

Most types of homophily do not

carry over to the MMO domain

Computational Social Science

Semantics

Algorithmic

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Structure

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

Applications

Characteristics of Trust Networks

Gold Farmer Detection

No Honor Amongst Thieves

Detection of Clandestine Actors

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Page 79: Jaideep

Trust and Homophily

Most types of homophily do not

carry over to the MMO domain

Computational Social Science

Semantics

Algorithmic

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Structure

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

Applications

Characteristics of Trust Networks

Gold Farmer Detection

No Honor Amongst Thieves

Detection of Clandestine Actors

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Page 80: Jaideep

Housing-Trust in EQ2

• Access permissions to in-game house

as trust relationships

• None: Cannot enter house.

• Visitor: Can enter the house and can

interact with objects in the house.

• Friend: Visitor + move items

• Trustee: Friend + remove items

• Houses can contain also items which

allow sales to other characters without

exchanging on the market

80

Page 81: Jaideep

Homophily and Trust

Homophily: Birds of a feather flock together

There is no one form of homophily and homophily in

general is described in multiple ways: Status vs. Value

Homophily

Each of these homophiles are in turn defined in multiple

ways themselves

Previous literature instantiates homophily in MMOs in

terms of player characteristics and behavior in the game

81

Page 82: Jaideep

Network Models, Homophily and

Trust Networks in MMOs

RQ1: Does homophily in MMOs operate in ways similar to

homophily in the offline world?

RQ2: How do we map characteristics that define

homophily in the offline world to online settings?

82

Page 83: Jaideep

Mapping Homophily in MMOs

In general, studies of homophily in MMOs assume only one type of homophily and generalize based on that type

Even in the offline world homophily is of different types

Hence the necessity of Mapping Homophily which we address here

Mapping and Proteus Effect83

Page 84: Jaideep

Trust and Homophily in MMOs

Homophily Type Hypothesis Observation

H

1

Gender Homophily Players trust other players who

are of the same gender

?

H

2

Age Homophily Players trust other players who

are of the same age cohorts

?

H

3

Class Homophily Players trust other players who

are of the same class

?

H

4

Race Homophily Players trust other players who

are of the same race

?

H

5

Guild Homophily Players trust other players who

belong to the same guild

?

H

6

Level Homophily Players trust other players who

are at a similar level

?

H

7

Challenge

Homophily

Players trust other players who

like similar types of challenges

?

84

Page 85: Jaideep

Key Observations

H1: Players trust other players who are of the same gender?

H2: Players trust other players who are of similar age?

85

H3: Players trust other players who are of the same class?

In general players trust other players who are of

the same gender

The stronger the type of trust the lesser is the age

difference between the people specifying trust

Class does not seem to effect the choice of trusting

others

Page 86: Jaideep

Key Observations

H4: Players trust other players who are of the same race?

H5: Players trust other players who are of the same guild?

86

H6: Players trust other players more who are level at a similar rate?

Race does not seem to effect the choice of trusting

others

In general, the stronger the type of trust, the greater

is the percentage of the people who trust people in

their own guilds

Leveling at the same rate does not seem to greatly

effect trust amongst players

Page 87: Jaideep

Key Observations

H7: Players trust other players who are of the same level?

• Level difference seems to have some effect on trust

• For Trustee (strongest) and the Visitor (weakest) form of trust, the lower level

players are more likely to trust players who are at a higher level

87

Summary:

• Homophily is observed for a subset of types in MMOs as compared to what it is

observed for in the offline world

• The types of homophily which are not observed in MMOs are the ones which are

greatly effected by game mechanics

Page 88: Jaideep

Trust and Homophily

Most types of homophily do not

carry over to the MMO domain

Computational Social Science

Semantics

Algorithmic

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Structure

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

Applications

Characteristics of Trust Networks

Gold Farmer Detection

No Honor Amongst Thieves

Detection of Clandestine Actors

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Page 89: Jaideep

Social Networks: General

Observations There is an extensive literature on characteristics of social

networks (Leskovec PAKDD 2005, Leskovec ICDM 2005,

McGlohon ICDM 2008, McGlohon KDD 2008)

The network exhibits monotonically

shrinking diameter over time (Leskovec PAKDD 2005,

Leskovec ICDM 2005, McGlohon ICDM 2008)

At a certain point in time called the Gelling Point many

smaller connected connect together and become part of

the largest connected component (Leskovec PAKDD 2005,

Leskovec ICDM 2005, McGlohon ICDM 2008)

The largest connected component (LCC) comprises of

the majority of the nodes in the network (>= 80%)

(McGlohon ICDM 2008, McGlohon KDD 2008)

89

Page 90: Jaideep

Social Networks: General

Observations The size of the second and the third largest connected

components remain constant (more or less) even though

the identity of these components change over time

(Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon

ICDM 2008, McGlohon KDD 2008)

Network isolates are few in number (<5%)

(Leskovec PAKDD 2005, Leskovec ICDM 2005)

The number of connected components decreases over

time

(Leskovec PAKDD 2005, Leskovec ICDM 2005, McGlohon

ICDM 2008)

Relatively fast growth of LCC close to the gelling point

(Leskovec PAKDD 2005, Leskovec ICDM 2005)

90

Page 91: Jaideep

Trust Networks in MMOs

Data from 4 servers is available. Results from one server

(Player vs. Environment, „guk‟) are shown

The network consists of 15,237 nodes, 30,686 edges

and 1,476 connected components

Dataset spans from January 2006 to August 2006

Average node degree of 4.03. The size of the three

largest connected components are as follows: 9039, 51

and 49. The largest connected component accounts for

59% of all the nodes in the network

The Trust Network on „guk‟ on August 31, 2006

91

Page 92: Jaideep

Key Observations

Observation 1: Preferential Attachment: The rich get richer but not too richExplanation: Social bandwidth is limited, Dunbar Number

Observation 2: The growth of the LCC is retarded after the gelling pointExplanation: The trust network has a relatively low growth rate as compared to the other networks

Observation 3: Non-monotonic change in the diameter of the largest connected componentExplanation: Players have different levels of activity at various points in time and can also “drop out” of the network if they churn from the game

92

Page 93: Jaideep

Key Observations

Observation 4: A large number of isolate components are

observed (> 1000)

Explanation: People join in groups and spend all the time

playing with one another instead of interacting with people

from the outside

Observation 5: The number of isolate components

increases monotonically over time

Explanation: (Same as observation 4)

Observation 6: Nodes in the non-LCC constitute a

significant portion of the network (41%, 8 months after

gelling point)

Explanation: (Same as observation 4)

93

Page 94: Jaideep

Generative Models of Trust Networks

Time bound Preferential Attachment: The rich get rich

but not so much after a certain point in time. Edge

formation is bound by time

Presence of Auxiliary Components: Isolate components

are added to the network at an almost constant rate over

time

94

Page 95: Jaideep

Generative Models of Trust Networks

(ii)

Non-Monotonic Decrease in the Diameter:

Nodes become inert after a certain point in time. Sample

the lifetime of nodes from a normal distribution

Homophily in Edge formation: Probability of edge

formation dependent upon node degree as well as

agreement (similarity) in node characteristics

95

Page 96: Jaideep

Results

Diameter: Non-Monotonically Changing Diameter

% LCC as being relatively small

96

Page 97: Jaideep

Results

Number of Connected Components

97

Network growth and the gelling point

Page 98: Jaideep

Conclusion: Trust Networks

Trust Networks in MMOs exhibit many properties which are

not exhibited by other social networks in most other

domains

Proposed a model based on observations and domain

knowledge

Models of social networks should incorporate the

peculiarities which are observed in MMOs in general

Generalization? Similar observations have been made for

mentoring networks but not for PvP, Trade and Chat

Networks

98

Page 99: Jaideep

Trust and Homophily

Most types of homophily do not

carry over to the MMO domain

Computational Social Science

Semantics

Algorithmic

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Applications

Gold Farmer Detection

No Honor Amongst Thieves

Detection of Clandestine Actors

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Trust Prediction Family of Problems

Structure

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

Characteristics of Trust Networks

Page 100: Jaideep

Trust Prediction Family of

Problems

Trust Prediction: Given a trust network G predict which

nodes are going to trust one another in the future

Prediction Across Networks: Given a set of actors who participate in multiple types of interactions using features from one network predict the existence of links in the other network

100

Page 101: Jaideep

Trust Prediction Family of

Problems

Machine Learning/Classification approach to the problem

of trust prediction

Introduced a set of new problems (inter-network link

prediction, trust propensity prediction)

Proposed an algorithm for link prediction which used

domain knowledge from social science theories

New Contributions:

Does the social context (adversarial vs. cooperative) effect

prediction results?

If so then what is the implication for generalization?

101

Page 102: Jaideep

Prediction Task

Trust Prediction as a classification problem

60,000 examples for each prediction task

10 Fold Cross-validation

Data from Guk (Cooperative) and Nagafen (Adversarial) Servers

Six Standard Classifiers for Comparison: J48, JRip, AdaBoost, Bayes Network, Naive Bayes and k-nearest neighbor

Positive Example:

Training Period Test Period

Negative Example:

Training Period Test Period

102

Page 103: Jaideep

Prediction: Group Network

103

In general good prediction results are obtained for the combat network (in either direction), except one case

The grouping network is a union of all grouping instances and thus it is extremely dense (1,796,438 edges, 31,900 nodes)

Page 104: Jaideep

Prediction: Group Network

104

Grouping is not a good predictor of mentoring but the vice versa is correct

Mentoring is (more often than not is accompanied by grouping) but grouping can be in a variety of contexts e.g., raids, quests, dungeon instances, …

Page 105: Jaideep

Prediction: Combat Network

In general good prediction results are obtained for the combat network (in either direction)

In general if players have played against another person then they have friends in other networks who have done the same or something similar

105

Page 106: Jaideep

Comparison across Social

Environments

106

T: Trust; M: Mentoring; B: Trade

In general the results are similar for both the

environments with some exceptions

Mentoring-Mentoring: In the adversarial environment a

more cliqueish behavior is observed i.e., friends of

friends are likely to mentor one another in the future

which is not the case for the cooperative environment

Page 107: Jaideep

Comparison across Social

Environments

107

Mentoring-Related Tasks: In general mentoring is

much more prevalent in the adversarial environment as

compared to the cooperative environment (~3 times) and

is much more intense. Overlap between mentoring and

trade is thus more likely

Page 108: Jaideep

Trust and Homophily

Most types of homophily do not

carry over to the MMO domain

Computational Social Science

Semantics

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Structure

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

Characteristics of Trust Networks

Algorithmic

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Applications

Gold Farmer Detection

No Honor Amongst Thieves

Detection of Clandestine Actors

Page 109: Jaideep

Trust Amongst Clandestine Actors

“A plague upon‟t when thieves

cannot be true one to another!”– Sir Falstaff, Henry IV, Part 1, II.ii

Do gold farmers

trust each other?

109

Page 110: Jaideep

Hypergraphs to Represent Tripartite Graphs

110

• Accounts can have several characters

• Houses can be accessed by several characters

• Projecting to one- or two-model data obscures crucial

information about embededdness and paths

• Figure 2a: Can ca31 access the same house as ca11?

• Figure 2b: Are characters all owned by same account?

Page 111: Jaideep

Hypergraphs: Key Concepts

• Hyperedge: An edge between three or

more nodes in a graph. We use three

types of nodes: Character, account and

house

• Node Degree: The number of

hyperedges which are connected to a

node

NDh1 = 3

• Edge Degree: The number of

hyperedges that an edge participates in

EDa1-h1 = 2

111

Page 112: Jaideep

Network Characteristics

112

• Long tail distributions are observed for the various degree distributions

• The mapping from character-house to an account is always unique

• Players who are connected to a large number of houses are highly active

players otherwise as well

Page 113: Jaideep

Characteristics of Hypergraph Projection Networks

• Account Projection: Majority of the gold farmer nodes are isolates

(79%). Affiliates well-connected (8.89) vs. non-affiliates (3.47)

• Character Projection: Majority of the gold farmer nodes are isolates

(84%). Affiliates well-connected (10.42) vs. non-affiliates (3.23)

• House Projection: 521 gold farmer houses. Most are isolates (not

shown) but others are part of complex structures. Densely connected

network with gold farmers (7.56) and affiliates (84.02)

113The Housing Projection Network

Page 114: Jaideep

Key Observations

• Gold farmers grant trust ties less frequently than either

affiliates or general players

• Gold farmers grant and receive fewer housing permissions

(1.82) than their affiliates (4.03) or general player population

(2.73)

114

Total degree In degree Out degree

< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >

Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07

Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70

Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34

Page 115: Jaideep

Key Observations

• No honor among thieves

• Gold farmers also have very low tendency to grant other

gold farmers permission (0.29)

• Affiliates also unlikely to trust other affiliates (0.70)

115

Total degree In degree Out degree

< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >

Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07

Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70

Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34

Page 116: Jaideep

Key Observations

• Affiliates are brokers:

• Farmers trust affiliates more (1.82) than other farmers (0.29)

• Affiliates trust farmers more (1.28) than other affiliates (0.70)

• Non-affiliates have a greater tendency to grant permissions to

affiliates (7.77) than in general (2.73)

116

Total degree In degree Out degree

< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >

Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07

Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70

Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34

Page 117: Jaideep

Frequent Itemset Mining for Frequent Hyper-subgraphs

Support of a Hyper-subgraph: Given a sub-hypergraph of size k,

subP is the pattern of interest containing the label P, shP is a pattern

of the same size as subP and contains the label P, the support is

defined as follows:

117

Support of pattern also containing a gold farmer (red) = 5/8

Page 118: Jaideep

Frequent Itemset Mining for Frequent Hyper-subgraphs

Confidence of a Hyper-Subgraph: Given a sub-hypergraph of

size k, subP is the pattern of interest containing the label P, subG is

a pattern which is structurally equivalent but which does not contain

the label P, the confidence is defined as follows:

118

Confidence of pattern and containing a gold farmer = 5/7

Page 119: Jaideep

Frequent Patterns of GFs

• Very low (s ≤ 0.1) support and confidence for almost all

(except 8) frequent patterns with gold farmers

• Remaining 8 patterns can be used for discrimination

between gold farmers and non-gold farmers in a subset of

the instances

• Gold farmers & affiliates are more connected: A 3rd of more

complex patterns are associated with affiliates (15/44)

119

Page 120: Jaideep

Application: Gold Farmer Detection (Behavioral)

120

Classification using in-game behavioral and demographic features

Each model corresponds to a grouping of different features

Page 121: Jaideep

Classifier Metric Initial Dataset Label Prop Change In Performance

Bayes Net Precision 0.17 0.189 0.019

Recall 0.834 0.819 -0.015

F-Score 0.282 0.307 0.025

J48 Precision 0.494 0.62 0.126

Recall 0.189 0.337 0.148

F-Score 0.273 0.437 0.164

J Rip Precision 0.495 0.537 0.042

Recall 0.462 0.462 0

F-Score 0.478 0.497 0.019

KNN Precision 0.436 0.46 0.024

Recall 0.396 0.428 0.032

F-Score 0.415 0.443 0.028

Logistic Regression Precision 0.455 0.534 0.079

Recall 0.189 0.271 0.082

F-Score 0.267 0.36 0.093

Naïve Bayes Precision 0.146 0.142 -0.004

Recall 0.538 0.502 -0.036

F-Score 0.23 0.221 -0.009

Adaboost w/ DT Precision 0.405 0.471 0.066

Recall 0.105 0.08 -0.025

F-Score 0.167 0.137 -0.03

Application: Gold Farmer Detection (Label

Propagation)• Problem: Not all people who are labeled as normal players are such. Some

of them are gold farmers but have not been identified as such

• Analogue: If someone is socializing almost exclusively with criminals then

he may be a criminal

121

Page 122: Jaideep

Gold Farmer Detection

Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s character in the game e.g., character race, character gender, distribution of gaming activities etc.

Model 2 (Item Based Features): These are the features which are derived from items bought and sold from the consignment network. These features are based on the frequency of the frequent items sold or bought by gold farmers.

Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.

Model 4 (Item Network Based Features): Features which are derived from the item network in a manner analogous to Model 2.

Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1 and Model 4.

Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and Model 4.

Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described above.

122

Page 123: Jaideep

Application: Structural Signatures Approach Applied to Trade Networks

A set of standard machine learning models are used: Naive Bayes, Bayes Net,

Logistic Regression, KNN, J48, JRip, AdaBoost and SMO

Not sufficient data is present to use the structural signature approach to catch

gold farmers

Alternative: Application of the same approach in other networks: Trade Network

123

Page 124: Jaideep

Conclusion: Gold Farmer Behavior Analysis and Prediction

124

Representation Related Issues

addressed with respect to trust in

MMOsApplication of frequent pattern mining to

discover distinct trust patterns associated

with gold farmers

No honor between thieves:

Gold farmers tend not to trust other

gold farmers

Page 125: Jaideep

Computational Trust in Multiplayer Online Games

Effect of Social Environments on Trust Trust and Homophily Trust and Clandestine Behavior Trust and Mentoring Trust and Trade

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Item Recommendation Success Prediction (Social Capital)

Effect of network structure on trustEffects of social environments on trust

Different Social environments result in

differences in network signatures

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

No Honor Amongst Thieves

Department of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Most types of homophily do not

carry over to the MMO domain

Trust as a multi-modal multi-level

network phenomenon

Social Characteristics of Trust

Network formulations of Traditional

Trust related concepts

Incorporating social science theories

in trust related prediction task

Generative Network Models for Trust based

social interactions

Trust and Prediction

Trust as a Multi-Level Network Phenomenon

Page 126: Jaideep

Trust and Homophily

Most types of homophily do not

carry over to the MMO domain

Computational Social Science

Semantics

Predictive Analysis

Trust Prediction Family of Problems

Predict Formation, Change, Breakage of Trust with in the same

network and across social networks. Predict trust propensity.

Structure

Trust based and other social networks in MMOs

exhibit anomalous network characteristics

Applications

Characteristics of Trust Networks

Gold Farmer Detection

No Honor Amongst Thieves

Detection of Clandestine Actors

Computational Trust in Multiplayer Online GamesDepartment of Computer Science, University of Minnesota

Muhammad Aurangzeb Ahmad

Page 127: Jaideep

Conclusion

Availability of data allows one to analyze phenomenon

where it was not possible to do so in the past (Trust in

MMOs in the current case)

Explored a series of big-picture questions pertaining to

trust in MMOs

Similar affordances and contexts (with respect to the offline

world) lead to similar outcomes in the online world

Explored and expanded the scope of trust related

prediction tasks

Analysis is required on more datasets for generalizability

127

Page 128: Jaideep

Acknowledgement

DMR Lab

Professor Jaideep Srivastava and all DMR lab members especially

Zoheb Borbora, Amogh Mahapatra, Young Ae Kim, Nishith Pathak,

Kyong Jin Shim and Nisheeth Srivastava

Virtual Worlds Observatory

Professor Noshir Contractor (Northwestern), Professor Scott Poole

(UIUC), Professor Dmitri Williams (USC)

External Collaborators at Northwestern, UIUC, USC, U Toronto

especially Brian Keegan

Funding Agencies:

NSF, AFRL, NSCTA, IARPA

128

Page 129: Jaideep

Publication Summary

Publication Type Total Thesis Related Comment

Book 1 1 Book on Clandestine

Behaviors and Networks

(Springer 2012)

Conference

Papers

14 5

Book Chapters 1 -

Journal Papers 2 -

Workshop Papers 10 3 2 Best Paper Awards

Short/Poster

Papers

8 1

Technical Reports 4 1

Tutorials 4 1

Patents 2 127 Co-authors

129

Page 130: Jaideep

A Computational Model

for

Social Influence

Page 131: Jaideep

Objective: Model social

influence in a dynamic network

setting as a spread of cascades

to identify key influential nodes

131

Page 132: Jaideep

Outline

Team Overview

Optimal Target Selection

Current approaches

Proposed Approach

Results

Update summary

Q&A

132

Page 133: Jaideep

Motivation

Courtesy:http://blog.twitter.com/2011/06/global-pulse.html

Global retweets of Tweets coming from Japan for one

hour after the earthquake

133

OriginalRetweet

Page 134: Jaideep

Optimal Target People Selection

Goal: Find the optimal targets to maximize influence

spread in network

Why is it important?

Public Polls: Sentiment spread

Sales and Marketing: Word-of-mouth spread

Public Health: Disease spread

Influence: A force that attempts to change the opinion or

behavior of the individual

Influence is causal

My friend buys a product -> I buy a product

134

Page 135: Jaideep

Current Approaches (1/2)

Assume a certain model for propagation of influence [1]

Independent Cascade Model

Each node is independently influences the neighbor

using a biased coin toss

Linear Threshold Model

Each node picks a random threshold

Each node infects the neighbor by a specified

amount

All of them need to know the

propagation probability135

Page 136: Jaideep

Current Approaches (1/2)

Use a two step approach

Step 1: Choose influence Model

Step 2: Find the subset of top-k nodes that maximizes

the influence

Bad News: Step 2 is NP-Hard [2]

Popular method for step 2: Greedy Heuristics

Greedy heuristics gives (1-1/e) approximation

Address scalability issues in the second step [3][4][5][6]

CELF [3]: Influence maximization function follows law

of diminishing returns (sub-modular)

136

Page 137: Jaideep

Limitations of current approaches

Estimation/Learning of propagation probabilities

Data-driven and model free approach

Propagation models are not content unaware

Content-aware influencer mining

No causality effect in influence model

Add causal effect in influence propagation

137

Page 138: Jaideep

Proposed Approach

Communication

Logs

Network

Structure

Frequent

Sequence

Mining

Network

Discovery of

Influencers

Ordered list of

influencers

138

Page 139: Jaideep

Sequence Cascade Mining

Let Support = 2

Peter, Charles, Kim, Vicky, Nancy

Network structure

Append neighbors from network

Check downward closure and

support count

Example

i=2

Temporal order

139

Page 140: Jaideep

Sequence Cascade Mining

Let Support = 2

Peter, Charles, Kim, Vicky, Nancy

Network structure

Append neighbors from network

Example

i=3

140

Check downward closure and

support count

Page 141: Jaideep

Sequence Cascade Mining

Let Support = 2

Peter, Charles, Kim, Vicky, Nancy

Network structure

Append neighbors from network

Example

i=4

Break loop; No more work to do…

Return Frequent Sequences

141

Page 142: Jaideep

Sequence Cascade Mining

L1 Frequent Sequence Mining

Candidate Generation for k+1

By appending neighbors

Downward closure pruning

Support Count and reorganize

142

Page 143: Jaideep

Influence Function

Node pattern Influence Set (Q(i,p))

=

Influence Set of a Node

Influence Set of a Node Set

It can be shown that influence function I(V,s) is

monotone and sub-modular143

Page 144: Jaideep

Network Discovery of Influencers

Greedy heuristic with (1-1/e) approximation

guarantee 144

Page 145: Jaideep

Network Discovery of Influencers

Frequent Sequences

P

C

K V

N

Example

Choose the node with max out-degree

C chosen

Current Set of Influenced People

(Randomly break tie between P

and C)

145

C K P

i=1

Find top-3

influencers

Page 146: Jaideep

Network Discovery of Influencers

Frequent Sequences

P

C

K V

N

Example

Choose the node with max out-degree

P chosen

Current Set of Influenced People

146

C K P

i=2

V N

Page 147: Jaideep

Network Discovery of Influencers

Frequent Sequences

P

C

K V

N

Example

Choose the node with max out-degree

K chosen

Current Set of Influenced People

147

C K P

i=3

V N

(Randomly break tie between K, V,

N)

Page 148: Jaideep

Context-Aware Influencers

Bag words

representing

the context

Frequent

Sequence

Mining

Network

Discovery of

Influencers

Context Specific

influencers

148

Page 149: Jaideep

Initial Results (1/2)

Significant performance gain over established

baseline PrefixSpan

DBLP Dataset USPTO Dataset

149

Page 150: Jaideep

Initial Results (2/2)

DBLP

Dataset

USPTO

Dataset

150

Page 151: Jaideep

Top Influencers

151

Thomas Huang

Philip Yu

Elisa Bertino

DBLP

Dataset

USPTO

Dataset

Dieter FreitagCTO, FRX Polymers

American Plastics Hall of FameProfessor, UIUC

Professor, UIC

Professor, Purdue

Akira Suzuki2010 Nobel Prize winner

In Chemistry

*

*

Wilhelm BrandesScientist, Bayer Crop Science

* Prolific inventors 1988-1997 as per USPTO announcement [7]

Page 152: Jaideep

What‟s next?

Short Term

o Implement NDISC algorithm

o Verify the influence spread of NDISC compared to popular

baselines (PMIA, DegreeDiscountIC, SPM, and SP1M)

o Qualitatively analyze the influencer results compared to

baseline algorithms

Long Term

o Need to extend the approach for dynamic time evolving

content and network structures

o Need implement influence analysis for streaming

algorithms

o Modeling of influence propagation in collaborative networks

using Hypergraphs

o Need to do this for multi-relational graphs152

Page 153: Jaideep

Overall Summary

Optimal Target People Selection

Task 3.3

UMN Team:

Karthik Subbian (Reseacher)

Jaideep Srivastava(Faculty)

Goals Optimal target people selection using

model-free influence models

Develop a content-centric approach for sequential cascade mining

Develop greedy heuristics for influencer mining from cascades

Tracking the influence of target nodes using online algorithms

Novel Ideas Optimal Target Selection

Content-centric

Mining Sequential cascades in network

Finding influencers from cascades

Discover context specific influencers

Future Plans Short Term

Implement NDISC and compare with baseline algorithms

Long Term

Finding influencer in dynamic time evolving network structures

Modeling of influence propagation in hyper-graphs and multi-relational graphs

Page 154: Jaideep

4/9/2013 University of Minnesota 154

Course Outline

• Module 1• Introduction to Social Analytics – applying data mining to social computing

systems; examples of a number of social computing systems, e.g.

FaceBook, MMO games, etc.

• Module 2 • Computational online trust

• Identifying key influencers

• Information flow in networks

• Module 3• Analysis of clandestine networks

• Katana - game analytics engine

Page 155: Jaideep

155

Untangling Dark WebsTheories, Methods, and Models for a

Computational Social Science of

Clandestine Networks

Page 156: Jaideep

Defining Clandestine Behavior

Clandestine behavior is socially and culturally constructed.

There is no computational definition of what constitutes

clandestine behavior

“Kept secret or done secretively, esp. because illicit”

(Webster‟s Definition)

Clandestine network thus involve actors and/or activities

which are illicit in nature

Page 157: Jaideep

Kevin Bacon Linked to Al-Qaeda !

Just because two things are related does not mean that they have a concrete

relationship. Unless …..

Page 158: Jaideep

Clandestine organizations as

networks

Networks are more flexible organizational forms than

markets or hierarchies [Powell 1990; Podolny 1998; Brass, Galaskiewicz, et al. 2004; Robins 2009]

Criminals are embedded within organizations

supporting division of labor and specialization [Cressy 1972; Canter & Alison 2000; Waring 2002]

Trust relations mediate functional relations like

grouping, exchange, & communication [McIntosh 1974; von Lampe 2004]

Balancing security vs. efficiency; time-to-task; resilience

vs. flexibility [Milward & Raab 2003, 2006; Morselli, Giguere, & Petit 2007]

Page 159: Jaideep

Contrasting network

assumptions

Current assumptions Network theory assumptions

1 Networks are information systems Networks are multi-functional

communication systems

2 Networks have uniplex ahistoric relations Networks have multiplex and dynamic

relations

3 Networks are hierarchically organized, C2

structures

Networks are temporary, dynamic,

emergent, adaptive, flexible

4 Boundary specification is a political tool Boundary specification is an analytic tool

5 Networks are globalized and homophilous Networks are local, glocal, global, and

heterogeneous

Stohl & Stohl (2005)

Page 160: Jaideep

Information systems?

Network ties aren‟t just for sharing information and

resources, but also represent latent relationships

Networks are not a machine to be broken, but a social

organism that can adapt and reproduce

What are underlying processes that govern how networks evolve

and maintain their structure?

Networks are structures for sensemaking & socialization

Bomb-making websites easy to disrupt but message boards which

foster communication and solidarity among people with similar

sympathies and values much less so

Page 161: Jaideep

Uniplex, ahistoric relationships?

Essential covert relationships like trust & knowledge exchanged based

on multiplex ties and attributes

Shared background, common identity, family ties, reciprocity

Jihadi cells prevalent in Montreal, London, Madrid, Hamburg because of

poor social integration & high disaffection

Longitudinal data necessary to measure changes over time

Repeated measurements of the same network actors, ties, attributes

Collection problems:

Left censoring: important changes happened before data collection

Right censoring: data collection does not last long enough to capture important

changes

Boundary specification: Omitting important actors, ties, attributes

Cognitive biases: Over or under-recall

Instrumentation: measurement inconsistencies, panel conditioning & attrition,

missing data

Page 162: Jaideep

Hierarchies?

Licit and illicit organizations are no longer hierarchical pyramids with

clear chains of command

Network identities and roles are not fixed and constant

Networks do not operate according to formal & unambiguous rules

Cells: Operations can be performed spontaneously & independently

Individuals may identify with an organization, but are not part of it

Networks operate at multiple levels

“Organizations created out of complex webs of exchange, dependency, reciprocity

among multiple organizations” (Monge &Contractor 2003)

Political parties, bureaucrats, businesses, charitable organizations, clinics, schools, &

houses of worship are legitimate peripheral actors which enable covert organizations

Page 163: Jaideep

Easily specified?

Specifying rules for including/excluding actors in network membership

is essential part of research design

Family members, politicians, businesses, charitable organizations, …

Unit of analysis: individuals, teams, organizations?

Example: Snowball sampling

Identify initial set then their links to second degree, third degree, etc.

Missing crucial seed actors, time intensive, super-abundance of unreliable data,

“everyone” connected by 6 degrees of separation

Labels are fuzzy & prone to political framing

“Terrorist vs. freedom-fighter”: UK vs. IRA? China vs. ETIM? Russia vs. Chechnya?

Afghanistan/Pakistan vs. Taliban? Iraq/Turkey vs. Kurds?

Distinguish between being in contact vs. being operational

Ability to mobilize, control, coordinate members

Look for how organizations cleave when they are forming or being changed

Page 164: Jaideep

Global & homophilous?

Terrorist networks encompass very different ideologies and goals

Religious (salafism), ethnic (Kurds), and nationalist (ETA) movements

Coercive bargaining (IRA, PLO, ETA) vs. war-inducing (AQ)

Different motivations leads to different formation processes &

structures

Hamas, Hezbollah, ETA emphasize shared background/ethnicity vs. “movements”

like al Qaeda or militias committed to shared ideology

Relational homophily drives creation of self-similar strong ties which remain dormant,

difficult to identify & enter, easy to disassemble

Ideological homophily drives creation of single-issue ties which remain salient, more

permeable boundaries means easier to enter but harder to prevent re-formation

Page 165: Jaideep

Criminal networks literature

Balancing security vs. efficiency; time-to-task; resilience vs. flexibility Milward & Raab 2003; Morselli, Giguere, Petit 2007

Secrecy-oriented networks emphasize sparse, decentralized networks to avoid

detection Erickson 1981

Decentralization a common response to authorities‟ targeting and seizure Morselli & Petit 2007

Leaders on periphery (low closeness) to avoid detection Baker & Faulkner 1993

Drug traffickers exhibit higher centralization in core of participants with stable

roles, but insulate by adding participants to extend periphery of core Morselli, Giguere, Petit 2007; Dorn, Oette, White 1998

165

Page 166: Jaideep

Covert networks are dynamic

networks

Multiple types of relationships

Familial ties, communication, exchange, authority, & other latent relationships

Actor-actor, actor-attribute, actor-event networks

Individuals central in one network are peripheral in other networks

Positions and relationships are not stable

Removing links & nodes can alter pattern but does not address underlying processes

that govern how networks evolve, maintain, dissolve

Outwardly different networks share common structural properties

Faust & Skvoretz (2002): Senate co-sponsorship most similar to cow-licking

Outwardly similar networks generated by different processes

Page 167: Jaideep

Cutpoints & bridges

Brokers are excellent targets

to disassemble networks

Cutpoint

Removing a node creates a new

component

Bridges

Removing a link creates a new

component

Page 168: Jaideep
Page 169: Jaideep

Clandestine networks are

complex

Multiple types of relationships [Stohl & Stohl 2007]

Networks have multiplex and dynamic relations – trust, exchange,

communication, authority, and other relationships

Networks are temporary, dynamic, emergent, adaptive, flexible

Networks are local, glocal, global, and heterogeneous – different ideologies,

motivations, goals lead to different structures & processes

Descriptive network analysis does not address

underlying processes of how networks emerge, stabilize,

dissolve

Goal: To disrupt network, understand and attack the

processes which create and stabilize

Page 170: Jaideep

Modeling Clandestine

Organizations

and Behaviors as

Networks

Page 171: Jaideep

Introduction to network analysis

Networks are sets of nodes

connected by links

Nodes can be people, groups,

webpages, etc.

Links can be friendships, exchanges,

affiliations, etc.

A B

Page 172: Jaideep

Networks - Directed & undirected

Communication vs. friendship networks

A B

D E

F G

C A B

D E

F G

C

Page 173: Jaideep

Degree distributions

What‟s the probability P(k)

of randomly selecting a

node with degree k in this

network?

24/31

6/31

1/31

P(k

)

k1 4 12

Page 174: Jaideep

Power laws

• Large networks can have

degree distributions that span

several orders of magnitude

• Many real world networks follow

a power law degree

distribution

• Scale free networks, 80/20 rule,

Pareto principle, Zipf‟s Law, long tail,

etc.

kkP ~)(

Internet routers Movie actors

Physicists Neuroscientists

ᵞ ᵞ

ᵞ ᵞ

Page 175: Jaideep

Deg distributions across

networks

Page 176: Jaideep

Path length

Path length: number of links

between two nodes (degrees

of separation)

BACDE = 4

Geodesic: Shortest path

length between two nodes

BAE = 2

Diameter: Network‟s largest

geodesic\eccenctricity

BAEFGH\BACJIH

A B

DC

E

F

G H

I

J K

Page 177: Jaideep

Density, clustering, centralization

Density

Observed edges in network / maximum possible

edges

Clustering

Count ties among alters, removing ego and ties to

ego

Observed ties in actor‟s ego network / maximum

possible ties in ego network

Network centralization

Variation in individual actors‟ centralities

High centralization when few actors possess higher

centrality than average

Low centralization when actors all have similar

centralities

B

D

G

F

E

C

H

I

A

Page 178: Jaideep

Paths & clustering across

networks

Page 179: Jaideep

Small worlds

Paradox: Individuals within the

network are highly clustered but

also have small average

geodesics to other members

Randomly rewiring a fraction of

links on a regularly-clustered

network drastically shortens

average eccentricity

Random rewiring, however

still maintains high clustering

over several orders of

magnitude

Page 180: Jaideep

Degree centrality

Degree: total number of links with other actors

In-degree: Directional links to actor from other actors

Out-degree: Directional links from actor to other actors

“Popularity”

A

D

B

H I

F

G

E

C

J

Page 181: Jaideep

Closeness centrality

How easily one actor can reach rest of network

Actor with shortest average path length

“Pulse-taker”

A

D

B

H I

F

G

E

C

J

Page 182: Jaideep

Betweenness centrality

How much an actor lies between distinct groups

Number of geodesics passing through actor

“Broker”

A

D

B

H I

F

G

E

C

J

Page 183: Jaideep

Multi-Dimensional Networks -

Attributes

Selection: Immutable characteristics

Race, ethnicity, gender, etc.

Simple process: Attributes remain fixed and influence how

connections are formed

Homophily

Influence: alterable characteristics

Interests, activities, infectiousness, etc.

Complex process: Feedback and interactions between ego‟s

attributes, network structure, alters‟ attributes

Diffusion and coevolution

Page 184: Jaideep

Selection and homophily

Existing attributes drive creation & destruction of

connections

“Birds of a feather flock together”

A B

D E

F G

C A B

D E

F G

C

Time

Page 185: Jaideep

Influence and diffusion

Existing connections drive creation & destruction of

attributes

A B

D E

F G

C A B

D E

F G

C

Time

Page 186: Jaideep

Multi-relational

Organizations: authority, trust, & friendship

A B

D E

F G

C

Page 187: Jaideep

Triad census & network motifs

16 possible triads types\motifs\isomorphisms in a directed network

(Reciprocated ties, unreciprocated ties, null ties)

Triad census: frequency of each structure in a network

Compare frequencies in observed network, measuring deviations against

frequencies in random networks

Simmelian ties: Transitive-reciprocated triads (3-0-0) occur appear

more frequently than any other motif in social networks

003 012 102 021D 021U 021C 111D 111U

030T 030C 201 120D 120U 120C 210 300

Page 188: Jaideep

Network families

Page 189: Jaideep

Growth & preferential

attachment

How do you generate scale-free networks? Rich get richer (Yule 1925, Simon 1955, de Solla Price 1976, Albert & Barabasi 1998)

?

Page 190: Jaideep

Data collection issues

Data on clandestine organizations are doubly hard to

obtain Clandestine networks by definition seek to avoid detection or identification

Law enforcement prohibited from collecting some types of data, reluctant to disclose

extant data to prevent adaptation

Criminal network studies to date generally rely on:

Evidence entered into court proceedings [Sparrow 1991; Baker & Faulkner 1993]

Imputation from secondary or tertiary sources [Krebs 2002; Sageman 2004]

Page 191: Jaideep

Problems with approaches

Collection of incomplete network

data seriously compromises

reliability of findings [Wasserman & Faust

1994]

Boundary specification of nodes –

peripheral & legitimate actors often

omitted despite playing crucial roles

Censored data – only one type of

interaction recorded; earlier & later ties

may be impossible to capture

Lack of attributes – occupation, gender,

affiliation, personality, psychological

states strongly influence tie-formation

behavior over time [Robins 2009]

Krebs 2002

Page 192: Jaideep

Introduction to MMOGs,

Gold Farming, and

Everquest II

Page 193: Jaideep

MMOGs

Massively Multiplayer Online Games (MMOGs)

Shared online persistent virtual environments where

millions of people can interact with one another

Players can complete quests, interact with the game

environment, interact with other players

Many of the behaviors which are observed in the real world

are also observed in MMOGs e.g., friendship, economic

behavior, backstabbing, illicit behaviors etc

Page 194: Jaideep

Gold farming

194

• Gold farming and real money trade

involve the exchange of virtual in-game

resources for “real world” money

• Laborers in China and S.E. Asia paid to

perform repetitive in-game practices

(“farming”) to accumulate virtual wealth

(“gold”)

• Western players purchase farmed gold to

obtain more powerful items/abilities and

open new areas within the game

• Market for real money trade exceeds $3

billion annually [Lehdonvirta & Ernkvist 2011]

Page 195: Jaideep

Deviance

Game companies actively ban accounts involved in gold farming operations

Why?

Upsets game economy equilibrium by inflating prices

Excludes other players from shared game environments

Automated bots/scripts ruin social interactions

Pay-to-play upsets meritocratic expectations

Theft of billing or account information

Legal ramifications of virtual items as “property”

195

Page 196: Jaideep

Identification

Most gold farmers are caught by:

Other players reporting behavior

Farmers’ solicitations and spamming

Organized “sting operations”

Administrator heuristics

Farming operations employ highly-specialized operations that have to balance practices to efficiently accumulate gold with practices to avoid detection

Page 197: Jaideep

Changes in ban reasons, all

197

Page 198: Jaideep

Ninja Metrics confidential information. Copyright 2012

Dynamics, bursts, lifecycle

Page 199: Jaideep

Mapping

Gold farmers potentially operate under similar motivations

and constraints as other clandestine or criminal

organizations

Profit motive – illicit goods with minimal input costs provide

arbitrage opportunities

Distribution challenges – suboptimal processes and

structures for generating and distributing goods so to avoid

risk of detection

Selection pressures – authorities confiscate goods and detain

participants when detected

Page 200: Jaideep

EverQuest II

Massively Multiplayer Online Role

Playing Game (MMORPG, MMO)

Data spanning from January 2006

to Mid-September 2006

2.1 million players across multiple

servers

Our analysis focuses on a subset of

the server

In-game action data available as

well as social interaction data (trust,

mentoring, questing, grouping,

trade) etc

Page 201: Jaideep

EverQuest II – Gold Farmers

Gold Farmers are explicitly labeled in the dataset

Gold Farmers constitute only a small percentage of all of

the players (1-3%)

Gold Farmers bots are not as common in EverQuest II as it

is generally assumed to be the case in MMOs

Page 202: Jaideep

Types of Gold Farmers

There are multiple types of Gold Farmers

• Gatherers: Accumulate gold or other resources

• Bankers: Low-activity reserve accounts

• Mules and dealers: Single user accounts to transmit money and

interact with the customer

• Barkers: Spammers marking services in the game

Issues: The dataset however does not have labels for

the gold farmer sub-types

Open Question: Are there other types of gold farmers?

Page 203: Jaideep

Descriptive properties

and

Statistical network

models

Page 204: Jaideep

Properties of Clandestine

Networks

Clandestine Networks are embedded in larger networks

and have been studied in isolation as well as being

embedded in larger networks

Do clandestine networks exhibit properties that distinguish

them from normal networks?

Behavioral Signatures

Structural Signatures

Spatio-Temporal Signatures

Page 205: Jaideep

Centrality comparisons

Compared to the population-at-large, do farmers or affiliates have higher or lower…

Incoming trade relationships? (In-degree)

Outgoing trade relationships? (Out-degree)

Incoming transactions? (In-weight)

Outgoing transactions? (Out-weight)

Proximity to the rest of the network? (Closeness)

Level of brokerage? (Betweenness)

Higher levels of “prestige”? (Eigenvector)

Tendency for counterparties to also trade? (Clustering)

Page 206: Jaideep

Multinomial logit

Compared to non-farmers, do farmers & affiliates have significantly different structures?

Farmers (z-statistic) Affiliates (z-statistic)

In-degree -0.0473 (-9.24)*** 0.0626 (47.21)***

Out-degree -0.189 (-5.11)*** 0.0529 (44.25)***

In-weight 0.00691 (23.04)*** 0.00851 (32.64)***

Out-weight 0.00796 (24.32)*** 0.009556 (32.97)***

Closeness -0.679 (-6.38)*** -2.438 (-13.24)***

Betweenness 1.56x10-7 (1.94) 1.36x10-7 (35.81)***

Eigenvector -10300 (-8.60)*** 5377 (35.73)***

Clustering coefficient 0.905 (8.64)*** -0.0926 (-0.81)

Page 207: Jaideep

Network characterizations

Degree distribution

What fraction, P(k), of nodes in the network have kconnections?

Weight distribution

What fraction, P(s), of links in the network have had stransactions?

Power law/scale free distributions

80/20 rule: minority of nodes have majority of links

Generated by growth and preferential attachment

Linear on a log-log plot… with some interesting exceptions

Page 208: Jaideep

Degree distribution & attenuation

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1 10 100

P(k

)

k

Farmer-In Farmer-Out Affiliates-In Affiliates-Out Non-Affiliates-In Non-Affiliates-Out

Page 209: Jaideep

Growth constraints

• Power law scaling followed by exponential cut-off• Aging: old nodes stop accepting

new links

• Cost: becomes more expensive to accept new links

• Capacity: nodes stop accepting links above threshold

• Copying: new nodes imitate connections of existing nodes

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1 10 100

P(k

)

k

Farmer-In Farmer-Out Affiliates-In

Affiliates-Out Non-Affiliates-In Non-Affiliates-Out

Page 210: Jaideep

Assortative v. random mixing

Is the degree of a node correlated with its neighbors’ degrees? No: random mixing

Yes, positive: assortative mixing

Yes, negative: dissortative mixing

Assortative mixing Well-connected nodes are connected to

other well-connected nodes

Poorly connected nodes connected to other poorly connected nodes

Dissortative mixing Well-connected nodes are connected to

poorly-connected nodes

Page 211: Jaideep

Assortativity in context

Dissortativity found in biological, ecological, & technological networks that require failure tolerance

Assortativity found in social and collaboration networks

Page 212: Jaideep

Comparative network analysis

How does the gold farming network compare against a “real world” criminal network?

Criminal network data hard to get a hold of in first place

Problems defining boundaries, multiplex relations, etc.

Carlo Morelli’s CAVIAR drug trafficking network (N=110, E=295)

Page 213: Jaideep

Assortativity

0.1

1

10

1 10 100

Kn

n(k

)

k

Affiliates-InIn Affiliates-OutOut Non-Affiliates-InIn Non-Affiliates-OutOut

Farmer-InIn Farmer-OutOut Caviar-InIn Caviar-OutOut

Gold farmers and drug traffickers both adopt

avoidance interaction structures

Normal players and farmer affiliates both adopt

collaborative interaction structures

Page 214: Jaideep

Conclusions – Assortativity

Non-affiliated players exhibit assortativity

Collaboration > Resilience

Affiliate network generally assortative

Collaboration > Resilience

High degree outliers likely unidentified farmers

Farmers’ network is dissortative

Collaboration < Resilience

Drug trafficking network similarly dissortative

Collaboration < Resilience

Page 215: Jaideep

Attack & failure tolerance

Failure: random removal of nodes

Attack strategies Degree attack: removal of best-connected nodes

Edge attack: removal of high-transaction dyads

Outcomes Fraction of remaining nodes in largest connected component

Fraction of remaining nodes as isolates

Page 216: Jaideep

Study 2 - Attack & failure analysis

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.01% 0.10% 1.00% 10.00% 100.00%

Fra

cti

on

Node fraction removed

Degree attack - LCC Degree attack - Isolate fraction Random failure - LCC

Random failure - Isolate fraction Edge attack - LCC Edge attack - Isolate fraction

Degree attack fractures farming network

faster than random failure or edge attack

Page 217: Jaideep

Comparative attack & failure analysis – LCC fraction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.01% 0.10% 1.00% 10.00% 100.00%

Fra

cti

on

Node fraction removed

Farmer attack - LCC fraction Farmer failure - LCC fraction Caviar attack - LCC fraction Cavaiar failure - LCC fraction

Page 218: Jaideep

Comparative attack & failure analysis – Isolate fraction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.01% 0.10% 1.00% 10.00% 100.00%

Fra

cti

on

Node fraction removed

Farmer attack - Isolate Farmer failure - Isolate Caviar attack - Isolate Caviar failure - Isolate

Page 219: Jaideep

Conclusions – Attack tolerance

Edge attack strategy has poorer performance than even random failure

Gold farmers & drug traffickers respond similarly to degree attack

Page 220: Jaideep

Month 1

220

Page 221: Jaideep

Month 2

221

Page 222: Jaideep

Month 3

222

Page 223: Jaideep

Month 4

223

Page 224: Jaideep

Month 5

224

Page 225: Jaideep

Month 6

225

Page 226: Jaideep

Month 7

226

Page 227: Jaideep

Month 8

227

Page 228: Jaideep

Month 1 in Month 5

228

Page 229: Jaideep

Month 4 in Month 5

229

Page 230: Jaideep

Month 5 in Month 5

230

Page 231: Jaideep

Month 6 in Month 5

231

Page 232: Jaideep

Month 8 in Month 5

232

Page 233: Jaideep

Predictive Modeling and

Machine Learning

approaches

Page 234: Jaideep

Gold Farmer Detection

How does one catch gold farmers?

Are there characteristics which can

be used to distinguish gold farmers?

What attributes can be used to detect GFs?

Player Character Attributes: Gender, Class etc of the player

character

Player Activity Data: Player activities over the course of time e.g.,

number of quests, type of quests, NPCs killed?

Player Socialization Data: Grouping others, trust, mentoring

Player Demographics: Real world age, gender, location

Page 235: Jaideep

Machine Learning Approaches

Multiple ways to catch gold farmers:

Binary Classification Problem

Multi-class Classification

One class Classification problem

Cascading Classifiers Problem

Outlier Detection

Label Propagation Problem

Combination of these

Class Labeling Issues: Not all players who are labeled as

normal players are normal players

Page 236: Jaideep

Machine Learning Binary Classification

Approach

Two main classes: GFs and non-GFs

Highly Skewed Distribution

9,178 Gold Farmer Characters out of a total of 2.1

million characters

Standard ser of combinations of classifiers and

features e.g., Naive Bayes, Bayes Net, Logistic

Regression, KNN, J48, JRip, AdaBoost and SMO etc

236

Page 237: Jaideep

Feature Set

• Demographic Features*

• Performance Features

• Task distributions (set of tasks performed)

• Sequence of activities performed by gold farmers

• Examples: KKKdDKdEESSKD, SSSEKdKdDD

– K= Killed Monster, d = damage points, D = Character Death, S =

Completed a recipe e.g., spell* All information is annonymized.

237

Page 238: Jaideep

Sequence Patterns

238

Page 239: Jaideep

239

Binary Classification Approach:

Classification Models

Page 240: Jaideep

Initial Classification Results

Frequent Pattern Mining Association

Page 241: Jaideep

Label Propagation

Research Problem: Not all people who are labeled as

normal players are such. Some of them are gold farmers

but have not been identified as such

Analogue: Not all people who are free i.e., not in jails

are innocent

Solution: Use label propagation to label people who

may be gold farmers

Analogue: If someone is socializing almost exclusively

with criminals then he may be a criminal

Page 242: Jaideep

Label Propagation

(a) Classification Results (b) Social Networks within

class boundaries

Page 243: Jaideep

Label Propagation

Propagate Labels based on the Social Networks of the Gold Farmers

• Guilt by association

• Propagate labels based on

the social neighborhood and

socialization patterns of players

• If player A spends > 80% of time

socializing (grouping, trusting,

mentoring other gold farmers then

he is likely a gold farmer

• Alternative methods: Propagation

based on similarities

Page 244: Jaideep

Results

Classifier Metric Initial Dataset Label Prop Change In Performance Lift

Bayes Net Precision 0.17 0.189 0.019 1.11

Recall 0.834 0.819 -0.015 0.98

F-Score 0.282 0.307 0.025 1.09

J48 Precision 0.494 0.62 0.126 1.26

Recall 0.189 0.337 0.148 1.78

F-Score 0.273 0.437 0.164 1.60

J Rip Precision 0.495 0.537 0.042 1.08

Recall 0.462 0.462 0 1.00

F-Score 0.478 0.497 0.019 1.04

KNN Precision 0.436 0.46 0.024 1.06

Recall 0.396 0.428 0.032 1.08

F-Score 0.415 0.443 0.028 1.068

Logistic Regression Precision 0.455 0.534 0.079 1.17

Recall 0.189 0.271 0.082 1.43

F-Score 0.267 0.36 0.093 1.35

Naïve Bayes Precision 0.146 0.142 -0.004 0.97

Recall 0.538 0.502 -0.036 0.93

F-Score 0.23 0.221 -0.009 0.96

Adaboost w/ DT Precision 0.405 0.471 0.066 1.16

Recall 0.105 0.08 -0.025 0.76

F-Score 0.167 0.137 -0.03 0.82

Page 245: Jaideep

Results Comparison

MetricSocialComp’09

(F0)Label

Propagation

Lift (Social Comp vsNew Features + Label

Propagation)

Precision 0.493 0.537 1.089Recall 0.304 0.462 1.520

F-Score 0.376 0.497 1.322

• The recall improves significantly from the previous results which implies that the

accounts that we are catching more gold farmers

• In case of Label propagation the precision increases from 0.49 to 0.54 which

implies that more of the users being identified as gold farmers by us are indeed

gold farmers

Page 246: Jaideep

Ninja Metrics confidential information. Copyright 2012

Gold Farmer Detection as a One class

classification Problem

• The labels of one class are known for certain

• The labels for the rest of the records are not known

with certainty

• Use the known class for training and classify the

rest of the records for the known class

• Issues: The known class consists of many

subclasses which different feature sets

Page 247: Jaideep

Disambiguating Gold Farming

sub-classes

Heuristics can be used to disambiguate the different sub-

classes

Gatherers have high in-game intense activity associated

with them

Mules have high trade volume but low in-game activity

Bankers have low level trade volume and low in-game

activity

Spammers have little of no trade activity (in general) but

denser chat networks

Page 248: Jaideep

Ninja Metrics confidential information. Copyright 2012

Disambiguating Gold Farming sub-

classes

248

High Intensity Gatherer

High Intensity Normal Player

Banker

Player with Periodic Behavior

Low intensity Player

0000 hour 1200 hours

0000 hour 1200 hours

0000 hour 1200 hours

0000 hour 1200 hours

0000 hour 1200 hours

Page 249: Jaideep

Research Question

“How do Gold Farmers change their

behaviors as a consequence of game

admin's behaviors?”

Can we anticipate GF change in

behaviors in advance?

Page 250: Jaideep

Change Detection in Gold Farmer Behavior

• Research Question: How do gold famers respond to global

enforcement of policies by the game admins

• The in-game activities of gold farmers can be represented as a time series

• Applied clustering to these series and 3 clusters made the

most sense (most clear separation)

• Each cluster can be mapped to a gold farmer subtype

• However 20-30% of all players

in each cluster are non-GFs

• Change detection to determine

when the time series changed

250

Page 251: Jaideep

Interpretation: Global Changes in GF Behaviors - Adaptation

• Gold farmers and game admins change their activities

based on how the other acts in the game

• Previously there was anecdotal evidence for this change,

we have established that this happens at not just the activity

level but at the structural pattern level

• Can we predict how the gold farmers will act if game

admins adopt a certain banning policy?

251

Page 252: Jaideep

Research Question

“A plague upon‟t when thieves

cannot be true one to another!”– Sir Falstaff, Henry IV, Part 1, II.ii

Do gold farmers

trust each other?

Page 253: Jaideep

Housing-Trust in EQ2

• Access permissions to in-game house

as trust relationships

• None: Cannot enter house.

• Visitor: Can enter the house and can

interact with objects in the house.

• Friend: Visitor + move items

• Trustee: Friend + remove items

• Houses can contain also items which

allow sales to other characters without

exchanging on the market

253

Page 254: Jaideep

Hypergraphs to Represent Tripartite Graphs

254

• Accounts can have several characters

• Houses can be accessed by several characters

• Projecting to one- or two-model data obscures crucial

information about embededdness and paths

• Figure 2a: Can ca31 access the same house as ca11?

• Figure 2b: Are characters all owned by same account?

Page 255: Jaideep

Hypergraphs: Key Concepts

• Hyperedge: An edge between three or

more nodes in a graph. We use three

types of nodes: Character, account and

house

• Node Degree: The number of

hyperedges which are connected to a

node

• NDh1 = 3

• Edge Degree: The number of

hyperedges that an edge participates in

• EDa1-h1 = 2

255

Page 256: Jaideep

Approach

• Game administrators miss gold farmers and deviance is not

a simple binary classification task

• Guilt by association: Identify “affiliates” who have ever

interacted with identified gold farmers, but have not been

identified as gold farmers themselves

B CA

Farmer Affiliate Non-affiliate

Page 257: Jaideep

Network Characteristics

257

• Long tail distributions are observed for the various degree distributions

• The mapping from character-house to an account is always unique

Page 258: Jaideep

Characteristics of Hypergraph Projection Networks

• Account Projection: Majority of the gold farmer nodes are isolates

(79%). Affiliates well-connected (8.89) vs non-affiliates (3.47)

• Character Projection: Majority of the gold farmer nodes are isolates

(84%). Affiliates well-connected (10.42) vs non-affiliates (3.23)

• House Projection: 521 gold farmer houses. Most are isolates (not

shown) but others are part of complex structures. Densely connected

network with gold farmers (7.56) and affiliates (84.02)

258

Page 259: Jaideep

Key Observations

• Picky picky: Gold farmers grant trust ties less frequently than

either affiliates or general players

• Gold farmers grant and receive fewer housing permissions

(1.82) than their affiliates (4.03) or general player population

(2.73)

259

Total degree In degree Out degree

< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >

Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07

Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70

Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34

Page 260: Jaideep

Key Observations

• No honor among thieves

• Gold farmers also have very low tendency to grant other

gold farmers permission (0.29)

• Affiliates also unlikely to trust other affiliates (0.70)

260

Total degree In degree Out degree

< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >

Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07

Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70

Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34

Page 261: Jaideep

Key Observations

• Affiliates are brokers:

• Farmers trust affiliates more (1.82) than other farmers (0.29)

• Affiliates trust farmers more (1.28) than other affiliates (0.70)

• Non-affiliates have a greater tendency to grant permissions to

non-affiliates (7.77) than in general (2.73)

261

Total degree In degree Out degree

< n > < nGF > < nAff > < n > < nGF > < nAff > < n > < nGF > < nAff >

Farmers 1.82 0.29 1.82 0.89 0.29 0.89 1.07 0.29 1.07

Affiliates 4.03 1.28 0.70 1.55 0.75 0.70 2.88 0.63 0.70

Non-Affiliates 2.73 - 7.77 1.57 - 5.98 1.56 - 2.34

Page 262: Jaideep

Frequent Pattern Mining: Key Terms

262

t1: Beer, Diaper, Milk

t2: Beer, Cheese

t3: Cheese, Boots

t4: Beer, Diaper, Cheese

t5: Beer, Diaper, Clothes, Cheese, Milk

t6: Diaper, Clothes, Milk

t7: Diaper, Milk, Clothes

• Items: Cheese, Milk, Beer, Clothes, Diaper, Boots

• Transactions: t1,t2, …, tn

• Itemset: {Cheese, Milk, Butter}

• Support of an itemset: Percentage of transactions which

contain that itemset

• Support( {Diaper, Clothes, Milk} ) = 3/7

Market Basket Transaction

dataset example

Page 263: Jaideep

Frequent Itemset Mining for Frequent Hyper-subgraphs

o Support of a Hyper-subgraph: Given a sub-hypergraph of size

k, subP is the pattern of interest containing the label P, shP is a

pattern of the same size as subP and contains the label P, the

support is defined as follows:

263

Support of pattern also containing a gold farmer (red) = 5/8

Page 264: Jaideep

Frequent Itemset Mining for Frequent Hyper-subgraphs

o Confidence of a Hyper-Subgraph: Given a sub-hypergraph of

size k, subP is the pattern of interest containing the label P, subG

is a pattern which is structurally equivalent but which does not

contain the label P, the confidence is defined as follows:

264

Confidence of pattern and containing a gold farmer = 5/7

Page 265: Jaideep

Frequent Patterns of GFs

• Less than 0.1 support and confidence for almost all

(except 8) frequent patterns with gold farmers

• Remaining 8 patterns can be used for discrimination

between gold farmers and non-gold farmers

• Gold farmers & affiliates are more connected: A third of

more complex patterns (k >= 10 nodes) are associated

with affiliates (15/44)

265

Page 266: Jaideep

Conclusion and contributions

266

Using hypergraphs to represent

complex data structures and

dependencies

Application of frequent pattern mining to

discover distinct trust patterns associated

with gold farmers

No honor between thieves:

Gold farmers tend not to trust other

gold farmers

Page 267: Jaideep

Implications

Social organization and behavioral

patterns of clandestine activity as

co-evolutionary outcomes

Using online behavioral patterns to inform

and develop metrics/algorithms for

detecting offline clandestine activity

Clandestine networks as “dual use”

technologies – ethical and legal

implications of improving detection?

[Keegan, Ahmad, et al. 2011]

Page 268: Jaideep

Limitations and future work

• Housing/trust ties mediated by other or multiplex

relationships

• Communication, grouping, mentoring, trading, etc.

• Multiple types of deviance and deviants: Modeling

role specialization & division of labor

• Using frequent subgraphs patterns as

discriminating features for ML models

• Changes in frequent subgraphs over time

268

Page 269: Jaideep

Contraband; Online and Offline

o Contraband are illegally obtained items

constituting a parallel or shadow economy

which evade regulation or taxation.

269

o Governments try to interrupt such exchanges especially

when they involve dangerous items like weapons and

drugs

o Extremely difficult to obtain data about contraband. Thus

analysis is limited

o Online analogues of such behaviors offer the possibility

of analyzing such behaviors and closing this gap

Page 270: Jaideep

Research Questions

oWhat do the trade networks of gold

farmers & normal players look like?

270

oDo gold farmers exhibit distinctive

behavioral patterns for buying and

selling items?

oCan we use contraband

networks to catch gold farmers?

oWhat are the characteristics of

contraband networks in MMOs?

Page 271: Jaideep

271

Consignment Trade in EverQuest II

o Gold farmer trading activity is a significant fraction of the trading activity

and then it significantly decline

o Possible Explanation: (i) Real decline in gold farming activity (ii) Gold

farmers change their strategies to evade detection

Page 272: Jaideep

272

Consignment Trade in EverQuest II

Page 273: Jaideep

273

Gold Farmer Trading Items

• Even though the trade volume of gold farmers decreases over

time, the total number of unique items traded by them

remains constant

• This implies that there is a subset of items that gold farmers

are interested in buying and selling

• Specialized items have a low trade volume

Page 274: Jaideep

Frequent Pattern Mining for Contraband and Trade

Network

oMain Idea: There certain behaviors and/or

activities that are associated more with gold

farmers as compared to other people

oOnce identified these can be used as feature

sets to build models to classify people as

deviant vs. non-deviant

oWe analyze the social

networks as well as the

item-usage networks of gold farmers

Page 275: Jaideep

Item-Projection Networks and FPM Framework

oTwo mode network of players and items sold

oProjection network of items: An edge is

created between two items if they have been

traded by the same person

oWhat are the items which are traded by gold

farmers as compared to others?

Item A

Item B

Player XItem A

Item B

Page 276: Jaideep

Frequent Items Associated with Gold Farmers

Page 277: Jaideep

Characteristics of Items associated with Gold Farmers

o With normal player a range of items types are associated with buying a

selling activity

o Surprisingly not only are certain items associated with gold farmer

selling activity but also buying activity

o The items that gold farmers buy are usually low end items (i) Used for

crafting (ii) Cornering the market?

o The items that are sold by gold

farmers are usually high end items

and in many cases almost

exclusively sold by them

Page 278: Jaideep

Construction of Item-Networks

o Apply standard frequent-pattern mining techniques (Apriori, FP-Tree) to

determine frequently traded items

o For all the items which occur frequently create an edge between them

o For items which are sold in different transactions by the same people then

also form an edge between them

Page 279: Jaideep

Examples of Extracted Item Network

Page 280: Jaideep

Frequent Subgraphs as features

oMain Idea: In addition to the features based on

player characteristics use sub-graphs as

features

Page 281: Jaideep

Prediction Models

Model 1 (Player Attribute Based Features): These features are based on the attributes of the player‟s

character in the game e.g., character race, character gender, distribution of gaming activities etc. These

are the same features which were used by Ahmad et al [SocCom‟09].

Model 2 (Item Based Features): These are the features which are derived from items bought and sold

from the consignment network. These features are based on the frequency of the frequent items sold or

bought by gold farmers.

Model 3 (Player Attribute & Item Based Features): All the attributes from the previous two models.

Model 4 (Item Network Based Features): Features which are derived from the item network in a

manner analogous to Model 2.

Model 5 (Player Attribute & Item-Network Based Features): A combination of features from Model 1

and Model 4.

Model 6 (Item Network & Item-Network Based Features): A combination of features from Model 2 and

Model 4.

Model 7 (Player Attribute, Item & Item-Network Based Features): Union of all the features described

above.

Page 282: Jaideep

Results

A set of standard machine learning models are used: Naive Bayes, Bayes Net,

Logistic Regression, KNN, J48, JRip, AdaBoost and SMO

Page 283: Jaideep

283

Conclusion

o Described a phenomenon analogous to contraband in the offline world

o Analysis of gold farmer item networks reveal that they exhibit characteristics

which are different from that of normal players

o Items that gold farmers buy are usually low end items and items that they sell

are often high end items

o One can use frequent items and networks

of such items associated with gold farmers to build classifiers which can be

used to catch gold farmers

Page 284: Jaideep

Future work

Association rule learning for temporal patterns bursts of

activity predict gold farming?

Statistical modeling of sudden changes and system

responses in network over time

Expanding hypergraph approaches to representing

complex relationships

Comparative analysis against EVE Online, other games

with similar “gold farmers get banned” regimes

Page 285: Jaideep

Some ethical quandaries

Euphemisms abound: “removing” links and nodes Should scholars be engaged in “destructive science”?

Clandestine network analysis as dual use technology “Terrorist vs. freedom-fighter”

Used for good (MENA revolutions), evil (al-Qaeda), unclear (Wikileaks)

Legal dimensions of information theory and methods Different assumptions in model & methods foreground different suspects

Minimize false positives or false negatives? Maximize true positives or true

negatives?

Page 286: Jaideep

Legal questions

If moral, ethical, and legal boundaries that constrain

behavior are indistinct or unenforceable in virtual world,

who defines regulations to define and curb excesses? Some normative rules hard-wired into code, others permitted by code

[Lastowka & Hunter 2006; Grimmelmann 2006; Ondrejka 2006]

From false positives to due process Responsibility to disclose proprietary methodological approaches?

Heightened or different burden of proof given superabundance of data?

Expectations of privacy?

Demonstrating intent?

Right to representation and due process?

Page 287: Jaideep

Katana:

A Game Analytics Engine

Page 288: Jaideep
Page 289: Jaideep

MMOG/VW

Game Knowledge

Game logs @ partner

3rd Party Custom

Data/Axciom/TRW

Analysis Engine

HADOOP – Cloudera ReleaseRDB - MSSQL Server

MMOG Vendor

Actionable Insights

Free Websites

AnalyticsArchitecture

Data Piped from partner

289

Page 290: Jaideep

Game

Data

290

Sony

CR3

Data

Modeling

Data

Warehouse

Hadoop Cluster• 4 Units of Dell PowerEdge R510 (rack server)• Each with 14 TB hard drive, 12 GB memory, 2 CPU x 8 parallelization = 16 CPUs

Data

Mart

Domain

Knowledge

• Data cleaning• Data transformation• Normalization• Loading

MS SQL Server 2008• Dell Precision 1500 (Workstation)• 1.5 TB hard drive, 4 GB memory, 2 Dual core CPUs

Katana Analytics Engine (UI)

Analytics Engines

Standalone Java applications

Churn

Analysis

Gold

Farming

Network

Value

Terabytes

•Alienware workstations• 2 TB hard drive, 8 GB memory, 2 Dual core CPUs

• Java applet embedded in web browser• Stand-alone deployable Java appletScheduled

Pipeline Flow

Analytics Pipeline

Page 291: Jaideep
Page 292: Jaideep
Page 293: Jaideep
Page 294: Jaideep
Page 295: Jaideep
Page 296: Jaideep

Google: Aspiring to know all

aspects of your social life

Page 297: Jaideep

Social Networks

Social Multimedia

Networks

Text-based Conversation

Networks

Page 298: Jaideep

Data Collected, Benefits, Impact Data collected

Activity logs: Clicks, page visits, chats, friends, uploads,

downloads, tags, comments, responses, joining and leaving of

communities, people friended, people blocked, ads clicked on,

ads ignored, frequency of logging in.

Network Logs: What are people‟s activities when I tag,

comment, respond, stay idle, post, recommend and ignore?

Basically: „Too much data about you‟

Benefits

Advertising: Easiest way to spread an infection by tapping key

nodes in Google's network

Sentiment Analysis: Detect how your product or service is

doing

Etc., etc., etc., etc., basically too many to list

Impact

Tremendous

Also very scary!

Page 299: Jaideep

Facebook: The operating

system for your social life,

and more

Page 300: Jaideep

Facebook

> 950 million users for Facebook

>14% of humanity!!

>35% of all who have computers!!!

> 50% of Facebook users log on every day

(http://www.facebook.com/press/info.php?statistics)

spending an average of 14 minutes per day

(http://mashable.com/2010/02/16/facebook-nielsen-stats/)

Ultimate social data, no end to what can be done with it

No wonder FB is being pegged at $100+ billion IPO

They know everything

So even if Google doesn‟t scare you, Facebook should!

Page 301: Jaideep

Impact of new instrumentation on science

1950s

Invention of the electron microscope fundamentally changed

chemistry from „playing with colored liquids in a lab‟ to „truly

understanding what‟s going on‟

1970s

Invention of gene sequencing fundamentally changed biology from

a qualitative field to a quantitative field

1980s

Deployment of the Hubble (and other) Space telescopes has had

fundamental impact on astronomy and astrophysics

2000s

Massive adoption is fundamentally changing social science

research

Massively Multiplayer Online Games (MMOGs) and Virtual Worlds

(VWs) are acting as „macroscopes of human behavior‟

Page 302: Jaideep

Some leading edge US research

programs we are participating in

Biggest

US Army‟s Network Science Collaborative Technology Alliance

Led by BBN, with over 25 institutions and 150 researchers

Approximately $200 million over a 10 year period (2009 – 2019)

http://www.ns-cta.org/ns-cta-blog/

A number of DARPA programs

Social Media in Strategic Communication (SMISC)

Started in November 2011

http://www.darpa.mil/Our_Work/I2O/Programs/Social_Media_in_Strategic_Comm

unication_(SMISC).aspx

Graph-Theoretic Research in Algorithms and PHenomenology of Social

Networks (GRAPHS)

Started in March 2012

http://www.darpa.mil/Our_Work/DSO/Programs/Graph-

theoretic_Research_in_Algorithms_and_the_Phenomenology_of_Social_Network

s_%28GRAPHS%29.aspx

Page 303: Jaideep

Summary – The Big Picture

Converging trends

Rapid increase in the usage of the Internet/Web

increased amount of interactions on line

huge amount of socialization on line

Increase in resolution and deployment of data collection „probes‟, e.g. GPS,

cell phone/PDA, wireless enabled laptop, RFID tags, …

increased ability to monitor and record interactions at a really fine

granularity

Dramatic increase in storage capacity and decrease in storage costs

feasible to store all the data collected

Fundamental advances in computational methods for data analytics

Becoming possible to really understand individual and group

behavior at a fine granularity

Great opportunities for

Basic R&D

Applied R&D

Entrepreneurship

But, putting together the right team and partnerships is critical!

Page 304: Jaideep

and last,

but certainly not the least

- thank you for your invitation