jure leskovec (stanford), daniel huttenlocher and jon kleinberg (cornell)

Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell) Rich social structure in online computing applications Such structures are modeled by networks Most social network analyses view links as positive Friends Fans Followers But generally links can convey either friendship or antagonism 2 Our plan Study social interactions on the Web that have positive and negative relationships Questions How do edge signs and network structure interact? Approach Edge sign prediction problem Given a network and signs on all but one edge, predict the missing sign Applications Friend recommendation for social media Easier to predict whether you know someone vs. to predict what you think of them ? Each link A B is explicitly tagged with a sign: Epinions: Trust/Distrust Does A trust Bs product reviews? (only positive links are visible) Wikipedia: Support/Oppose Does A support B to become Wikipedia administrator? Slashdot: Friend/Foe Does A like Bs comments? Other examples: World of Warcraft [Szell et al. 2010] 4 Edge signs can be predicted with ~90% accuracy using only the local network structure No need for global trust-propagation mechanisms Data oriented justification of classical theories from social psychology Our models align with theories of Balance and Status Near perfect generalization: Same underlying mechanism of signed edge creation Can train on how people vote and predict trust as well as the model trained on trust itself 5 Machine Learning formulation: Predict sign of edge (u,v) Class label: +1: positive edge -1: negative edge Learning method: Logistic regression Evaluation: Accuracy and ROC curves Dataset: Original: 80% +edges Balanced: 50% +edges Features for learning: Next slide 6 u u v v + + ? For each edge (u,v) create features: Triad counts (16): Counts of signed triads edge u v takes part in Node degree (7 features): Signed degree: d + out (u), d - out (u), d + in (v), d - in (v) Total degree: d out (u), d in (v) Embeddedness of edge (u,v) 7 u v Error rates: Epinions: 6.5% Slashdot: 6.6% Wikipedia: 19% Signs can be modeled from local network structure alone Trust propagation model of [Guha et al. 04] has 14% error on Epinions Triad features perform less well for less embedded edges Wikipedia is harder to model: Votes are publicly visible 8 Epin Slash Wiki Our goal is not just to predict signs but also to derive insights into usage of signed edges Logistic regression learns a weight b i for each feature x i : Connection to theories from social psychology: Structural balance Theory of status which both give predictions on the sign of the edge (u,v) based on the triad it is embedded into 9 uv +- Consider edges as undirected Start with intuition [Heider 46]: Friend of my friend is my friend Enemy of enemy is my friend Enemy of friend is my enemy Look at connected triples of nodes that are consistent with this logic: Status theory [Davis-Leinhardt 68, Guha et al. 04, Leskovec et al. 10] Link u v means: v has higher status than u Link u v means: v has lower status than u Based on signs/directions of links from/to node x make a prediction Status and balance can make different predictions: 11 + uv x - - uv x + + Balance: + Status: LogReg: Balance: + Status: LogReg: Balance: + Status: LogReg: Balance: + Status: LogReg: uv x - + Balance: Status: LogReg: Balance: Status: LogReg: Both theories agree well with learned models Further observations: Backward-backward triads have smaller weights than forward and mixed direction triads Balance is in better agreement with Epinions and Slashdot while Status is with Wikipedia Balance consistently disagrees with enemy of my enemy is my friend 13 v v x x u u Balance based and learned coefficients: 14 Feature Balance theory EpinSlashdotWiki const Model if signs would be created purely based on Balance theory Status based and learned coefficients: 15 Feature Status theory EpinSlashdotWiki const u < x < v u > x > v u v u > x < v v v x x u u + + Triads where u > x > v Triads where u > x > v v v x x u u v v x x u u v v x x u u + + Model if signs would be created purely based on Status theory Deterministic models compare well to Learned models Epinions and Slashdot: More embedded edges are easier to predict Wikipedia: Status outperforms balance Learned balance performs nearly as well as the full model 16 Epin Slash Wiki Do people use these very different linking systems by obeying the same principles? How generalizable are the results across the datasets? Train on row dataset, predict on column Almost perfect generalization of the models even though networks come from very different applications 17 Suppose we are only interested in predicting whether there is a trust edge or no edge Does knowing negative edges help? ? + + ? Vs. YES! Both theories make predictions about the global structure of the network Structural balance Factions Put nodes into groups such that the number of in group + and between group - edges is maximized Status theory Global Status Flip direction and sign of negative edges Assign each node a unique status value so that most edges point from low to high Fraction of edges of the network that satisfy Balance and Status? Observations: No evidence for global balance beyond the random baselines Real data is 80% consistent vs. 80% consistency under random baseline Evidence for global status beyond the random baselines Real data is 80% consistent, but 50% consistency under random baseline 20 Signed networks provide insight into how social computing systems are used: Status vs. Balance Sign of relationship can be reliably predicted from the local network context ~90% accuracy sign of the edge More evidence that networks are globally organized based on status People use signed edges consistently regardless of particular application Near perfect generalization of models across datasets Negative information helps in predicting positive edges 21 22Jure Leskovec Heuristic predictors for (u,v): Balance: Chose sign that makes majority of the triads balanced Status: Predict in based on status (v)=d + in (u)+d - out (u)-d + out (v)-d - in (v) Out-sign of v: majority sign In-sign of u: majority sign Observations: Triadic models do better with increasing embeddedness 23

jure leskovec (stanford), daniel huttenlocher and jon kleinberg (cornell)

Documents