machine learning at peerindex

57
Machine Learning at PeerIndex Ferenc Huszár @fhuszar Wednesday, 16 May 12

Upload: ferenc-huszar

Post on 30-Oct-2014

3.988 views

Category:

Technology


3 download

DESCRIPTION

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

TRANSCRIPT

Page 1: Machine Learning at PeerIndex

Machine Learning atPeerIndex

Ferenc Huszár

@fhuszar

Wednesday, 16 May 12

Page 2: Machine Learning at PeerIndex

PeerIndex.com: understand your influence

Wednesday, 16 May 12

Page 3: Machine Learning at PeerIndex

PeerPerks.com: free stuff for influencers

Wednesday, 16 May 12

Page 4: Machine Learning at PeerIndex

PeerPerks: free stuff for influencers

Wednesday, 16 May 12

Page 5: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

Wednesday, 16 May 12

Page 6: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

Wednesday, 16 May 12

Page 7: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

Wednesday, 16 May 12

Page 8: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

Wednesday, 16 May 12

Page 9: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

Wednesday, 16 May 12

Page 10: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

Wednesday, 16 May 12

Page 11: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

Wednesday, 16 May 12

Page 12: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

Wednesday, 16 May 12

Page 13: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

• visualise different aspects of influence, in an engaging way

Wednesday, 16 May 12

Page 14: Machine Learning at PeerIndex

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

• visualise different aspects of influence, in an engaging way

• influence maximisation - submodular optimisation

Wednesday, 16 May 12

Page 15: Machine Learning at PeerIndex

Inferring networks of influence

Wednesday, 16 May 12

Page 16: Machine Learning at PeerIndex

Inferring networks of influence

Social network

Wednesday, 16 May 12

Page 17: Machine Learning at PeerIndex

Inferring networks of influence

Social network Propagation probabilities

pi,j

Wednesday, 16 May 12

Page 18: Machine Learning at PeerIndex

Inferring networks of influence

Social network Propagation probabilities

pi,j

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

259725 2011-10-24T03:32:19+01:0076539 2011-10-24T03:32:23+01:00

1922351 2011-10-24T04:28:47+01:009183 2011-10-24T03:30:57+01:00

3335398 2011-10-24T03:34:01+01:001616885 2011-10-24T03:48:16+01:00

82198 2011-10-24T03:48:29+01:00906390 2011-10-24T23:13:51+01:00

1051322 2011-10-24T03:40:02+01:00

Information cascade logshttp://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/...

Wednesday, 16 May 12

Page 19: Machine Learning at PeerIndex

Heurisric approaches to estimate pi,j

Wednesday, 16 May 12

Page 20: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Page 21: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Page 22: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

• data-driven heuristics

pi,j number of items shared by j after i shared it

number of items shared by i

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Page 23: Machine Learning at PeerIndex

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

• data-driven heuristics

pi,j number of items shared by j after i shared it

number of items shared by i

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

How do you solve this with machine learning?

Wednesday, 16 May 12

Page 24: Machine Learning at PeerIndex

The likelihood

Wednesday, 16 May 12

Page 25: Machine Learning at PeerIndex

DThe likelihood

✓P ( | )

Wednesday, 16 May 12

Page 26: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

✓P ( | )

Wednesday, 16 May 12

Page 27: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )

Wednesday, 16 May 12

Page 28: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

Wednesday, 16 May 12

Page 29: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 30: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

p0,u1

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 31: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

(1� (1� p0,u2) (1� pu1,u2))p0,u1

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 32: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

(1� (1� p0,u2) (1� pu1,u2))p0,u1 · · ·

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

Page 33: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Wednesday, 16 May 12

Page 34: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for users that are not in cascade

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Wednesday, 16 May 12

Page 35: Machine Learning at PeerIndex

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for users that are not in cascade

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Y

u/2{u1...un}

Y

v2users(1� pu,v)

Wednesday, 16 May 12

Page 36: Machine Learning at PeerIndex

Maximum likelihood at scale

Wednesday, 16 May 12

Page 37: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

Wednesday, 16 May 12

Page 38: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

Wednesday, 16 May 12

Page 39: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

Wednesday, 16 May 12

Page 40: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

• use heuristics to compute probabilities at scale

Wednesday, 16 May 12

Page 41: Machine Learning at PeerIndex

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

• use heuristics to compute probabilities at scale

• use ML to tune parameters on small-scale data

Wednesday, 16 May 12

Page 42: Machine Learning at PeerIndex

Influence maximisation

Wednesday, 16 May 12

Page 43: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

Wednesday, 16 May 12

Page 44: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

Wednesday, 16 May 12

Page 45: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

Wednesday, 16 May 12

Page 46: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

Wednesday, 16 May 12

Page 47: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

• these functions are fun to optimise

Wednesday, 16 May 12

Page 48: Machine Learning at PeerIndex

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

• these functions are fun to optimise

• pops up many times in machine learning

Wednesday, 16 May 12

Page 49: Machine Learning at PeerIndex

Wrap up

Wednesday, 16 May 12

Page 50: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

Wednesday, 16 May 12

Page 51: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

Wednesday, 16 May 12

Page 52: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

Wednesday, 16 May 12

Page 53: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

Wednesday, 16 May 12

Page 54: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

Wednesday, 16 May 12

Page 55: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

• putting all aspects together into a single number, and visualise

Wednesday, 16 May 12

Page 56: Machine Learning at PeerIndex

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

• putting all aspects together into a single number, and visualise

• influence maximisation

Wednesday, 16 May 12

Page 57: Machine Learning at PeerIndex

Thanks

We’re hiring ML scientists, interns and engineers...

[email protected]

@fhuszar

Wednesday, 16 May 12