machine learning at peerindex

Post on 30-Oct-2014

3.988 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides for talk given at London Machine Learning Meetup on 29 Feb about machine learning behind measuring people's influence at PeerIndex.

TRANSCRIPT

Machine Learning atPeerIndex

Ferenc Huszár

@fhuszar

Wednesday, 16 May 12

PeerIndex.com: understand your influence

Wednesday, 16 May 12

PeerPerks.com: free stuff for influencers

Wednesday, 16 May 12

PeerPerks: free stuff for influencers

Wednesday, 16 May 12

Machine Learning @ PeerIndex

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

• visualise different aspects of influence, in an engaging way

Wednesday, 16 May 12

Machine Learning @ PeerIndex

• The usual stuff

• topic modelling/classification of tweets/statuses/URLs

• identity resolution across twitter, facebook, linkedIn

• spambot/fraud detection: identify people gaming the system

• sentiment classification: happy/sad/neutral

• The really exciting stuff

• inferring networks of influence - more about this later

• visualise different aspects of influence, in an engaging way

• influence maximisation - submodular optimisation

Wednesday, 16 May 12

Inferring networks of influence

Wednesday, 16 May 12

Inferring networks of influence

Social network

Wednesday, 16 May 12

Inferring networks of influence

Social network Propagation probabilities

pi,j

Wednesday, 16 May 12

Inferring networks of influence

Social network Propagation probabilities

pi,j

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

259725 2011-10-24T03:32:19+01:0076539 2011-10-24T03:32:23+01:00

1922351 2011-10-24T04:28:47+01:009183 2011-10-24T03:30:57+01:00

3335398 2011-10-24T03:34:01+01:001616885 2011-10-24T03:48:16+01:00

82198 2011-10-24T03:48:29+01:00906390 2011-10-24T23:13:51+01:00

1051322 2011-10-24T03:40:02+01:00

Information cascade logshttp://www.pcworld.com/article/239719 http://techcrunch.com/2011/11/21/...

Wednesday, 16 May 12

Heurisric approaches to estimate pi,j

Wednesday, 16 May 12

Heurisric approaches to estimate

• purely based on local network structure

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

• data-driven heuristics

pi,j number of items shared by j after i shared it

number of items shared by i

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

Wednesday, 16 May 12

Heurisric approaches to estimate

• purely based on local network structure

• trivalency “model” my personal favourite :)

• data-driven heuristics

pi,j number of items shared by j after i shared it

number of items shared by i

pi,j {0.1, 0.01, 0.01} randomly

pi,j 1

din(j)

pi,j

How do you solve this with machine learning?

Wednesday, 16 May 12

The likelihood

Wednesday, 16 May 12

DThe likelihood

✓P ( | )

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

✓P ( | )

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

p0,u1

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

(1� (1� p0,u2) (1� pu1,u2))p0,u1

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

(1� (1� p0,u2) (1� pu1,u2))p0,u1 · · ·

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for users that are not in cascade

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Wednesday, 16 May 12

DThe likelihood

1079306 2011-08-25T00:03:06+01:004549198 2011-08-25T04:32:25+01:002662975 2011-08-25T00:35:11+01:002333224 2011-08-25T01:43:18+01:003141371 2011-08-25T01:52:06+01:003482720 2011-08-25T07:18:24+01:001403682 2011-08-25T03:52:58+01:004679657 2011-08-25T01:07:48+01:00

32460 2011-08-25T01:11:43+01:00

http://www.pcworld.com/article/239719

pi,j

P ( | )what’s the probability of the cascade u1, u2, u3, . . . , un

for users that are not in cascade

for subsequent users in cascade

=nY

i=1

0

@1�i�1Y

j=1

(1� puj ,ui)

1

A

Y

u/2{u1...un}

Y

v2users(1� pu,v)

Wednesday, 16 May 12

Maximum likelihood at scale

Wednesday, 16 May 12

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

Wednesday, 16 May 12

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

Wednesday, 16 May 12

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

Wednesday, 16 May 12

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

• use heuristics to compute probabilities at scale

Wednesday, 16 May 12

Maximum likelihood at scale

• data too sparse to learn one parameter per edge

• large scale gradient-based optimisation is costly

• Solution: combine ensemble of heuristics with ML

• use heuristics to compute probabilities at scale

• use ML to tune parameters on small-scale data

Wednesday, 16 May 12

Influence maximisation

Wednesday, 16 May 12

Influence maximisation

• Select a set of users to maximise outreach

Wednesday, 16 May 12

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

Wednesday, 16 May 12

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

Wednesday, 16 May 12

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

Wednesday, 16 May 12

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

• these functions are fun to optimise

Wednesday, 16 May 12

Influence maximisation

• Select a set of users to maximise outreach

• Influence of people combines non-linearly

• In many models it combines sub-modularly

A ✓ B =) f(A [ {x})� f(A) � f(B [ {x})� f(B)

• these functions are fun to optimise

• pops up many times in machine learning

Wednesday, 16 May 12

Wrap up

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

• putting all aspects together into a single number, and visualise

Wednesday, 16 May 12

Wrap up

• two lines of ‘data’ products: PeerIndex, PeerPerks

• lots of ‘standard’ machine learning tasks

• some uniquely exciting problems

• inferring propagation probabilities

• compute expected number of users one reaches out to

• putting all aspects together into a single number, and visualise

• influence maximisation

Wednesday, 16 May 12

Thanks

We’re hiring ML scientists, interns and engineers...

fh@peerindex.com

@fhuszar

Wednesday, 16 May 12

top related