tackling toxic content - irsg.bcs.orgirsg.bcs.org/searchsolutions/2017/presentations/tackling toxic...

19
Elastic November 29th 2017 @elasticmark Tackling toxic content Mark Harwood, developer

Upload: others

Post on 08-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Elastic

November 29th 2017

@elasticmark

Tackling toxic content

Mark Harwood, developer

Page 2: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Business drivers for tackling toxic content

2

Fake news

Hate speech

Extremist videos

Advertisers

Withdrawing ads

Government

Fines, legal restrictions

Consumers

Reputational damage, loss of audience

Toxic content

!

Pressure groupsPublic shaming

Page 3: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

How?

Page 4: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Proactive

• Root out content before it gathers an audience

• Reactive

• Respond to complaints from the audience

Two approaches:

4

How do your staff determine what is

“toxic”?

Whose opinions do you trust?

Page 5: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Proactive challenge

How do we determine what is toxic?

Page 6: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Parsing is hard - content is often binary e.g. audio or video

• Limited metadata - lack of descriptions or keywords

Content based analysis is hard

6

?

Page 7: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Reuse the basis of recommendation engines - people who liked X also like Y

Easier to examine activity around content

7

! ?

Page 8: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Recommendations recap: MovieLens data

8

http://files.grouplens.org/datasets/movielens/ml-10m-README.html

Page 9: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Random samples should hold no surprises

9

• 17% of all people like “Forrest Gump”• In a random sample of people, 17% of

them will also like “Forrest Gump”

Dull. But in non-random samples something interesting happens…..

Page 10: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Non-random sample: people who liked “Talladega nights”

10

<0.5% of all people like “Anchorman”

In the set of “Talladega-likers”, 20% of them like “Anchorman”

..a huge uplift in popularity from the norm!

Find all people who liked movie #46970

Summarise how their movie tastes differ from everyone else

Page 11: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Proactive demo

Page 12: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Reactive challenge

Whose opinions do we trust?

Page 13: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Allow end users to report toxic content

13

Page 14: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

BUT - some user reports, like some content, can be questionable

14

Page 15: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

• Positive reviews - “shill” or “sock puppet” accounts are used to artificially inflate the reputation of sellers in a marketplace

• Negative reviews - fake accounts or mob-rallying is used to sabotage the reputation of an innocent party.

• Tell-tale signs of collusion might include:

• • A common IP address or user agent

• • A common "hit list" of items being flagged

• • A common phrase used in feedback

• • The same time-of-day when logging requests

• • The same site join-date

Review fraud is a thing

15

Page 16: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Components of a fraud detection stack

16

Ingest Linking Risk-scoring Investigation""" """

#

#

"""

#

#

%# %

# %

Entity resolution, filtering

Cleansing, enriching normalisation

Graph exploration, anomaly detection, scoring

Task lists, case management, visualisation

Outcomes

Page 17: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Bad actors make strange shapes

17

%

&

#

#

"

'&

'&

#

#

"

'&

' &

#

#

"

'&

' &

#

#

"

'&

' &

#

#

"

' &

'

#

#

"

#

#

"

It is hard for identity manipulators to

avoid reusing resources (IP addresses,

join dates, subject lists, phrases, time) .

Fraudsters generate too many

“coincidences”.

Use the Graph API to gather related

data then raise alerts on anomalies.

See example: http://bit.ly/es_fraud

Ingest Linking Risk-scoring Investigation""" """

#

#

"""

#

#

%# %

# %

( ( ( (

)

Page 18: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Responding to alerts

18

Kibana with the Graph plugin allows investigators to examine details behind alerts.

)

Ingest Linking Risk-scoring Investigation""" """

#

#

"""

#

#

%# %

# %

See example: http://bit.ly/es_fraud

Page 19: Tackling toxic content - irsg.bcs.orgirsg.bcs.org/SearchSolutions/2017/presentations/Tackling toxic content.pdfTackling toxic content Mark Harwood, developer. Business drivers for

Demo