exploiting a/b testing ekoparty 2016 slides

Exploiting A/B Testing for Fun and Profit

Juan Berner

About me● @89berner● Security researcher● Developer● SpaceX Fan

Why am I here?

Growing trend ignored by security teams and assessment tools

Not what I was looking for =>

??

Who is using it?

So.. What is A/B Testing?

● Simple way to evaluate different versions of the same product

● Provides a data-driven way of making decisions

● Key metrics are usually the decisive factors

Which one was chosen?

Which one was chosen?

Though most people prefer the first one

Not always A/B

● Multivariate experiments

● Might only be used for a percentage of traffic

● Can be short or long lived

● Usually forgotten after the fact

Why should we care?

● Number of companies using it keeps growing

● Decisions are based on untrusted parties data

● False sense of security

Detecting A/B TestingAre we only seeing part of the picture?

Detecting A/B Testing

What to look for:

● JS● Images● Additional code sections● New links● Display changes

Adversarial A/B Testing

● We make decisions based on user input.

● What could ever go wrong, Tay?

Please Track Me, Bro

● By user account / email / identifier

● Visitor’s cookie

● Mobile app id

Manipulating results“Unexpected results are expected in some degree”“Decisions are made based on user input”

Abnormal

Exploiting A/B TestingPhase 1: Exploring

Exploiting A/B TestingPhase 2: Mapping

Exploiting A/B TestingPhase 3: Pretending

We will need to blend

● Usual ways of getting armies of IPv4

● Nat’s will be expected

● Geographical distribution

● Fingerprinting evasion

All about the metrics

● Some metrics are pretty easy to guess

● More activity is usually better

● Not always financial

How do we find out about them?

They

https://www.optimizely.com/case-studies/brooks-running/

They wouldn’t

http://eng.wealthfront.com/2016/04/11/building-mobile-ab-testing-infrastructure/

They wouldn’t just

https://blog.twitter.com/2015/twitter-experimentation-technical-overview

They wouldn’t just blog

www.slideshare.net/SteveUrban/experimentation-platform-at-netflix

They wouldn’t just blog about it.

https://www.optimizely.com/case-studies/sony/

They wouldn’t just blog about it.

Right?http://www.slideshare.net/KrishnaGade2/why-eveyrthing-is-an-ab-test-at-pinterest

Impacting metrics

● Can be costly to go for the known metrics

● Use business logic in your favor

Not just a good thing

Users just drop out after watching this feature

This new feature will have more users closing their accounts

No financial cost associated

Scaling the attack

● We would need to keep normal user behaviour.

● This could mean a big financial investment depending on the metrics used.

● Could we just crowdsource?

Finding volunteers

Botnets

Malware

MITM

Open proxies

As little interference as possible

Finding volunteers

Interaction as usual in most of the site.

When faced with the variants you are not betting to create small disruptions that could be attributed to chance.

No real effect on users means no attention is drawn.

Remember these guys?

News sites experiment too

News sites experiment too

Source: https://freedom-to-tinker.com/2016/05/26/a-peek-at-ab-testing-in-the-wild/

The other side of A/B Testing

Decisions are based on data

Data is based on untrusted user input

Results can be unintuitive

Demo: Manipulating news headlinesInstead of faking users, let’s get real ones

You don’t need to win, just have someone else lose instead

Setup:

● Python open proxy● Sentiment classification to detect positive or negative news related

to a keyword

Demo: Manipulating news headlines

Less negative stories => Better conversion

What lies behind the experiments


How do we calculate amount of scans needed?

H: Desired probability of getting all scansN: Amount of experiments expected on site

Assumptions:All experiments in a 50/50 situationGetting 100% of traffic


log_2(N) + log_2(1/(1-h)) + 1


For example:

log2(1) + log2(1/(1-0.5)) + 1 = 2

log2(1) + log2(1/(1-0.99)) + 1 = 8

log2(1000) + log2(1/(1-0.99)) + 1 = 18

log2(10000) + log2(1/(1-0.99)) + 1 = 21

Backend vs Frontend experiments

● Backend experiments will seem transparent

● Frontend experiments will require to simulate the browser

● Common practice of JS rendering

Demo:

Finding vulnerabilities behind experiments

● One variant● 25% of traffic

Defensive techniques

Finding the fakes

Looking for manipulation

Retroactive experimentation

Human analysis

Final Remarks

● External data should be untrusted by default

● Experiments can’t replace human reasoning

● Experiments are coming to stay

Questions?

[email protected]

Thanks!

exploiting a/b testing ekoparty 2016 slides

Technology