exploiting a/b testing ekoparty 2016 slides
TRANSCRIPT
Exploiting A/B Testing for Fun and Profit
Juan Berner
About me● @89berner● Security researcher● Developer● SpaceX Fan
Why am I here?
Growing trend ignored by security teams and assessment tools
Not what I was looking for =>
??
Who is using it?
So.. What is A/B Testing?
● Simple way to evaluate different versions of the same product
● Provides a data-driven way of making decisions
● Key metrics are usually the decisive factors
Which one was chosen?
Which one was chosen?
Though most people prefer the first one
Not always A/B
● Multivariate experiments
● Might only be used for a percentage of traffic
● Can be short or long lived
● Usually forgotten after the fact
Why should we care?
● Number of companies using it keeps growing
● Decisions are based on untrusted parties data
● False sense of security
Detecting A/B TestingAre we only seeing part of the picture?
Detecting A/B Testing
What to look for:
● JS● Images● Additional code sections● New links● Display changes
Adversarial A/B Testing
● We make decisions based on user input.
● What could ever go wrong, Tay?
Please Track Me, Bro
● By user account / email / identifier
● Visitor’s cookie
● Mobile app id
Manipulating results“Unexpected results are expected in some degree”“Decisions are made based on user input”
Abnormal
Exploiting A/B TestingPhase 1: Exploring
Exploiting A/B TestingPhase 2: Mapping
Exploiting A/B TestingPhase 3: Pretending
We will need to blend
● Usual ways of getting armies of IPv4
● Nat’s will be expected
● Geographical distribution
● Fingerprinting evasion
All about the metrics
● Some metrics are pretty easy to guess
● More activity is usually better
● Not always financial
How do we find out about them?
They
https://www.optimizely.com/case-studies/brooks-running/
They wouldn’t
http://eng.wealthfront.com/2016/04/11/building-mobile-ab-testing-infrastructure/
They wouldn’t just
https://blog.twitter.com/2015/twitter-experimentation-technical-overview
They wouldn’t just blog
www.slideshare.net/SteveUrban/experimentation-platform-at-netflix
They wouldn’t just blog about it.
https://www.optimizely.com/case-studies/sony/
They wouldn’t just blog about it.
Right?http://www.slideshare.net/KrishnaGade2/why-eveyrthing-is-an-ab-test-at-pinterest
Impacting metrics
● Can be costly to go for the known metrics
● Use business logic in your favor
Not just a good thing
Users just drop out after watching this feature
This new feature will have more users closing their accounts
No financial cost associated
Scaling the attack
● We would need to keep normal user behaviour.
● This could mean a big financial investment depending on the metrics used.
● Could we just crowdsource?
Finding volunteers
Botnets
Malware
MITM
Open proxies
As little interference as possible
Finding volunteers
Interaction as usual in most of the site.
When faced with the variants you are not betting to create small disruptions that could be attributed to chance.
No real effect on users means no attention is drawn.
Remember these guys?
News sites experiment too
News sites experiment too
Source: https://freedom-to-tinker.com/2016/05/26/a-peek-at-ab-testing-in-the-wild/
The other side of A/B Testing
Decisions are based on data
Data is based on untrusted user input
Results can be unintuitive
Demo: Manipulating news headlinesInstead of faking users, let’s get real ones
You don’t need to win, just have someone else lose instead
Setup:
● Python open proxy● Sentiment classification to detect positive or negative news related
to a keyword
Demo: Manipulating news headlines
Less negative stories => Better conversion
What lies behind the experiments
What lies behind the experiments
How do we calculate amount of scans needed?
H: Desired probability of getting all scansN: Amount of experiments expected on site
Assumptions:All experiments in a 50/50 situationGetting 100% of traffic
What lies behind the experiments
log_2(N) + log_2(1/(1-h)) + 1
What lies behind the experiments
For example:
log2(1) + log2(1/(1-0.5)) + 1 = 2
log2(1) + log2(1/(1-0.99)) + 1 = 8
log2(1000) + log2(1/(1-0.99)) + 1 = 18
log2(10000) + log2(1/(1-0.99)) + 1 = 21
Backend vs Frontend experiments
● Backend experiments will seem transparent
● Frontend experiments will require to simulate the browser
● Common practice of JS rendering
Demo:
Finding vulnerabilities behind experiments
● One variant● 25% of traffic
Defensive techniques
Finding the fakes
Looking for manipulation
Retroactive experimentation
Human analysis
Final Remarks
● External data should be untrusted by default
● Experiments can’t replace human reasoning
● Experiments are coming to stay