8. a/b test and experimental research - web mining...

1

8. A/B Test and Experimental Research

Jonathan Zhu 祝建華

MACNM Computational Workshop, Jan 13, 2019

22

Who Did People Guess Would Win?

33

Did Facebook Help Trump Win Election 2016?

▪ Yes

o Cambridge Analytica worked for him

o He used Twitter around the clock

o There was a lot of false information about Hillary

o Russian hackers were highly active during the election

o …

▪ No

o More Facebook users supported Clinton than him

o Most of his supporters did not use Twitter

o There was a lot of negative information about him too

o There is no such direct link yet

o …

44

How to Find the True Cause(s)?

Voting Decision

Personal-Family

Background

Social Influence

Media Influence

To remove or isolate

To confirm or ensure using experiment

55

What’s the Powerful Doing: A/B Test

Source: Tan of Millward Brown, 2009

66

Example of A/B Test (Online Experiment)

Source: Tan of Millward Brown, 2009

7

Four Types of A/B Test and Online Experiment

Salganik (2018):

1. Partner with the powerful (websites, government, advertisers, NGOs, etc.)

2. Use an existing system (e.g., running on Facebook, Twitter, etc.)

3. Build a standalone experiment platform (one-time only for experiment, e.g., MusicLab)

4. Build a product system (real application, e.g., MovieLens)

8

Partner with the Powerful

Bond, R. M., et al. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415), 295.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3834737/

99

Build a Standalone Experiment Platform

1010

Randomized Screen

1111

Follow-up Pages for Rating and Downloading

1212

Build a Complete Product System

13

Comparisons among the Four Types of A/B Test

Partner with power

Use existing systems

Build own experiment

Build a real product

Cost (money and time)

Low Low Medium High

Control (for subjects, stimuli)

Medium Low High High

Realism (setting) High High High High

Ethics (impact on systems)

Complex Complex Easy Easy

Source: based on Salganik (2018).

1414

A Larger Picture of Experiment

▪ Major Types:

o Pseudo experiment

o Lab experiment

o Field experiment

o Natural experiment

o A/B Test & online experiment

▪ Key Ingredients:

o Study subjects (S)

o Testing setting (T)

o Random assignment (R)

o Manipulated stimulus (X)

o Observed outcome (O)

1515

Pseudo Experiment

Source: Babbie (2007) Fig 8.3

• Change in Y: unknown • Effect of X: unknown• Confound: unknown

• Change in Y: known • Effect of X: unknown• Confound: possible

• Change in Y: unknown • Effect of X: unknown• Confound: possible

16

Three Most Deadly Sins at Harrah’s

Gary Loveman, CEO Harrah’s:

“… you don't harass women, you don't steal, and you've got to have a control group. … you can lose your job for at Harrah's not running a control group."

Source: Salganik (2018)

17

Lab Experiment

▪ Also known as “classical experiment” or “randomized controlled experiment”, with the presence of the following “controls”:

1. Randomized assignment (R)

2. Control condition (C)

3. Manipulated stimulus (X)

4. Observed outcome (O)

1818

Basic Experimental Design

Source: Babbie (2007) Fig 8.1

19

Field Experiment

Same as lab experiment

▪ Randomization

▪ Control condition

▪ Manipulated stimulus

▪ Observed outcome

Different from lab exprmnt.

▪ More “realistic” setting (e.g., shopping mall, office, etc.)

▪ More “normal” subjects (e.g., adults)

20

Natural Experiment

Same as field experiment

▪ Realistic setting

▪ Normal subjects

Different from field exprmnt.

▪ “Naturally occurred” stimulus, uncontrolled by the researcher

21

Online vs. Offline Experiment

Same as offline experiment

▪ Randomization

▪ Control condition

▪ Manipulated stimulus

▪ Observed outcome

Different from offline exprmnt.

▪ More heterogeneous subjects

▪ Larger sample size

▪ Faster data collection

▪ More “realistic” setting

▪ More likely to run into ethical problems

2222

Online Experiment Stands out as a Winner

Survey

Lab Experiment

Online Experiment

Internal Validity

Exte

rnal

Val

idit

y

High

High

Low

Low

Field Experiment

23

9. Overview of Computational Methods



24

Summary of the WorkshopPython Basics Web Data

CollectionText Mining User Profiling Visualization A/B Test Network Analysis

Concepts List, Dictionary, Dataframe,

Missing Value,Discretization,

Permutation/Random Sampling,Dummy Coding,Merge/Join/Con

catnate,GroupBy/apply,

Cross Tabulations

Webpage scraping, API

retrieval, html, css

Word extraction, Bag of words, TF-IDF, feature transformation, topic modeling,

tokenization, normalization,st

emming, lemmatization, word tagging

Audience Targeting,Behavior Analytics,

Timing Analysis,Machine Learning,

Clustering,Classification,

Regression

Univariate: histogram & KDE; Bivariate: bar, pie, line charts, hexbin,

scatterplot; 3-variate:

superposed line, grouped &

stacked bar; Multi-variate: parallel

coordinate, scatterplot matrix

Experiment design,

Experiment group, Control

group, Stimulus, Randomization,

Partner with power, Existing

platform, Stand-alone platform, Product systems

Graph, Edge list, Graph-, Node-,

and Community-level analysis,

Degree distribution,

Clustering, Path length,

Centrality, Component,

Community, Ego network

Python Packages pandas Tweepy, BeautifulSoup

Nltk, jieba, sklearn

pandas, scikit-learn

pandas, seabornnumpy

NetworkX

Data (CityU Tweets)

user_profiles_ch_anony.csv, tweet_ch_anony.xlsx

/ tweet_en_anony, tweet_ch_anony

user_profiles_ch_anony.csv,user_profiles_en_anony.csv

user_profiles_ch_anony.csv，user_profiles_en_anony.csv

edgelist_following.csv

25

Real Challenges in Computational Research

▪ Causal inference

▪ Sampling

▪ Mixed methods

▪ Research ethics

26

Causal Inference

▪ Causality is still the ultimate goal of scientific research

▪ Experimental design is the best tool to ensure causality

▪ Online experiment excels over offline counterpart

▪ The major challenge for online experiment lies in research ethics

27

Sampling

▪ Sampling is necessary in the age of big data

▪ Good sample is more informative than big data

▪ Sample size around 10,000-100,000 is an optimal, balancing quality and cost

2828

Integration of Multi-source Data

▪ Found vs. Made Data

o Found data:

• Log files

• Web content

• etc.

o Made data:

• Experiment observations

• Survey responses

• etc.

▪ Offline vs. Online Data

o Offline data

• Small size

• Ground truth known (→supervised learning)

o Online data

• Big size

• Ground truth unknown (→unsupervised learning)

2929

The Rwanda Study

Survey (n=1,000)

Mobile Users (n=1.5 mil)

Wealth(n=1,000)

Call logs(n=1,000)

Wealth &Location

(n=1.5 mil)

1. S

amp

ling

2. P

red

icti

on

Available Data

Available Data

3. E

stim

atio

n

4. A

ggre

gati

on

Survey Data

5. C

ross

-val

idat

ion

Wealth (2,148 cells of 30 districts) 6. Projection

Source: Blumenstock et al. (2015). Predicting poverty and wealth from mobile phone metadata.

http://www.jblumenstock.com/files/papers/jblumenstock_itd2012_wp.pdf

3030

Data in the Study

31

Research Ethics

▪ Privacy concerns

▪ National security concerns

▪ Commercial interests

▪ Data ownership

▪ etc.

We’re in a post-API era →relying on data made by you

32

10. Student Reflections



33

What Do You Think about the Workshop

▪ What did you expect to learn?

▪ What have you actually learn?

▪ What you learned is most relevant/useful for your job?

▪ What else do you think is relevant/useful but has been missed?

▪ Anything else would you like to share?

Thank you so much for your participation!

8. a/b test and experimental research - web mining...

Documents