experiments in data science - project jupyter...experiments in data science examples oftheneedfor...
TRANSCRIPT
Date Tuesday March19 2019
EXPERIMENTS IN DATA SCIENCE
Examples of theneedfor experimentation1 Economicindicators based on a country's povertyemployment rate happiness etc
2 Medical treatments will treatmentA helpcontrol ailment 13
Focus Assessing causea effect akacausalanalysis
mm Mmmmm
Purposeful datacollection
IEEE.EE EstaiksaiahIeEnieIesEmatiwaysofNotationand Nomenclature
DependentvariableyMeasures the outcomethat wewanttooptimizeoverEx CTR session duration bouncerate etc
clickthroughrateExplanatory variables XiXs Xpvariables that we expect to influence ourdependentvariable YIn an experiment explanatory variables are referred toasfactorsthe values they can take on legdomain arecalled levels
Primary aim Understandwhich combinations of explanatoryvariables have a causal relationship with YThisinference gives us an action for futuredesignengineering
Experimental conditions
Uniquecombinations of the levels ofoneormore factors
Experimental UnitsApplied to each condition andresponse value is recorded
Example 1 Buttonmessage
Yi I indi click button button iStates
Xii message 4,1g submit3419Gro 39,19 let'sgoXia color p I button i is red to219batoniisblue
Conditions 3submit R 3submit Bgo R go Blet'sgo R 9 Let'sgo B
Experimentalunits Individualsthatwe've assigned
eachcondition above
Experimentsvs Observational StudiesIn an experiment we control and know howunitsareassigned to a condition we can then assesscausal relationships between conditions and the response
In an Obs Study wehave no control overassignmentto conditions Instead the data is observedpassivelyIt is difficult to test for casuality here thoughmethods do existEx DAGS propensityscorematching Grangercausality
Directedacyclic graph
Example AIB testing of user activity in secondson version At B of a websiteconditions versionA version B 12conditionsDependent variable yi time in second user
i stays on the siteExperimental unit Theusers
Note Assignment of units to conditions isdoneusing various forms of randomizationThe choice of randomization is typicallyreferred to as the Design
Usually we cannot or do not wantto assign unitsto multiple conditions1Drugtreatment version of a webpage seeing toomanyconfuse or frustrate an UterBecause of this we do not measure the dependentvariablefor aUler on at leastone condition The unobserved responsefor that user1condition is called a counterfactualTheprimary aim of design is to ensure thattheonlydifference we see in response are due to differencesin conditions thus we need to control for other intrinsicfeature