quick & simple simulation in excel with clinical trials applications
DESCRIPTION
Quick & Simple Simulation in Excel with Clinical Trials Applications. Presented to the Delaware Chapter of the American Statistical Association 20 October 2011 Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com. Background. Occasional need for simulations Excel is convenient, but - PowerPoint PPT PresentationTRANSCRIPT
Quick & SimpleSimulation in Excel
with Clinical Trials ApplicationsPresented to the
Delaware Chapter of the American Statistical Association
20 October 2011
Dennis Sweitzer, Ph.D.www.Dennis-Sweitzer.com
Background
• Occasional need for simulations• Excel is convenient, but
– does not explicitly support simulations– Simulation usually requires VBA programming
(so why not use R or SAS instead)
– Or Add-in commercial programs (eg., @Risk)– Or some academic add-ins
• Does have iterative calculations, Solver• Why not simulation?
Simulate what?
• Stochastic Models– Unknown parameters? Guestimate a distribution– Optimizing policy? Test each with simulations
• Sensitivity Analysis– Variations in Inputs Variations in Outputs– 2 parameters: use a table– >2 parameters: simulate & compare variation
Excel: ProsCommon Language / Common Tools
• Most people understand Excel
• Many tools available in Excel
Transparency: Modeling assumptions can be:
Specified -- Graphed -- Debated
What you see is what you get!More hands on deck, more eyes on the prize….:
Statistician Team Member
Initial Model Explores & breaks model
Repair & enhance …Repeat until satisfied
MEGO
Excel Cons
Slower than in SAS, S+, R, etc
Lacks some statistical/probability functions
• Latest versions are a little better
• Still need to add some VBA code
• Known bugs in statistical routines (often fixed)
Tradeoffs:
• Quicker modifications vs slower execution
Simple Solution: Data Tables
Excel Data Tables
•Creates a table of values of a function– (ie, Random Variables)
•Leftmost column is used as an argument– (which is ignored in a simulation)
•Data Table repeats calculations for each row– (Each row is an iteration of the simulation)
1. Create Simulation
Create Random Variables using Inverse Probability Method:For Random Variable X with distribution function F(x),
F(x): → [0,1]If Random Uniform U
X = F-1(U) (Excel: U=Rand() )
2. Align Random Variables
• Calculations can be anywhere in Spreadsheet
• Reference the Variables in a row
• Is best to label variables in same way
3. Select Data Table
• Select table region– 1st row is Rand Vars– 1st column is not used
(can label iterations)
• From toolbar:– Data>Data Table
4. Create Simulation Table• Column input cell =
Upper left hand corner of table
• Row input cell = ignore
• OK Populates the table
• (may have to manually recalcule)
5. Execute Simulation
Iterative development
•Simulation can be changed
•Add reporting variables
•Recalculate to rerun– (no need to use Data Table
again, unless expanding)
•Hint: debug with short table, expand for final run
The End (of the key concepts)
But still more….
• Why use inverse probability distributions (instead of random variables)?
• When not to use a spreadsheet for simulation?
• Tools:– Macros to set up a simulation– VBA functions for common simulation distributions
Inverse Probability Function• Most systems directly generate random variables with the
desired distribution
• Why use Inverse Probability Functions? – Which are (probably) slower?
Personal opinion
• Testing & Debugging
• Verification Calculates correctly
• Validation Calculations answer Problem
• Sensitivity Input vs Output variability
As Mapping function
Probability Distribution: F(x): → [0,1]
Random Uniform: UInverse PDF: X = F-1(U)
For Continuous (or monotone) F-1
Small changes in u∈U small changes in F-1 (u)
⟼F-1U
Mapping
2 Random Uniform Var
As input to
Deterministic Function
Mapping
Random numbers in
(should)
Map to outputs in
A Max value looks high.Is it a bug? If not, how often?
Example #1Simple model,
function of 2 RV
Saved random U[0,1]For each iteration
Check u U[0,1]∈That generated high valueu=0.983… random high Rarely happens
Saving {Ui}:
•Verify
•Replicate
•Quantify
Example #1 (Sensitivity)
Sort by U1, U2
Sensitive to U1
Insensitive to U2
Spreadsheet limitations• Only simple data structures are available
– Rows & columns, no lists & trees– Discrete event simulations
• Complex algorithms: difficult– Eg, While or for loops– Can improvise (cumbersome, slow, buggy)
• Speed: slow
• Data Storage: what-you-see-is-all-you-get
Tools: Excel Simulation Template
• Adds some missing random functions
• Adds some set-up macros
Macro SimulateSampler
To start a new simulation when you don't remember the names & parameters of common random variables used in simulation:
•Run the Macro SimulationSample
•Copy, delete, and edit as needed.
•Make sure all random values are referenced in the first row of the data table at the bottom.
Macro SimulationSampler
• Creates a simulation with each of common simulation functions
Macro SimulationSampler………
•Sets up header row for data table
•Sets up a place for statistics
Macro Simulate• Highlight the row of random variables
– (1st row of simulation table)
• Run macro "Simulate”– Prompts for which will ask for the number of simulation
iterations,– The default number of iterations is 100– Debug & develop (manually recalculate)– Final run with >1000 iterations– Visual Basic code is computationally intensive,
Macro Simulate
Note bene
• Run Simulate right after SimulationSampler– Risk of “Ref!” error
• SimTemplate,Plot,Sampler contains – The sampler – A distribution plot of all random variable
• Crude, but handy for quick comparisons
– Ready to edit
SimTemplate,Plot,Sampler• Crude
Distribution plot of ALL variables
• Uses Percentile Ranksto save space
• Good for Continuous Var.• Bad for Discrete Var.
1. Copy
2. Delete unwanted variables
3. Make it pretty
Excel Random Variables
Rand() --Random Uniform [0,1]NormSInv() – Inverse Standard Normal Distribution CriticalBinomial() – Inverse Binomial DistributionLogNormInv() - Inverse Log Normal Distribution
Caveat: parameters are mean, SD after the Log transformation
Erlang Distribution
How long do you wait until you get a predetermined number of arrivals?
•Interarrival times are distributed IID exponential
•Erlang is Gamma with integer parameter
Beta Distribution
Can use as •Distribution of a Binomial probability• Range = [0,1]
•Generic bounded hump (vs Normal as generic unbounded hump)
Example#2, Problem
Client: “Here’s our plan….”
•Simple spreadsheet calculation– But only the expected value, – but not variability
Example #2, Simulation• Time to 100th
patient
• Patients arrive IID Exponential
Summary Statistics of Simulated values (below)Interpretation: under the assumptions, 90% of simulations required more than 4.4 months
Added VBA Functions
Inverse Functions Needed for Simulation
•Poisson, Negative Binomial
Interpolation from Table
•Interpolate: 1 or 2 dimensional interpolation
Convenience
• Beta with Mean, SD as parameters
• Beta with Hi, Low, and Mode used for parameters (often used for PERT/CPM charts)
•Log Normal with mean, SD as parameters
Missing Statistical Functions
• InvPoisson :: Poisson Distribution
• InvPascal :: Integer valued Negative Binomial – (how many failures before k successes)
Negative Binomial is continuous valued distribution; discrete version is often denoted Pascal distribution
Example#3, Patients to Screen
Expected Enrollment rate = 75% ± 5%
~ Beta Distribution
# Screen Failures
~ Negative Binomial (Pascal)– Depends on Enrollment
Rate
Beta Distribution (2)
For Convenience•Beta distribution given Mean, SD•Beta distribution given Mean, SD, upper, lower bounds•Beta distribution given Mode, Upper, Lower bounds
– Sometimes used for PERT/Critical Path Analysis• 3 estimates for tasks: Optimistic, Pessimistic, Most Likely• Beta distributed time for each task
– Assumes SD = 1/6 of the interval [low, high]
Simulation from a Table
Simulate arbitrary distribution: •Top Row: values in [0,1]•Bottom Row: Quantiles•Result: interpolated value of U from table
Or a function: y=f(x)•X is found in top row, y is interpolated from bottom row
Table Simulation Uses
•Polygonal distributions (like Triangular)
•Survival curve (for time to event)–Est. K-M curve from data, simulate rest of trial
•Arbitrary empirical distributions
•Distribution from observations
Simulation from a 2-dimensional table
Here:•Rows are quartiles of a random function•Left column is value of a parameter•A family of distributions which vary with the parameter
• Parameter y=75% (can be random)• Generate random numbers from the interpolated distribution.
Example #4: Interim Review• After 2 months, review randomization rates
• Continue to Randomize to 100 patients
• How long?
Example#4: Interim Review (Simulation)
Y= # Patients at 2 mos
~ Poisson
Time to Randomize (100-Y) additional pts
~ Erlang (Gamma)
80% CI:; (2.5, 3.7) months
Clinical Trials Applications
• Simulations for planning
• Prototyping larger simulation
• Checking assumptions/validation
Why Simulate?Expected Trial Performance• Usually not of interest -- already done w/o simulation Variability of Trial Performance• Important for Risk Management: “What’s the earliest, the
latest, the most, the least, etc”• 80% CIsStructural Problems• Interactions of parameters may doom the trial before it even
starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )
PrototypingPrototyping:• Toy simulation with hands-on teamwork • Development model• Get team buy-in on assumptions• Processing speed not important• Rapid modifications are importantIdeal? • Develop a prototype in an 1 hour meeting• Check for errors later• Run large simulations later for precise estimates
Checking planning assumptions• H0 = Simulation assumptions• Observed: a value X • {xi} = corresponding values in simulation• Rank of X in {xi} ≈ p-valueStored Values: Use Function Percent RankDescriptive Statistics: Use Frequency Count
Use to:• Test assumptions, validate model, +??• If an observed value of X is rare in the simulation, question
assumptions!
Checking Assumptions (2)Example: • A trial is designed based on a non-trivial simulation. • The model predicts a completion rate of 65%
with 95% C.I.= (55%, 75%)• 4 months into the trial, a 50% completion rate is observed. • How significant is this discrepancy?Resimulate:• {xi} = simulated completion rates (1/iteration)• Rank of observed 50% in {xi} ≈ p-value• “How likely is the observation, under the modeled assumptions?”
Example #5: Simulating a 30 patient trial
• Each patient is a random variable• Survival times are interpolated • Estimated survival curves have confidence intervals•All 30 patients in an iteration use the same random conf.level•Conf. Level is updated each iteration
Example #5: Testing Assumptions• Assume the trial was
carried out
• 70% of patients complete
Q: is this consistent with the simulations?
A: Yes, but… Only 6.1% of simulations had >70% completion
Statistics on the patients are the simulation random variables
Macro ManagementVBA Editor:
Alt-F11 (or find the menu)• Some versions of Excel
• Copy Module between sheets
• Copy code from .xls sheet & insert into VBA editor
• Open & save as new sheet
Macro Management (newer)
Later versions:
In Visual Basic
From the
Tool Bar
•File > Export File– Export VBA code
(module: “SweitzerSimulationCoreCode”)
•File > Import File– Imports VBA code (into a module)
Further resources
Commercial and Free software packages
Provide:
•More rigorous algorithms
•More functions – Resampling, multivariate, etc
•More support
Commercial Add-Ins
@RISK
www.palisade.com
Crystal Ball
www.decisioneering.com
Free Add-InsPopTools
www.cse.csiro.au/poptools
SimTools.xla
http://home.uchicago.edu/~rmyerson/addins.htm
Caveat: Licensing
•Free for non-commercial (eg, education)
•Not clear for other uses
Semi-Commercial
Low-cost Excel simulation add-in:
•RiskSim by Michael Middleton
www.treeplan.com/–Currently 1 output var with lots of graphs– also: decision trees, sensitivity analysis– on-line text-book:
http://www.treeplan.com/chapters.htm
Additional ReadingINTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION
www.informs-sim.org/wsc07papers/008.pdf
Spreadsheet Simulation (Seila, 2006)
www.informs-sim.org/wsc06papers/002.pdf
Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments
www.informs-sim.org/wsc06papers/005.pdf
Tips for the Successful Practice of Simulation
www.informs-sim.org/wsc06papers/007.pdf
The End (Actual – not simulated)