quick & simple simulation in excel with clinical trials applications presented to the delaware...

Post on 22-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Quick & SimpleSimulation in Excel

with Clinical Trials ApplicationsPresented to the

Delaware Chapter of the American Statistical Association

20 October 2011

Dennis Sweitzer, Ph.D.www.Dennis-Sweitzer.com   

Background

• Occasional need for simulations• Excel is convenient, but

– does not explicitly support simulations– Simulation usually requires VBA programming

(so why not use R or SAS instead)

– Or Add-in commercial programs (eg., @Risk)– Or some academic add-ins

• Does have iterative calculations, Solver• Why not simulation?

Simulate what?

• Stochastic Models– Unknown parameters? Guestimate a distribution– Optimizing policy? Test each with simulations

• Sensitivity Analysis– Variations in Inputs Variations in Outputs– 2 parameters: use a table– >2 parameters: simulate & compare variation

Excel: ProsCommon Language / Common Tools

• Most people understand Excel

• Many tools available in Excel

Transparency: Modeling assumptions can be:

Specified -- Graphed -- Debated

What you see is what you get!More hands on deck, more eyes on the prize….:

Statistician Team Member

Initial Model Explores & breaks model

Repair & enhance …Repeat until satisfied

MEGO

Excel Cons

Slower than in SAS, S+, R, etc

Lacks some statistical/probability functions

• Latest versions are a little better

• Still need to add some VBA code

• Known bugs in statistical routines (often fixed)

Tradeoffs:

• Quicker modifications vs slower execution

Simple Solution: Data Tables

Excel Data Tables

•Creates a table of values of a function– (ie, Random Variables)

•Leftmost column is used as an argument– (which is ignored in a simulation)

•Data Table repeats calculations for each row– (Each row is an iteration of the simulation)

1. Create Simulation

Create Random Variables using Inverse Probability Method:For Random Variable X with distribution function F(x),

F(x): → [0,1]If Random Uniform U

X = F-1(U) (Excel: U=Rand() )

2. Align Random Variables

• Calculations can be anywhere in Spreadsheet

• Reference the Variables in a row

• Is best to label variables in same way

3. Select Data Table

• Select table region– 1st row is Rand Vars– 1st column is not used

(can label iterations)

• From toolbar:– Data>Data Table

4. Create Simulation Table• Column input cell =

Upper left hand corner of table

• Row input cell = ignore

• OK Populates the table

• (may have to manually recalcule)

5. Execute Simulation

Iterative development

•Simulation can be changed

•Add reporting variables

•Recalculate to rerun– (no need to use Data Table

again, unless expanding)

•Hint: debug with short table, expand for final run

The End (of the key concepts)

But still more….

• Why use inverse probability distributions (instead of random variables)?

• When not to use a spreadsheet for simulation?

• Tools:– Macros to set up a simulation– VBA functions for common simulation distributions

Inverse Probability Function• Most systems directly generate random variables with the

desired distribution

• Why use Inverse Probability Functions? – Which are (probably) slower?

Personal opinion

• Testing & Debugging

• Verification Calculates correctly

• Validation Calculations answer Problem

• Sensitivity Input vs Output variability

As Mapping function

Probability Distribution: F(x): → [0,1]

Random Uniform: UInverse PDF: X = F-1(U)

For Continuous (or monotone) F-1

Small changes in u∈U small changes in F-1 (u)

⟼F-1U

Mapping

2 Random Uniform Var

As input to

Deterministic Function

Mapping

Random numbers in

(should)

Map to outputs in

A Max value looks high.Is it a bug? If not, how often?

Example #1Simple model,

function of 2 RV

Saved random U[0,1]For each iteration

Check u U[0,1]∈That generated high valueu=0.983… random high Rarely happens

Saving {Ui}:

•Verify

•Replicate

•Quantify

Example #1 (Sensitivity)

Sort by U1, U2

Sensitive to U1

Insensitive to U2

Spreadsheet limitations• Only simple data structures are available

– Rows & columns, no lists & trees– Discrete event simulations

• Complex algorithms: difficult– Eg, While or for loops– Can improvise (cumbersome, slow, buggy)

• Speed: slow

• Data Storage: what-you-see-is-all-you-get

Tools: Excel Simulation Template

• Adds some missing random functions

• Adds some set-up macros

Macro SimulateSampler

To start a new simulation when you don't remember the names & parameters of common random variables used in simulation:

•Run the Macro SimulationSample

•Copy, delete, and edit as needed.

•Make sure all random values are referenced in the first row of the data table at the bottom.

Macro SimulationSampler

• Creates a simulation with each of common simulation functions

Macro SimulationSampler………

•Sets up header row for data table

•Sets up a place for statistics

Macro Simulate• Highlight the row of random variables

– (1st row of simulation table)

• Run macro "Simulate”– Prompts for which will ask for the number of simulation

iterations,– The default number of iterations is 100– Debug & develop (manually recalculate)– Final run with >1000 iterations– Visual Basic code is computationally intensive,

Macro Simulate

Note bene

• Run Simulate right after SimulationSampler– Risk of “Ref!” error

• SimTemplate,Plot,Sampler contains – The sampler – A distribution plot of all random variable

• Crude, but handy for quick comparisons

– Ready to edit

SimTemplate,Plot,Sampler• Crude

Distribution plot of ALL variables

• Uses Percentile Ranksto save space

• Good for Continuous Var.• Bad for Discrete Var.

1. Copy

2. Delete unwanted variables

3. Make it pretty

Excel Random Variables

Rand() --Random Uniform [0,1]NormSInv() – Inverse Standard Normal Distribution CriticalBinomial() – Inverse Binomial DistributionLogNormInv() - Inverse Log Normal Distribution

Caveat: parameters are mean, SD after the Log transformation

Erlang Distribution

How long do you wait until you get a predetermined number of arrivals?

•Interarrival times are distributed IID exponential

•Erlang is Gamma with integer parameter

Beta Distribution

Can use as •Distribution of a Binomial probability• Range = [0,1]

•Generic bounded hump (vs Normal as generic unbounded hump)

Example#2, Problem

Client: “Here’s our plan….”

•Simple spreadsheet calculation– But only the expected value, – but not variability

Example #2, Simulation• Time to 100th

patient

• Patients arrive IID Exponential

Summary Statistics of Simulated values (below)Interpretation: under the assumptions, 90% of simulations required more than 4.4 months

Added VBA Functions

Inverse Functions Needed for Simulation

•Poisson, Negative Binomial

Interpolation from Table

•Interpolate: 1 or 2 dimensional interpolation

Convenience

• Beta with Mean, SD as parameters

• Beta with Hi, Low, and Mode used for parameters (often used for PERT/CPM charts)

•Log Normal with mean, SD as parameters

Missing Statistical Functions

• InvPoisson :: Poisson Distribution

• InvPascal :: Integer valued Negative Binomial – (how many failures before k successes)

Negative Binomial is continuous valued distribution; discrete version is often denoted Pascal distribution

Example#3, Patients to Screen

Expected Enrollment rate = 75% ± 5%

~ Beta Distribution

# Screen Failures

~ Negative Binomial (Pascal)– Depends on Enrollment

Rate

Beta Distribution (2)

For Convenience•Beta distribution given Mean, SD•Beta distribution given Mean, SD, upper, lower bounds•Beta distribution given Mode, Upper, Lower bounds

– Sometimes used for PERT/Critical Path Analysis• 3 estimates for tasks: Optimistic, Pessimistic, Most Likely• Beta distributed time for each task

– Assumes SD = 1/6 of the interval [low, high]

Simulation from a Table

Simulate arbitrary distribution: •Top Row: values in [0,1]•Bottom Row: Quantiles•Result: interpolated value of U from table

Or a function: y=f(x)•X is found in top row, y is interpolated from bottom row

Table Simulation Uses

•Polygonal distributions (like Triangular)

•Survival curve (for time to event)–Est. K-M curve from data, simulate rest of trial

•Arbitrary empirical distributions

•Distribution from observations

Simulation from a 2-dimensional table

Here:•Rows are quartiles of a random function•Left column is value of a parameter•A family of distributions which vary with the parameter

• Parameter y=75% (can be random)• Generate random numbers from the interpolated distribution.

Example #4: Interim Review• After 2 months, review randomization rates

• Continue to Randomize to 100 patients

• How long?

Example#4: Interim Review (Simulation)

Y= # Patients at 2 mos

~ Poisson

Time to Randomize (100-Y) additional pts

~ Erlang (Gamma)

80% CI:; (2.5, 3.7) months

Clinical Trials Applications

• Simulations for planning

• Prototyping larger simulation

• Checking assumptions/validation

Why Simulate?Expected Trial Performance• Usually not of interest -- already done w/o simulation Variability of Trial Performance• Important for Risk Management: “What’s the earliest, the

latest, the most, the least, etc”• 80% CIsStructural Problems• Interactions of parameters may doom the trial before it even

starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )

PrototypingPrototyping:• Toy simulation with hands-on teamwork • Development model• Get team buy-in on assumptions• Processing speed not important• Rapid modifications are importantIdeal? • Develop a prototype in an 1 hour meeting• Check for errors later• Run large simulations later for precise estimates

Checking planning assumptions• H0 = Simulation assumptions• Observed: a value X • {xi} = corresponding values in simulation• Rank of X in {xi} ≈ p-valueStored Values: Use Function Percent RankDescriptive Statistics: Use Frequency Count

Use to:• Test assumptions, validate model, +??• If an observed value of X is rare in the simulation, question

assumptions!

Checking Assumptions (2)Example: • A trial is designed based on a non-trivial simulation. • The model predicts a completion rate of 65%

with 95% C.I.= (55%, 75%)• 4 months into the trial, a 50% completion rate is observed. • How significant is this discrepancy?Resimulate:• {xi} = simulated completion rates (1/iteration)• Rank of observed 50% in {xi} ≈ p-value• “How likely is the observation, under the modeled assumptions?”

Example #5: Simulating a 30 patient trial

• Each patient is a random variable• Survival times are interpolated • Estimated survival curves have confidence intervals•All 30 patients in an iteration use the same random conf.level•Conf. Level is updated each iteration

Example #5: Testing Assumptions• Assume the trial was

carried out

• 70% of patients complete

Q: is this consistent with the simulations?

A: Yes, but… Only 6.1% of simulations had >70% completion

Statistics on the patients are the simulation random variables

Macro ManagementVBA Editor:

Alt-F11 (or find the menu)• Some versions of Excel

• Copy Module between sheets

• Copy code from .xls sheet & insert into VBA editor

• Open & save as new sheet

Macro Management (newer)

Later versions:

In Visual Basic

From the

Tool Bar

•File > Export File– Export VBA code

(module: “SweitzerSimulationCoreCode”)

•File > Import File– Imports VBA code (into a module)

Further resources

Commercial and Free software packages

Provide:

•More rigorous algorithms

•More functions – Resampling, multivariate, etc

•More support

Commercial Add-Ins

@RISK

www.palisade.com

Crystal Ball

www.decisioneering.com

Free Add-InsPopTools

www.cse.csiro.au/poptools

SimTools.xla

http://home.uchicago.edu/~rmyerson/addins.htm

Caveat: Licensing

•Free for non-commercial (eg, education)

•Not clear for other uses

Semi-Commercial

Low-cost Excel simulation add-in:

•RiskSim by Michael Middleton

www.treeplan.com/–Currently 1 output var with lots of graphs– also: decision trees, sensitivity analysis– on-line text-book:

http://www.treeplan.com/chapters.htm

Additional ReadingINTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION

www.informs-sim.org/wsc07papers/008.pdf

Spreadsheet Simulation (Seila, 2006)

www.informs-sim.org/wsc06papers/002.pdf

Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments

www.informs-sim.org/wsc06papers/005.pdf

Tips for the Successful Practice of Simulation

www.informs-sim.org/wsc06papers/007.pdf

The End (Actual – not simulated)

top related