statistical simulation: learning and playing with ...sera-edresearch.org/simulation.pdf · and...
TRANSCRIPT
![Page 1: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/1.jpg)
Statistical Simulation: Learning and playing with statistics in R
Prathiba NatesanAssociate Professor
University of North Texas
1
![Page 2: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/2.jpg)
Statistics• Extracting scientifically meaningful information from data of all types
• Summarize large amounts of data with a few numbers • insight into the process that generated the observed data
• Determining probabilities • deductive
• computing probabilities given a statistic: 𝑝𝑟|𝑠
• Statistical reasoning • inductive
• guessing the best choice for parameters given the data 𝑠|𝑑𝑎𝑡𝑎
• how close our guess is to the real population parameters
2
![Page 3: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/3.jpg)
Probability Distributions
• All possible events and their respective probabilities
• Univariate:• Normal
• 𝑡
• 𝜒2
• Skewed normal
• Uniform
• Multivariate:• Multivariate normal
• Wishart3
![Page 4: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/4.jpg)
Statistical Simulation
• Investigate the performance of statistical estimates under varying conditions
• Usually the generating parameters, distributions, and models are known
• Monte Carlo methods used to generate data• rely on repeated random sampling
• Generate draws from a probability distribution
4
![Page 5: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/5.jpg)
Normal Distribution Example in R
Normal_distribution.R
data <- rnorm(n=5, mean = 0, sd = 1)
#you dont have to specify n, mean, and sd
#instead you can simply type
data <- rnorm(5, 0, 1)
#let us plot the probability density
plot(density(data))
5
![Page 6: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/6.jpg)
Normal Distribution Example in R
• What is the mean of this distribution?
• What is the SD?
• How can I get estimates that more accurately reflect the population?
Normal_distribution_2.R
Rewritten as Normal_distribution_2b.R
Skew_Normal.R
6
![Page 7: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/7.jpg)
Exercise
• Generate two uniform distributions as follows
• Sample 1 ~ unif(-1,1); n = 5
• Sample 2 ~ unif(-100, 100); n = 5000
• Compare the descriptives
• Plot the densities
Uniform_distribution_2c.R
7
![Page 8: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/8.jpg)
Autoregression
• Autoregression_example.R
8
![Page 9: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/9.jpg)
Why Simulation?
• Understand the nuts and bolts of statistical concepts
• Because you already know the true values
• Test the concepts for irregular/idiosyncratic data
• Extend the concepts to newer applications/situations
• Develop new statistical concepts/models
• GREAT teaching tool!
9
![Page 10: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/10.jpg)
Understanding sampling distribution
• Define sampling distribution
• Distribution of that statistic, when derived from a sample of size n
• Sampling distributions contain statistics and not scores
10
![Page 11: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/11.jpg)
Sampling distribution of the meanAlgorithm
1. Create a population so you know the “true” parameter values {y}
2. Decide on a sample size (or many sample sizes) {n}
3. Draw a sub-sample and compute its mean {sub.sample}
4. Store the mean {averages}
5. Repeat steps 3 and 4
6. Averages is the Sampling Distribution of the mean
7. SD of Averages is the standard error: compare with theoretical se
8. Theoretical se = SD(y)/sqrt(n)
11
![Page 12: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/12.jpg)
Sampling Distribution
• Sampling_Distribution_a.R
12
![Page 13: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/13.jpg)
But how close is close enough?
13
![Page 14: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/14.jpg)
Simulation Diagnostics (1/3)
• RMSE: Root mean squared error
• 𝑅𝑀𝑆𝐸 =1
𝑟𝑒𝑝σ𝑖=1𝑟𝑒𝑝
(𝑠𝑖 − 𝑆)2
• Bias• Average bias: cancels out
• Relative bias
• Probabilities (e.g. Natesan et al., under review)
• Bounded vs unbounded
14
![Page 15: Statistical Simulation: Learning and playing with ...sera-edresearch.org/Simulation.pdf · and playing with statistics in R Prathiba Natesan Associate Professor University of North](https://reader036.vdocument.in/reader036/viewer/2022071014/5fccfdbd93e9e504370dc72c/html5/thumbnails/15.jpg)
Creating a function in R
averages <- function(){
S <- sum(vec) #variables created within the function
L <- length(vec) #do not exist outside the function
A <- S/L
return(A) #Asks the function to output A
} #end of function
Sampling_distribution_b.R
15