exploring displaying
TRANSCRIPT
1
EXPLORING, DISPLAYING, AND EXAMINING DATA
2
Types of Data Analysis
• Exploratory data analysis• the data guide the choice of analysis--or a
revision of the planned analysis
• Confirmatory data analysis• closer to classical statistical inference in its
use of significance and confidence
• may use information from a closely related data set or by validating findings through the gathering and analyzing of new data
3
Techniques to Display and Examine Distributions
Frequency Table Visual Displays
• Histograms
• Stem-and-leaf display
• Box-plot Crosstabulation of Variables
4
Techniques to Display and Examine Distributions
Histograms• Display all intervals in a distribution, even
without observed values
• Examine the shape of the distribution for skewness, kurtosis, and the modal pattern
5
Techniques to Display and Examine Distributions (cont.)
Box-plot (box and whisker-plot)
• Rectangular plot encompasses 50% of the data values• Edges of the box (hinges)
• Center line through the width of the box marks the median
• Whiskers extend from the right and left hinges to the largest and smallest values
6
Techniques to Display and Examine Distributions (cont.)
Transformation
• To improve interpretation and compatibility with other data sets
• To enhance symmetry and stabilize spread
• To improve linear relationships between and among variables
7
Improvement & Control Analysis
Statistical process control• Uses statistical tools to analyze, monitor, and
improve process performance
• Total Quality Management
• Control chart• Displays sequential measurements of a process
together with a center line and control limits
• Upper control limit
• Lower control limit
8
Types of Control Charts
Variables data
(ratio or interval measurements)• X-bar
• R-charts
• s-charts
• Pareto Diagrams• Bar chart whose percentages sum to 100 percent
9
Geographic Information Systems
Systems of hardware, software, and procedures that capture, store, manipulate, integrate, and display spatially-referenced data
10
Geographic Information Systems
Minimum four components• Integrating information from various sources
• Capturing data
• Projection and restructuring
• Modeling
11
Crosstabulation
A technique for comparing two classification variables–Cells
–Marginals
–Contingency tables
12
Percentaging Errors
Averaging percentages without weighting Using too-large percentages (>100%) Using percentage with very small sample Citing percentage decrease exceeding
100 percent
13
Other Table-based Analysis
Automatic Interaction Detection (AID)• Sequential partitioning procedure that uses a
dependent variable and set of predictors
• Searches among up to 300 variables for the best single division of data into subsets according to each predictor variable,
• Chooses one division approach
• Splits the sample using chi-square tests to create multi-way splits.