combining dod test information from disparate test events
DESCRIPTION
Combining DOD Test Information from Disparate Test Events. Mark London May 12, 2012. NAVAIR Public Release YY-2012-530. Contents. Introduction Preliminary Comments Problem Statement Proposed Solution: Meta-Analysis Test Setup and Data Set Data Analysis Data Results Conclusions - PowerPoint PPT PresentationTRANSCRIPT
Combining DOD Test Information from Disparate Test Events
Mark London
May 12, 2012
NAVAIR Public Release YY-2012-530
Contents Introduction Preliminary Comments Problem Statement Proposed Solution: Meta-Analysis Test Setup and Data Set Data Analysis Data Results Conclusions Summary References
NAVAIR Public Release YY-2012-530
Introduction Declining DOD budgets require improvements in DOD T&E
Acquisition processes• DT&E, IT&E, and OT&E need to provide system
performance results in a more efficient manner• Design of Experiments is useful but doesn’t solve all
problems• Methods of combining information from multiple test
sources must be developed– Meta-Analysis– Bayesian Analysis– Bayesian Meta-Analysis (combination of both)
NAVAIR Public Release YY-2012-530
Preliminary Comments
Defense Acquisition University. Test and Evaluation Management Guide. The Defense Acquisition University Press, Ft. Belvoir, VA. 2005.
(Image courtesy of DAU Test and Evaluation Management Guide 2005)
Cost Influence of T&E:
Early detection of system issues can dramatically influence total program expenditures.
NAVAIR Public Release YY-2012-530
Preliminary Comments
SE Design Processes (left side of the “V”)Testers are involved in writingthe verification procedures for requirements at each level
SE Realization Processes (right side of the “V”))
Testers verify that productsat each level meet their requirements
Before integration at next higher level
(Image adapted from URL source: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm)
US Dept. of Transportation Federal Highway Administration, Web. URL: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm)
NAVAIR Public Release YY-2012-530
Preliminary Comments
Product Lifecycle Test Phases(Image adapted from Systems Engineering Guide (2011), Mitre Corporation.)
• Multiple test phases• Test phases provide different sets of test data• Different sets of data answer different questions about the system
Mitre Corporation. (2011). Systems Engineering Guide: Test and Evaluation. Web. URL: http://www.mitre.org/work/systems_engineering/guide/se_lifecycle_building_blocks/test_evaluation/.
NAVAIR Public Release YY-2012-530
Problem Statement We need to find ways of combining test data from disparate
test events and test phases Two “normative” inferential statistical methods available
• Meta-Analysis• Bayesian Estimation
We will focus on Meta-Analysis Purpose of Study: Determine utility of Meta-Analysis for
simple analysis of multiple flight test data sets.
NAVAIR Public Release YY-2012-530
Problem Statement
What’s our goal?... To integrate multiple test data sets into (hopefully) amore statistically significant set of data results
NAVAIR Public Release YY-2012-530
Proposed Solution: Meta-Analysis What IS Meta-Analysis?
• Combines results from several studies to address a set of related research hypotheses
• The statistical synthesis of results from a series of studies (Borenstein, 2009)
Where is Meta-Analysis used?• Health (Sandelowski, 2000), Medicine, Pharmacology,
Education, Psychology, Business, Finance, Computer Simulations (Reese, 1996)
• Almost anywhere there is a need to assemble a summary of research studies on a given topic
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. Reese, C. S., Wilson, A. G., Hamada, M. S., Martz H. F., & Ryan, K. J. (1996). Integrated Analysis of Computer and Physical Experiments. Los Alamos National Labs, Report No. LA-UR-00-2915.Sandelowski (M. (2000). Focus on Research methods: Combining Qualitative and Quantitative Sampling, Data Collection, and Analysis Techniques in Mixed-Method Studies. Research in Nursing & Health, 23, 246-258.
NAVAIR Public Release YY-2012-530
Proposed Solution: Meta-Analysis What can Meta-Analysis provide?
• Way to combine results from multiple studies• Way to broadly cover large amounts of studies/tests
Limitations of Meta-Analysis? (Aguinas, 2011)• Viewed with suspicion in technical fields• File drawer problem• Mixing apples & oranges• Some studies may be ignored
Aguinas, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking Myths and Urban Legends about Meta-Analysis. Organizational Research Methods, 14(2), 306-331.
NAVAIR Public Release YY-2012-530
Proposed Solution: Meta-Analysis Different “flavors” of Meta-Analysis
• Fixed Effects Model• Random Effects Model
“Effect Sizes” - measure the strength of relationship between variables and are the summary statistic in Meta-Analysis (Shelby, 2008)
Effects may be use different models:• d-family (Hedges, g) – compares mean difference• r-family –compares correlation coefficients
Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.
NAVAIR Public Release YY-2012-530
Proposed Solution: Meta-Analysis
Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.
(Adapted from Shelby & Vaske, 2008 and Lipsey & Wilson, 2001)
NAVAIR Public Release YY-2012-530
Test Setup and Data Set Laser spot tracking “miss” distance
• Airborne FLIR/Laser system• Tested at NAS Patuxent River
5 separate data sets collected “Effect” is impact of flight environment
on mean of laser spot placement. Fixed Effects model IAW (Ruzni, 2010)
Improved Mobile IR Signature Target System
Ruzni, N., Idris, N., & Saidin, N. (2010). The Effects of the Choice of Meta-Analysis Model in the Overall Estimates for Continuous Data with Missing Standard Deviations. 2nd International Conference on Computer Engineering and Applications, 369-373.
NAVAIR Public Release YY-2012-530
Test Setup and Data Set
IMISTS
Pixel Pattern 16 x 16Temperature Range 0.25C / ~15C above ambientTarget Board Emissivity (3-12m) 0.95EO Camera Measurement Capability
YES
Laser Target Board Capability YES
Improved Mobile IR Signature Target System (IMISTS)
Table: IMISTS parameters
NAVAIR Public Release YY-2012-530
Test Setup and Data Set
Ground Test Measure Static Laser System Boresight Error (“Control”)
Flight Test Measures pointing accuracy under flight conditions (“Experiment”)
Flight Test Approach Video of Laser Spot on IMISTS Sample Data Points
NAVAIR Public Release YY-2012-530
Test Setup and Data Set Each of 5 ground data sets are “Control” group
• Ground data simulated in Matlab• Data radial offset distance simulated as N(0,0.1)• Data polar angle simulated as U(0,2π)• # simulated points for each set matched corresponding #
measured data points Each of 5 flight data sets are “Experimental” group to
consider effect of flight environment
NAVAIR Public Release YY-2012-530
Data AnalysisPLOTS OF SIMULATED GROUND TEST DATA
• All simulated data modeled as N(0,0.1) in radius and U(0,2π) in polar angle. • Results shown are averaged over 1000 simulations.
NAVAIR Public Release YY-2012-530
Data AnalysisPLOTS OF FLIGHT TEST DATA
• Note difference of grouping for each separate flight test event• Most data contained within 2 radius “units”
NAVAIR Public Release YY-2012-530
Data Analysis Descriptive Statistics of Simulated data sets
Table: Simulated data set statistics for average of 1000 simulations.
Descriptive Statistics of Flight data setsTable: Flight test data Statistics
Sim Data L1 L2 L3 L4 L5
Mean, XS 0.0038 0.0033 0.0008 0.0028 0.0062
SD, sS 0.0519 0.0525 0.0509 0.0473 0.0499
Samples, nS 300 300 300 300 300
Flight Data L1 L2 L3 L4 L5
Mean, XF 0.7038 1.5316 1.7316 0.4875 0.6012
SD, sF 0.3138 0.2770 0.3859 0.3212 0.2506
Samples, nF 300 300 300 300 300
NAVAIR Public Release YY-2012-530
Data AnalysisCombining data sets into a Summary Table
Table: Summary table of all data sets.Test Event Flight Data Simulated Data
Mean (XF) SD (sF) n (nF) Mean (XS) SD (sS) n (nS)
L1 0.7038 0.3138 300 0.0038 0.0519 300
L2 1.5316 0.2770 300 0.0033 0.0525 300
L3 1.7316 0.3859 300 0.0008 0.0509 300
L4 0.4875 0.3212 300 0.0028 0.0473 300
L5 0.6012 0.2506 300 0.0062 0.0499 300
NAVAIR Public Release YY-2012-530
Data AnalysisCalculating the Cohen Effect Size, d, using Direct Calculation Method for each of the data sets we use the standard formulas (Borenstein, 2009):
the variance, Vd,, and Standard Error, SEd, are given by
where: i = data set number (i=1,2,…,5); XF,XS = sample means
sF, sS = sample SDs; nF, nS = 300 = # samples for each set.
pooled
SF
SF
SSFF
SFi s
XX
nn
nsns
XXd ii
ii
iiii
ii
2
)1()1( 22
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.
)(2
2
ii
i
SF
id nn
dV
id dSE
i
NAVAIR Public Release YY-2012-530
Data AnalysisOur resulting table of Cohen effect sizes becomes:
Table: Calculated Cohen d effect size parameter.
These Effect Sizes d are VERY Small! (d < 0.1)
Test Event Cohen d effect size parameter
d Vd SEdL1 0.0011 0.0008 0.0289L2 0.0024 0.0008 0.0289L3 0.0027 0.0008 0.0289L4 0.0008 0.0008 0.0289L5 0.0009 0.0008 0.0289
NAVAIR Public Release YY-2012-530
Data AnalysisBut, the Cohen d effect size parameter tends to overestimate our effect size so we apply the Hedges J conversions using:
Table: Bias conversion using Hedges J parameter.
14
31
dfJ 2
ii SF nndf
Test Event Cohen d effect size parameter Hedges, J Conversion
d Vd SEd JL1 0.0011 0.0008 0.0289 0.9987
L2 0.0024 0.0008 0.0289 0.9987
L3 0.0027 0.0008 0.0289 0.9987
L4 0.0008 0.0008 0.0289 0.9987
L5 0.0009 0.0008 0.0289 0.9987Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.
NAVAIR Public Release YY-2012-530
Data AnalysisSo we calculate our Hedges g effect size parameter using the following formula:
iii dJg And the variance, Vg,, and Standard Error, SEg, are given by
Table: Calculation of Hedges g parameter using J conversion.
ii dig VJV 2
ii gg VSE
Test Event
Cohen d effect size parameter
Hedges, J Conversion
Hedges g
Hedges Variance
Hedges Standard Error
d Vd SEd J g Vg SEgL1 0.0011 0.0008 0.0289 0.9987 0.0011 0.0017 0.0408L2 0.0024 0.0008 0.0289 0.9987 0.0024 0.0017 0.0408L3 0.0027 0.0008 0.0289 0.9987 0.0027 0.0017 0.0408L4 0.0008 0.0008 0.0289 0.9987 0.0008 0.0017 0.0408L5 0.0009 0.0008 0.0289 0.9987 0.0009 0.0017 0.0408
NAVAIR Public Release YY-2012-530
Data Results But, for our Fixed Effects model we also need the respective weighting effects of each
data set using: The weighting factor, Wi:
The relative weighting factor, Wr:
Product of Wi and Effect Size parameter, g
Sum of Wi and Wi *g
Table: Calculation of Weighting Factors.Test
Event
Hedges g
Hedges Variance
Hedges Standard Error Weighting Factor Rel. Weighting
FactorProduct
W*g
g Vg SEg Wi Wr W*g
L1 0.0011 0.0017 0.0408 600.0658 0.2001 0.6571
L2 0.0024 0.0017 0.0408 599.5196 0.1999 1.4432
L3 0.0027 0.0017 0.0408 599.4099 0.1998 1.6010
L4 0.0008 0.0017 0.0408 600.2063 0.2001 0.4548
L5 0.0009 0.0017 0.0408 600.1295 0.2001 0.5654
SUM: 2999.3312 1.0 4.7215
igi V
W1
5
1ii
ir
W
WW
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.
NAVAIR Public Release YY-2012-530
Data Results Finally to compute our Summary Effect statistics we use the following:
And calculate our upper and lower 95% confidence levels as:
Producing the summary effects of the flight vs. simulated data
5
1
5
1
iii
ii
gW
WM
MSEMUL 96.195 MSEMLL 96.195
5
1
1
ii
M
WV
MM VSE
Summary Effect
Summary Variance
SummaryStandard
Error
Upper 95% CL
Lower 95% CL
M Vm SEm UL95 LL95
0.0016 0.0003 0.0183 0.0374 -3.5773
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.
NAVAIR Public Release YY-2012-530
Data Results A forest plot of our g effect sizes and
Summary M effect produces:
NAVAIR Public Release YY-2012-530
Data Results But…we need to confirm Homogeneity of data sets using
Cochrane’s Q statistic:
Produces a Q value of Q=3002!! Our Chi-Square Critical Value (p=0.05, df=5-1=4) is: 9.488 Since our Q is inside the CV (9.488 < 3002) we reject the null
hypothesis that our variability is due to sampling error Homogeneity is NOT confirmed! To continue we would consider Meta-Regression analysis.
5
1
25
15
1
2
ii
iii
iii
W
gW
gWQ
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.
NAVAIR Public Release YY-2012-530
Data Results
(Adapted from Shelby & Vaske, 2008 and Lipsey & Wilson, 2001)
Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.
NAVAIR Public Release YY-2012-530
Conclusions Results of preliminary Meta-Analysis:
1. Very small effect sizes (d, g, M all < 0.1)– Flight data does not produce significant statistical difference from
Ground data– Large data sets of same dimensions– Significant overlap between Flight vs. Ground
2. Homogeneity NOT confirmed via Q test– Random Effects models probably more accurate– Meta-Regression probably needed
3. Application to flight test data problematic Future Work should include:
1. Complete full Meta-Regression for Random Effects model2. Explore analysis of other flight test regimes3. Compare & Contrast with Bayesian methods
NAVAIR Public Release YY-2012-530
Summary
Purpose of Study: Determine utility of Meta-Analysis for simple analysis of multiple flight test data sets.
Did our study succeed?—Not as originally planned!
Additional Observations: Need to ensure sufficient number of data sets Meta-Analysis more complicated than initially thought Homogeneity of data sets of primary importance Advanced methods (e.g. Meta-Regression) start to look more
like conventional ANOVA or Multiple-Regression Application to Flight Test Data still unclear
NAVAIR Public Release YY-2012-530
ReferencesAnderson-Cook, C. M. (2009). Opportunities and issues in Multiple Data Type Meta-Analyses. Quality Engineering, 21, 243-253.Aguinas, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking Myths and Urban Legends about Meta-Analysis .
Organizational Research Methods, 14(2), 306-331.Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex,
UK. Defense Acquisition University. Test and Evaluation Management Guide. The Defense Acquisition University Press, Ft. Belvoir, VA. 2005.Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Mitre Corporation. (2011). Systems Engineering Guide: Test and Evaluation. Web. URL:
http://www.mitre.org/work/systems_engineering/guide/se_lifecycle_building_blocks/test_evaluation/. Reese, C. S., Wilson, A. G., Hamada, M. S., Martz H. F., & Ryan, K. J. (1996). Integrated Analysis of Computer and Physical Experiments. Los
Alamos National Labs, Report No. LA-UR-00-2915.Ruzni, N., Idris, N., & Saidin, N. (2010). The Effects of the Choice of Meta-Analysis Model in the Overall Estimates for Continuous Data with
Missing Standard Deviations. 2nd International Conference on Computer Engineering and Applications, 369-373.Sandelowski (M. (2000). Focus on Research methods: Combining Qualitative and Quantitative Sampling, Data Collection, and Analysis
Techniques in Mixed-Method Studies. Research in Nursing & Health, 23, 246-258.Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.US Dept. of Transportation Federal Highway Administration, Web. URL: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm)
NAVAIR Public Release YY-2012-530