combining dod test information from disparate test events

Combining DOD Test Information from Disparate Test Events

Mark London

May 12, 2012

NAVAIR Public Release YY-2012-530

Contents Introduction Preliminary Comments Problem Statement Proposed Solution: Meta-Analysis Test Setup and Data Set Data Analysis Data Results Conclusions Summary References


Introduction Declining DOD budgets require improvements in DOD T&E

Acquisition processes• DT&E, IT&E, and OT&E need to provide system

performance results in a more efficient manner• Design of Experiments is useful but doesn’t solve all

problems• Methods of combining information from multiple test

sources must be developed– Meta-Analysis– Bayesian Analysis– Bayesian Meta-Analysis (combination of both)


Preliminary Comments

Defense Acquisition University. Test and Evaluation Management Guide. The Defense Acquisition University Press, Ft. Belvoir, VA. 2005.

(Image courtesy of DAU Test and Evaluation Management Guide 2005)

Cost Influence of T&E:

Early detection of system issues can dramatically influence total program expenditures.



SE Design Processes (left side of the “V”)Testers are involved in writingthe verification procedures for requirements at each level

SE Realization Processes (right side of the “V”))

Testers verify that productsat each level meet their requirements

Before integration at next higher level

(Image adapted from URL source: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm)

US Dept. of Transportation Federal Highway Administration, Web. URL: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm)



Product Lifecycle Test Phases(Image adapted from Systems Engineering Guide (2011), Mitre Corporation.)

• Multiple test phases• Test phases provide different sets of test data• Different sets of data answer different questions about the system

Mitre Corporation. (2011). Systems Engineering Guide: Test and Evaluation. Web. URL: http://www.mitre.org/work/systems_engineering/guide/se_lifecycle_building_blocks/test_evaluation/.


Problem Statement We need to find ways of combining test data from disparate

test events and test phases Two “normative” inferential statistical methods available

• Meta-Analysis• Bayesian Estimation

We will focus on Meta-Analysis Purpose of Study: Determine utility of Meta-Analysis for

simple analysis of multiple flight test data sets.


Problem Statement

What’s our goal?... To integrate multiple test data sets into (hopefully) amore statistically significant set of data results


Proposed Solution: Meta-Analysis What IS Meta-Analysis?

• Combines results from several studies to address a set of related research hypotheses

• The statistical synthesis of results from a series of studies (Borenstein, 2009)

Where is Meta-Analysis used?• Health (Sandelowski, 2000), Medicine, Pharmacology,

Education, Psychology, Business, Finance, Computer Simulations (Reese, 1996)

• Almost anywhere there is a need to assemble a summary of research studies on a given topic

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. Reese, C. S., Wilson, A. G., Hamada, M. S., Martz H. F., & Ryan, K. J. (1996). Integrated Analysis of Computer and Physical Experiments. Los Alamos National Labs, Report No. LA-UR-00-2915.Sandelowski (M. (2000). Focus on Research methods: Combining Qualitative and Quantitative Sampling, Data Collection, and Analysis Techniques in Mixed-Method Studies. Research in Nursing & Health, 23, 246-258.


Proposed Solution: Meta-Analysis What can Meta-Analysis provide?

• Way to combine results from multiple studies• Way to broadly cover large amounts of studies/tests

Limitations of Meta-Analysis? (Aguinas, 2011)• Viewed with suspicion in technical fields• File drawer problem• Mixing apples & oranges• Some studies may be ignored

Aguinas, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking Myths and Urban Legends about Meta-Analysis. Organizational Research Methods, 14(2), 306-331.


Proposed Solution: Meta-Analysis Different “flavors” of Meta-Analysis

• Fixed Effects Model• Random Effects Model

“Effect Sizes” - measure the strength of relationship between variables and are the summary statistic in Meta-Analysis (Shelby, 2008)

Effects may be use different models:• d-family (Hedges, g) – compares mean difference• r-family –compares correlation coefficients

Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.


Proposed Solution: Meta-Analysis

Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.

(Adapted from Shelby & Vaske, 2008 and Lipsey & Wilson, 2001)


Test Setup and Data Set Laser spot tracking “miss” distance

• Airborne FLIR/Laser system• Tested at NAS Patuxent River

5 separate data sets collected “Effect” is impact of flight environment

on mean of laser spot placement. Fixed Effects model IAW (Ruzni, 2010)

Improved Mobile IR Signature Target System

Ruzni, N., Idris, N., & Saidin, N. (2010). The Effects of the Choice of Meta-Analysis Model in the Overall Estimates for Continuous Data with Missing Standard Deviations. 2nd International Conference on Computer Engineering and Applications, 369-373.


Test Setup and Data Set

IMISTS

Pixel Pattern 16 x 16Temperature Range 0.25C / ~15C above ambientTarget Board Emissivity (3-12m) 0.95EO Camera Measurement Capability

YES

Laser Target Board Capability YES

Improved Mobile IR Signature Target System (IMISTS)

Table: IMISTS parameters


Test Setup and Data Set

Ground Test Measure Static Laser System Boresight Error (“Control”)

Flight Test Measures pointing accuracy under flight conditions (“Experiment”)

Flight Test Approach Video of Laser Spot on IMISTS Sample Data Points


Test Setup and Data Set Each of 5 ground data sets are “Control” group

• Ground data simulated in Matlab• Data radial offset distance simulated as N(0,0.1)• Data polar angle simulated as U(0,2π)• # simulated points for each set matched corresponding #

measured data points Each of 5 flight data sets are “Experimental” group to

consider effect of flight environment


Data AnalysisPLOTS OF SIMULATED GROUND TEST DATA

• All simulated data modeled as N(0,0.1) in radius and U(0,2π) in polar angle. • Results shown are averaged over 1000 simulations.


Data AnalysisPLOTS OF FLIGHT TEST DATA

• Note difference of grouping for each separate flight test event• Most data contained within 2 radius “units”


Data Analysis Descriptive Statistics of Simulated data sets

Table: Simulated data set statistics for average of 1000 simulations.

Descriptive Statistics of Flight data setsTable: Flight test data Statistics

Sim Data L1 L2 L3 L4 L5

Mean, XS 0.0038 0.0033 0.0008 0.0028 0.0062

SD, sS 0.0519 0.0525 0.0509 0.0473 0.0499

Samples, nS 300 300 300 300 300

Flight Data L1 L2 L3 L4 L5

Mean, XF 0.7038 1.5316 1.7316 0.4875 0.6012

SD, sF 0.3138 0.2770 0.3859 0.3212 0.2506

Samples, nF 300 300 300 300 300


Data AnalysisCombining data sets into a Summary Table

Table: Summary table of all data sets.Test Event Flight Data Simulated Data

Mean (XF) SD (sF) n (nF) Mean (XS) SD (sS) n (nS)

L1 0.7038 0.3138 300 0.0038 0.0519 300

L2 1.5316 0.2770 300 0.0033 0.0525 300

L3 1.7316 0.3859 300 0.0008 0.0509 300

L4 0.4875 0.3212 300 0.0028 0.0473 300

L5 0.6012 0.2506 300 0.0062 0.0499 300


Data AnalysisCalculating the Cohen Effect Size, d, using Direct Calculation Method for each of the data sets we use the standard formulas (Borenstein, 2009):

the variance, Vd,, and Standard Error, SEd, are given by

where: i = data set number (i=1,2,…,5); XF,XS = sample means

sF, sS = sample SDs; nF, nS = 300 = # samples for each set.

pooled

SF

SF

SSFF

SFi s

XX

nn

nsns

XXd ii

ii

iiii

ii

2

)1()1( 22

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.

)(2

2

ii

i

SF

id nn

dV

id dSE

i


Data AnalysisOur resulting table of Cohen effect sizes becomes:

Table: Calculated Cohen d effect size parameter.

These Effect Sizes d are VERY Small! (d < 0.1)

Test Event Cohen d effect size parameter

d Vd SEdL1 0.0011 0.0008 0.0289L2 0.0024 0.0008 0.0289L3 0.0027 0.0008 0.0289L4 0.0008 0.0008 0.0289L5 0.0009 0.0008 0.0289


Data AnalysisBut, the Cohen d effect size parameter tends to overestimate our effect size so we apply the Hedges J conversions using:

Table: Bias conversion using Hedges J parameter.

14

31

dfJ 2

ii SF nndf

Test Event Cohen d effect size parameter Hedges, J Conversion

d Vd SEd JL1 0.0011 0.0008 0.0289 0.9987

L2 0.0024 0.0008 0.0289 0.9987

L3 0.0027 0.0008 0.0289 0.9987

L4 0.0008 0.0008 0.0289 0.9987

L5 0.0009 0.0008 0.0289 0.9987Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK.


Data AnalysisSo we calculate our Hedges g effect size parameter using the following formula:

iii dJg And the variance, Vg,, and Standard Error, SEg, are given by

Table: Calculation of Hedges g parameter using J conversion.

ii dig VJV 2

ii gg VSE

Test Event

Cohen d effect size parameter

Hedges, J Conversion

Hedges g

Hedges Variance

Hedges Standard Error

d Vd SEd J g Vg SEgL1 0.0011 0.0008 0.0289 0.9987 0.0011 0.0017 0.0408L2 0.0024 0.0008 0.0289 0.9987 0.0024 0.0017 0.0408L3 0.0027 0.0008 0.0289 0.9987 0.0027 0.0017 0.0408L4 0.0008 0.0008 0.0289 0.9987 0.0008 0.0017 0.0408L5 0.0009 0.0008 0.0289 0.9987 0.0009 0.0017 0.0408


Data Results But, for our Fixed Effects model we also need the respective weighting effects of each

data set using: The weighting factor, Wi:

The relative weighting factor, Wr:

Product of Wi and Effect Size parameter, g

Sum of Wi and Wi *g

Table: Calculation of Weighting Factors.Test

Event

Hedges g

Hedges Variance

Hedges Standard Error Weighting Factor Rel. Weighting

FactorProduct

W*g

g Vg SEg Wi Wr W*g

L1 0.0011 0.0017 0.0408 600.0658 0.2001 0.6571

L2 0.0024 0.0017 0.0408 599.5196 0.1999 1.4432

L3 0.0027 0.0017 0.0408 599.4099 0.1998 1.6010

L4 0.0008 0.0017 0.0408 600.2063 0.2001 0.4548

L5 0.0009 0.0017 0.0408 600.1295 0.2001 0.5654

SUM: 2999.3312 1.0 4.7215

igi V

W1

5

1ii

ir

W

WW



Data Results Finally to compute our Summary Effect statistics we use the following:

And calculate our upper and lower 95% confidence levels as:

Producing the summary effects of the flight vs. simulated data

5

1

5

1

iii

ii

gW

WM

MSEMUL 96.195 MSEMLL 96.195

5

1

1

ii

M

WV

MM VSE

Summary Effect

Summary Variance

SummaryStandard

Error

Upper 95% CL

Lower 95% CL

M Vm SEm UL95 LL95

0.0016 0.0003 0.0183 0.0374 -3.5773



Data Results A forest plot of our g effect sizes and

Summary M effect produces:


Data Results But…we need to confirm Homogeneity of data sets using

Cochrane’s Q statistic:

Produces a Q value of Q=3002!! Our Chi-Square Critical Value (p=0.05, df=5-1=4) is: 9.488 Since our Q is inside the CV (9.488 < 3002) we reject the null

hypothesis that our variability is due to sampling error Homogeneity is NOT confirmed! To continue we would consider Meta-Regression analysis.

5

1

25

15

1

2

ii

iii

iii

W

gW

gWQ



Data Results

(Adapted from Shelby & Vaske, 2008 and Lipsey & Wilson, 2001)

Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.


Conclusions Results of preliminary Meta-Analysis:

1. Very small effect sizes (d, g, M all < 0.1)– Flight data does not produce significant statistical difference from

Ground data– Large data sets of same dimensions– Significant overlap between Flight vs. Ground

2. Homogeneity NOT confirmed via Q test– Random Effects models probably more accurate– Meta-Regression probably needed

3. Application to flight test data problematic Future Work should include:

1. Complete full Meta-Regression for Random Effects model2. Explore analysis of other flight test regimes3. Compare & Contrast with Bayesian methods


Summary

Purpose of Study: Determine utility of Meta-Analysis for simple analysis of multiple flight test data sets.

Did our study succeed?—Not as originally planned!

Additional Observations: Need to ensure sufficient number of data sets Meta-Analysis more complicated than initially thought Homogeneity of data sets of primary importance Advanced methods (e.g. Meta-Regression) start to look more

like conventional ANOVA or Multiple-Regression Application to Flight Test Data still unclear


ReferencesAnderson-Cook, C. M. (2009). Opportunities and issues in Multiple Data Type Meta-Analyses. Quality Engineering, 21, 243-253.Aguinas, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking Myths and Urban Legends about Meta-Analysis .

Organizational Research Methods, 14(2), 306-331.Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex,

UK. Defense Acquisition University. Test and Evaluation Management Guide. The Defense Acquisition University Press, Ft. Belvoir, VA. 2005.Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.Mitre Corporation. (2011). Systems Engineering Guide: Test and Evaluation. Web. URL:

http://www.mitre.org/work/systems_engineering/guide/se_lifecycle_building_blocks/test_evaluation/. Reese, C. S., Wilson, A. G., Hamada, M. S., Martz H. F., & Ryan, K. J. (1996). Integrated Analysis of Computer and Physical Experiments. Los

Alamos National Labs, Report No. LA-UR-00-2915.Ruzni, N., Idris, N., & Saidin, N. (2010). The Effects of the Choice of Meta-Analysis Model in the Overall Estimates for Continuous Data with

Missing Standard Deviations. 2nd International Conference on Computer Engineering and Applications, 369-373.Sandelowski (M. (2000). Focus on Research methods: Combining Qualitative and Quantitative Sampling, Data Collection, and Analysis

Techniques in Mixed-Method Studies. Research in Nursing & Health, 23, 246-258.Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110.US Dept. of Transportation Federal Highway Administration, Web. URL: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm)


combining dod test information from disparate test events

Documents

multiple test data sets

multiple test sources

dod test information

htmnavair public release

utility of metaanalysis

image courtesy of dau

metaanalysistest setup

metaanalysispurpose