what is novometric data analysis?

12
Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00 195 What is Novometric Data Analysis? Paul R. Yarnold, Ph.D. Optimal Data Analysis, LLC Novometric (Latin: new measure) statistical theory is introduced. The mathematical modeling methodology which is called optimal discriminant analysis or ODA identifies nonparametric statistical models that explicitly maximize the (weighted) accuracy of predicted class assignments for the sample. 1-5 It is crucial to understand that accuracy is not the same thing as explained variation or maximum likelihood. Neither of those centuries- old objective functions motivated development of methods which predict data very well, not even if effect sizes approach their theoretically maximum-possible valuesthereby making p- values incomputable due to sparsity issues. 6-9 In contrast, what one seeks using ODA is predictive precision. For example, imagine each individual case (“observation”) in a sample is either a member of class A or class B. Further imagine a statistical model is obtained to predict whether each individual case in the sample is a member of class A or class B. ODA finds the model which makes the most accurate possible predictions into classes A vs. B for the sample, given the data. The most accurate solution that exists for the sample is the optimal solution. 1-5 This paper briefly reviews etiology and ontogenesis of novometric (Latin: new measure- ment) statistical theory, the state-of-the-art ODA methodology. 5 Discussion first addresses crucial aspects of ODA classification models and of classification performance indices. Classification Models and Performance ODA is used in an effort to accurately predict a class variable, known as a dependent variable in legacy methods. Attributes, called independent variables in legacy methods, are used to predict class variables. Class variables and attributes may be ordered or (multi)categorical 10-12 ; any positive number may be used as an observation- level weight (e.g., propensity score 13-15 ); models may be static 16,17 or dynamic 18-26 ; a priori hy- potheses may be exploratory or confirmatory (i.e., one- or two-tailed); exact p-values and statistical power do not require distributional assumptions, and are easily computed 27-34 ; and optimal methods are rapidly and successfully adopted by research faculty and students. 35 Early use of maximum-accuracy pre- dictive methods often addressed engineering objectives. Prevalent then, and still today, the tradition was to maximize the (weighted) pro- portion of the total sample correctly classified, without considering the class membership status of individual observations. 4,36-41 The associated Percentage Accuracy in Classification or PAC index ranges between 0% and 100% correct, however this index is not normed vs. chance. The focus of modern research, the Effect Strength for Sensitivity or ESS index of classifi- cation accuracya function of the model mean sensitivity across classesis adjusted by prior

Upload: others

Post on 07-Jun-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

195

What is Novometric Data Analysis?

Paul R. Yarnold, Ph.D. Optimal Data Analysis, LLC

Novometric (Latin: new measure) statistical theory is introduced.

The mathematical modeling methodology which

is called optimal discriminant analysis or ODA

identifies nonparametric statistical models that

explicitly maximize the (weighted) accuracy of

predicted class assignments for the sample.1-5

It is crucial to understand that accuracy

is not the same thing as explained variation or

maximum likelihood. Neither of those centuries-

old objective functions motivated development

of methods which predict data very well, not

even if effect sizes approach their theoretically

maximum-possible values—thereby making p-

values incomputable due to sparsity issues.6-9

In contrast, what one seeks using ODA

is predictive precision. For example, imagine

each individual case (“observation”) in a sample

is either a member of class A or class B. Further

imagine a statistical model is obtained to predict

whether each individual case in the sample is a

member of class A or class B. ODA finds the

model which makes the most accurate possible

predictions into classes A vs. B for the sample,

given the data. The most accurate solution that

exists for the sample is the optimal solution.1-5

This paper briefly reviews etiology and

ontogenesis of novometric (Latin: new measure-

ment) statistical theory, the state-of-the-art ODA

methodology.5 Discussion first addresses crucial

aspects of ODA classification models and of

classification performance indices.

Classification Models and Performance

ODA is used in an effort to accurately predict a

class variable, known as a dependent variable in

legacy methods. Attributes, called independent

variables in legacy methods, are used to predict

class variables. Class variables and attributes

may be ordered or (multi)categorical10-12

; any

positive number may be used as an observation-

level weight (e.g., propensity score13-15

); models

may be static16,17

or dynamic18-26

; a priori hy-

potheses may be exploratory or confirmatory

(i.e., one- or “two”-tailed); exact p-values and

statistical power do not require distributional

assumptions, and are easily computed27-34

; and

optimal methods are rapidly and successfully

adopted by research faculty and students.35

Early use of maximum-accuracy pre-

dictive methods often addressed engineering

objectives. Prevalent then, and still today, the

tradition was to maximize the (weighted) pro-

portion of the total sample correctly classified,

without considering the class membership status

of individual observations.4,36-41

The associated

Percentage Accuracy in Classification or PAC

index ranges between 0% and 100% correct,

however this index is not normed vs. chance.

The focus of modern research, the Effect

Strength for Sensitivity or ESS index of classifi-

cation accuracy—a function of the model mean

sensitivity across classes—is adjusted by prior

Page 2: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

196

odds in order to eliminate the effect of chance.4

ESS=0 is the level of classification accuracy

that is expected by chance; ESS=100 is perfect

(errorless) classification; and -100< ESS<0

indicates classification accuracy worse than is

expected by chance.4,42-47

Regarding the theoretical meaning of

ESS, the findings of original Monte Carlo

investigations, in conjunction with advances

made in subsequent research, motivate the

continuing use of the categorical ordinal ESS

descriptors as originally proposed—which have

passed the test of time: ESS<25 is a relatively

weak effect; ESS<50 a moderate effect; ESS<75

is a relatively strong effect; ESS<90 is a strong

effect; and ESS>90 is a very strong effect.4

Models with a high ESS are favored in applied

translational and precision forecasting applica-

tions, owing to their adaptability to the diversity

reflected by legions of individual case profiles.

As ESS adjusts classification accuracy to

compensate for the role of chance, the distance

or D statistic adjusts ESS to compensate for the

role of model complexity—operationalized for-

mulaically in terms of number of sample strata

identified by the GO model.5 D indicates the

additional number of effects (which return the

mean ESS) needed to obtain a perfectly accurate

model.47-49

Models with a small D are favored in

theoretical applications, because they retain

only the empirically strongest findings.

Novometric Statistical Theory5,50,51

Novometric theory postulates the following set

of four axioms:

Axiom 1: For a random statistical sample S1

consisting of a class variable, one or more

attributes and a weight, corresponding exact

discrete 95% confidence intervals obtained for

model and for chance do not overlap (i.e., a

“statistically significant effect” exists).52-56

Axiom 2: In applications involving more than

one attribute, the subset of attributes in S1 which

yields the globally optimal (GO) model having

the lowest D statistic is identified via structural

decomposition analysis (SDA), which involves

iterative application of ODA in a manner that is

conceptually analogous to principal components

analysis, but that explicitly maximizes classifi-

cation accuracy—instead of variance.4,35,57-61

Note: In my experience using ODA

software, a primary utility of Axiom 2 (which

required four years to authenticate) lies in the

analysis of data sets having large N and many

attributes bearing a weak- to moderate-strength

relationship to the class variable, that degrades

in LOO analysis. Analyzing such data it is not

inconceivable to use CPU days in unsuccessful

attempts (attributable to researcher fatigue) to

identify the initial CTA model. In my personal

experience, if I can obtain a descendant family

(see Axiom 3) in reasonable time-on-task (an

hour or less), then I often forego Axiom 2.

Axiom 3: The GO model for S1 is found in the

descendant family of models that is obtained by

applying the minimum denominator search

algorithm (MDSA) to an initially-unrestricted

enumerated classification tree analysis (CTA)

model configured to predict the class variable

using only the attribute(s) selected by SDA. The

MDSA algorithm enables discovery of all CTA

models in S1 which vary as a function of com-

plexity and that originate from an unadulterated

(initially unrestricted) model.5,13,18

Note: I learned to not make assumptions

when obtaining a descendant family. Suffice it

to say, I continue to augment the minimum de-

nominator until the software returns a message

stating “no model found”.

Axiom 4: Validity is assessed by using the GO

model to classify an independent random S2 (if

possible), and/or for S1 by cross-generalizability

methods such as hold-out, leave-one-out (LOO)

one-sample jackknife, K-of-N jackknife, split-

half, multi-sample and/or bootstrap methods for

Page 3: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

197

static data4, and using test-retest

62-67, survival

(time-to-event)21,23

, little jiffy68,69

, and weighted

Markov70-74

methods with dynamic (repeated-

measures) data. These types of analyses enable

researchers to identify the limits of cross-gener-

alizability of GO models when they are used to

classify independent random samples.4,5

Discussion now considers two simple

real-world examples of novometric analysis.

Gender and Cancer Incidence

Imagine evaluating if the cancer incidence rate

(CIR) for all cancer types combined, assessed

across numerous independent sites, differs by

gender—male vs. female.50

If using a legacy methodology such as

t-test75-77

or logistic regression9,78-80

to compare

CIR between gender, one’s conclusion would

necessarily be: (a) the absence of a statistically

significant effect indicates the CIR of males and

females does not differ; or (b) the presence of a

statistically significant effect indicates males

have a (higher or lower) CIR vs. females.

In contrast, having only two options as a

possible analytic solution in this example is not

a restriction for novometric analysis.

Data were drawn from the Surveillance,

Epidemiology and End Results (SEER) Pro-

gram, which collects and publishes cancer inci-

dence and survival data to assemble and report

estimates of cancer incidence, survival, mortal-

ity, other measures of the cancer burden, and

patterns of care, in the USA. SEER statistics

routinely include information specific to race/

ethnic populations and other populations de-

fined by age, gender and geography. Herein,

CIR is the number of new cancers of a specific

site that occurs in a specified population in one

year, expressed as the number of new cancers

for every 100,000 population at risk.

Analysis involved N=608 observations.

Assuming proportional reduction in sample size

across successive parses: a single binary parse

creates two strata each having N=304 observa-

tions; two binary parses create four strata each

having N=152 observations; and three binary

parses create eight strata each having N=76

observations. This sample size is more than

sufficient34

to achieve greater than 90% power

to identify a moderate effect with p<0.05.

Table 1 summarizes findings of novo-

metric analysis using gender to parse CIR data.

The initial (6 strata) CTA model that emerged

had moderate ESS: its point estimate (33.2) and

the upper (41.2) and lower (25.4) bounds of the

95% CI for model-based ESS all indicate mod-

erate effect. The 95% CIs for chance-based ESS

lie substantially beneath corresponding 95% CIs

for model-based ESS. Note that since no two-

strata model comparing all males vs. all females

exists in the descendant family, this is a statisti-

cally unmotivated, exploratory comparison.

Table 1: Parsing Cancer Incidence by Gender:

All Sites Combined

Strata MinD ESS Efficiency D

-----------------------------------------------------------------

6 2 33.2 5.54 12.1

25.4-41.2 4.22-6.87 8.6-17.6

0.33-7.57 0.06-1.26 73.3-1812

5 63 33.2 6.64 10.1

25.3-41.2 5.05-8.23 7.1-14.8

0.33-6.91 0.07-1.38 67.4-1510

3 80 31.9 10.6 6.4

22.8-40.6 7.60-13.5 4.4-10.2

0.33-7.57 0.11-2.52 36.6-906

-----------------------------------------------------------------

Note: Strata is number of CTA model endpoints.

MinD (for minimum denominator) is the smallest

sample size for any strata. Efficiency, defined as

ESS/Strata, is a normed index of relative strength

of the class variable(s) used in identifying sample

strata. Results for every step of MDSA analysis

are tabled. For each model the first line is a point

estimate; the second line gives a 95% confidence

interval (CI) for the discrete distribution obtained

for the model by bootstrap analysis; and the third

line is a 95% CI for chance obtained using Monte

Carlo analysis with 100,000 iterations. Values for

model efficiency were computed by dividing ESS

by the number of strata.

Page 4: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

198

As seen in Table 1 the initial CTA model

identified six strata, the smallest of which repre-

sented only two observations. The 95% CI for

this endpoint (not tabled) included 0%, render-

ing the CTA model redundant.

In step two of the minimum denominator

selection algorithm, a LOO-stable five-strata

CTA model was obtained yielding moderate

ESS equivalent to that of the initial six-strata

model, as assessed by point estimate and by

95% CI overlap. The five-strata model had

6.64/5.54 or 19.9% greater efficiency as

assessed by point estimate, however this

difference wasn’t statistically significant

because the five- and six-strata 95% CIs for

efficiency overlapped. The smallest strata for

this model had N=63 patients.

In step three the final LOO-stable model

in the descendant family identified three patient

strata. Based on point estimates the final model

achieved 31.9/33.2 or 96% of overall accuracy

(ESS) vs. the other models. And, examining

95% CIs for model-based ESS reveals that all

models had comparable ESS (the three-strata

model had greater efficiency vs. the six-strata

model). However, because the lower-bound of

ESS of the three-attribute model was 22.8 (or

relatively weak), the ESS for the three-strata

model indicates a weak-moderate effect.

Having achieved comparable strength,

and greater parsimony and efficiency vs. other

models in the descendant family, the three-strata

model is selected as being the GO model of the

relationship between gender and all-site cancer

incidence. Illustrated in Figure 1, circles (model

nodes) represent attributes (cancer incidence);

arrows indicate model branches; rectangles are

model end-points and represent sample strata;

numbers adjacent to arrows are the value of the

cutpoint for the node; numbers beneath nodes

are the exact per-comparison Type I error rate

for the parse; the number of observations classi-

fied in each endpoint is indicated beneath the

endpoint; and the percentage of male observa-

tions is given inside the endpoint.

Figure 1: Three-Strata CTA Model Predicting

Cancer Incidence by Gender, All Cancer Sites

Cancer

Incidence

53.45%

Males

30.83%

Males

98.75%

Males

N=275 N=253 N=80

p<0.0001 p<0.0001

<249.439

<2066.0435

>2066.0435

As seen, the sample strata having lowest

cancer incidence (<0.25%) was approximately

equally represented by males and females, and it

comprised 275/608 or 45% of the total sample.

In contrast, the sample strata with highest cancer

incidence (>2.06%) was dominated (98.8%) by

males, and comprised 13% of the sample. And,

the sample strata having an intermediate cancer

incidence (0.26%-2.065%) was 69.2% female

and comprised 42% of the total sample.

Fear, Specific Recommendation, and

Inoculation Shot-Taking Behavior

Classic research tested the a priori hypothesis

that fear and specificity of recommendation

synergistically influence a person’s decision to

have a tetanus inoculation.81

Data analysis by 2

missed the hypothesized interaction: “…specific

plans for action influence behavior while level

of fear does not” (p. 27).

For these data, CTA with shot taking

(yes vs. no) used as class variable, and fear and

specific recommendation as attributes, found a

relatively strong, cross-generalizable two-strata

model (Figure 2) supporting the a priori hypoth-

esis that fear and specific recommendations syn-

ergistically predict shot-taking behavior.82

The

confusion matrix for this three-strata model is

presented in Table 2.

Page 5: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

199

Figure 2: Fear and Specific Recommendation

Predict Shot-Taking Behavior

Fear

Specific

Recommendation

0%

Took Shot

3.3%

Took Shot

27.6%

Took Shot

p<0.001

p<0.014

N=30 N=29

N=90

No Yes

No Yes

Table 2: Confusion Matrix for CTA Model,

Factorial Design, Fear and Specific

Recommendation

Predicted Behavior

No Shot Shot

Actual No Shot 119 21 85.0%

Behavior Shot 1 8 88.9%

A statistically reliable, reproducible, and

strong effect (ESS=73.9) emerged: as seen, 7 in

8 of the people refusing the shot, and 8 in 9 of

the people accepting the shot, were correctly

predicted by the model. Consistent with the

original hypothesis, fear and specific recom-

mendation interact to affect shot-taking behav-

ior. As the root variable, fear plays the dominant

role. When there is no fear, 0% of 90 people

took the shot. When there is fear, but no specific

recommendation is given, 3.3% of 30 people

took the shot. However when there is fear and

specific recommendation, 27.6% of 29 people

took the shot.

Coda

In the event that this brief introduction leaves

one wanting to know more, I recommend a few

papers discussing modeling ordered class

variables83-85

, matrix display of findings86,87

, and

generating novometric confidence intervals

using R software.88

In addition, articles are

available which contrast novometric analysis

with legacy methods such as ANOVA89-93

,

Friedman test94

, log-linear analysis95-99

, logit

analysis100

, logistic regression101

, Markov

analysis102,103

, regression/correlation104-109

,

polychoric correlation110,111

, Probit analysis112

,

recursive causal analysis113

, sign test114

,

stepwise analysis115

, temporal analysis116-119

,

and Yule’s Q.120

References

1Yarnold PR (2014). “A statistical guide for the

ethically perplexed” (Chapter 4, Panter &

Sterba, Handbook of Ethics in Quantitative

Methodology, Routledge, 2011): Clarifying

disorientation regarding the etiology and

meaning of the term Optimal as used in the

Optimal Data Analysis (ODA) paradigm.

Optimal Data Analysis, 3, 30-31.

2Yarnold PR, Soltysik RC (2010). Optimal data

analysis: A general statistical analysis paradigm.

Optimal Data Analysis, 1, 10-22.

3Yarnold PR (2017). What is optimal data

analysis? Optimal Data Analysis, 6, 26-42.

4Yarnold PR, Soltysik RC (2005). Optimal data

analysis: Guidebook with software for Windows.

Washington, D.C.: APA Books.

5Yarnold PR, Soltysik RC (2016). Maximizing

predictive accuracy. Chicago, IL: ODA Books.

DOI: 10.13140/RG.2.1.1368.3286

Page 6: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

200

6Grimm LG, Yarnold PR [Eds.] (1985). Reading

and understanding multivariate statistics.

Washington, D.C.: APA Books.

7Grimm LG, Yarnold PR [Eds.] (2000). Reading

and understanding more multivariate statistics.

Washington, D.C.: APA Books.

8Yarnold PR, Bryant FB, Soltysik RC (2013).

Maximizing the accuracy of multiple regression

models via UniODA: Regression away from the

mean. Optimal Data Analysis, 2, 19-25.

9Yarnold PR (2013). Univariate and multivariate

analysis of categorical attributes with many

response categories. Optimal Data Analysis, 2,

177-190.

10 Yarnold PR (2018). Minimize usage of binary

measurement scales in rigorous classical

research. Optimal Data Analysis, 7, 3-9.

11Yarnold PR (2010). Aggregated vs. referenced

categorical attributes in UniODA and CTA.

Optimal Data Analysis, 1, 46-49.

12Yarnold PR (2014). “Breaking-up” an ordinal

variable can reduce model classification

accuracy. Optimal Data Analysis, 3, 19.

13Linden A, Yarnold PR (2017). Using classifi-

cation tree analysis to generate propensity score

weights. Journal of Evaluation in Clinical

Practice, 23, 703-712. DOI: 10.1111/jep.12744

14Yarnold PR, Linden A (2017). Computing

propensity score weights for CTA models

involving perfectly predicted endpoints.

Optimal Data Analysis, 6, 43-46.

15Linden A, Yarnold PR (2016). Combining

machine learning and propensity score

weighting to estimate causal effects in

multivalued treatments. Journal of Evaluation in

Clinical Practice, 22, 875-885. DOI:

10.1111/jep.12610

16Linden A, Yarnold PR (2016). Using data

mining techniques to characterize participation

in observational studies. Journal of Evaluation

in Clinical Practice, 22, 839-847. DOI:

10.1111/jep.12515

17Linden A, Yarnold PR (2016). Using machine

learning to assess covariate balance in matching

studies. Journal of Evaluation in Clinical

Practice, 22, 848-854. DOI: 10.1111/jep.12538

18Linden A, Yarnold PR (2016). Using machine

learning to identify structural breaks in single-

group interrupted time series designs. Journal of

Evaluation in Clinical Practice, 22, 855-859.

DOI: 10.1111/jep.12544

19Linden A, Yarnold PR, Nallamothu BK

(2016). Using machine learning to model dose-

response relationships. Journal of Evaluation in

Clinical Practice, 22, 860-867. DOI:

10.1111/jep.12573

20Yarnold PR, Linden A (2016). Using machine

learning to model dose-response relationships:

Eliminating response variable baseline variation

by ipsative standardization. Optimal Data

Analysis, 5, 41-52.

21Linden A, Yarnold PR (2017). Modeling time-

to-event (survival) data using classification tree

analysis. Journal of Evaluation in Clinical

Practice, 23, 1299-1308. DOI:

10.1111/jep.12779

22Linden A, Yarnold PR (2018). Identifying

causal mechanisms in health care interventions

using classification tree analysis. Journal of

Evaluation in Clinical Practice, 24, 353-361.

DOI: 10.1111/jep.12848

Page 7: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

201

23Linden A, Yarnold PR (2018). Estimating

causal effects for survival (time-to-event)

outcomes by combining classification tree

analysis and propensity score weighting.

Journal of Evaluation in Clinical Practice, 24,

380-387. DOI: 10.1111/jep.12859

24Linden A, Yarnold PR (2018). Using machine

learning to evaluate treatment effects in

multiple-group interrupted time series analysis.

Journal of Evaluation in Clinical Practice, 24,

740-744 DOI: 10.1111/jep.12966

25Yarnold PR (2020). Optimal weighted Markov

analysis: Modeling weekly RG-score. Optimal

Data Analysis, 9, 122-123.

26Yarnold PR (2020). Optimal weighted Markov

analysis: Predicting serial RG-score. Optimal

Data Analysis, 9, 124-127.

27Yarnold PR, Soltysik RC (1991). Theoretical

distributions of optima for univariate

discrimination of random data. Decision

Sciences, 22, 739-752.

28Soltysik RC, Yarnold PR (1994). Univariable

optimal discriminant analysis: One-tailed

hypotheses. Educational and Psychological

Measurement, 54, 646-653.

29Carmony L, Yarnold PR, Naeymi-Rad F

(1998). One-tailed Type I error rates for

balanced two-category UniODA with a random

ordered attribute. Annals of Operations

Research, 74, 223-238.

30Soltysik RC, Yarnold PR (2010). Automated

CTA software: Fundamental concepts and

control commands. Optimal Data Analysis, 1,

144-160.

31Soltysik RC, Yarnold PR (2013). MegaODA

large sample and BIG DATA time trials:

Separating the chaff. Optimal Data Analysis, 2,

194-197.

32Soltysik RC, Yarnold PR (2013). MegaODA

large sample and BIG DATA time trials:

Harvesting the Wheat. Optimal Data Analysis,

2, 202-205.

33Yarnold PR, Soltysik RC (2013). MegaODA

large sample and BIG DATA time trials:

Maximum velocity analysis. Optimal Data

Analysis, 2, 220-221.

34Rhodes NJ (2020). Statistical power analysis

in ODA, CTA and novometrics (Invited).

Optimal Data Analysis, 9, 21-25.

35Bryant FB (2010). The Loyola experience

(1993-2009): Optimal data analysis in the

Department of Psychology. Optimal Data

Analysis, 1, 4-9.

36Yarnold PR, Soltysik RC, Martin GJ (1994).

Heart rate variability and susceptibility for

sudden cardiac death: An example of

multivariable optimal discriminant analysis.

Statistics in Medicine, 13, 1015-1021.

37Soltysik RC, Yarnold PR (1994). The

Warmack-Gonzalez algorithm for linear two-

category multivariable optimal discriminant

analysis. Computers and Operations Research,

21, 735-745.

38Yarnold PR, Soltysik RC, McCormick WC,

Burns R, Lin EHB, Bush T, Martin GJ (1995).

Application of multivariable optimal discri-

minant analysis in general internal medicine.

Journal of General Internal Medicine, 10, 601-

606.

39Yarnold PR, Soltysik RC, Lefevre F, Martin

GJ (1998). Predicting in-hospital mortality of

patients receiving cardiopulmonary resuscita-

tion: Unit-weighted MultiODA for binary data.

Statistics in Medicine, 17, 2405-2414.

Page 8: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

202

40Soltysik RC, Yarnold PR (2010). Two-group

MultiODA: Mixed-integer linear programming

solution with bounded M. Optimal Data

Analysis, 1, 31-37.

41Yarnold PR (2016). Maximizing overall

percentage accuracy in classification:

Discriminating study groups in the National

Pressure Ulcer Long-Term Care Study

(NPULS). Optimal Data Analysis, 5, 29-30.

42Yarnold PR (2018). Visualizing application

and summarizing accuracy of ODA models.

Optimal Data Analysis, 7, 85-89.

43Yarnold PR (2019). Value-added by ODA vs.

chi-square. Optimal Data Analysis, 8, 11-14.

44Yarnold PR, Brofft GC (2013). Comparing

knot strength with UniODA. Optimal Data

Analysis, 2, 54-59.

45Yarnold PR, Brofft GC (2013). ODA range

test vs. one-way analysis of variance:

Comparing strength of alternative line

connections. Optimal Data Analysis, 2, 198-

201.

46Yarnold PR (2015). ESS as an index of

decision consistency. Optimal Data Analysis, 4,

197.

47Yarnold PR (2015). Knowing (ESS) and not

knowing (D). Optimal Data Analysis, 4, 198.

48Yarnold PR (2015). Distance from a

theoretically ideal statistical classification model

defined as the number of additional equivalent

effects needed to obtain perfect classification for

the sample. Optimal Data Analysis, 4, 81-86.

49Yarnold PR, Linden A (2016). Theoretical

aspects of the D statistic. Optimal Data

Analysis, 5, 171-174.

50Yarnold PR, Soltysik RC (2014). Globally

optimal statistical classification models, I:

Binary class variable, one ordered attribute.

Optimal Data Analysis, 3, 55-77.

51Yarnold PR, Soltysik RC (2014). Globally

optimal statistical classification models, II:

Unrestricted class variable, two or more

attributes. Optimal Data Analysis, 3, 78-84.

52Yarnold PR (2020). Reformulating the first

axiom of novometric theory: Assessing

minimum sample size in experimental design.

Optimal Data Analysis, 9, 7-8.

53Yarnold PR (2014). Illustrating how 95%

confidence intervals indicate model redundancy.

Optimal Data Analysis, 3, 96-97.

54Yarnold PR, Soltysik RC (2014). Discrete

95% confidence intervals for ODA model- and

chance-based classifications. Optimal Data

Analysis, 3, 110-112.

55Rhodes JN, Yarnold PR (2020). Generating

novometric confidence intervals in R: Bootstrap

analyses to compare model and chance ESS.

Optimal Data Analysis, 9, 172-177.51

56Rhodes NJ (2020). Assessing reproducibility

of novometric bootstrap confidence interval

analysis using multiple seed numbers (Invited).

Optimal Data Analysis, 9, 190-194.

57Yarnold PR (2015). Modeling intended and

awarded college degree via-v-vis UniODA-

based structural decomposition. Optimal Data

Analysis, 4, 177-178.

58Yarnold PR (2015). UniODA-based structural

decomposition vs. log-linear model: Statics and

dynamics of intergenerational class mobility.

Optimal Data Analysis, 4, 179-181.

Page 9: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

203

59Yarnold PR (2015). Modeling religious

mobility by UniODA-based structural

decomposition. Optimal Data Analysis, 4, 192-

193.

60Yarnold PR (2015). UniODA-based structural

decomposition vs. legacy linear models: Statics

and dynamics of intergenerational occupational

mobility. Optimal Data Analysis, 4, 194-196.

61Yarnold PR (2016). GenODA structural

decomposition vs. log-linear model of one-step

Markov transition data: Stability and change in

male geographic mobility in 1944-1951 and

1951-1953. Optimal Data Analysis, 5, 213-215.

62Yarnold PR (2013). Comparing attributes

measured with “identical” Likert-type scales in

single-case designs with UniODA. Optimal

Data Analysis, 2, 148-153.

63Yarnold PR (2013). Comparing responses to

dichotomous attributes in single-case designs.

Optimal Data Analysis, 2, 154-156.

64Yarnold PR (2014). UniODA vs. kappa:

Evaluating the long-term (27-year) test-retest

reliability of the Type A Behavior Pattern.

Optimal Data Analysis, 3, 14-16.

65Yarnold PR (2014). How to assess inter-

observer reliability of ratings made on ordinal

scales: Evaluating and comparing the

Emergency Severity Index (Version 3) and

Canadian Triage Acuity Scale. Optimal Data

Analysis, 3, 42-49.

66Yarnold PR (2015). UniODA vs. sign test:

Comparing repeated ordinal scores. Optimal

Data Analysis, 4, 151-153

67Yarnold PR (2015). UniODA vs. eyeball

analysis: Comparing repeated ordinal scores.

Optimal Data Analysis, 4, 154-155.

68Yarnold PR (2013). Surfing the Index of

Consumer Sentiment: Identifying statistically

significant monthly and yearly changes. Optimal

Data Analysis, 2, 211-216.

69Yarnold PR (2013). Determining when annual

crude mortality rate most recently began

increasing in North Dakota counties, I:

Backward-stepping little jiffy. Optimal Data

Analysis, 2, 217-219.

70Yarnold PR (2018). Initial research using

ODA in Markov process modelling. Optimal

Data Analysis, 7, 90-91.

71Yarnold PR (2019). Maximum-precision

Markov transition table: Successive daily

change in closing price of a utility stock.

Optimal Data Analysis, 8, 3-10.

72Yarnold PR, Soltysik RC (2019). Confirming

the efficacy of weighting in optimal Markov

analysis: Modeling serial symptom ratings.

Optimal Data Analysis, 8, 53-55.

73Yarnold PR (2019). Optimal Markov model

relating two time-lagged outcomes. Optimal

Data Analysis, 8, 61-63.

74Yarnold PR (2020). Modeling an individual’s

weekly change in RG-score via novometric

single-case analysis. Optimal Data Analysis, 9,

114-121.

75Yarnold PR (2014). UniODA vs. t-Test:

Comparing two migraine treatments. Optimal

Data Analysis, 3, 6-8.

76Yarnold PR (2015). UniODA vs. t-Test: Low

brain uptake of L-[11

C]5-hydroxytryptophan in

major depression. Optimal Data Analysis, 4,

146-147.

77Yarnold PR (2019). ODA vs. t-test: Lysozyme

levels in the gastric juice of patients with peptic

ulcer vs. normal controls. Optimal Data

Analysis, 8, 112-113.

Page 10: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

204

78Yarnold PR (2014). UniODA vs. logistic

regression analysis: Serum cholesterol and

coronary heart disease and mortality among

middle aged diabetic men. Optimal Data

Analysis, 3, 17-18.

79Yarnold PR (2015). UniODA vs. logistic

regression and Fisher’s linear discriminant

analysis: Modeling 10-year population change.

Optimal Data Analysis, 4, 139-145.

80Linden A, Bryant FB, Yarnold PR (2019).

Logistic discriminant analysis and structural

equation modeling both identify effects in

random data. Optimal Data Analysis, 8, 97-102.

81Leventhal H, Singer R, Jones S (1965). Effects

of fear and specificity of recommendation upon

attitudes and behavior. Journal of Personality

and Social Psychology, 2, 20-29.

82Yarnold PR (2016). CTA vs. not chi-square:

Fear and specific recommendations do

synergistically affect behavior. Optimal Data

Analysis, 5, 108-111.

83Yarnold PR Linden A (2016). Novometric

analysis with ordered class variables: The

optimal alternative to linear regression analysis,

Optimal Data Analysis, 5, 65-73.

84Yarnold PR Bennett CL (2016). Novometrics

vs. correlation: Age and clinical measures of

PCP survivors, Optimal Data Analysis, 5, 74-

78.

85Yarnold PR Bennett CL (2016). Novometrics

vs. multiple regression analysis: Age and

clinical measures of PCP survivors, Optimal

Data Analysis, 5, 79-82.

86Yarnold PR (2016). Matrix display of pairwise

novometric associations for ordered variables.

Optimal Data Analysis, 5, 94-101.

87Yarnold PR Batra M (2016). Matrix display of

pairwise novometric associations for mixed-

metric variables. Optimal Data Analysis, 5, 104-

107.

88Rhodes JN, Yarnold PR (2020). Generating

novometric confidence intervals in R: Bootstrap

analyses to compare model and chance ESS.

Optimal Data Analysis, 9, 172-177.

89Yarnold PR (2016). Novometrics vs. ODA vs.

One-Way ANOVA: Evaluating comparative

effectiveness of sales training programs, and the

importance of conducting LOO with small

samples. Optimal Data Analysis, 5, 131-132.

90Yarnold PR (2016). Novometric analysis vs.

MANOVA: MMPI codetype, gender, setting,

and the MacAndrew Alcoholism scale. Optimal

Data Analysis, 5, 177-178.

91Yarnold PR (2018). ANOVA with one

between-groups factor vs. novometric analysis.

Optimal Data Analysis, 7, 23-25.

92Yarnold PR (2018). ANOVA with two

between-groups factors vs. novometric analysis.

Optimal Data Analysis, 7, 26-27.

93Yarnold PR (2018). ANOVA with three

between-groups factors vs. novometric analysis.

Optimal Data Analysis, 7, 36-39.

94Yarnold PR (2018). Friedman test vs. ODA vs.

novometry: Rating violin excellence. Optimal

Data Analysis, 7, 16-18.

95Yarnold PR (2016). Novometric analysis

predicting voter turnout: Race, education, and

organizational membership status. Optimal Data

Analysis, 5, 194-197.

96Yarnold PR (2016). Novometric analysis vs.

GenODA vs. log-linear model: Temporal

stability of the association of presidential vote

choice and party identification. Optimal Data

Analysis, 5, 204-207.

Page 11: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

205

97Yarnold PR (2016). Novometric analysis vs.

ODA vs. log-linear model in analysis of a two-

wave panel design: Assessing temporal stability

of Catholic party identification in the 1956-1960

SRC panels. Optimal Data Analysis, 5, 208-212.

98Yarnold PR (2016). Novometric vs. log-linear

model: Intergenerational occupational mobility

of white American men. Optimal Data

Analysis, 5, 218-222.

99Yarnold PR (2016). Novometric vs. log-linear

analysis: Church attendance, age and religion.

Optimal Data Analysis, 5, 229-232.

100Yarnold PR (2016). Novometric vs. logit

analysis: Abortion attitude by religion and time.

Optimal Data Analysis, 5, 225-228.

101Yarnold PR (2016). Novometric models of

smoking habits of male and female friends of

American college undergraduates: Gender,

smoking, and ethnicity. Optimal Data

Analysis, 5, 146-150.

102Yarnold PR (2017). Novometric analysis of

transition matrices to ascertain Markovian order.

Optimal Data Analysis, 6, 5-8.

103Yarnold PR (2017). Novometric comparison

of Markov transition matrices for heterogeneous

populations. Optimal Data Analysis, 6, 9-12.

104Yarnold PR (2016). Novometrics vs.

regression analysis: Literacy, and age and

income, of ambulatory geriatric patients.

Optimal Data Analysis, 5, 83-85.

105Yarnold PR (2016). Novometrics vs.

regression analysis: Modeling patient

satisfaction in the Emergency Room. Optimal

Data Analysis, 5, 86-93.

106Linden A, Yarnold PR (2018). Comparative

accuracy of a diagnostic index modeled using

(optimized) regression vs. novometrics. Optimal

Data Analysis, 7, 66-71.

107Yarnold PR (2019). Multiple regression vs.

novometric analysis of a contingency table.

Optimal Data Analysis, 8, 48-52.

108Yarnold PR (2019). Regression vs.

novometric analysis predicting income based on

education. Optimal Data Analysis, 8, 81-83.

109Yarnold PR (2019). Regression vs.

novometric-based assessment of inter-examiner

reliability. Optimal Data Analysis, 8, 107-111.

110Yarnold PR (2016). Novometric vs. ODA

reliability analysis vs. polychoric correlation

with relaxed distributional assumptions: Inter-

rater reliability of independent ratings of plant

health. Optimal Data Analysis, 5, 179-183.

111Yarnold PR (2016). Novometrics vs.

polychoric correlation: Number of lambs born

over two years. Optimal Data Analysis, 5, 184-

185.

112Yarnold PR (2016). Novometric vs. logit vs.

probit analysis: Using gender and race to predict

if adolescents ever had sexual intercourse.

Optimal Data Analysis, 5, 223-224.

113Yarnold PR (2016). Novometric vs. recursive

causal analysis: The effect of age, education,

and region on support of civil liberties. Optimal

Data Analysis, 5, 200-203.

114Yarnold PR (2016). Using novometrics to

disentangle complete sets of sign-test-based

multiple-comparison findings. Optimal Data

Analysis, 5, 175-176.

115Yarnold PR, Linden A (2019). Novometric

stepwise CTA analysis discriminating three

class categories using two ordered attributes.

Optimal Data Analysis, 8, 68-71.

116Yarnold PR (2017). Novometric comparison

of patient satisfaction with nurse responsiveness

over successive quarters. Optimal Data

Analysis, 6, 13-17.

Page 12: What is Novometric Data Analysis?

Optimal Data Analysis Copyright 2020 by Optimal Data Analysis, LLC

Vol. 9 (June 27, 2020), 195-206 2155-0182/10/$3.00

206

117Yarnold PR (2017). Novometric pairwise

comparisons in consolidated temporal series.

Optimal Data Analysis, 6, 18-24.

118Linden A, Yarnold PR (2018). The Australian

Gun Buyback Program and rate of suicide by

Firearm. Optimal Data Analysis, 7, 28-35.

119Yarnold PR (2020). Modeling an individual’s

weekly change in RG-score via novometric

single-case analysis. Optimal Data Analysis, 9,

114-121.

120Yarnold PR (2016). Novometrics vs. Yule’s

Q: Voter turnout and organizational

membership. Optimal Data Analysis, 5, 198-

199.

Author Notes

No conflicts of interest were reported.