token swap contingency tables in three dimensions: paradigm for biomedical data analysis. g. william...

42
Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A. Brown, MD.

Upload: mervyn-dickerson

Post on 16-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis.

G. William Moore, MD, PhD.

Grover M. Hutchins, MD.

Lawrence A. Brown, MD.

Page 2: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Disclaimer.

United States Government Work, uncopyrighted, public-domain, DRAFT COPY ONLY. This document does not necessarily represent the views or policies of any United States Government agency. This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of, or in connection with the document or the use or other dealings made with the document.

Page 3: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Abstract.Context: Contingency tables are commonly used for organizing frequency data on

biomedical databases. Classical statistical methods applied to contingency tables include chisquare and Fisher exact methods, based upon squared-normal and binomial distributions. In the token swap method, patients, or tokens, in the contingency table are randomly swapped, to determine whether observed data deviate from a preset null hypothesis.

Technology: Perl programming language, theory of statistics.Design: The simplest contingency table is a rectangular table, consisting of four cells, two

rows by two columns, that measures association between row and column variables in a misclassification space. The null hypothesis predicts expected values for each cell; tokens are randomly swapped until they match observed values. More generally, a three-dimensional contingency table has rows, columns, and depths, representing a variable for ultimate biomedical outcome.

Results: The two- and three-dimensional token swap methods satisfy the Neyman-Pearson condition for power of the alternative hypothesis. Unlike classical methods, the token swap method supports a range of null hypotheses, including those with zero cell totals.

Conclusion: The present model extends the range of existing contingency table analysis to incorporate additional clinicopathologic information, and to explore customized null hypotheses.

Page 4: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Contingency Table.

1. Commonly used for organizing biomedical frequency data.

2. Simplest contingency table: 2×2 table.

3. Rectangular table, 2 rows, 2 columns.

4. Φ: established/old test; Ψ: new test.

5. Determines statistical correlation between independent variables Φ and Ψ.

Page 5: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Contingency Table: example.

1. 71 patients autopsied with sickle cell disease.

2. 20 patients with pain crisis, 9 deaths unexplained at autopsy (45%).

3. 51 patients without pain crisis, 4 deaths unexplained at autopsy (8.5%).

4. Φ: established/old test, i.e., death unexplained at autopsy.

5. Ψ: new test, i.e., clinical pain crisis.

Page 6: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Contingency Table: Example.

1. Is there a correlation between pain crisis and death unexplained at autopsy?

2. Chisquare method: χ2 = 11.18551587, 1 d.f., p<0.0001.

3. Fisher exact method: p=0.000792.

4. Token swap method: p=0.00112452925.

Page 7: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Contingency Table: Terminology.

1. Marginal totals: Φ:False=a+c=x; Φ:True = b+d = y; Ψ:False = a+b = v; Ψ:True = c+d = w.

2. Grand total: (a+b+c+d)=(w+v)=(x+y)=z.

3. Positive diagonal (true negatives, true positives): a, d.

4. Negative diagonal (false positives, false negatives): b, c.

Page 8: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Problems with Classical Methods.

• 1. Chisquare (χ2) method fails if 20% of cell totals are small (less than 5).

• 2. Both methods assume random sampling, statistical independence.

• 3. Limited freedom to customize null hypothesis.

• 4. No distinction between established test and ultimate followup.

Page 9: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Misclassification Paradigm.

• 1. Classical statistics, cell-frequencies: either entire population, or random sample of population.

• 2. Token swap method, cell frequencies: misclassifications: false negatives, false positives.

• 3. Classical null hypothesis: cross-products of marginal totals, i.e., statistical independence.

• 4. Token swap method: how many swaps to transform the observed into the expected cell-frequencies?

Page 10: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Misclassification Paradigm.

• 1. Classical null hypothesis: statistical independence.

• 2. What if null hypothesis is zero false positives?

• 3. Trade-off Ratio: relative cost of false negatives versus false positives.

• 4. Screening test, e.g., gynecologic cytology, false negative (losing patient to followup) more costly than false positive (additional gynecologic cytology).

Page 11: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Method.

• 1. Patients (tokens) randomly swapped in contingency table.

• 2. Determine whether observed data deviate from null hypothesis.

• 3. Null hypothesis: does not necessarily have statistical independence.

Page 12: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Method: Usual Null Hypothesis.

• 1. Upper contingency table: expected table: cross-products of marginal totals:

expected_a = (v×x)/z. expected_b = (v×y)/z. expected_c = (w×x)/z. expected_d = (w×y)/z. • 2. Five swaps: transform

expected table into observed table.

• 3. Each swap: move forward or fall back.

• 4. Token swap, p=0.00112452925.

Page 13: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Method: Customized Null Hypothesis,

Trade-off Ratio.• 1. Upper contingency

table: customized expected table.

• 2. Three swaps: transform customized expected table into observed table.

• 3. Each swap: move forward or fall back.

• 4. Token swap, p=0.0075322901118097.

Page 14: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Live Demonstration: Usual Null Hypothesis.

• http://www.netautopsy.org/toknusul.htm• TOKENSWAP, p: 0.00124529250987441• CHISQUARE, χ2: 11.1855158730159

Page 15: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Live Demonstration: Customized Null Hypothesis.

• http://www.netautopsy.org/tokncust.htm• TOKENSWAP, p: 0.0075322901118097• CHISQUARE, χ2: 3.63311688311688

Page 16: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Distribution.

Page 17: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Neyman-Pearson Condition: Definition

• Neyman-Pearson Condition is the condition that, for a hypothesis

test between two point hypotheses H0: θ=θ0 and H1: θ=θ1, then the likelihood-ratio test that rejects H0 in favor of H1 when

Λ(x) = (L(θ0|x) / L(θ1|x)) < η,• where P((Λ(X)<η)|H0)=α is the most powerful test of size α for

threshold η: (L(θ0|x)/L(θ1|x)): likelihood ratio; η: critical region for the test; α: significance level for Type I (false positive) Error.

• Statistical method decreases β-error only by increasing α-error.

Page 18: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Neyman Pearson Condition

Page 19: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Neyman-Pearson Condition: high α, low β

Page 20: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Neyman-Pearson Condition: low α, high β.

Page 21: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Definition 1.Definition 1. Token swap

distribution: • T(a,k<0)=0 at swaps k<0; • T(a,k=0)=1 and T(≠a,k=0)=0

at swap k=0; • T(a+j,k>0) = T(a+j-1,k-

1)×(((a+j-1)×(d+j-1))/(((a+j-1)×(d+j-1))+((c-j+1)×(b-j+1)))) + T(a+j+1,k-1)×(((a+j+1)×(d+j+1))/(((a+j+1)×(d+j+1))+((c-j-1)×(b-j-1)))) at swap k>0, where 0×(.../...) = 0.

Page 22: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Theorem 1: Step k, zero tail beyond k.

1a. T(a+j,k) = 0 when j>k;1b. T(a-j,k) = 0 when j>k.

Proof. 1a. By induction, at swap k=0, by Definition 1 that T(≠a,k=0)=0, T(a+j,k) = 0. Assume true for swap k-1; consider swap k. Since j-1 > k-1 and j+1 > k-1, then by the inductive hypothesis: T(a+j,k) = T(>a,k-1)×... + T(a+j-1,k+1)×(.../...) = 0×(.../...) + 0×(.../...) = 0.

Page 23: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Theorem 2.

2a. T(a+k,k) = T(a+k-1,k-1)× T(a+j-1,k-1)×(((a+j-1)×(d+j-1))/(((a+j-1)×(d+j-1))+((c-j+1)×(b-j+1)))) + 0.

2b. T(a-k,k) = T(a-k-1,k-1)×...... + 0.

Proof. 2a. By Theorem 1a, the second term is T(a+k+1,k-1)=0.

Proof. 2b. Analogous to Proof 2a.

Page 24: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Theorem 3: Step k, kth tail; less than step (k-1), (k-

1)th3a. T(a+k,k) < T(a+k-1,k-1).

Proof. 3a. Since the swaps terminate when either c=k or b=k, then c>(k-1) , b>(k-1) , (c-k+1)>0, , (b-k+1)>0, , and q=...... Then by Theorem 2, T(a+k,k) = T(a+k-1,k-1)×q < T(a-k+1,k-1).

Proof. 3b. Analogous to Proof 3a.

Page 25: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Misclassification: Three Dimensions.

• 1. Some biomedical analyses

involve: old test, Φ; new test, Ψ; ultimate test, Ω, for example, long-term follow-up or autopsy findings.

• 2. Tests Φ, Ψ: conceptually comparable; but ultimate test, Ω, wins over other two tests.

• 3. Token swaps: swaps that favor Φ versus swaps that favor Ψ.

Page 26: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Token Swap Method: Three Dimensions

• 1. Cell a: true negative for Φ, Ψ: Φ, Ψ, Ω all false.

• 2. Cell b: Φ false positive, Ω false, Φ true; but for Ψ,: true negative: Ψ, Ω both false, etc.

• 3. Status of all cells: a = Φ,Ψ true negative.

b = Φ false positive. c = Ψ false positive. d = Φ,Ψ false positive. e = Φ,Ψ false negative. f = Ψ false negative. g = Φ false negative. h = Φ,Ψ true positive.

Page 27: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Constraints: 3D paired swaps.

• 1. No swaps across Ω-true, Ω-false.

• 2. No net gain or loss in marginal totals for Ω.

• 3. No net gain or loss in Φ-true, Φ-false, Ψ-true, Ψ-false, Ω-true, or Ω-false.

• 4. 8 permitted swaps, to or from cell a, as follows:

1. a→b. 2. a→c. 3. a→d, c→b. 4. a→d, b→c. 5. b→a. 6. c→a. 7. d→a, b→c. 8. d→a, c→b.

Page 28: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Token Swaps: 1, 2, 3, 4.

Page 29: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Token Swaps: 5, 6, 7, 8.

Page 30: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 1.

• If a→b,• Then d→c, f→e, and

g→h.• Net: no changes.

Page 31: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 2.

• If a→c,• Then d→b, g→e, and

f→h.• Net: no changes.

Page 32: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 3.

• If a→d and c→b,• Then h→e and f→g.• Net: +2 Φ false

positive, +2 Φ false negative,

• Favors Ψ.

Page 33: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 4.

• If a→d and b→c,• Then h→e and g→f.• Net: +2 Ψ false

positive, +2 Ψ false negative,

• Favors Φ.

Page 34: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 5.

• If b→a,• Then c→d, e→f, and

h→g.• Net: no changes.

Page 35: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 6.

• If c→a,• Then b→d, e→g, and

h→f.• Net: no changes.

Page 36: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 7.

• If d→a and b→c,• Then e→h and g→f.• Net: +2 Ψ false

positive, +2 Ψ false negative,

• Favors Φ.

Page 37: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

3D Swap 8.

• If d→a and c→b,• Then e→h and f→g.• Net: +2 Φ false

positive, +2 Φ false negative,

• Favors Ψ.

Page 38: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Summary of Swaps.

Page 39: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Sample Problem: Sickle Cell Crisis.

• 1. Suppose: autopsy findings: Ω; clinical findings: Φ; hypothetical new pain test: Ψ.

• 2. Two, Φ-favorable swaps: transform upper figure into lower figure.

Page 40: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Live Demonstration: 3D Token Swap.

• http://www.netautopsy.org/toknlive.htm• TOKENSWAP, p: 0.107142857142857

Page 41: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

Summary, Conclusions

• Two- and three-dimensional token swap methods satisfy Neyman-Pearson condition, for power of alternative hypothesis.

• Unlike classical methods, the token swap method supports a range of null hypotheses, including zero cell totals.

• Present model extends range of existing contingency table analysis.

• Incorporates additional clinicopathologic information.

• Explores customized null hypotheses.

Page 42: Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A

References.• 1. Parfrey NA, Moore GW, Hutchins GM.

Is pain crisis a cause of death in sickle cell disease? Am J Clin Pathol. 1985 Aug;84(2):209-212.

• 2. Moore GW, Hutchins GM, Miller RE. Token swap test of significance for serial medical data bases. Am J Med. 1986 Feb;80(2):182-190.

• 3. Moore GW, Hutchins GM, Miller RE. A new paradigm for hypothesis testing in medicine, with examination of the Neyman Pearson condition. Theor Med. 1986 Oct;7(3):269-282.

• 4. Heckering PS. Token swap test revisited. Comput Methods Programs Biomed. 2003 Mar;70(3):265-269.