beta tucker decomposition for dna methylation dataas5530/scheinflahertyzhousheldonwall... · 2019....

Post on 08-Mar-2021

8 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Beta Tucker Decomposition for DNA Methylation Data

Aaron Schein UMass Amherst

Mingyuan Zhou Univ. Texas at Austin

Hanna Wallach Microsoft Research

Joint work with:

Pat Flaherty UMass Amherst

Dan Sheldon UMass Amherst

DNA methylation

CATTCCGCCTTCTCTCCCGAGG

DNA methylation

CpG dinucleotides

CATTCCGCCTTCTCTCCCGAGG

DNA methylation

M

methylated unmethylated

CATTCCGCCTTCTCTCCCGAGG

DNA methylationCGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT

TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT

CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG

CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG

ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC

CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT

CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA

GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT

CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC

CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC

CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT

TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC

GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC

TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC

CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT

CATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG

CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT

TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT

CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC

CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA

GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG

CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC

TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG

AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC

TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC

CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC

TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

Gene

DNA methylationCGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT

TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT

CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG

CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG

ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC

CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT

CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA

GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT

CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC

CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC

CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT

TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC

GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC

TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC

CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT

CATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG

CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT

TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT

CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC

CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA

GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG

CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC

TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG

AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC

TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC

CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC

TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

CpG island

DNA methylationCGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT

TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT

CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG

CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG

ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC

CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT

CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA

GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT

CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC

CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC

CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT

TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC

GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC

TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC

CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT

CATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG

CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT

TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT

CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC

CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA

GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG

CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC

TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG

AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC

TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC

CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC

TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

CpG island (often in the promoter region)

DNA methylation

CGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT

TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT

CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG

CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG

ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC

CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT

CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA

GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT

CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC

CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC

CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT

TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC

GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC

TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC

CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT

M M M

M

MMMMMM

MM

MMM

MM MM

MM M M

MM

MMMM M M

MMMM M

MMM

MM M

M MM

MMM

MM

MMMM

MMMM

MMCATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG

CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT

TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT

CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC

CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA

GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG

CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC

TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG

AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC

TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC

CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC

TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

M

M

M

M M

M

M

M

M

M

M

Gene is silenced

DNA methylation

CGAGGCATTCCGCCTTCTCTCCCGAGGCATTCCGCCT

TCGACGCGCCTTCTCTCCCGCGCGACGCGCCTTCTCT

CCCGCGCGACGCGCCTTCTCTCCCGCGCTCGACGCG

CCTTCTCTCCCGCGCGACGCGCCTTCTCTCCCGCGCG

ACGCGCCTTCTCTCCCGCGCCGACGCGCCTTCTCTCC

CGCGCGACGCGCCTTCTCTCCCGCGCGACGCGCCTT

CTCTCCCGCGTCCCGCGACGCGCCTTCTCTCCCGCGA

GGCATTCCGCCTTCTTTTTTTTTTTTCGACGCGCCTTCT

CTCCCGCGCGACGCGCCTTCTCTCCCGCGTTTTTCTC

CCGAGGCATTCCGCCTTCTCCGACGCGCCTTCTCTCC

CGCGTTCTCTAGCGCCTTCTCTCCCGACGACGCGCCT

TCTCTCCCGCGCGACGCGACGCGCCTTCTCTCCCGC

GCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTTC

TCTCCCGACGCCTTCTCTCCCGACGCGCCTTCTCTCC

CGCGCCTTCTCTCCCGCGCCTTCTCTCCCGACGCCTT

M

M

M

M

M M

M

M M

M

M

M

MMCATTCCGCCTTCTGCTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

ACCTATCTCCCGAGGCATTCCGCCTTCTCTCCCGAGG

CATTCCGCCTTCTCTCCCGAGGCATTCCGCCTTCTTTT

TTTTTTTTTTTTTCTCCCGAGGCATTCCGCCTTCTCTTCT

CTAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTC

CCCCAGGCTGGATTGCTACACCTTCTCTAGTCCCCCA

GGCTGGATTGCTACACCTCCCGAGGCATGCATTCCG

CCTTTCTCTAGTCCCCCAGGCTGGATTGCTACACCTTC

TCTAGTCCCCCAGGCTGGATTGCTACACCTCTCTCCG

AGGCATTCCGCCTTCCTCTCCTCTCTCTCCCGAGTCTC

TAGTCCCCCAGGCTGGATTGCTACACCTTCTCTAGTCC

CCCAGGCTGGATTGCTACACCTGCATTCCGCCTTCTC

TTTTTCCCGAGGCATTTCTCTAGTCCCCCAGGCTGGAT

TGCTACACCTTCTCTAGTCCCCCAGGCTGGATTGCTAC

M

M

M

M M

M

M

M

M

M

M

Gene is expressed

Abnormal DNA methylation

It causes cancer• Hypomethylation of oncogenes

• Hypermethylation of tumor suppressor genes

[Baylin & Ohm (2006)]

Cancer taxonomies

Sample 1 Sample 2

Cancer taxonomies

Sample 1 Sample 2

``Breast cancer”

Cancer taxonomies

Sample 3 Sample 4

Cancer taxonomies

Sample 3 Sample 4

``Ovarian cancer”

``Breast cancer”

Cancer taxonomies

Anatomically similar cancer cells may be genetically different

Anatomically different cancer cells may be genetically similar

Cancer taxonomies

Goal: Develop new taxonomies based on genetic information

ML solution: Unsupervised dimensionality reductionPCA, NMF, ICA,…

[Flusberg et al. (2010)]

[Teschendorff et al. (2007)][Wang et al. (2006)]

DNA methylation data

�ijhow methylated locus j is in sample i=

�ij 2 [0, 1]

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

�Lo

cus

6

CP decomposition

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

�Lo

cus

6k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

k=1

k=2

k=3

'

K ``components”

⇥ �

�ij 'KX

k=1

✓ik�kj

�ij 'KX

k=1

✓ik�kj⇡k

CP decomposition

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

�Lo

cus

6k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

k=1

k=2

k=3

'

K ``components”

⇥ �

k=1

k=2

k=3

k=1

k=2

k=3

Tucker decomposition

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

�Lo

cus

6k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

c=1

c=2

'

C ``clusters’’ and K ``components”

⇥ �

c=1

c=2

k=1

k=2

k=3

�ij 'CX

c=1

✓ic

KX

k=1

⇡ck �kj

Our contributions:• Novel generative model

• Based on the Tucker decomposition • Matches the true data-generating process

✓ Beta likelihood ✓ Latent variables match real ones ✓ Priors match known sources of noise

• Gibbs sampler with closed form conditionals

Is it better than PCA/NMF/ICA/etc in practice?

Beta Tucker decomposition

• Comparable performance on (contrived) prediction tasks

Is it better than PCA/NMF/ICA/etc in theory?• Yes

• ??

[Ma et al. (2015)]

DNA methylation data

�ijhow methylated locus j is in sample i=

�ij 2 [0, 1]

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

�Lo

cus

6

CGTTTTTCTCM

CGCCTTCTCTCCCG CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Sample i

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

y(m)ij num. of methylated CpG sites

in locus j of sample =

y(u)ijnum. of unmethylated CpG sites in locus j of sample

=

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Sample i

Locus j

[Wang & Petronis (2008)]

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Sample i

Locus j

[Wang & Petronis (2008)]

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

�(m)ij �(u)

ij

Two real-valued fluorescent intensities

[Wang & Petronis (2008)]

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

�(m)ij �(u)

ij

�ij :=�(m)ij

�(m)ij + �(u)

ij

``Beta value”

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

�(m)ij �(u)

ij

n

�(m)ij ,�(u)

ij

oJ

j=1

Histogram of intensities for given sample i

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

DNA methylation data

Locus j

Sample i

�(m)ij �(u)

ij

n

�(m)ij ,�(u)

ij

oJ

j=1

Histogram of intensities for given sample i

�(m)ij ⇠ Gam(· · · , ci)

�(u)ij ⇠ Gam(· · · , ci)

Gamma-Beta relationship

�1 ⇠ Gam(↵1, c) �2 ⇠ Gam(↵2, c)

✓�1

�1 + �2

◆⇠ Beta(↵1, ↵2)

Gamma-Beta relationship

�(m)ij ⇠ Gam(· · · , ci) �(u)

ij ⇠ Gam(· · · , ci)

�ij :=�(m)ij

�(m)ij + �(u)

ij

�ij ⇠ Beta(· · · , · · · )

CGTTTTTCTC

MCGCCTTCTCTCCCG

CTCCCGCGTCCCGCGAM M M

ACGCGCCTTCTCTM

CGCCTTCTTTTTM

Locus j

Sample i

�(m)ij �(u)

ij

Beta Tucker decomposition

Locus j

Sample i

CTCCCGCGTCCCGCGAM M M

�(m)ij �(u)

ij

Beta Tucker decomposition

Locus j

Sample i

CTCCCGCGTCCCGCGAM M M

�(m)ij �(u)

ij

1 2 31

Beta Tucker decomposition

Locus j

Sample i

CTCCCGCGTCCCGCGAM M M

�(m)ij �(u)

ij

1 2 31

1 2= + + 3 = 1

Beta Tucker decomposition

Locus j

Sample i

CTCCCGCGTCCCGCGAM M M

�(m)ij �(u)

ij

1 2 31

1 2= + + 3 = 1

Beta Tucker decomposition

Locus j

Sample i

CTCCCGCGTCCCGCGAM M M

�(m)ij �(u)

ij

1 2 31

1 2= + + 3 = 1+ +

Beta Tucker decomposition

�(m)ij

�(u)ij

1 2= + + 3

= 1

+

+

Beta Tucker decomposition

Locus jSample i

Locus jSample i

�(m)ij

�(u)ij

= s

= s

+

+

Beta Tucker decomposition

y(m)ijX

s=1

y(u)ijX

s=1

Locus jSample i

�(m)ij

�(u)ij

= s

= s

+

+

Beta Tucker decomposition

y(m)ijX

s=1

y(u)ijX

s=1

⇠ Gam(

1,c i)

⇠ Gam(

b0,c i)

⇠ Gam⇣b0 + y(m)

ij , ci⌘

⇠ Gam⇣b0 + y(u)ij , ci

Beta Tucker decomposition

Locus jSample i

�(m)ij

�(u)ij

�(m)ij ⇠ Gam

⇣b0 + y(m)

ij , ci⌘

�(u)ij ⇠ Gam

⇣b0 + y(u)ij , ci

�ij :=�(m)ij

�(m)ij + �(u)

ij

�ij ⇠ Beta⇣b0 + y(m)

ij , b0 + y(u)ij

Equivalent to:

Beta Tucker decomposition

�(m)ij ⇠ Gam

⇣b0 + y(m)

ij , ci⌘

�(u)ij ⇠ Gam

⇣b0 + y(u)ij , ci

�ij :=�(m)ij

�(m)ij + �(u)

ij

Beta Tucker decomposition

y(m)ij ⇠ Pois(· · · ) y(u)ij ⇠ Pois(· · · )

Beta Tucker decomposition

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!

Beta Tucker decomposition

the probability that sample i is in cluster c

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!

Beta Tucker decomposition

the probability that samples in cluster c

methylate loci in component k

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!

Beta Tucker decomposition

the probability that locus j is in component k

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!

Beta Tucker decomposition

✓i ⇠ Dir(⌘1, . . . , ⌘C)

⇡ck ⇠ Beta(⌘(m)0 , ⌘(u)0 )

�j ⇠ Dir(⌫1, . . . , ⌫K)

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!

Beta Tucker decomposition

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!{= pij

Beta Tucker decomposition

y(m)ij ⇠ Pois

CX

c=1

✓ic

KX

k=1

⇡ck �kj

!{= pij

Beta Tucker decomposition

y(m)ij ⇠ Pois(� pij)

the probability that sample i methylates CpG sites in locus j

Beta Tucker decomposition

y(m)ij ⇠ Pois(� pij)

the occurrence rate of CpG sites

Beta Tucker decomposition

y(u)ij ⇠ Pois

�� (1� pij)

�y(m)ij ⇠ Pois

�� pij

pij :=CX

c=1

✓ic

KX

k=1

⇡ck�kj

�(m)ij ⇠ Gam

⇣b0 + y(m)

ij , ci⌘

�(u)ij ⇠ Gam

⇣b0 + y(u)ij , ci

�ij :=�(m)ij

�(m)ij + �(u)

ij

Beta Tucker decomposition

y(u)ij ⇠ Pois

�� (1� pij)

�y(m)ij ⇠ Pois

�� pij

pij :=CX

c=1

✓ic

KX

k=1

⇡ck�kj

�ij ⇠ Beta⇣b0 + y(m)

ij , b0 + y(u)ij

k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

c=1

c=2

⇥ �

c=1

c=2

k=1

k=2

k=3

Beta Tucker decomposition

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (m)Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (u)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

c=1

c=2

⇥ �

c=1

c=2

k=1

k=2

k=3

⇧Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (m)Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (u)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(m)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(u)

Beta Tucker decomposition

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

c=1

c=2

⇥ �

c=1

c=2

k=1

k=2

k=3

⇧Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (m)Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (u)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(m)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(u)

Inference

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

c=1

c=2

⇥ �

c=1

c=2

k=1

k=2

k=3

⇧Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (m)Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (u)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(m)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(u)

Inference

P⇣⇥,⇧,� |Y (m), Y (u), · · ·

= Poisson Tucker decomposition[Schein et al. (2016)]

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

k=1

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Sample 1

Sample 2

Sample 3

Sample 4

k=2

k=3

c=1

c=2

⇥ �

c=1

c=2

k=1

k=2

k=3

⇧Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (m)Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

Y (u)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(m)

Sample 1

Sample 2

Sample 3

Sample 4

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s 4

Locu

s 5

Locu

s 6

⇤(u)

Inference

P⇣Y (m), Y (u) |⇤(m),⇤(u), · · ·

Inference

y(m)ij ⇠ Pois

�� pij

⇠ Gam⇣b0 + y(m)

ij , ci⌘

�(m)ij

P (y(m)ij |�(m)

ij , · · · ) =?

Poisson is not conjugate to the gamma…

…but maybe the posterior still has a closed form…

Inference

y(m)ij ⇠ Pois

�� pij

⇠ Gam⇣b0 + y(m)

ij , ci⌘

�(m)ij

The Bessel distribution!

P (y(m)ij |�(m)

ij , · · · ) = Bes

✓b0�1, 2

qci�

(m)ij � pij

◆ [Yuan & Kalbfleisch (2000)]

The Bessel distribution

Bes(y; v, a) / 1

y!�(y + v)

⇣a2

⌘2y+v

Sampling the Bessel

[Devroye (2002)] [Yuan & Kalbfleisch (2000)]

[Amos (1974)]

[Zhou (2015)]

Stable computation of Bessel functions

Exact rejection sampling (four methods)

Table sampling

Basic properties of Bessel distribution

It’s easy and fast

https://github.com/aschein/fatwalrus

MCMC algorithm

P⇣⇥,⇧,� |Y (m), Y (u), · · ·

Poisson Tucker decompositionO(CK|Y>0|)

O(2IJ)P⇣Y (m), Y (u) |⇤(m),⇤(u), · · ·

Sample Bessel counts

� controls sparsity!

Example results

⇥⇧

Top locus in component 8 is in

the promoter region of FLJ1030207

Hypomethylation of FLJ1030207 is

a strong indicator of ovarian cancer

[Model & Rujan (2009)]

top related