genomic signatures for metagenomic data analysis

Upload: fabio-gori

Post on 05-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    1/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    G e n o m i c S i g n a t u r e s f o r M e t a g e n o m i c D a t a

    A n a l y s i s : E x p l o i t i n g t h e R e v e r s e C o m p l e m e n t a r i t y

    o f T e t r a n u c l e o t i d e s

    F a b i o G o r i

    1

    D i m i t r i o s M a v r o e d i s

    1

    M i k e S . M . J e t t e n

    2

    E l e n a M a r c h i o r i

    1

    1

    R a d b o u d U n i v e r s i t y N i j m e g e n , I n s t i t u t e f o r C o m p u t i n g a n d I n f o r m a t i o n S c i e n c e s ,

    T h e N e t h e r l a n d s

    2

    R a d b o u d U n i v e r s i t y N i j m e g e n , D e p a r t m e n t o f M i c r o b i o l o g y , T h e N e t h e r l a n d s

    H o n g K o n g U n i v e r s i t y , 1 2 S e p t e m b e r 2 0 1 1

    g o r i @ s c i e n c e . r u . n l

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    2/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    T a b l e o f C o n t e n t s

    M e t a g e n o m i c s a n d B i n n i n g

    G e n o m i c S i g n a t u r e s f o r B i n n i n g

    E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    3/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    T a b l e o f C o n t e n t s

    M e t a g e n o m i c s a n d B i n n i n g

    G e n o m i c S i g n a t u r e s f o r B i n n i n g

    E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    4/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    W h a t i s M e t a g e n o m i c s ?

    M e t a g e n o m i c s :

    s t u d y o f m i c r o b i a l

    c o m m u n i t i e s a n a l y s i n g

    t h e i r g e n e t i c m a t e r i a l

    W h y ?

    9 9 % m i c r o b e s

    c a n n o t b e s t u d i e d i n

    l a b o r a t o r i e s

    U n d e r s t a n d o r g a n i s m s

    i n t e r a c t i o n s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    5/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    W h a t i s M e t a g e n o m i c s ?

    M e t a g e n o m i c s :

    s t u d y o f m i c r o b i a l

    c o m m u n i t i e s a n a l y s i n g

    t h e i r g e n e t i c m a t e r i a l

    W h y ?

    9 9 % m i c r o b e s

    c a n n o t b e s t u d i e d i n

    l a b o r a t o r i e s

    U n d e r s t a n d o r g a n i s m s

    i n t e r a c t i o n s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    6/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    W h a t i s M e t a g e n o m i c s ?

    M e t a g e n o m i c s :

    s t u d y o f m i c r o b i a l

    c o m m u n i t i e s a n a l y s i n g

    t h e i r g e n e t i c m a t e r i a l

    W h y ?

    9 9 % m i c r o b e s

    c a n n o t b e s t u d i e d i n

    l a b o r a t o r i e s

    U n d e r s t a n d o r g a n i s m s

    i n t e r a c t i o n s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    7/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    H o w ? D N A S e q u e n c i n g T e c h n o l o g y

    E n v i r o n m e n t a l

    S a m p l e

    D N A s

    S m a l l - I n s e r t L i b r a r y C l o n i n g

    = T A C C A C A G A T A T C A G . . .

    A m e t a g e n o m i c d a t a s e t i s m a d e b y t h e s e D N A s e q u e n c e s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    8/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    W h a t k i n d o f d a t a ? A m e t a . . . j i g s a w - p u z z l e

    F r a g m e n t s o f D N A s

    P i e c e s a r e s i m i l a r

    O r i g i n a l p i c t u r e s a r e

    u n k n o w n

    M i s s i n g P i e c e s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    9/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    A i n t e r e s t i n g p r o b l e m : M e t a g e n o m i c B i n n i n g

    C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    10/62

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    11/62

    M e t a g e n o m i c b i n n i n g

    C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e

    ( u n s u p e r v i s e d a p p r o a c h )

    {A , C , G , T }

    Rn

    C l u s t e r i n g

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    12/62

    M e t a g e n o m i c b i n n i n g

    C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e

    ( u n s u p e r v i s e d a p p r o a c h )

    {A , C , G , T }

    Rn

    C l u s t e r i n g

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    13/62

    M e t a g e n o m i c b i n n i n g

    C l u s t e r i n g t o g e t h e r s e q u e n c e s s a m p l e d f r o m t h e s a m e g e n o m e

    ( u n s u p e r v i s e d a p p r o a c h )

    {A , C , G , T }

    Rn

    C l u s t e r i n g

    I n t h i s s t u d y : f o c u s o n

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    14/62

    T a b l e o f C o n t e n t s

    M e t a g e n o m i c s a n d B i n n i n g

    G e n o m i c S i g n a t u r e s f o r B i n n i n g

    E x p e r i m e n t s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    15/62

    W h a t s h o u l d d o

    zs r

    Rn

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    16/62

    W h a t s h o u l d d o

    zs r

    Rn

    (s)(z)

    (r)

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    17/62

    W h a t s h o u l d d o

    zs r

    n e e d s t o b e a

    g e n o m i c s i g n a t u r e :

    [ K a r l i n

    e t a l .

    , T r e n d s i n G e n e t i c s , 1 9 9 5 ]

    (s ) (z )

    (s ) = ( r )

    Rn

    (s)(z)

    (r)

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    18/62

    T y p i c a l ' s u s e d i n b i n n i n g

    T

    (s

    ):=

    f r e q u e n c i e s o f t h e 4

    k

    s e q u e n c e s o f l e n g t h k

    ( k - m e r s ) .

    U s u a l l y k=

    4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6

    [ M o h a m m e d

    e t a l .

    , B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z

    e t a l .

    , B M C B i o i n f o r m a t i c s , 2 0 0 9 ]

    [ C h a n

    e t a l .

    , J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g

    e t a l .

    , E n v i r o n . M i c r o b . , 2 0 0 4 ]

    E x a m p l e :

    s =A G C A T G C A G C A T A T G T G G A G C A

    T (

    s) =( . . .

    )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    19/62

    T y p i c a l ' s u s e d i n b i n n i n g

    T

    (s

    ):=

    f r e q u e n c i e s o f t h e 4

    k

    s e q u e n c e s o f l e n g t h k

    ( k - m e r s ) .

    U s u a l l y k=

    4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6

    [ M o h a m m e d

    e t a l .

    , B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z

    e t a l .

    , B M C B i o i n f o r m a t i c s , 2 0 0 9 ]

    [ C h a n

    e t a l .

    , J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g

    e t a l .

    , E n v i r o n . M i c r o b . , 2 0 0 4 ]

    E x a m p l e :

    s =A G C A T G C A G C A T A T G T G G A G C A

    T (

    s) =(#

    A A A A =

    0, . . .

    )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    20/62

    T y p i c a l ' s u s e d i n b i n n i n g

    T (s) := f r e q u e n c i e s o f t h e 4 k s e q u e n c e s o f l e n g t h k

    ( k - m e r s ) .

    U s u a l l y k=

    4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6

    [ M o h a m m e d

    e t a l .

    , B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z

    e t a l .

    , B M C B i o i n f o r m a t i c s , 2 0 0 9 ]

    [ C h a n

    e t a l .

    , J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g

    e t a l .

    , E n v i r o n . M i c r o b . , 2 0 0 4 ]

    E x a m p l e :

    s =A G C A T G C A G C A T A T G T G G A G C A

    T (

    s) =(#

    A A A A =

    0, . . . , #

    A G C A =

    3, . . .

    )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    21/62

    T y p i c a l ' s u s e d i n b i n n i n g

    T (s) := f r e q u e n c i e s o f t h e 4 k s e q u e n c e s o f l e n g t h k

    ( k - m e r s ) .

    U s u a l l y k=

    4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6

    [ M o h a m m e d

    e t a l .

    , B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z

    e t a l .

    , B M C B i o i n f o r m a t i c s , 2 0 0 9 ]

    [ C h a n

    e t a l .

    , J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g

    e t a l .

    , E n v i r o n . M i c r o b . , 2 0 0 4 ]

    E x a m p l e :

    s =A G C A T G C A G C A T A T G T G G A G C A

    T (

    s) =(#

    A A A A =

    0, . . . , #

    A G C A =

    3, . . . , #

    A T A T =

    1, . . .

    )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    22/62

    T y p i c a l ' s u s e d i n b i n n i n g

    T (s) := f r e q u e n c i e s o f t h e 4 k s e q u e n c e s o f l e n g t h k

    ( k - m e r s ) .

    U s u a l l y k=

    4= 4 k = 2 5 6 f e a t u r e s : T ( s ) N2 5 6

    [ M o h a m m e d

    e t a l .

    , B i o i n f o r m a t i c s , 2 0 1 1 ] , [ D i a z

    e t a l .

    , B M C B i o i n f o r m a t i c s , 2 0 0 9 ]

    [ C h a n

    e t a l .

    , J . B i o m e d . B i o t e c h . , 2 0 0 8 ] , [ T e e l i n g

    e t a l .

    , E n v i r o n . M i c r o b . , 2 0 0 4 ]

    E x a m p l e :

    s =A G C A T G C A G C A T A T G T G G A G C A

    T (

    s) =(#

    A A A A =

    0, . . . , #

    A G C A =

    3, . . . , #

    A T A T =

    1, . . .

    . . . , #G C A T = 2 , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    23/62

    M e t a c l u s t e r a n d s i g n a t u r e R a n k [ B . Y a n g

    e t a l .

    , A C M - B C B , 2 0 1 0 ]

    S p e a r m a n F o o r t u l e d i s t a n c e

    b e t w e e n s a n d z

    M a n h a t t a n d i s t a n c e

    b e t w e e n R a n k ( s ) a n d R a n k

    (z )

    S y m m e t r i z e d R a n k S i g n a t u r e R a n k : S S1 3 6

    R a n k (s

    ) :=r a n k i n g i n d u c e d b y s o r t i n g t h e e l e m e n t s o f

    S (s

    ).

    F o r i n s t a n c e , i f S

    ( s ) = ( 7 , 0 , 3 ) t h e n R a n k

    ( s ) = (1 , 3 , 2 ) .

    S y m m e t r i z e d S i g n a t u r e S : S N1 3 6

    Si

    ( s ) = #wi

    + # w Ci

    , i = 1 , . . . , 1 3 6

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    24/62

    W h a t i s m i s s i n g ?

    ' s u s e d i n b i n n i n g :

    N o t d e s i g n e d a s s i g n a t u r e s

    f o r m e t a g e n o m i c d a t a

    N o t h o r o u g h c o m p a r a t i v e a n a l y s i s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    25/62

    W h a t i s m i s s i n g ?

    ' s u s e d i n b i n n i n g :

    N o t d e s i g n e d a s s i g n a t u r e s

    f o r m e t a g e n o m i c d a t a

    N o t h o r o u g h c o m p a r a t i v e a n a l y s i s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    26/62

    I n t h i s s t u d y

    1 I n t r o d u c e n e w g e n o m i c s i g n a t u r e s

    f o r b i n n i n g

    2 T e s t & C o m p a r e p e r f o r m a n c e s o f

    n e w a n d k n o w n s i g n a t u r e s

    3 . . . a n d s i g n a t u r e c o m b i n a t i o n s ( e x t r a )

    4 R e l a t i o n t a x o n o m i c d i v e r g e n c e & s i g n a t u r e d i s s i m i l a r i t y

    ( e x t r a )

    T E S T : i s a s i g n a t u r e o n m e t a g e n o m i c s d a t a :

    ( s ) ( z )

    ( s ) = ( r )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    27/62

    I n t h i s s t u d y

    1 I n t r o d u c e n e w g e n o m i c s i g n a t u r e s

    f o r b i n n i n g

    2 T e s t & C o m p a r e p e r f o r m a n c e s o f

    n e w a n d k n o w n s i g n a t u r e s

    3 . . . a n d s i g n a t u r e c o m b i n a t i o n s ( e x t r a )

    4 R e l a t i o n t a x o n o m i c d i v e r g e n c e & s i g n a t u r e d i s s i m i l a r i t y

    ( e x t r a )

    T E S T : i s a s i g n a t u r e o n m e t a g e n o m i c s d a t a :

    ( s ) ( z )

    ( s ) = ( r )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    28/62

    I n t h i s s t u d y

    1 I n t r o d u c e n e w g e n o m i c s i g n a t u r e s

    f o r b i n n i n g

    2 T e s t & C o m p a r e p e r f o r m a n c e s o f

    n e w a n d k n o w n s i g n a t u r e s

    3 . . . a n d s i g n a t u r e c o m b i n a t i o n s ( e x t r a )

    4 R e l a t i o n t a x o n o m i c d i v e r g e n c e & s i g n a t u r e d i s s i m i l a r i t y

    ( e x t r a )

    T E S T : i s a s i g n a t u r e o n m e t a g e n o m i c s d a t a :

    ( s ) ( z )

    (s

    ) = (r

    )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    29/62

    S p e c i a l r e q u i r e m e n t s f o r m e t a g e n o m i c s

    G e n o m i c s i g n a t u r e n e e d s t o :

    W o r k o n s e q u e n c e s

    1 , 0 0 0 b p

    ( s t a n d a r d t e s t 1 0 , 0 0 0 b p )

    N o t r e l y o n s o u r c e g e n o m e

    S t r a n d i n d e p e n d e n t

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    30/62

    S e q u e n c e s c a n b e s a m p l e d f r o m b o t h s t r a n d s

    s =

    A G C A T G C A G C A T A T G T G G A G C A

    T C G T A C G T C G T A T A C A C C T C G T = s C

    W e w a n t :

    ( s ) = ( s C )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    31/62

    S e q u e n c e s c a n b e s a m p l e d f r o m b o t h s t r a n d s

    s =

    A G C A T G C A G C A T A T G T G G A G C A

    T C G T A C G T C G T A T A C A C C T C G T = s C

    W e w a n t :

    ( s ) = ( s C )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    32/62

    T a b l e o f C o n t e n t s

    M e t a g e n o m i c s a n d B i n n i n g

    G e n o m i c S i g n a t u r e s f o r B i n n i n g

    E x p e r i m e n t s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    33/62

    T e s t e d s i g n a t u r e s :

    S i g n a t u r e s e x p l o i t f r e q u e n c i e s o f s u b s e q u e n c e s ( l e n g t h = 4 )

    3 k n o w n s i g n a t u r e s , S n o t u s e d i n m e t a g e n o m i c s

    6 n e w s t r a n d i n d e p e n d e n t s i g n a t u r e s

    D a t a :

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s ( N C B I )

    S e q u e n c e l e n g t h : 1 , 0 0 0 b p [ B . Y a n g

    e t a l .

    , A C M - B C B , 2 0 1 0 ]

    M a x o u t p u t o f 4 5 4 G S F L X + S y s t e m

    D i s s i m i l a r i t y m e a s u r e d w i t h s i g n a t u r e d i s t a n c e ( M a n h a t t a n ) :

    d((

    s), (

    z)) := ( s ) (z )

    1

    =n

    i

    =1

    |i

    (s

    ) i

    (z

    )|

    [ M o h a m m e d

    e t a l .

    , B i o i n f o r m a t i c s , 2 0 1 1 ] , [ M r z e k

    e t a l .

    , M o l . B i o l . E v o l . , 2 0 0 9 ]

    [ B o h l i n

    e t a l .

    , S c i e n t i c W o r l d J o u r n a l , 2 0 1 1 ] , [ K a r l i n

    e t a l .

    , A n n u . R e v . G e n e t . , 1 9 9 8 ]

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    34/62

    P e r f o r m a n c e e v a l u a t i o n

    d ((sh

    ), ( si

    ))W I T H I N - g e n o m e

    d i s t a n c e

    t

    # b e t w e e n - d i s t a n c e s

    S p e c i c i t y ( t ) =# w i t h i n - d i s t a n c e s t

    # w i t h i n - d i s t a n c e s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    38/62

    H o w w e c o m p a r e : R O C c u r v e

    F o r e a c h d i s t a n c e t h r e s h o l d t :

    S e n s i t i v i t y (

    t) =

    # b e t w e e n - d i s t a n c e s > t

    # b e t w e e n - d i s t a n c e s

    S p e c i c i t y ( t ) =# w i t h i n - d i s t a n c e s t

    # w i t h i n - d i s t a n c e s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    39/62

    H o w w e c o m p a r e : R O C c u r v e

    F o r e a c h d i s t a n c e t h r e s h o l d t :

    S e n s i t i v i t y (

    t) =

    # b e t w e e n - d i s t a n c e s > t

    # b e t w e e n - d i s t a n c e s

    S p e c i c i t y ( t ) =# w i t h i n - d i s t a n c e s t

    # w i t h i n - d i s t a n c e s

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    40/62

    H o w w e c o m p a r e : R O C c u r v e

    a

    a l w a y s b e t t e r t h a n

    b

    i f a n d o n l y i f

    R O C o f a a b o v e R O C o f b

    A l t e r n a t i v e i n d e x :

    A r e a U n d e r t h e C u r v e ( A U C )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    41/62

    H o w w e c o m p a r e : R O C c u r v e

    a

    a l w a y s b e t t e r t h a n

    b

    i f a n d o n l y i f

    R O C o f a a b o v e R O C o f b

    A l t e r n a t i v e i n d e x :

    A r e a U n d e r t h e C u r v e ( A U C )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    42/62

    R e s u l t s

    T a b l e : C o m p a r i s o n o f

    g e n o m i c s i g n a t u r e s

    S i g n a t u r e A U C F e a t .

    S 0 . 9 1 2 1 3 6

    m a x0 . 9 0 0 1 2 0

    T 0 . 8 8 4 2 5 6

    m i n 0 . 8 8 1 1 2 0

    I 0 . 8 5 1 1 6

    R a n k 0 . 7 9 4 1 3 6

    R a t i o 10 . 7 0 7 1 2 0

    R a t i o 2 0 . 6 8 6 1 2 0

    J S0 . 5 7 3 1 2 0

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    43/62

    C o n c l u s i o n

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    44/62

    C o n c l u s i o n

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    45/62

    C o n c l u s i o n

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    46/62

    C o n c l u s i o n

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    47/62

    C o n c l u s i o n

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    C o n c l u s i o n

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    48/62

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    C o n c l u s i o n

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    49/62

    W h a t w e d i d

    F i r s t c o m p a r a t i v e t e s t o f

    f o r m e t a g e n o m i c s

    1 , 2 8 4 p r o k a r y o t i c g e n o m e s s t u d i e d

    N e w s i g n a t u r e s d e s i g n e d f o r m e t a g e n o m i c s

    R e s u l t s

    S u p p o r t s o m e k n o w n s i g n a t u r e s

    ( S b e t t e r t h a n T b u t n o t u s e d )

    N e w s i g n a t u r e s : c o m p a r a b l e r e s u l t s w i t h l e s s f e a t u r e s

    F u t u r e w o r k

    T e s t o n s h o r t e r s e q u e n c e s ( 1 5 0 - 5 0 0 b p ) - p r e l i m i n a r y r e s u l t s

    A n a l y z e p e r f o r m a n c e s a t o t h e r t a x o n o m i c l e v e l s ( f a m i l y ,

    g e n u s , . . . )

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    50/62

    T h a n k y o u !

    Q u e s t i o n s ?

    g o r i @ s c i e n c e . r u . n l

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    51/62

    T h a n k y o u !

    Q u e s t i o n s ?

    g o r i @ s c i e n c e . r u . n l

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    W h i t h i n g e n o m e d i s t a n c e - v a l u e s d e r i v a t i o n

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    52/62

    F o r e a c h

    g e n o m e

    ( 1 , 2 8 4 )

    1 0 , 0 0 0

    s e q u e n c e s

    C o m p u t e

    1 0 , 0 0 02

    d i s t a n c e s

    ( a l l p a i r s )

    M e a n

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    B e t w e e n g e n o m e d i s t a n c e - v a l u e s d e r i v a t i o n

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    53/62

    F o r 8 , 0 0 0 g e n o m e p a i r s

    1 0 , 0 0 0

    s e q u e n c e p a i r s

    C o m p u t e

    d i s t a n c e s

    a n d t a k e t h e

    M e a n

    1 , 0 0 0 g e n o m e p a i r s f o r e a c h l e v e l o f t a x o n o m i c d i v e r s i t y , r a n d o m l y

    s e l e c t e d

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    T a x o n o m i c d i v e r s i t y

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    54/62

    T w o g e n o m e s g

    i

    , g j

    h a v e t a x o n o m i c d i v e r s i t y a t r a n k r

    i L o w e s t C o m m o n A n c e s t o r o f g

    i

    a n d g

    j

    i s a t r a n k r .

    L C A

    g

    1

    g

    2

    g

    3

    g

    4

    g

    5

    g

    6

    g

    7

    g

    8

    g

    9

    g

    1 0

    g

    1 1

    g

    1 2

    M e t a g e n o m i c s a n d B i n n i n g G e n o m i c S i g n a t u r e s f o r B i n n i n g E x p e r i m e n t s

    T a x o n o m i c d i v e r g e n c e a n d s i g n a t u r e d i s t a n c e

  • 8/2/2019 Genomic signatures for metagenomic data analysis

    55/62

    F o r e a c h s i g n a t u r e :

    F o r e a c h (

    r

    1

    ,r

    2

    )p a i r o f r a n k s :

    C h e c k t h a t :

    D i s t a n c e d i s t r i b u t i o n r

    1