influence function for robust phylogenetic reconstruction

32
Influence Function for Robust Phylogenetic Reconstruction Mahendra Mariadassou Unité MIG INRA Jouy-en-Josas JOBIM 2011 Institut Pasteur M. Mariadassou (INRA MIG) June 11 1 / 25

Upload: others

Post on 16-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Influence Function for Robust Phylogenetic Reconstruction

Influence Function for Robust PhylogeneticReconstruction

Mahendra Mariadassou

Unité MIGINRA Jouy-en-Josas

JOBIM 2011Institut Pasteur

M. Mariadassou (INRA MIG) June 11 1 / 25

Page 2: Influence Function for Robust Phylogenetic Reconstruction

Outline

1 Motivation

2 Outlier Sites

3 Taxon Influence Index

M. Mariadassou (INRA MIG) June 11 2 / 25

Page 3: Influence Function for Robust Phylogenetic Reconstruction

Goal of Phylogenetic Reconstruction

Chimpanzee Bonobo Human Gorilla Orangutan

Pictures: National Geographic

Last common ancestor of bonobo

and chimpanzee

M. Mariadassou (INRA MIG) June 11 3 / 25

Page 4: Influence Function for Robust Phylogenetic Reconstruction

Applications in Many Domains

Including but not limited to:

Studies of an Evolutionary Group:Estimate Time to Most Recent Common Ancestor (TMRCA);Find genes under positive/purifying selection;Identifying Horizontal Gene Transfer;Testing evolutionary hypothesis.

Systematics:Reconstruct Tree of X, phylogeny of all living X;DNA barcoding: easily identify the species of a new organism;Natural way to measure biodiversity.

Most of these applications require “good” trees.

M. Mariadassou (INRA MIG) June 11 4 / 25

Page 5: Influence Function for Robust Phylogenetic Reconstruction

Applications in Many Domains

Including but not limited to:

Studies of an Evolutionary Group:Estimate Time to Most Recent Common Ancestor (TMRCA);Find genes under positive/purifying selection;Identifying Horizontal Gene Transfer;Testing evolutionary hypothesis.

Systematics:Reconstruct Tree of X, phylogeny of all living X;DNA barcoding: easily identify the species of a new organism;Natural way to measure biodiversity.

Most of these applications require “good” trees.

M. Mariadassou (INRA MIG) June 11 4 / 25

Page 6: Influence Function for Robust Phylogenetic Reconstruction

Applications in Many Domains

Including but not limited to:

Studies of an Evolutionary Group:Estimate Time to Most Recent Common Ancestor (TMRCA);Find genes under positive/purifying selection;Identifying Horizontal Gene Transfer;Testing evolutionary hypothesis.

Systematics:Reconstruct Tree of X, phylogeny of all living X;DNA barcoding: easily identify the species of a new organism;Natural way to measure biodiversity.

Most of these applications require “good” trees.

M. Mariadassou (INRA MIG) June 11 4 / 25

Page 7: Influence Function for Robust Phylogenetic Reconstruction

Reconstruction Methods

Three main families:Distance based: Neighbor-Joining (NJ),

Parsimony based: Maximum Parcimony(MP),

Likelihood based: Maximum Likelihood(ML), Bayesian Inference (BI).

Alignment −→ Phylogenetic Tree

A C C T TB G G A AC G G A C

NJ, MP,ML, BI,

. . .A B C

M. Mariadassou (INRA MIG) June 11 5 / 25

Page 8: Influence Function for Robust Phylogenetic Reconstruction

Validating the Tree

Inference Problems:Compare inferred tree to true tree to assess how good it is,

Inferred True

A B C A B C

But the true tree is not available!

Confidence Issue:How confident are we on the inferred tree ?

Which parts of the tree are reliable/not reliable ?How robust is the tree to small changes in the data and outliers ?

M. Mariadassou (INRA MIG) June 11 6 / 25

Page 9: Influence Function for Robust Phylogenetic Reconstruction

Validating the Tree

Inference Problems:Compare inferred tree to true tree to assess how good it is,

Inferred True

A B C

?But the true tree is not available!

Confidence Issue:How confident are we on the inferred tree ?

Which parts of the tree are reliable/not reliable ?How robust is the tree to small changes in the data and outliers ?

M. Mariadassou (INRA MIG) June 11 6 / 25

Page 10: Influence Function for Robust Phylogenetic Reconstruction

Validating the Tree

Inference Problems:Compare inferred tree to true tree to assess how good it is,

Inferred True

A B C

?But the true tree is not available!

Confidence Issue:How confident are we on the inferred tree ?

Which parts of the tree are reliable/not reliable ?How robust is the tree to small changes in the data and outliers ?

M. Mariadassou (INRA MIG) June 11 6 / 25

Page 11: Influence Function for Robust Phylogenetic Reconstruction

Bootstrap Values: the Theory

Original Dataset:

Alignment Phylogenetic treeA A C T TB G G A TC G G C C

A B C

66

Bootstrap Datasets:A A C T CB G G A GC G G C G

A C A T AB G G A GC G G C G

A T T T TB A T A TC C C C C

A B C A B C A B C

M. Mariadassou (INRA MIG) June 11 7 / 25

Page 12: Influence Function for Robust Phylogenetic Reconstruction

Bootstrap Values: A Robustness Index ?

Potential Causes for Uncertainty:Finite sequence lengths (sampling errors);Poor alignment quality (influent sites);Poor taxon sampling (rogue taxa);Model misspecification,. . .

Different Tools to Assess Them:Bootstrap: deals with global sources of uncertainty;Unable to pinpoint local sources of uncertainty;Need for other indexes to detect outliers

M. Mariadassou (INRA MIG) June 11 8 / 25

Page 13: Influence Function for Robust Phylogenetic Reconstruction

Bootstrap Values: A Robustness Index ?

Potential Causes for Uncertainty:Finite sequence lengths (sampling errors);Poor alignment quality (influent sites);Poor taxon sampling (rogue taxa);Model misspecification,. . .

Different Tools to Assess Them:Bootstrap: deals with global sources of uncertainty;Unable to pinpoint local sources of uncertainty;Need for other indexes to detect outliers

M. Mariadassou (INRA MIG) June 11 8 / 25

Page 14: Influence Function for Robust Phylogenetic Reconstruction

Outlier Sites: Motivation and Goal

Motivation: Filter DataIdentifying/filter out outliers corresponding to:

Sequencing errors;Alignment errors;Presence of an atypical DNA segment;. . .

ProcedureQuantify influence of site i by

Removing site i from alignment;Computing new tree T−i from smaller (jackknife) alignment;Comparing new tree to original tree;

Compute phylogeny from “not too influent sites”.

M. Mariadassou (INRA MIG) June 11 9 / 25

Page 15: Influence Function for Robust Phylogenetic Reconstruction

Outlier Sites: Motivation and Goal

Motivation: Filter DataIdentifying/filter out outliers corresponding to:

Sequencing errors;Alignment errors;Presence of an atypical DNA segment;. . .

ProcedureQuantify influence of site i by

Removing site i from alignment;Computing new tree T−i from smaller (jackknife) alignment;Comparing new tree to original tree;

Compute phylogeny from “not too influent sites”.

M. Mariadassou (INRA MIG) June 11 9 / 25

Page 16: Influence Function for Robust Phylogenetic Reconstruction

For Phylogenies...

DefinitionFor alignment of size n, the influence value of site i is:

IF(site i) = (n− 1)(LogLik(T−i)

n− 1− LogLik(T)

n

)Difference between support of alignments for their ML trees;Negative / Positive value: enhanced / weakened support (whenadding the site);Expect most sites with small negative values and a few with largepositive values.

Strategy towards greater stabilityFocus on outliers: sites with IF(site i) > 0;Rank them in decreasing IF(site i);Remove them one at the time until a stable tree is found.

M. Mariadassou (INRA MIG) June 11 10 / 25

Page 17: Influence Function for Robust Phylogenetic Reconstruction

Data: Zygomycetes & Chytridiomycetes

Lower mushrooms;

Biology: unknown!

Enough signal toresolve the topology;

1026 sites, 158 taxa,GTR model.

sBasidiomycota

Ascomycota

Zygomicota

Glomeromycota

Chytridiomycota

M. Mariadassou (INRA MIG) June 11 11 / 25

Page 18: Influence Function for Robust Phylogenetic Reconstruction

Information About Sites

0 200 400 600 800 1000

510

1520

25

Sites

num

ber o

f int

erna

l nod

es d

iffer

ent f

rom

ML

tree

0 200 400 600 800 1000

05

01

00

Sites

Influ

en

ce

va

lue

M. Mariadassou (INRA MIG) June 11 12 / 25

Page 19: Influence Function for Robust Phylogenetic Reconstruction

Distance Between Trees

0

2

4

6

8

10

1 2

14

10

2 0

30

4 05 0

1 02 0

3 04 0

5 0

Pen

ny-H

endy

dis

tanc

e

Tree w

ithout i

site

s i=1,..

.50

T ree w ithou t i s ites i=1 ,...50

D is ta n ce b e tw e e n tre e s

d(T0,Ti) ≈ 18 and d(Ti,Tj) ≤ 2 for i, j = 1..45

M. Mariadassou (INRA MIG) June 11 13 / 25

Page 20: Influence Function for Robust Phylogenetic Reconstruction

Observations

Strongest Outlier (position 142): Highly variable site located on amed loop (5nt) located on a conserved hairpin;

Removing most influent sites leads to Increased bootstrap valuesand loss of 20% of inner nodes;

Confirms monophyly of phyla Glomeromycota

Reinforces polyphyletic status of phyla Chytridiomycota andZygomycota

M. Mariadassou (INRA MIG) June 11 14 / 25

Page 21: Influence Function for Robust Phylogenetic Reconstruction

Taxon Influence Index: Motivation and Goal

Motivation: Filter DataStudy the robustness of the tree with respect to the speciesIdentify rogue taxa.

ProcedureQuantify influence of taxon i by:

Removing site i from alignment;Computing new tree T−i from smaller (jackknife) alignment;Comparing new tree to original tree;

Compute phylogeny from “not too influent sites”.

M. Mariadassou (INRA MIG) June 11 15 / 25

Page 22: Influence Function for Robust Phylogenetic Reconstruction

Taxon Influence Index: Motivation and Goal

Motivation: Filter DataStudy the robustness of the tree with respect to the speciesIdentify rogue taxa.

ProcedureQuantify influence of taxon i by:

Removing site i from alignment;Computing new tree T−i from smaller (jackknife) alignment;Comparing new tree to original tree;

Compute phylogeny from “not too influent sites”.

M. Mariadassou (INRA MIG) June 11 15 / 25

Page 23: Influence Function for Robust Phylogenetic Reconstruction

Method

Pruning

Inference

Inference

Ti T (−i)

T

i

TII(i) = 2

Species set (whole)

Species set (partial)

M. Mariadassou (INRA MIG) June 11 16 / 25

Page 24: Influence Function for Robust Phylogenetic Reconstruction

Data: Placental Mammal Phylogeny

Mitochondrial genome of 68 mammals;Amino Acids sequences;Sequences are 3658 AA long;MtMam + I + Γ4 model;Phylogeny published in Nikaido et al. in 2003.

M. Mariadassou (INRA MIG) June 11 17 / 25

Page 25: Influence Function for Robust Phylogenetic Reconstruction

Taxon Influence Index

BDS RF

00.

050.

10.

15

02

46

810

12

Eur. red squirrel

Guinea pig

RabbitEur. red squirrel

Guinea Pig

Rabbit

BSD: Branch-score distance / RF: Robinson-Foulds distancemean edge length: 0.05

M. Mariadassou (INRA MIG) June 11 18 / 25

Page 26: Influence Function for Robust Phylogenetic Reconstruction

Complete Phylogeny

American.pika

Australian.echidna

baboonbarbary.macaque

blue.whale

bonobo

Bornean.orangutan

cape.golden.mole

cat

chimpanzee

common.gibbon

common.wombat

donkey

dugong

elephant.shrew

fat.dormouse

formosan.lesser.horseshoe.bat

formosan.shrew

gorilla

greater.cane.rat

grey.sealharbor.seal

hippopotamus

horsehorseshoe.bat

human

Japanese.mole

Japanese.pipistrelle

little.red.flying.fox

long−eared.hedgehog

mouse

nine−banded.armadillo

North.American.opossumnorthern.brown.bandicoot

pig

polar.bear

rat

Ryukyu.flying.fox

sheep

silver−gray.brushtail.possum

sperm.whale

Sumatran.orangutan

vole

wallaroo

white.rhinoceros

78

93

91

97

83

76

99

79

97

69

74

85

87

76

85

9999

9984

89

98/75

94/28

92/14

97/54

98/5534/11

40/54

94/23

94/30

Eurasian.red.squirrel (8)

guinea.pig (12)

rabbit (8)

aardvark

African.elephant

alpaca

cape.hyrax

cow

dog

European.mole

fin.whale

greater.Japanese.shrew−mole

greater.moonrathedgehog

Indian.rhinoceros

Jamaican.fruit−eating.batlong−clawed.shrew

longtailed.bat

northern.tree.shrew

platypus

slow.loris

tenrec1

white−fronted.capchin

●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●

0 10 20 30 40 50 60

4060

8010

0

Nodes

BS

sco

res

(%)

●●

● ●

● ●

M. Mariadassou (INRA MIG) June 11 19 / 25

Page 27: Influence Function for Robust Phylogenetic Reconstruction

Influential Species (mostly in Afrotheria)American.pika

1

1

Eurasian.red.squirrel

●●

1

11

1

fat.dormouse

1

1

greater.cane.rat

1

1

1

guinea.pig

1

1

1

1

1

1

mouse

●●

11

1

nine−banded.armadillo

1

1

rabbit

●●

1

11

1

M. Mariadassou (INRA MIG) June 11 20 / 25

Page 28: Influence Function for Robust Phylogenetic Reconstruction

Data: Bilaterian Transcription Factor T-Box

T-box TF of 164 metazoans, involved in gastrulation;

Amino Acids sequences;

Sequences are 296 AA long;

LG + I + Γ4 model;

Ancient family with 8 subfamiliesBrachyuryTbx1/10Tbx15/18/22Tbx20Tbx2/3Tbx4/5Tbx6/VegTEomes/Tbr1/Tbx21

Interest lies in the position of TF OlTbx present in a Oscarellalobularis.

M. Mariadassou (INRA MIG) June 11 21 / 25

Page 29: Influence Function for Robust Phylogenetic Reconstruction

AaTbx6a

AaTbx6b

AgXP31467

AgXP31592

AgXP32060

AgXP32179

AgXP55771

AmphiEomes

AmphiTbx1

AmphiTbx2

AmphiTbx6

AmqTbx115

AmqTbx45

AmqTbxA

AmqTbxBAmqTbxC

AmqTbxE

ApBra

ApTbrain

AtH15−1

AtH15−2

AtOptomot

AtTbx1

Avtbx115

BfTbx1518

Cap1241000

Cap190074

Cap1960038

Cap2380027

Cap2960003

Cap7420011

Cape gw1 7

CiBra

CiTbx1

CiTbx1518CiTbx20

CiTbx23

CiTbx6aCiTbx6b

CiTbxA

Dmbi

Dmbyn

DmCG6634P

DmDoc1DmDoc3

DmH15

Dmorg1

DrEomesod

Drnotail

Drsimilar

DrT−box15

DrT−box18

DrT−box24

Drtbx1

Drtbx16

Drtbx2

Drtbx20

Drtbx4

Drtbx5

Drtbx6

GgTbx6

HeBraHmBra

HmTbxAHmTbxB

HmTbxC

HpBra

HrTbxA

HrTbxB

HrTbxCHrTbxD

HrTbxG

HrTbxH

HrTbxI

HrTbxJ

HrTbxK

HrTbxL

HrTbxM

HrTbxN

HrTbxO

HrTbxP

HybraB

LgTbr

LgTbxA

LgTbxBLgTbxC

LgTbxD

LgTbxE

LgTbxF

LgTbxG

LgTbxH

LgTbxI

LgTbxJ

LvBra

LvTbx23

Mlbra

MlTbx1

MltbxC

MltbxD

MltbxE

MmEomes

MmT

MmTbr1

MmTbx1MmTbx10

MmTbx15

MmTbx18

MmTbx19

MmTbx2

MmTbx20

MmTbx21

MmTbx22

MmTbx3

MmTbx4

MmTbx5

MmTbx6

MtTbx6

NvBra

Nvtbx1

NvTbx15b

NvTbx20

NvTbx23

NvTbxA

NvTbxB

NvTbxC

NvTbxD

NvTbxE

OlTbx

OmBra

PcBra

PcTbx45

Pdbra

Plbra

PpBra

PpTbx23

PvBra

SdBra

SdTbx2

SgTbx6aSgTbx6b

SpTBR1

SpTbx

SpTbx2

SpTbx2b

SrBra

TaBra

TaTbx23

TaTbxB

TaTbxDTaTbxE

Tcdorsocr

TcTbox20

XBRA

Xleomes

Xltbx1

Xltbx2

Xltbx20

Xltbx3

Xltbx5

Xltbx6

XlTbxAXlTbxB

XlTbxC

XlVEGT

880.96

920.98

981.00

710.93

840.76

810.99

680.98

880.98

750.99

880.98

520.78

851.00

620.97 99

1.00

790.98

871.00

780.97

540.92

730.99

1001.00

991.00 61

0.66

920.99

940.84

910.6

720.92

550.99

620.94

990.99

880.99

961.00

740.97

660.75

920.99

530.9279

0.99

780.99

1001.00

1001.00

1001.00

830.99

1000.64 89

0.98

1001.00

1001.00

1000.99

850.74

901.00

971.00

981.00

800.81

780.97

750.98

540.83

930.99

991.00

991.00

740.69

360.73

330.5

510.8

820.99

981.00

760.94

470.46

951.00

921.00

971.00

460.94

981.00

640.83

330.57

1001.00

390.53

770.85

981.00

540.95

1001.00

Tbr1/E

omes

Brachyury

Tbx2/3

Tbx6/D

oc Tbx6/V

egT

Tbx4/5

Tbx20

Tbx15/18/22

Tbx1

0.5

AaTbx6aAaTbx6b

AgX

P31

467

AgXP31592

AgXP32060

AgXP32179

AgX

P55

771

AmphiEomes

AmphiTbx1

AmphiTbx2

AmphiTbx6

AmqTbx115

Am

qTbx45

Am

qTbx

AA

mqT

bxB

Am

qTbx

CA

mqT

bxE

ApBra

ApTbrain

AtH

15−1

AtH

15−2

AtOptomot

AtTbx1 Avtbx115

BfTb

x151

8

Cap1241000

Cap190074

Cap1960038

Cap2380027

Cap2960003

Cap

7420

011

Cape gw

1 7

CiBra

CiTbx1

CiT

bx15

18

CiT

bx20

CiTbx23

CiTbx6aCiTbx6b

CiTbxA

Dmbi

Dm

byn

Dm

CG

6634

P

DmDoc1DmDoc3

Dm

H15

Dmorg1

DrEomesod

Drnotail

Drsim

ilar

DrT−b

ox15

DrT−box

18

DrT−box24Drtbx1

Drtbx16

Drtbx2

Drtbx20

Drtbx4

Drtbx5

Drtbx6

GgTbx6

HeB

raH

mB

ra

HmTbxAHmTbxB

HmTbxC

HpBra

HrTbx

A

HrTbxB

HrT

bxC

HrT

bxD

HrTbx

G

HrTbxH

HrT

bxI

HrT

bxJ

HrT

bxK

HrTbxL

HrTbxM

HrTbxN

HrTbxOHrTbxP

HybraB

LgTbr

LgTbxA

LgTbx

B

LgTbx

C

LgTbxD

LgT

bxE

LgTbxF

LgTbxG

LgTbxH

LgT

bxI

LgT

bxJ

LvBra

LvTbx2

3

Mlbra

MlTbx1

Mltb

xC

MltbxD

MltbxE

MmEomes

Mm

T

MmTbr1

MmTbx1MmTbx10

Mm

Tbx1

5

MmTbx18

Mm

Tbx19

MmTbx2

Mm

Tbx

20

MmTbx21

MmTbx22

MmTbx3

Mm

Tbx4M

mT

bx5

Mm

Tbx6

MtTbx6

NvB

ra

Nvtbx1

NvTbx1

5b

NvT

bx20

NvT

bx23

NvTbxA

NvT

bxB

NvT

bxC

NvT

bxD

NvTbxE

OlTbx

Om

Bra

PcBra

PcT

bx45

PdbraPlbra

PpB

ra

PpTb

x23

PvBra

SdB

ra

SdT

bx2

SgT

bx6a

SgT

bx6b

SpTBR1

SpTbx

SpT

bx2

SpTbx2b

SrB

ra

TaBra

TaTb

x23

TaTbxB

TaTbxD

TaT

bxE

Tcdorsocr

TcTb

ox20

XB

RA

Xleomes

Xltbx1

Xltbx2

Xltbx20

Xltbx3

Xltbx5

Xltbx6

XlTbx

A

XlTbxB

XlTbxCXlVEGT

M. Mariadassou (INRA MIG) June 11 22 / 25

Page 30: Influence Function for Robust Phylogenetic Reconstruction

AgXP32060

AgXP32179

AgXP55771

AmphiEomes

AmphiTbx1

AmphiTbx2

AmphiTbx6

AmqTbx45

ApBra

ApTbrain

AtH15−1AtH15−2

AtOptomot

AtTbx1

BfTbx1518

Cap1241000

Cap190074

Cap1960038

Cap2380027

Cap2960003

Cap7420011

CiBra

CiTbx1

CiTbx1518CiTbx20

CiTbx23

Dmbi

Dmbyn

DmCG6634P

DmH15

Dmorg1

DrEomesod

DrnotailDrsimilar

DrT−box15DrT−box18

DrT−box24

Drtbx1

Drtbx16

Drtbx2

Drtbx20

Drtbx4

Drtbx5

Drtbx6

GgTbx6

HmBra

HpBra

HrTbxAHrTbxB

HrTbxGHrTbxH

HrTbxJHrTbxK

HrTbxN

HrTbxO

HrTbxP

HybraB

LgTbxA

LgTbxBLgTbxC

LgTbxD

LgTbxE

LgTbxF

LgTbxH

LgTbxI

Mlbra

MlTbx1

MltbxC

MltbxD

MmEomes

MmT

MmTbr1

MmTbx1MmTbx10

MmTbx15

MmTbx18

MmTbx19

MmTbx2

MmTbx20

MmTbx21

MmTbx22

MmTbx3

MmTbx4

MmTbx5

MmTbx6

NvBra

Nvtbx1

NvTbx15b

NvTbx20

NvTbx23

OlTbx

OmBra

PcBra

PcTbx45

Pdbra

Plbra

PvBra

SdTbx2

SpTBR1

SpTbx

SpTbx2b

TaBra

TaTbx23

TcTbox20

XBRA

Xleomes

Xltbx1

Xltbx2

Xltbx20

Xltbx3

Xltbx5

Xltbx6

XlTbxA

XlTbxB

XlTbxC

XlVEGT

990.52

740.5

46

42*

500.98

730.48

720.27

96*

91

93

760.28

660.8 85

0.99

740.5

540.56

621.00

6668

561.00

910.4

4858*

1000.97

84

740.92

100

55

94

1000.64

85

700.66

7882

0.46

530.6

10086

43 46

72*

76

431.00

420.87

42

44*

880.83

100*

98

99100

54

5696

980.97

100

5646

0.57

98

99

74

46

42

5073

0.95

720.6

960.54

910.72

930.99

76

660.64 85

0.78

741.00

540.72

621.00

6668

560.81

91

480.98 58

1000.99

840.98

740.94

1001.00 55

94

1000.69

24*

52*

980.49

41*

660.76

96

980.98

94

341.00

590.99

790.94

100

100

24*

520.44

98

410.38

66

960.44

980.99

940.92

980.73

96

98

Tbr1/E

omes

Brachyury

Tbx2/3

Tbx6/V

egTT

bx4/5

Tbx20

Tbx15/18/22

Tbx1

0.5

AgX

P32060

AgXP32

179

AgX

P55771

AmphiEomes

AmphiTbx1

Am

phiT

bx2

AmphiTbx6

AmqTbx45

ApB

ra

ApTbrain

AtH

15−1

AtH

15−2

AtOpt

omot

AtTbx1

BfT

bx15

18

Cap

1241

000

Cap190074

Cap1960038

Cap2380027

Cap

2960

003

Cap7420011

CiB

ra

CiTbx1

CiT

bx15

18

CiT

bx20

CiT

bx23

Dmbi

Dm

byn

Dm

CG

6634P

Dm

H15

Dmorg1

DrEomesod

Drnotail

Drsimilar

DrT

−box

15

DrT

−box

18

DrT−box24

Drtbx1

Drtbx16

Drtbx2

Drt

bx20

Drtbx4

Drtbx5

Drtbx6GgTbx6

HmBra

HpB

ra

HrT

bxA

HrT

bxB

HrT

bxG

HrT

bxH

HrT

bxJH

rTbxK

HrTbxN

HrTbxO

HrTbxP

HybraB

LgTbxA

LgTb

xBLg

TbxC

LgTb

xD

LgTbxE

LgTbxF

LgTbxH

LgTbxI

Mlbra

MlTbx1

MltbxCMltb

xD

MmEomes

Mm

T

MmTbr1

MmTbx1

MmTbx10

Mm

Tbx1

5

Mm

Tbx1

8

MmTbx19

MmTbx2

Mm

Tbx

20

MmTbx21

Mm

Tbx2

2

MmTbx3

MmTbx4

Mm

Tbx5

MmTbx6

NvBra

Nvtbx1

NvTbx

15b

NvT

bx20

NvTbx23

OlTbx

Om

Bra

PcB

ra

PcTbx45

Pdb

ra

Plbra

PvB

ra

SdTbx2

SpTBR1

SpTbx

SpTbx2b

TaBra

TaTbx23

TcT

box20

XBRA

Xleomes

Xltbx1

Xltbx2

Xltb

x20

Xltbx3

Xltbx5

Xltbx6

XlTbx

AX

lTbx

B

XlTbxC

XlVEGT

M. Mariadassou (INRA MIG) June 11 23 / 25

Page 31: Influence Function for Robust Phylogenetic Reconstruction

T-Box TF Phylogeny

Branch Depth

Boo

tstr

ap V

alue

0

20

40

60

80

100

0 20 40 60

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

127 Tbox

mean 164

loess 164

mean 127

loess 127

0 20 40 60

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

123 Tbox

mean 164

loess 164

mean 123

loess 123

0 20 40 60

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

116 Tbox

mean 164

loess 164

mean 116

loess 116

Number of Tbx in tree 164 127 123 116BP of OlTbx in clade TBX2/3 19 32 34 59

No sponge (yet) identified as a member of the Tbx2/3 subfamily: newevolutionary hypothesis.

M. Mariadassou (INRA MIG) June 11 24 / 25

Page 32: Influence Function for Robust Phylogenetic Reconstruction

Summary

Potential Sources of Instability and Indexes to Detect Them1 Data sampling: Bootstrap;

2 Outliers: Influence function for sites;

3 Rogue species: Taxon Influence Index.

Pros and Cons1 Assess uncertainty coming from different sources;

2 Highlight potential outliers;

3 No rigorous statistical threshold for inclusion (p-values,...)

M. Mariadassou (INRA MIG) June 11 25 / 25