phylogenomics and morphology of extinct paleognaths reveal the

44
Current Biology, Volume 27 Supplemental Information Phylogenomics and Morphology of Extinct Paleognaths Reveal the Origin and Evolution of the Ratites Takahiro Yonezawa, Takahiro Segawa, Hiroshi Mori, Paula F. Campos, Yuichi Hongoh, Hideki Endo, Ayumi Akiyoshi, Naoki Kohno, Shin Nishida, Jiaqi Wu, Haofei Jin, Jun Adachi, Hirohisa Kishino, Ken Kurokawa, Yoshifumi Nogi, Hideyuki Tanabe, Harutaka Mukoyama, Kunio Yoshida, Armand Rasoamiaramanana, Satoshi Yamagishi, Yoshihiro Hayashi, Akira Yoshida, Hiroko Koike, Fumihito Akishinonomiya, Eske Willerslev, and Masami Hasegawa

Upload: dangnguyet

Post on 14-Feb-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Current Biology, Volume 27

Supplemental Information

Phylogenomics and Morphology

of Extinct Paleognaths

Reveal the Origin and Evolution of the Ratites

Takahiro Yonezawa, Takahiro Segawa, Hiroshi Mori, Paula F. Campos, YuichiHongoh, Hideki Endo, Ayumi Akiyoshi, Naoki Kohno, Shin Nishida, Jiaqi Wu, HaofeiJin, Jun Adachi, Hirohisa Kishino, Ken Kurokawa, Yoshifumi Nogi, HideyukiTanabe, Harutaka Mukoyama, Kunio Yoshida, Armand Rasoamiaramanana, SatoshiYamagishi, Yoshihiro Hayashi, Akira Yoshida, Hiroko Koike, FumihitoAkishinonomiya, Eske Willerslev, and Masami Hasegawa

Page 2: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Figure S1

Figure S1 | Maximum likelihood tree of Aves as inferred from concatenated sequences of

multiple nuclear genes (nuc) and mitochondrial genomes (mt) (Related to Figure 1). The numbers

on the branches indicate the bootstrap probabilities (nuc+mt/nuc/mt). Several nodes do not exist

because of missing taxa in each dataset. The bootstrap probabilities in such nodes are indicated as NA

Page 3: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

(not applicable). “--” indicate nodes not supported by the maximum likelihood tree of respective

dataset. This tree topology was used in estimating the divergence times, ancestral body (or egg)

weights and geographic distributions except that the basal position of Anhimidae was assumed within

Anseriformes following the previous studies [S1, S2] as shown in the sub-tree in the dashed lined

rectangular. Although the sister relationship between Anhimidae and Anseranas was preferred by our

analysis, it may be an analytical artefact caused by the very limited number of shared sites within

Anseriformes in our supermatrix. The nodal numbers correspond to the numbers in “Table S2, S3 and

S5”. The black nodes were calibrated using the fossil evidence for the divergence time estimations. The

grey nodes were optionally calibrated to test the hypothesis of the divergence times.

Page 4: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Figure S2

Page 5: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Figure S2 | Effects of taxon sampling and fossil calibrations on the divergence time estimates of

Palaeognathae (Related to Figure 1). The black branches indicate divergence time estimates without

any calibration between Neognathae and Palaeognathae. The time tree with blue branches was

calibrated with the fossil of Ichthyornis (Neognathae/Palaeognathae split was younger than 86.5 mya),

and the time tree with red branches was calibrated with the fossil of Enaliornis

(Neognathae/Palaeognathae split was younger than 100.5 mya). The images on the left indicate taxon

sampling. The first time tree was based on all taxa including ratites, tinamous, neognaths and reptiles

(outgroup). The second time tree was based on the Aves including ratites, tinamous and neognaths (no

root in this tree). The third time tree was based on ratites, tinamous and reptiles. The last time tree was

based on ratites, neognaths and reptiles.

Page 6: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Figure S3

Page 7: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Figure S3 | Reconstruction of the ancestral geographic distribution areas (Related to Figure 3

and Figure 4). The colours of the branches and the nodes indicate the geographic distribution areas.

Fossil species without molecular data are indicated by †. The species indicated by circles were used for

the reconstruction of the ancestral geographic distribution. The species indicated by squares are fossil

species and were not involved in the analysis. (A) Reconstructed ancestral geographic distribution

areas based on the 2 states (Northern Hemisphere and Southern Hemisphere) model by the parsimony

method. (B) Reconstructed ancestral geographic distribution areas based on the 7 states (Palearctic,

Nearctic, Afrotropical, South America, Australia, Madagascan, and Zealandia) model by the Bayesian

method. Since there are no paleognaths including fossil species in Indomalay region, Indomalay was

fused in Palearctic region. (C) Reconstructed ancestral geographic distribution areas based on the 5

states (Palearctic+Nearctic, Afrotropical, South America+Australia+Antarctica, Madagascan, and

Zealandia) model by the Parsimony method. (D) Reconstructed ancestral geographic distribution areas

based on the 5 states (Palearctic+Nearctic, Afrotropical, South America+Australia+Antarctica,

Madagascan, and Zealandia) model by the Bayesian method. The regions in a pie chart on a node are

proportional to the posterior probabilities.

Page 8: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Table S1: List of ancient elephant bird samples analyzed in this study (Related to Figure 1)

Lab. No. Location Excavation age Genus Part GLa PBb PDc Date BP

07AEP05 Beloha Aepyornis Tarsometatarsus 354.5 152.8 131.2 1580 ± 80

07AEP06 Beloha Mullerornis Tarsometatarsus 292.9 74.8 73.6 1290 ± 90

07APE07 Belo-sur-Mer Mullerornis Tarsometatarsus 306.3 78.4 73.2

07APE08 Beloha Mullerornis Tarsometatarsus 292.2 71.9 70.6

07APE09 Beloha Mullerornis Tarsometatarsus 339 81.6 78

07APE10 E25B Aepyornis Tarsometatarsus 338.5 120.4 128.1

07APE11 Antseirab Mullerornis Tarsometatarsus 267.8 98.7 94.9

08AEP07 Beloha 1914 Aepyornis Tibiotarsus 618 591 472 1581 ± 23*

08APE01 Antsirabe 1914 Aepyornis? Tarsometatarsus 261.8 100.1 98.2

08APE03 ? Aepyornis Tarsometatarsus 334 132.7 --

Index refers to the index used while constructing Illumina DNA libraries.

14C dates were determined by AMS at the MALT facility, the University of Tokyo.

*Graphite preparation was performed at the 14C Dating Laboratory, the University Museum, the University of Tokyo and 14C dates were measured at Paleo Lab Co., Ltd.

aGL: Greatest length (mm)

bBreadth of the proximal end (mm)

cPD: Breadth of the distal end (mm)

Page 9: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Table S2: Fossil calibrations used for the divergence time estimations (related to Figure 1)

nodal number divergence event maximum time minimum time Reference

node23 Crocodilia - Aves 259 mya 243 mya Haddrath&Baker[S3],Benton et al.[S4], Muller&Reisz [S8]

node12 Casuarius - Dromaius 35 mya 25 mya Haddrath& Baker [S3], Boles [S6]

node24 Procellariiformes - Sphenisciformes 62 mya 60 mya Haddrath& Baker [S3], Slack et al. [S12]

node25 Rostratulidae - Jacanidae* 32 mya 30 mya Haddrath& Baker [S3], Rasmussen et al.[S11]

node26 Anatidae - Anseranatidae 68 mya 66 mya Haddrath& Baker [S3], Clarke et al. [S7]

node27 Caiman - Alligator 71 mya 66 mya Haddrath& Baker [S3], Muller&Reisz [S8]

node1 Neognathae - Palaeognathae 100.5 mya Bell&Chiappe [S5], O’Connor [S9]

86.5 mya Benton et al.[S4]

node14 Dinornithidae - Tinamidae 66 mya Parris&Hope [S10]

Nodal numbers are corresponding to "Supplemental Fig S1".

Basic callibrations

Optional callibrations

Optional callibrations were used for evaluating the stability of the divergence times and for testing the evolutionary hypothesis.

*This callibration was originally used for Scolopacidae- Jacanidae split [S3]. However, since this calibration is based on the oldest fossil of Jacanidae, it should be

applied to the split of Jacanidae and its closest relative (Rostratulidae in this case) rather than Scolopacidae- Jacanidae split. With the aim of evaluating the

appropriateness of this calibration, we optionaly excluded this calibration, but it resulted the limited effect [see Supplemental Table S3]

Page 10: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Table S4: Species identification of the eggshells from Madagascar (related to Figure 2)

The numbers of base differences between the nucleotide sequences from egg shells and those of known species are shown.

All ambiguous sites were removed from this analysis.

Aepyornis GU799601 Aepyornis GU799600 Mullerornis GU799591

Aepyornis maximus 07AEP05 (this study) 0 1 10

Aepyornis maximus 08AEP07 (this study) 0 1 10

Aepyornis hildebrandti KJ749824 1 2 10

Mullerornis 07AEP06 (this study) 5 6 0

Mullerornis agilis AY016018 5 6 0

Mullerornis agilis KJ749825 5 6 0

Apteryx haastii AF338708 12 13 19

Apteryx owenii GU071052 13 14 20

Casuarius casuarius AF338713 14 15 13

Dromaius novaehollandiae AF338711 14 13 14

Dinornis giganteus AY016013 9 8 16

Anomalopteryx didiformis NC 002779 10 9 16

Emeus crassus AY016015 9 8 17

Eudromia elegans AF338710 14 14 24

Tinamus major AF338707 15 14 23

Rhea americana AF090339 8 9 15

Pterocnemia pennata AF338709 7 8 16

Struthio camelus AF338715 15 14 16

total sequence length 82bp 82bp 230bp

sequences from egg shell

Page 11: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Table S3: Stability of the estimated divergence times (Related to Figure 1) [separate file]

Table S5: Body and egg weights of the extant and ancestral aves (Related to Figure 2 and Figure 4) [separate file] [S13-S25]

Page 12: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Supplemental Experimental Procedures

Ancient DNA

Information on the specimens

The sub-fossils of elephant birds used for this study were kept in the specimen room of the

paleontological laboratory of the Department of Biological Anthropology and Palaeontology, Faculty of

Science, University of Antananarivo (Antananarivo, Madagascar). We transferred the specimens between

27 and 30 August 2007 and between 16 and 18 September 2008 with the permission of the government of

Madagascar. Although detailed information on the sampling area and date is not archived, locality names

and numbers were written on the surface of the bones. For example, bones inscribed with Beloha or

Antsirabe and 1914 were assumed to have been collected at Beloha or Antsirabe in 1914. The specimens

from Beloha were morphologically classified into Aepyornis maximum (abbreviated as AEP) and

Mullerornis sp. (abbreviated as MUL). To reduce the possibility of sampling more than once from the

same individual, only the tarsometatarsus was used for the analysis, except for specimen 08AEP07, for

which the tibiotarsus was used. Approximately 0.4 g of bone was sampled from a section of

tarsometatarsus or tibiotarsus with a Volvere GX dental drill (Nakanishi) using a diamond disk55 dental

disk (Shofu Inc.), and the outermost layers were abraded to remove contaminating material. Detailed

information on these specimens is summarized in Table S1. Dates before present (BP) of the AEP and

MUL bone samples were measured by C14 analysis at the laboratory of carbon dating, University

Museum, University of Tokyo (07AEP05, 1580 ± 80 BP; 07AEP06, 1290 ± 90 BP; 08AEP07, 1581 ± 23

BP; Table S1).

DNA extraction from fossil species

DNA extraction from 10 ancient elephant bird bone samples (Table S1) was performed at the Centre for

GeoGenetics, Denmark. DNA extraction was performed as described previously [S26]. In brief,

approximately 0.15 g of bone powder generated by drilling was incubated overnight at 55°C in 1 ml of

buffer containing 0.5 M EDTA (pH 8.0) (Life Technologies) and 0.1 mg/ml proteinase K (Roche). After

centrifugation at 470 relative centrifugal force for 5 minutes at room temperature, supernatant was

Page 13: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

transferred to 30-kDa Amicon filter units (Millipore) and centrifuged at 4000 g. Approximately 250 µl of

concentrate was recovered from each sample and further purified using a MinElute PCR Purification Kit

(Qiagen) following the manufacturer’s instructions, except that in the elution step the spin column was

incubated with 45 μl of EB buffer at 37°C for 10 minutes. The same procedure without bone powder was

conducted as a negative control and the resulting eluate was examined to identify contaminating DNA.

Genome library construction

A-tailed libraries were prepared with 16 μl of DNA sample using a NEBNext Quick DNA Library

Prep Master Mix Set for 454 (E6090; New England BioLabs) according to the manufacturer’s protocol but

without the DNA fragmentation process. Illumina sequencing adapters were added to the end-repair

reaction at a final concentration of 0.25 μM together with 1 unit of Quick T4 DNA ligase. DNA fragments

in the reaction mixture were purified using the MinElute PCR Purification Kit (Qiagen) and eluted in 20

μl of EB buffer at 37°C for 10 minutes.

Three microlitres of the eluate were subjected to PCR amplification in 50 μl of reaction mixture

containing 1 High Fidelity PCR buffer, 2 mM MgSO4, 0.2 mM of each dNTP, 1 unit Platinum Taq

DNA polymerase High Fidelity (Life Technologies), 0.2 μM of the Illumina Multiplexing PCR primer

InPE1.0 and 0.004 µM of InPE2.0, and 0.2 µM of PCR primer with an index [S27]. PCR conditions

consisted of initial denaturation for 4 minutes at 94°C, 12 cycles of 30 seconds at 94°C, 30 seconds at

60°C and 45 seconds at 68°C, and a final extension for 7 minutes at 72°C. Three independent reaction

mixtures were combined after PCR and purified using a NucleoSpin Gel and PCR Clean-up Kit

(Macherey-Nagel) with a 10 minute incubation at 37°C for the elution step. A seconround PCR consisting

of 18 cycles was performed under the same conditions except that only a modified InPE1.0 primer and the

indexed primer were used. The PCR products were separated by 3% agarose gel electrophoresis and

purified as described above.

Blunt-end libraries were constructed according to Orlando et al. [S28], using 21.25 µl of DNA

sample and the NEBNext DNA Library Prep Mater Mix Set for 454 (E6070; New England BioLabs).

Page 14: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

DNA purification after adapter ligation was performed as described above. PCR conditions were the same

as above except that 10 instead of 12 cycles were used in the second-round PCR.

Illumina sequencing

Single-end and paired-end reads were generated on an Illumina MiSeq platform using the MiSeq Reagent

Kit v2 or v3 (Illumina) at the NIPR. Read files (fastq.gz) were generated using MiSeq Reporter software

version 2.3.32 (Illumina). As a result, raw sequence reads (208,414,418 for Aepyornis maximums and

176,042,008 for Mullerornis sp.) were generated.

PCR amplification and Sanger sequencing of D-loop and tRNA (Pro-Thr) regions of mitochondrial

genomes

Mitochondrial genome regions not recovered by Illumina sequencing were amplified by PCR using

the primers shown in DataBaseTable2 (http://aepyornis.paleogenome.jp/) and the p5 or p7 region of the

Illumina adapter sequences. PCR amplification was performed in 25 μl of reaction mixture containing 1 μl

of genome library DNA, 1.25 U Platinum Taq DNA polymerase High Fidelity (Life Technologies), 1

High Fidelity PCR buffer, 2 mM MgSO4, 0.25 mM of each dNTP and 0.25 µM of each primer. PCR

conditions were as follows: 3 minutes of initial denaturation at 94°C, 30 cycles of 45 seconds of

denaturation at 94°C, 45 seconds of annealing at 53°C, and 60 seconds extension at 68°C, with a final

extension for 10 minutes at 72°C. A control reaction without the DNA template was performed for each

PCR attempt. PCR products were sequenced using a BigDye Terminator Cycle Sequencing Kit v3.1 and

an ABI 3130xl Genetic Analyser (Applied Biosystems).

Bioinformatics

Quality filtering of Illumina sequence data

We discarded the Illumina MiSeq platform reads that contained ambiguous nucleotides or were mapped to

the PhiX genome sequence, using Bowtie 2 version 2.1.0 [S29] with default parameters. After that, we

removed the adapter sequences in the reads using Cutadapt version 1.2.1 [S30] and low quality regions in

Page 15: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

the 3' end of the reads with a Phred-like quality score <17. In addition, we discarded the reads that

contained <50 bp or were associated with an average Phred-like quality score <25.

Identification of Aepyornis maximus and Mullerornis sp. reads

The MiSeq reads that were derived from Aepyornis maximus or Mullerornis sp. genomes were identified

as follows. (i) An in-house nucleotide database comprising the genome sequences of paleognaths and their

relatives (designated as in-house Palaeognathae DB) was constructed by combining previously reported

sequence data [S3, S31-32] (DataBaseDataset1: http://aepyornis.paleogenome.jp/ ). (ii) All of the high

quality reads were subjected to BLASTN searches against the in-house Palaeognathae DB with E-value

<0.001. (iii) The reads that matched the sequences in the in-house Palaeognathae DB were subjected to

BLASTN searches against the GenBank-nt database (January 2014) with E-value <1e−4. (iv) The reads

that matched the sequences from birds or reptiles in the GenBank-nt database were regarded as genome

fragments of Aepyornis maximus or Mullerornis sp.

Analyses of DNA fragmentation and nucleotide missincorporation patterns

Although the MiSeq sequencing conditions used in this study can generate reads with >250 bp, the

average and median of the actual lengths of the reads identified as Aepyornis maximus or Mullerornis sp.

mitochondrial sequences were only 75–89 bp (DataBaseTable3: http://aepyornis.paleogenome.jp/ ).

We used mapDamage version 0.3.3 [S33] to analyse DNA fragmentation and nucleotide

missincorporation patterns across all of the identified mitochondrial reads from four samples (two from

Aepyornis maximus and two from Mullerornis sp.). In each sample, the reads were mapped to the

Aepyornis maximus or Mullerornis sp. mitochondrial genome sequence with BWA version 0.6.1 [S34],

and analysed using mapDamage [S33] with parameters (-l 300 -u).

The patterns of DNA fragmentation and nucleotide missincorporation analysed using MapDamage are

shown in DataBaseFigure5 (http://aepyornis.paleogenome.jp/). The results showed typical patterns of

ancient DNA which indicate (a) an increase in missincorporation of thymine instead of cytosine residues

at the 5′-end regions of the reads, which is in parallel with that of adenine instead of guanine residues at

Page 16: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

the 3′-end regions and (b) a higher frequency of guanine and adenine at the nucleotide site adjacent to the

5′-end of the reads. It suggests that the former reflects cytosine deamination and that the latter reflects

strand breaks resulting from depurination of purines [S35].

Mitochondrial genome reconstruction

MiSeq reads identified as mitochondrial genome fragments of Aepyornis maximus or Mullerornis sp.

were used for the reconstruction of the nearly complete mitochondrial genome sequences of each species

by the following procedure: (i) We conducted the initial assembly by combining the identified

mitochondrial reads from two samples per species and mapping them against the mitochondrial genome

sequence of Dromaius novaehollandiae (AF338711) using MIRA version 4.0 [S36] with parameters (-

NW:cnfs=warn -NW:cmrnl=no -AS:nop=1 -SK:bph=10 -CL:pecbph=20 SOLEXA_SETTINGS -

CO:msr=no -AS:epoq=no -AS:mrpc=2 -SK:pr=80 -AL:ms=10 -AL:mo=15). (ii) An iterative refinement

of the assembly was performed using MITObim version 1.7 [S37] with a maximum of 20 rounds of

iteration. (iii) For each gap region on the mitochondrial genome, candidate reads that filled the gap were

identified by a BLASTN search using all of the high quality reads before taxonomic assignment and

Sanger sequencing reads obtained as described above, as queries against the initial assembly of the

mitochondrial genome sequence with E-value <0.1, identity >80%, and alignment length >20 bp. (iv) To

remove partially matched false positive reads, we constructed a multiple alignment of the candidate reads

and the initial assembly of the mitochondrial genome sequence using MAFFT version 7.13 [S38] with

parameters (--localpair --maxiterate 1000). (v) The reads that overlapped gap regions and almost

completely aligned with the initial assembly of the mitochondrial genome sequence were used for manual

gap filling. The assembled sequences of Aepyornis maximus and Mullerornis sp. were included in the

phylogenomic analyses. Because coverage of the Aepyornis maximus and Mullerornis sp. genomes is not

extensive, their ancestral mitochondrial genome sequences were reconstructed from the combined data of

two individuals for each species using PAML ver. 4.7 [S39] with the codon substitution+Γ model for

protein coding genes and the GTR+Γ model for RNA genes and introns. The composite genomes of

Page 17: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Aepyornis maximus and Mullerornis sp. were aligned with the nucleotide sequences of other avian species

and reptilian species.

The numbers of MiSeq raw reads and high quality reads are summarized in DataBaseTable4,

DataBaseTable5 and DataBaseTable6 (http://aepyornis.paleogenome.jp/) present the results of automated

identification of the Aepyornis maximus and Mullerornis sp. reads. The total number of contigs and

singleton reads for each nuclear gene after manual sequence validation is presented in DataBaseTable7

(http://aepyornis.paleogenome.jp/). The assembly results of Aepyornis maximus and Mullerornis sp.

mitochondrial genomes are summarized in DataBaseTable8 (http://aepyornis.paleogenome.jp/). The

Aepyornis maximus and Mullerornis sp. mitochondrial genome sequences were deposited with accession

numbers AP014697 and AP014698, respectively. The nucleotide alignments are available from

http://aepyornis.paleogenome.jp/.

Reconstruction of nuclear genes

MiSeq reads identified as a part of nuclear genes of Aepyornis maximus or Mullerornis sp. were used

for the reconstruction of each gene sequence as follows. (i) The identified reads were independently

assembled for each gene using CAP3 software [S40]. (ii) Assembled contigs and singletons were multiply

aligned with reference gene sequences in the in-house Palaeognathae DB using MAFFT [S38] with

parameters (--adjustdirectionaccurately --genafpair --maxiterate 1000). (iii) Partially matched false

positive reads were manually removed by checking the multiple alignment result with the UGENE viewer

[S41]. (iv) Each gene sequence of Aepyornis maximus and Mullerornis sp. was manually reconstructed

from the multiple alignment result. The minimum read depth required for reconstruction of a nuclear

sequence was one read.

As shown in the DataBase Table 5, the numbers of the high quality reads of Miseq data are

187,903,977 reads (Aepyornis maximus in total) and 150,210,376 reads (Mullerornis sp. in total). The

number of MiSeq reads identified as nuclear gene (locus) sequences are shown in the DataBase Table 6,

and it indicates 30,174 reads were identified as Aepyornis maximus, and 29,134 reads were identified as

Mullerornis sp. Therefore, 0.016% (Aepyornis maximus) ~ 0.019% (Mullerornis sp.) are endogenous

Page 18: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

DNA. Since our in house database (DataBase Table 9) consists of 871,499 sites (≈0.87 Mbp), if avian

genome size is assumed to be 1 G bp, our in house database covers about 1/1,250 of the whole genome.

Therefore, theoretically, about 20% of the Miseq reads are endogenous DNA. However, considering

genome fragment lengths of the reconstructed Aepyornis maximus and Mullerornis sp. are only 89,203

sites (10.2% of in house database) and 49,480 sites (5.68% of in house database), respectively, it is

plausible the portion of endogenous DNA is much smaller.

Evolutionary Analyses

Molecular phylogenetic inference

Our data consisted of mitochondrial genomes (13 protein-coding regions, 2 ribosomal RNA

genes, 22 transfer RNA genes: 15,977 bp in total), and multiple nuclear genes including the following

three datasets: those of Haddrath and Baker [S3] (10 nuclear loci; exons: 9,888 bp), Harshman et al.

[S32](19 nuclear loci; exons and introns: 26,775 bp) and Smith et al. [S42] (40 nuclear loci; introns:

25,131 bp). These sequences were separately aligned using MAFFT [S38] and MUSCLE [S43], and were

carefully checked visually. Detailed information on these genes is presented in DataBaseTable9

(http://aepyornis.paleogenome.jp/). The alignment was deposited on the web site

http://aepyornis.paleogenome.jp/. Taking account of the differences in taxon sampling, the partitions of

the mitochondrial genomes and the three nuclear gene datasets were analysed separately. Moreover, the

exon regions and intron regions in the dataset of Harshman et al. [S32] were dealt with separately. Then,

the best partition among loci within each data set was chosen using PartitionFinder [S44] under the

GTR+I+Γ model with the unlinked branch length model. The phylogenetic tree was inferred by the

maximum likelihood method using RAxML ver. 7.8.1 [S45]. The MTMAM+F+I+Γ model was used for

the amino acid sequences of protein-coding genes of the mitochondrial genomes. The GTR+I+Γ model

was used for the other regions. Taking account of the difference among codon positions, the first, second

and third positions of nuclear exons were partitioned. The confidence limits of the internal branches were

evaluated using the rapid bootstrap algorithm. The branch lengths of each partition were independently

estimated (unlinked branch length model).

Page 19: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

The maximum likelihood trees as inferred from mitochondrial genomes (mt), multiple nuclear

genes (nuc), and combined sequence data (mt+nuc) are shown in Figure S1. The mt and nuc data support

an essentially identical topology within the Palaeognathae. A sister group relationship of the elephant bird

and the kiwi was supported by the mitochondrial genome [S46], and also by the combined analysis of

multiple nuclear genes and the mitochondrial genome, although the support value from the nuclear genes

alone was not high and was dependent on the model (partition model) and taxon sampling (data not

shown). The maximum likelihood tree as inferred from the combined sequence data (mt+nuc) gives strong

support to the basal position of the ostrich (100% BP). The sister group relationship between the elephant

bird and the kiwi (99% BP), between the cassowary and the emu (100 % BP) and between the tinamou

and the moa (100% BP) were strongly supported, and that between the elephant bird–kiwi clade and the

cassowary–emu clade was moderately supported (80% BP). However, the position of the rhea was only

weakly supported (33% BP), and was highly affected by the model and taxon sampling (data not shown).

Morphological phylogenetic inference

To clarify the phylogenetic positions of the fossil paleognaths that became extinct in the

Palaeogene and the Neogene, such as Lithornis, Palaeotis and Emuarius, we reanalysed the morphological

data matrix created by Mayr [S47]. Mitchell et al. [S46] and Mayr [S47] carried out phylogenetic analyses

of the Palaeognathae based on total evidence from molecular and morphological data, and identified a

monophyletic relationship between Lithornis and the Tinamiformes. However, according to the results of

the pseudo-extinct analysis by Springer et al. [S48], in which they treated particular extant taxa (for

example, Carnivora and Afrotheria in Eutherian mammals) as if extinct and assumed that only osteologic

morphology data were available, phylogenetic analyses based on total evidence often failed to reconstruct

the well-established tree.

The major difficulty in phylogenetic inference based on morphologic data are that there are

considerable amount of convergences of morphologic characters, but the current substitution model does

not take convergent evolution into account [S49, S50]. Consequently, independent lineages that possess

convergently acquired characters (for example, two distantly related groups such as Xenarthra and

Page 20: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Pholidota), or groups retaining morphologically primitive (symplesiomorphic) characters (for example,

polyphyletic or paraphyletic groups such as the so called “Lipotyphla” and “Artiodactyla”) are often

recognized erroneously as a monophyletic group (see the molecular and morphologic trees in O’Leary et

al. [S51]). To overcome this difficulty, the morphologic characters without homoplasy (convergence or

parallelism) should be used in phylogenetic reconstruction. Such characters were selected by the following

procedures. First, the morphologic characters of the extant species (including recently extinct elephant

birds and moas) were selected from the matrix in Mayr [S47]. Subsequently, the characters with

convergent, parallel or reversal evolution were excluded under the tree topology as inferred from the

molecular data (Figure S1). If a morphologic character has N states, the characters with ≤N – 1 steps were

selected. This procedure was carried out using Mesquite version 3.3 (http://mesquiteproject.org [S52])

with the parsimony criterion. Although the matrix presented by Mayr [S47] is limited to the Neornithes

(Palaeognathae + Neognathae), cases with information on basal fossil lineages of Aves were also taken

into account (characters 180–243 in Mayr’s [S47] matrix correspond to those in the matrix of Johnston et

al. [S53], including the basal fossil lineages of the Aves). Finally, the selected characters of the fossil

species were added to the data matrix.

The phylogenetic tree including both extant and extinct species was inferred from the selected

morphologic characters by the ML method using the RAxML 7.2.6 [S45] and MrBayes ver. 3.2 [S54]

with the BIN+Γ model. The tree topology as inferred from the molecular data were given as the constraint

(Figure 3). Because the BIN model is a binary state version of the JC model, each multistate character (N

≥3) was converted into multiple binary state characters using the following procedures. If a character

changed from state X to state Y, and then changed from state Y to another state Z, we regarded it as a

nested character. If a character started from state X, and independently evolved into different states Y and

Z in two different lineages, we regarded it as an independent character. The final data matrix is presented

in DataBaseTable10 (http://aepyornis.paleogenome.jp/).

After morphological characters with homoplasy were excluded, 34 characters remained

DataBaseTable10 (http://aepyornis.paleogenome.jp/). There are many missing data for Palaeotis [S47],

and only seven characters remained in this taxon. All of these seven characters were commonly shared

Page 21: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

with the ostrich and the rhea. Therefore, Palaeotis was excluded from the subsequent phylogenetic

analyses. The maximum likelihood tree as inferred from the morphologic data is shown in Figure 3A. The

most basal position of Lithornis among the Palaeognathae was supported by this analysis (77% BP/0.90

PP). The basal position of Lithornis and the ostrich among the Palaeognathae (irrespective of whether they

are monophyletic or paraphyletic) was also supported by a relatively high bootstrap value (92% BP/0.99

PP). Another extinct fossil taxon, Emuarius, was clustered together with the emu (99% BP/0.98 PP).

This method is effective in excluding homoplastic characters, but not completely. As far as the

extant species alone are concerned, character 12 (Os basisphenoidale [os parasphenoidale], position of

proc. basipterygoidei) of Mitchell et al. [S46] is not homoplasy. However, character state 1 (anterior to

basitemporal platform on the caudal end of rostrum parasphenoidale and widely separated) seems to have

convergently evolved in Anhimidae (Anseriformes) and Lithornis independently. When this character was

excluded from the analysis, Lithornis formed a monophyletic group with the ostrich with a low bootstrap

value (17% BP), but the basal position of Lithornis + ostrich was still supported by a relatively high

bootstrap value (91% BP). Although Palaeotis was not included in the phylogenetic inference, Palaeotis

shared a common state with the ostrich and the rhea (character 89). The possible phylogenetic positions of

Palaeotis based on this finding are also shown in Figure 3A (indicated by arrows).

Lithornis is a member of the family Lithornithidae together with Paracathartes and

Pseudocrypturus. According to Houde [S55], the ratites have evolved polyphyletically from the

Lithornithidae multiple times, and he suggested the Lithornis-cohort as an ancestral state of the ratites.

Therefore, it is also important to elucidate the phylogenetic positions of Paracathartes and

Pseudocrypturus among the Palaeognathae. Houde [S56] carried out an extensive phylogenetic analysis of

the Palaeognathae based on morphologic data of extant and extinct species including three species of

Lithornithidae (Lithornis, Paracathartes and Pseudocrypturus) as well as Palaeotis and Diogenornis.

Since he did not provide the character matrix, we reconstructed the character matrix based on his

phylogenetic tree (Figure 39 in Houde [S56]). In this process, we assumed there are no missing data.

Subsequently, we excluded morphologic characters with homoplasy under the assumption of the tree

topology of (((rhea, tinamou, (kiwi, cassowary)), Lithornis, ostrich), Neognathae) following Figure 3A.

Page 22: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Finally, 10 morphologic characters remained DataBaseTable10: http://aepyornis.paleogenome.jp/) and the

maximum likelihood tree and the Bayesian tree were reconstructed under the constraint of the same

topology used in selecting the morphologic characters. The tree is shown in Figure 3B. Pseudocrypturus

was placed in the most basal position of the Palaeognathae, and the family Lithornithidae was recognized

as a paraphyletic group. Palaeotis and the ostrich form a monophyletic group. Therefore, all the fossil

Palaeognathae species distributed in the Northern Hemisphere were placed in basal positions among the

Palaeognathae. An essentially identical topology with the basal position of Lithornis+Paracathartes

among the Palaeognathae was also supported from the morphological character matrix selected from

Worthy et al. [S57] (92 characters remained DataBaseTable10: http://aepyornis.paleogenome.jp/). The tree

is shown in Figure 3C.

On the other hand, Diogenornis discovered from Brazil clustered together with the rhea. As

mentioned above, Emuarius can be recognized as a sister group of the emus (Figure 3A), and all the fossil

species involved in this analysis from the Southern Hemisphere can be recognized as members of the

Notopalaeognathae, which includes the order Rheiformes (the rhea), the clade Novaeratitae (the kiwi,

emu, cassowary, and elephant bird), the order Tinamiformes (the tinamou) and the extinct order

Dinornithiformes (the moa) [S58].

Divergence time estimations

The divergence times in the evolution of the Palaeognathae were estimated by the relaxed molecular clock

method [S59, S60] using MCMCTREE implemented in PAML ver. 4.7 [S39]. To reduce the

computational burden, we applied the two-stage procedure of the Laplace method [S59]. Because most of

the information on divergence times and evolutionary rates is contained in the branch lengths, we

considered the likelihood of the branch lengths. In the first stage, we obtained maximum likelihood

estimates of the branch lengths and the Fisher information (Hessian matrix) from the sequence data

without a constraint of molecular clock. The likelihood of the branch lengths is approximated by a

multivariate normal distribution. The mean was the maximum likelihood estimate as generated above, and

the variance was the minus Hessian matrix.

Page 23: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

The fossil records used as the calibration points are summarized in Table S2 [S3-S12].

The prior distribution of the root rate was the gamma distribution with the shape parameter α set to 2 and

the scale parameter β set to 5.33. The prior distribution of σ2 was the gamma distribution with the shape

parameter α set to 1 and the scale parameter β set to 0.3. These prior distributions were roughly estimated

from the genomic data of Baker et al. [S31] assuming the root of the tree, that is the divergence between

the Archosauria and Lepidosauria, as 300 mya (because we used one billion years as one time unit, 300

million years is 0.3 in this case) under the strict clock. For the Markov Chain Monte Carlo (MCMC)

analysis, the first 500,000 generations were discarded as the burn-in, and then 250,000 trees were sampled

per every 20 generations. Each analysis was run at least twice to confirm consistency between runs.

The mitochondrial genomes, the three datasets of nuclear loci used in reconstructing the tree and the

genome scale dataset of Baker et al. [S31] (594 nuclear loci; exon: 795,492 bp) were used for this

analysis. Taking account of the difference of the taxon sampling, these five data sets were treated as

different partitions. Moreover, the first, second and third codon positions of the protein-encoding genes,

introns and RNAs were partitioned. Concerning Baker et al. [S31]’s genomic data, the partitions were

determined by the following two strategies: The first strategy is the optimization by the PartitionFinder

[S44]. Baker et al. [S31]’s genomic data were separated into 236 optimised partitions. However, this

genomic data could not be separated into three codon positions due to a computational problem. In this

framework, our 5 data sets were finally separated into 248 partitions [3 partitions (three codon positions)

for [S3], 4 partitions (three codon positions and intron) for [S32], 1 partitions (intron) for [S42], 4

partitions (three codon positions and RNAs) for mitogenome, and 236 partitions for [S31]]. The GTR+Γ

model was used for the nucleotide substitutions. However, since this model could not be applied to Baker

et al. [S31]’s data because of a computational problem, we applied Tamura’s three parameter model [S61]

(without Γ) for Baker et al. [S31]’s data. The second strategy is simply separating Baker et al. [31]’s

genomic data into three codon positions (three partitions) because the time estimation based on the

optimised 248 partitions is computationally too expensive. In this second framework, our sequence data

sets were simply separated into 15 partitions taking account of the taxon samplings of each data set, as

well as three codon positions, coding and non-coding, mitochondrial protein coding genes and nuclear

Page 24: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

coding genes [15 partitions: 3 partitions (three codon positions) for [S3], 4 partitions (three codon

positions and intron) for [S32], 1 partitions (intron) for [S42], 4 partitions (three codon positions and

RNAs) for mitogenome, and 3 partitions (three codon positions) for [S31]]. The results are shown in the

Table S3. The divergence time estimates based on data with the partitions optimized by PartitionFinder

(248 partitions) are generally larger, but close to the estimates based on data with simpler empirical

partitioning (15 partitions), and the difference between these two analyses are within the standard errors.

Although Nishihara et al. [S62] demonstrated the sensitivity of phylogenomic inference on the

evolutionary model, especially the partition model, differences in partitioning had only a limited effect on

divergence time estimations in the present study. The results are shown in DataBase Figure 6

(http://aepyornis.paleogenome.jp/). However, 248 partition strategy required huge computational burden,

and the calculation of the likelihood function does not work well. In addition, as mentioned above, 248

partitions strategy needs several compromises such as use of homogeneity model (that is without Γ model

and without codon partitions). Therefore, we mainly used the results of the empirical 15 partitions strategy

for this study.

The time tree of the Palaeognathae is shown in Figure 1. Our estimates are significantly younger

than those of Haddrath & Baker [S3], but slightly older than those of Mitchell et al. [S46]. The divergence

time between the Palaeognathae and the Neognathae was estimated to be about 110 mya. The emergence

of the crown Palaeognathae (the divergence time between the ostrich and the others) was about 80 mya.

Then, the rhea, the tinamou–moa clade, the cassowary–emu clade and the elephant bird–kiwi clade

successively diverged during 71-62 mya, which roughly coincides with the K-Pg boundary. As mentioned

before, the phylogenetic relationships among these four clades are not strongly supported (<80% BP), and

in particular, the position of the rhea remains controversial. This suggests rapid successive divergence

events among the paleognaths. The divergences between the elephant bird (Madagascar) and the kiwi

(New Zealand), and between the tinamou (South America) and the moa (New Zealand) occurred about 62

(58.9-64.7) mya and about 53 (50.2-56.1) mya, respectively. The divergence events among the taxa that

are currently distributed in different landmasses ended during this stage. The emergence times of the

current crown taxa are as follows: the elephant bird (about 35 (29.9-39.9) mya, Madagascar), the tinamou

Page 25: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

(about 30 (26.6-33.0) mya, South America), the cassowary–emu (about 32 (27.4-35.3) mya, Australia–

New Guinea), the moa (about 13 (9.7-15.8) mya, New Zealand), the kiwi (about 12 (9.0-15.8) mya, New

Zealand) and the rhea (about 9 (6.9-12.1) mya, South America).

Evaluation of stability of divergence time estimations

To evaluate the stability of the divergence time estimations, the effects of taxon sampling, fossil

calibrations were investigated in detail. For the effect of taxon sampling, estimates based on the dataset of

all species (ratites, tinamous, neognaths, reptiles), the avian dataset (ratites, tinamous, neognaths), the

excl-tinamous dataset (ratites, neognaths, reptiles) and the excl-neognaths dataset (ratites, tinamous,

reptiles) were compared following Mitchell et al. [S46].

For the effect of fossil constraints, we focused on the divergence time between the Palaeognathae and

Neognathae. The first assumption had no constraints on the divergence time between the Palaeognathae

and Neognathae. The second assumption is that the divergence time between the Palaeognathae and

Neognathae was after the emergence of the first fossil record of Ichthyornis (86.5 mya [S4]). The third

assumption is that the divergence time between the Palaeognathae and Neognathae was after the

emergence of the first fossil record of Enaliornis (100.5 mya [S5, S9]).

In contrast to the findings of Mitchell et al. [S46], whose estimates were highly dependent on the

taxon sampling, the effect of taxon sampling was negligible when the sequence data of about 873 Kbp

were analysed as in this work (Table S3, Figure S2).

Our estimates are also stable against different assumptions of fossil constraints. It is generally

difficult to give the maximum ages (the older limit) of the nodes. As mentioned above, we used three

types of assumptions on the divergence of the Palaeognathae and the Neognathae; that is, (a) no maximum

limitation, (b) later than 86.5 mya based on the oldest fossil record of Ichthyornis, and (c) later than 100.5

mya based on the oldest fossil record of Enaliornis. Because the soft boundary method [S63, S64] was

applied in this study, if the molecular data strongly support earlier divergence time than these maximum

constraints such fossils of as Ichthyornis or Enaliornis, the posterior mean of the divergence time can be

earlier than these constraints (Figure 1B, Figure S2 and Table S3). No matter which constraints were

Page 26: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

applied, the estimated divergence times were stable. Only in the case in which an out-group (reptiles) was

excluded (Aves data set) and when the split between the Palaeognathae and the Neognathae was

constrained by the ages of Ichthyornis (86.5 mya) or of Enaliornis (100.5 mya), the estimated time was

significantly later than that of the other cases. In particular, the divergence time between the

Palaeognathae and the Neognathae was strongly affected by this calibration. Probably, this is due to the

rooting problem as discussed in detail in the main text. However, the estimates (about 104 mya) were still

much earlier than the given constraints. It is also notable that despite that most of the calibration points

were in the Neognathae clade or in the non-Avian outgroup in our major analyses, the excl-Neognathae

data, in which one calibration within the Palaeognathae and two calibrations within the reptiles were used,

yielded nearly the same divergence time estimates as the data including all taxa (Figure S2 and Table S3).

The Aves data (because the non-Avian outgroup were not used, the calibrations were limited within the

Aves) also showed very close estimated times.

The only fossil calibration commonly used in this taxon sampling sensitivity analysis is Emuarius

(25-35 mya) for the split of Casuarius – Dromaius. The divergence time of Casuarius – Dromaius without

this calibration was estimated to be about 32 mya (27.1-38.1 mya), and very close to this fossil calibration

(Table S3).

Recently developed joint method of the molecular and morphological data for the tip-dating [e.g.,

S65] might provide a novel framework for the divergence time estimation. Selected 92 morphological data

from Worthy et al. [S57] were applied for the time estimation among the crown Aves. The results are

shown in the Table S3. The first splits within the crown Aves (about 110 mya) are well consistent with the

traditional calibration method. However, the divergence times within the Palaeognathae and the

Neognathae were generally estimated to be older than those of the traditional calibration method. The first

splits of the Palaeognathae and the Notopalaeognathae was estimated to be about 84.0 mya (about 79 mya

by the traditional internal node calibration method) and 76 mya (about 70 mya by the traditional internal

node calibration method), respectively, and the first splits of the Neognathae, the Galloanserae, and the

Neoaves were about 91mya (about 87 mya by the traditional method), 75 mya (about 74 mya by the

traditional method), and 77mya (about 72 mya by the traditional method), respectively.

Page 27: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

In addition to the issues on the taxon samplings and the fossil calibrations, the stabilities and

sensitivities of the divergence time estimates irrespective to the uncertainties of the tree topologies, the

saturations of the fast evolving genes, and the use of clock-like genes were further examined. Taking into

account the instability of the phylogenetic position of the rhea, we also estimated the divergence times

assuming alternative positions of the rhea (the rhea is closer to the elephant bird–kiwi–cassowary–emu

clade or the rhea is closer to the moa–tinamou clade). Alternative placements of the rhea had a very

limited effect on the time estimations (Table S3).

Since several fast evolving gene loci such as the mitochondrial genome or introns were included in

this study, it might be possible that the saturation of these loci caused a considerable effect on our time

estimation because of the long evolutionary time scale of the crown Aves (~110 mya) or Archosauria

(~300 mya). With the aim of addressing this issue, the effect of fast evolving loci on the time estimations

were evaluated by excluding such loci. Within 15 empirical simple partitions in this study, the

mitochondrial 3rd codon positions are the fastest evolving sites. Exclusion of the mitochondrial 3rd codon

positions gave only a limited effect on the time estimation (DataBase Figure 7). The 3rd codon positions of

the nuclear protein coding genes, the introns, and the entire mitochondrial genome were also successively

excluded, but these exclusions did not give any considerable effect (DataBase Figure 7). Finally only the

1st and 2nd codon positions of the nuclear genes remained. Because there were no remarkable differences

in the estimates by these data sets, the effect of the saturation if any in limited gene loci (e.g.,

mitochondrial 3rd codon positions) is negligible in this analysis.

The effect of the clock-like and non-clock like genes were also evaluated by a similar method. The

standard deviations of the root-tip lengths within the crown Aves were estimated for each partition, and

they were normalized by dividing the mean root-tip lengths of each partition. There was a tendency that

the non-coding regions such as introns and RNAs show more clock-like evolution than the protein coding

genes. The results from the top 5 clock-like partitions, top 10 clock-like partitions, and 15 partitions were

compared. The estimated divergence times were as follows: the first splits within the crown Aves [110.0

(105.0-115.8) mya for the 15 partitions, 106.8 (101.5-112.7) mya for the 10 partitions, and 105.7 (98.6-

113.7) mya] for the 5 partitions, the first splits within the crown Palaeognathae [79.6 (77.0-83.9) mya,

Page 28: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

78.3 (75.5-81.4) mya, and 76.5 (70.2-79.8) mya for the 15, 10, and 5 partitions, respectively], the first

splits within the Notopalaeognathae [70.6 (68.5-74.6) mya, 70.2 (67.8-72.9) mya, and 69.2 (64.0-71.4)

mya with the same order], the first splits within the crown Neognathae [90.2 (87.1-93.6) mya, 88.3 (85.1-

91.9) mya, and 87.8 (83.5-92.6) mya], the first splits within the Galloanseres [75.3 (73.3-77.6) mya, 74.9

(72.7-77.4) mya, and 74.1 (70.3-77.6) mya], and the first splits within the Neoaves [72.0 (69.9-74.4) mya,

71.4 (69.5-74.1), and 69.7 (67.9-71.4) mya]. The exclusion of the partitions with the large normalized

standard deviations resulted slightly younger estimates, but did not differ very much from the others.

.

The mitochondrial mutation rate and flightlessness

Because mitochondria play an important role in aerobic metabolism, it is expected that

the mitochondrial evolution is highly correlated with the evolution of flight capability, which requires a

high metabolic rate. In this study, we focused on the mitochondrial synonymous substitution rate. This

rate is likely to be correlated with the flight capability, because it has a negative correlation with body size

[S66] and it is plausible that it also has a strong correlation with the amount of free radicals generated by

the high metabolism associated with volant behaviour.

Stability of divergence time estimates is essential for the reliable estimation of

molecular substitution rates. In this respect, our stable time estimation (Figure 1B) has a crucial advantage.

The substitution rates of the third codon positions estimated by the MCMCTREE program with the

GTR+Γ model were used as approximations of the synonymous substitution rates (Table S5). The

substitution rates of volant birds such as songbirds, hummingbirds and tinamous (harmonic mean of the

substitution rates in the terminal branches of volant birds, 4.64%/site/million years) are significantly

higher than those of flightless birds such as ratites and penguins (harmonic mean of the substitution rates

in the terminal branches of flightless birds: 1.22%/site/million years; t test, P = 1.21×10-5) (Figure 2B) The

substitution rate in the branch of the common ancestral crown Aves was estimated to be

2.75%/site/million years, and the substitution rates of the ancestral branches of paleognaths (the branches

coloured in red seen in Figure 1A, Figure S1) were estimated to be 3.71–5.03%/site/million years (Table

S5). The harmonic mean was 4.18%/site/million years, which is significantly higher than the substitution

Page 29: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

rates for flightless birds such as ratites and penguins (t test, P = 5.9×10-8), but not significantly different

from those for the volant birds (P = 0.19; Figure 2B). Because it is generally more difficult to reconstruct

the ancestral character states (volant or flightless) of the internal branches than those of the terminal

branches, the substitution rates of the terminal branches only were analysed in this study. However, if only

one species represents an old lineage, such as the extant ostrich representing Struthionidae, even for the

terminal branch, the substitution rate along the branch is actually average of the ancestral volant form

(with higher substitution rate) and the descendant flightless form (with lower substitution rate). Therefore,

compared with other terminal branches of flightless birds whose sister-groups are also flightless, it is

likely that the branches of the ostrich and the penguin (only one mitogenome was involved in this

analysis) have higher substitution rates. Indeed, the substitution rates of these branches are high among the

flightless birds (Table S5 and DataBase Figure S2). For this reason, the mean substitution rates of the

flightless bird might be overestimated. To examine this possibility, we excluded the branches of the

ostrich and penguin. However, there was no considerable effect (harmonic mean of the substitution rates

in the terminal branches of flightless birds excluding ostrich and penguin: 1.14%/site/million years; t test

(flightless birds vs. volant birds), P = 1.99×10-5).

A scatter diagram between body weight and substitution rate in the mitochondrial third

codon positions along the terminal branches is shown in Figure 2C and Database Figure 2

(http://aepyornis.paleogenome.jp/). There is a negative correlation between substitution rate and body

weight. Although the coefficient of the correlation is not high (R2 = 0.364), the negative correlation is

significant (PP>0.99). Since the mitochondrial substitution rate of the ancestral paleognaths was estimated

to be 3.71–5.03%/site/million years as mentioned above, their body masses can be roughly inferred from

this approximate prediction. They were estimated to be 1,500–2,800 g.

Among the volant birds, substitution rates of songbirds (9.69%/site/million years), parakeets

(6.49%/site/million years) and hummingbirds (7.22%/site/million years) are especially high, probably

because of their small body size and high metabolic rate associated with powerful flight behaviour.

Although the reason is unclear, the substitution rate of the tinamou is extraordinarily high (harmonic

mean, 13.15%/site/million years). If we exclude the tinamou, the correlation between body weight and

Page 30: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

substitution rate becomes stronger (R2 = 0.59), and the difference between the ancestral paleognaths and

the extant volant birds becomes smaller (P = 0.65). The body weight of the ancestral paleognaths as

inferred from this approximate prediction is 500–1,700 g.

Ancestral state reconstruction of body weight

Body weight data for extant species were collected from the literature (TableS5 [S13-S25]).

Because genomic sequences include abundant information on the rates and the patterns of molecular

evolution, the precision of the ancestral state reconstruction of phenotypic traits could be significantly

improved, especially when there is a strong correlation between the values of the phenotypic traits and the

rate of molecular evolution [S67]. Lartillot & Delsuc [S67] reported a negative correlation between the

synonymous substitution rate and the body weight. This is probably an indirect correlation mediated by

generation intervals. Larger animals generally have later female sexual maturation, which results in a

longer generation interval than for smaller animals. Lartillot & Delsuc [S67] estimated the body weights

of the ancestral Eutherian mammals by using their method taking account of the correlation of the

synonymous substitution rates and the body weights (correlation model), and compared these estimates

with the traditional method without taking into account the correlations (uncorrelation model). The body

weights of the ancestral Eutherian mammals estimated by either method were generally very close, but

there were remarlable difference in Cetartiodactyla, especially in whippomorpha (hippos + cetaceans). The

estimate by the correlation model was 6.0-376.1 kg (geometirc mean is 47.5kg), and the estimate by the

uncorrelation model was 32.3 – 2086.1 kg (geometric mean is 259.6 kg). Lartillot & Delsuc [S67] argued

that the early Cetacean species probably had small body size such as Himalayacetus and Pakicetus (wolf

size) or Ichtyolestes (fox size) [S68]. The extant hippopotamus and the extant cetaceas seem to have

gained large body size convergently because of the independent aquatic adaptation. In such a case, if the

ancestral body weight is estimated only from the body weights of the extant species, it may give an

overestimation of the ancestral body weight.

To take account of the correlation between body weight and the synonymous substitution rate in

avian evolution, we applied the Ancov program of Coevol [S69]. We adopted mitochondrial genomes as a

Page 31: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

material to support the ancestral state reconstruction of body weight. However, the current version of

Coevol does not incorporate variable evolutionary rates among sites (e.g., the gamma model), and this

may have a crucial effect on phylogenetic inference [S70]. To take account of the possible among-sites

variability of the rate of synonymous substitutions, we adopted the following two-stage procedure. First,

using the above MCMC sample of the divergence times as the constraints on the divergence times of the

internal nodes, we applied MCMCTREE [S39] to the data of the third codon positions of the

mitochondrial genomes with the discrete gamma model of variable rates among sites. We treated the

estimated evolutionary rates at the internal and the external nodes as measured values of a phenotypic trait

to reconstruct the ancestral state of body weight. The synonymous substitution rate and body weight were

negatively correlated (Figure 2C); the posterior mean of the correlation was −0.848 and the posterior

probability of negative correlation was 0.99997. The model assumed by default that the log-transformed

values follow the Brownian process. Accordingly, we calculated the geometric mean of the upper and

lower limits of the 95% credibility intervals to represent the ancestral values of body weight. The

estimated substitution rates of the mitochondrial 3rd codon positions, the body weights of the Aves (and

the non-Avian outgroups) reported by the previous studies, and the estimated body weights of the

ancestral Aves were summarized in the Table S5.

As shown in Figure 4 and the Table S5, the body weight of the ancestral paleognaths

was estimated to be about 5,000 g (3,800–5,500 g). Although this estimate is relatively large, it is still

within the range of the volant birds [S15]. It is also notable that these estimates are close to the estimated

size of the fossil Lithornithidae. According to Mayr [S71], “Lithornithid species significantly vary in size,

with the turkey-sized P. howardae being about twice as large as P. cercanaxius” (the body weight of the

turkey is about 6,000 g). When the correlation between body weight and substitution rate was not taken

into account, the estimates of body weight became grossly larger (data not shown). This is probably

because the conventional substitution model does not take account of convergence [S50]. Although

circumstantial, this work provides the first reconstruction of the ancestral state of the paleognaths.

However, we feel that there is a room for improvement in the methodologies used in the present

study. Because the mitochondrial substitution rate of birds, especially the synonymous rate, is very high, it

Page 32: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

is sometimes difficult to estimate an accurate number of substitutions along a branch, particularly for a

deep ancestral branch. It is plausible that the substitution rates inferred in this study (Figure 2B, Table S5)

are still underestimates. If this is the case, the ancestral body weights may be overestimated. For example,

although the body weight of the ancestral crown Aves was estimated to be 3,500 g, it is larger than the

body weights of Mesozoic birds estimated from fossil evidence (32–1,779 g) [S72]. Furthermore, to test

the tendency of estimated body weight, the weights of several extant species were assumed as missing

data. The estimated mean body weight of extant species using this test was generally higher than the actual

body weight, although it was still within 95% confidence intervals (data not shown). Improvements in the

substitution model are important to resolve this issue

Species identification of eggshells and ancestral state reconstruction of egg weight

One of the remarkable features of Aepyornis maximus is that they laid the largest eggs ever

known among the animals including dinosaurs, and they often reached more than 9 kg. A huge number of

eggshell fragments has been excavated from the west to the southwest coast of Madagascar, and the dune

of Faux Cap is known as the site with the most abundant eggshell deposits. However, somewhat

surprisingly there is no direct evidence that these huge eggs were laid by Aepyornis, because there are no

reports of sympatric excavation sites of Aepyornis bones and these eggshell fragments (but see also the

embryological study by Balanoff & Rowe [S73]). Oskam et al. [S74] first reported DNA sequences from

eggshells identified as Aepyornis and Mullerornis based on their thickness. However, there were no

sequence data of homologous nucleotide sites from Aepyornis bones available at that time. Although

Mitchell et al. [S46] reported nearly complete mitochondrial genome sequences of these two genera, they

did not focus on the species identification of the eggs. In the present study, we confirmed that the eggs

putatively identified as Aepyornis and Mullerornis are from these genera (Table S2). The nucleotide

sequence data (GU799601: 82 bp) from thick egg shells from Meanderare Estuary (25°09'S, 46°26'E),

Toriala, Madagascar were exactly identical with the mitochondrial DNA data of Aepyornis maximus

sequenced in this study, but one base substitution was observed from the comparison with

Page 33: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Aepyornis hildebrandti reported by Mitchell et al. [S46]. Based on the findings presented above, here we

discuss the evolution of reproductive strategy in the ratites.

There is a strong correlation between body weight (log-scale) and the egg size in the Aves,

especially in the Palaeognathae (Figure 2A, DataBase Figure S1). It is noteworthy that the eggs of

Aepyornis and the kiwi are extraordinarily large compared with their body sizes. The correlation between

body and egg size is very high among the extant Palaeognathae (R2=0.873). Although the egg size of

Aepyornis is inside of 95% CI within the extant Palaeognathae, it is noteworthy that Aepyornis lays huge

eggs not only in terms of the absolute size, but also in terms of the relative size in the comparison with the

body mass. If the kiwis were excluded, the correlation becomes higher (R2 = 0.983), and these two taxa are

clearly outside this range (Figure 2A, DataBase Figure 1: http://aepyornis.paleogenome.jp/). Endo et al.

[S75] indicated an anatomical similarity among the coxa of Aepyornis and the kiwi, suggesting similar

reproductive strategies. Molecular phylogenetic studies (Mitchell et al. [S46] and the present study)

strongly support the sister group relationship of these two taxa. Does this mean that the similarity in the

enormous relative size of the egg is synapomorphy among these taxa? To address this issue, absolute

ancestral egg sizes were estimated using the Ancov program, taking into account the correlation with

molecular evolutionary rate. The rate of synonymous substitutions was negatively correlated with egg

weight; the posterior mean of the correlation was −0.703 and the posterior probability of negative

correlation was 0.9964. Our result suggests that the common ancestor of Aepyornis and the kiwi laid larger

eggs relative to body size compared with the other extant paleognaths. Although the common ancestral

branch between Aepyornis and the kiwi is short, this suggests that the gigantism of the egg had already

started to evolve during this stage. The huge egg is one of the typical characters seen in the K strategists

that are adaptive in a stable environment. It should be noted that the extinct insular family Dinornithidae

also showed a tendency to egg gigantism (Figure 2A and DataBase Figure 1:

http://aepyornis.paleogenome.jp/). The K strategists might have advantages in remote isolated islands with

less migrants and small carrying capacities, such as Madagascar and New Zealand. Although fossil

records of the elephant bird and the kiwi are limited and no fossils have been reported in the Paleogene

Page 34: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

and Neogene, except for a recently described fossil of a kiwi from the Early Miocene [S76], these

ancestral state reconstructions suggest evolutionary traits in terms of morphology, ethology and ecology.

The ancestral geographic distributions of Palaeognathae

The ancestral geographic distribution of the Palaeognathae was inferred by the Bayesian method

using BayesTraits [S77]. The time-calibrated tree shown in Figure 1A was used in this analysis using

BayesTraits. In addition, the maximum parsimony method was also applied using Mesquite [S52]. The

fossil paleognaths, Lithornis and Palaeotis, were also included. Because the number of morphologic

characters used to estimate the phylogenetic positions of these two taxa was small, the divergence times of

these taxa from the other paleognaths were not estimated, but it was assumed that (a) Lithornis branched

out from the other paleognaths at the midpoint of the common ancestral branch of the extant paleognaths,

and (b) the relationships among the ostrich, Palaeotis, and the other extant paleognaths

(Notopalaeognathae) were represented by trifurcation. The geological age of Lithornis and Palaeotis was

assumed to be 61 mya [S56, S78] and 48 mya, respectively [S79]. The geographic distribution of Lithornis

and Palaeotis was in the Northern Hemisphere, while that of the Notopalaeognathae is in the Southern

Hemisphere. We assumed the ostrich to be distributed in both the Northern and the Southern Hemispheres

because, although the extant ostriches are found only in Africa, they were also distributed in Eurasia until

recently [S80, S81] as mentioned in the main text.

The ancestral geographic distribution of the Palaeognathae as inferred by the Bayesian method is shown in

Figure 3D (Bayesian method) and Figure S3A (Parsimony method). Our results indicated that the common

ancestor of all paleognaths including fossil species was distributed in the Northern Hemisphere, with the

highest posterior probability (100% PP). The distribution of the common ancestor of the ostrich, Palaeotis

and the Notopalaeognathae was also inferred to be in the Northern Hemisphere with the highest posterior

probability (100% PP). The common ancestor of the Notopalaeognathae was inferred to be in the Southern

Hemisphere with a high posterior probability (97.9% PP). The geographic distribution reconstructed by

the maximum parsimony methods is also consistent with this result (Figure S3A). The Bayesian method

takes into account all possibilities to reconstruct the ancestral states. Therefore, the possibilities of North

Page 35: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

Hemispheric distributions in the several ancestral nodes within the Notopalaeognathae (e.g., the common

ancestor of the Apterygidae and the Aepyornithidae) were not completely excluded. Conversely, the

ancestral reconstructions of the maximum parsimony method are clearer (Figure S3A). The North

Hemispheric origin of the Palaeognathae and a single migration event to the South Hemisphere by the

common ancestor of the Notopalaeognathae were strongly supported by both analyses.

Taking into account the uncertainty of the phylogenetic position of Palaeotis, the ancestral

geographic distribution of the root of all paleognaths was also estimated based on seven possible

topologies (there are seven possible alternative positions of Palaeotis; Figure 3A). Equal prior

probabilities among these seven tree topologies were assumed in this analysis. Again, the Northern

Hemispheric origin of the paleognaths was strongly supported with a high Bayesian posterior probability

(100% BP; data not shown).

With the aim for drawing more detailed picture on the establishment of the current geographic

distribution of the paleognaths, the seven zoogeographic regions (Nearctic, Palearctic, Afrotropical, South

America, Australia, Madagascan, and Zealandia) used by Claramunt and Cracraft [S1] were further

applied. Indomalay region was not considered because of the lack of the extant and fossil record of the

paleognaths. The Northern Hemispheric origin of the paleognaths was again supported (Figure S3B).

Although Palearctic region rather than Nearctic region was supported as the origin of the paleognaths, it

can be an analytical artefact because Nearctic species such as Paracathartes and Pseudocrypturus were

not involved in this analysis. Therefore, the origin of paleognaths can be either Palearctic or Nearctic

region. In either case, the ancestor of notopaleognaths migrated to South America via North America, and

then they spread to each zoogeographic region such as Australia, Madagascan, and Zealandia. Since the

number of species is small in the paleognaths including fossil species (only 24 species were involved in

this analysis), it is plausible that they do not have sufficient power to resolve the order of migrations

among seven zoogeographic regions.

As discussed in the main text, the timings of their intercontinental migration are limited to the Late

Cretaceous to Paleocene, the seven zoogeographic regions were degenerated into five zoogeographic

regions taking account of the continental positions at that time, namely (1) North Hemisphere

Page 36: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

(Nearctic+Palearctic), (2) Afrotropical, (3) South America+Australia+Antarctica, (4) Madagascan, and (5)

Zealandia. The ancestral geographic distributions were estimated based on these five zoogeographic

regions. The results were shown in Figure S3. Both of the MP analyses (Figure S3C) and the Bayesian

analyses (Figure S3D) supported the followings: The Palaeognathae originated in the North Hemisphere.

The first split of the crown Palaeognathae also occurred in the North Hemisphere. The Notopalaeognathae

originated in the South America+Australia+Antarctica. The Dinornithidae migrated to Zealand from the

South America+Australia+Antarctica. The geographic distribution of the common ancestor of Apterygidae

(Zealandia) and Aepyornithidae (Madagascan) is unclear.

Page 37: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

REFERENCE:

S1. Claramunt, S., and Cracraft, J. (2015) A new time tree reveals Earth history’s imprint on the

evolution of modern birds. Sci Adv 2015, 1. doi: 10.1126/sciadv.1501005

S2. Prum, R.O., Berv, J.S., Dornburg, A., Field, D.J., Townsend, J.P., Lemmon, E.M., and Lemmon,

A.R. (2015). A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA

sequencing. Nature 526, 569-573.

S3. Haddrath, O., and Baker, A.J. (2012). Multiple nuclear genes and retroposons support vicariance

and dispersal of the palaeognaths, and an Early Cretaceous origin of modern birds. Proc. Biol.

Sci. 279, 4617-4625.

S4. Benton, M.J., Donoghue, P.C.J., and Asher, R.J. (2009). Calibrating and constraining molecular

clocks, (Oxford: The Time Tree of Life. Oxford University Press).

S5. Bell, A., and Chiappe, L.M. (2015). A species-level phylogeny of the Cretaceous

Hesperornithiformes (Aves: Ornithuromorpha): Implications for body size evolution amongst the

earliest diving birds. J. Syst. Palaeontol. 14, 239-251

S6. Boles, W.E. (1992). Revision of Dromaius gidju Patterson and Rich 1987 from Riversleigh,

northwestern Queensland, Australia, with a reassessment of its generic position. Nat. Hist. Mus.

Los Angeles Cty. Sci. Ser. 36, 195-208.

S7. Clarke, J.A., Tambussi, C.P., Noriega, J.I., Erickson, G.M., and Ketcham, R.A. (2005). Definitive

fossil evidence for the extant avian radiation in the Cretaceous. Nature 433, 305-308.

S8. Muller, J., and Reisz, R.R. (2005). Four well-constrained calibration points from the vertebrate

fossil record for molecular clock estimates. Bioessays 27, 1069-1075.

S9. O'Connor, J.Z. (2013). A redescription of Chaoyangia beishanensis (Aves) and a comprehensive

phylogeny of Mesozoic birds. J. Syst. Palaeontol. 11, 889–906.

S10. Parris, D.C., and Hope, S. (2002). New interpretations of birds from the Hornerstown and

Navesink formations, New Jersey. In Proceedings of the 5th International Meeting of the Society

of Avian Paleontology and Evolution. (Beijing: Science Press).

Page 38: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S11. Rasmussen, D.T., Olson, S.L., and Simmons, E.L. (1987). Fossil birds from the Oligocene Jebel

Qatrani Formation, Fayum Province, Egypt. Smithsonian Contrib. Paleobiol. 62, 1-20.

S12. Slack, K.E., Jones, C.M., Ando, T., Harrison, G.L., Fordyce, R.E., Arnason, U., and Penny, D.

(2006). Early penguin fossils, plus mitochondrial genomes, calibrate avian evolution. Mol. Biol.

Evol. 23, 1144-1155.

S13. Davies, S.J.J.F. (2003). Ostriches. Grzimek's Animal Life Encyclopedia 8, (Farmington Hills:

Gale Group).

S14. Dickison, M.R. (2007). The Allometry of Giant Flightless Birds. PhD thesis, (Durham: Duke

University).

S15. Dunning, J.B.J. ( 2008). CRC Handbook of Avian Body Masses Second Edition, ( Boca Raton:

CRC press).

S16. Hauber, M.E. (2014). The book of eggs. A life size guide to the eggs of six hundred of the

world’s bird species, (Lewes: The Ivy Press).

S17. Higgins, P.J., Peter, J.M., and Steele, W.K. (2001). Handbook of Australian, New Zealand &

Antarctic birds. Volume 5: Tyrant-flycatchers to Chats, (Melbourne: Oxford University Press).

S18. Huynen, L., Gill, B.J., Millar, C.D., and Lambert, D.M. (2010). Ancient DNA reveals extreme

egg morphology and nesting behavior in New Zealand's extinct moa. Proc. Natl. Acad. Sci. USA

107, 16201-16206.

S19. Johnsgard, P.A. (1981). The plovers, sandpipers, and snipes of the world, (Lincoln: University of

Nebraska Press).

S20. Johnsgard, P.A. (1999). The pheasants of the world, (Oxford).

S21. Kear, J. (2005). Ducks, Geese and Swans: Volume 1: general chapters, species accounts (Anhima

to Salvadorina), (Oxford: Oxford University Press).

S22. Williams, T.D. (1995). The penguins: Spheniscidae, (Oxford: Oxford University Press).

S23. Yoshimura, T., and Suzuki, M. (2014). Tori to Tamago to Su no Daizukan, (Tokyo: Bookman

press).

S24. Brooke, M. (2004). Albatrosses and Petrels across the world, (Oxford: Oxford University Press).

Page 39: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S25. Cramp, S., Perrins, C.M., and Brooks, D.J. (1977). Handbook of the birds of Europe the Middle

East and North Africa: the birds of the western Palearctic. volume I: Ostrich to Ducks, (Oxford:

Oxford University Press).

S26. Campos, P.F., Willerslev, E., Sher, A., Orlando, L., Axelsson, E., Tikhonov, A., Aaris-Sørensen,

K., Greenwood, A.D., Kahlke, R.D., Kosintsev, P., et al. (2010). Ancient DNA analyses exclude

humans as the driving force behind late Pleistocene musk ox (Ovibos moschatus) population

dynamics. Proc. Natl. Acad. Sci. USA 107, 5675-5680.

S27. Kampmann, M.L., Fordyce, S.L., Avila-Arcos, M.C., Rasmussen, M., Willerslev, E., Nielsen,

L.P., and Gilbert, M.T. (2011). A simple method for the parallel deep sequencing of full

influenza A genomes. J. Virol. Methods 178, 243-248.

S28. Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M.,

Cappellini, E., Petersen, B., Moltke, I., et al. (2013). Recalibrating Equus evolution using the

genome sequence of an early Middle Pleistocene horse. Nature 499, 74-78.

S29. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat.

Methods 9, 357-359.

S30. Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads.

EMBnet. journal 17, 10–12.

S31. Baker, A.J., Haddrath, O., McPherson, J.D., and Cloutier, A. (2014). Genomic support for a moa-

tinamou clade and adaptive morphological convergence in flightless ratites. Mol. Biol. Evol. 31,

1686-1696.

S32. Harshman, J., Braun, E.L., Braun, M.J., Huddleston, C.J., Bowie, R.C., Chojnowski, J.L.,

Hackett, S.J., Han, K.L., Kimball, R.T., Marks, B.D., et al. (2008). Phylogenomic evidence for

multiple losses of flight in ratite birds. Proc. Natl. Acad. Sci. USA 105, 13462-13467.

S33. Ginolhac, A., Rasmussen, M., Gilbert, M.T., Willerslev, E., and Orlando, L. (2011).

mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27, 2153-

2155.

Page 40: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S34. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25, 1754-1760.

S35. Lindahl, T. (1993). Instability and decay of the primary structure of DNA. Nature 362, 709-715.

S36. Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Muller, W.E., Wetter, T., and Suhai, S.

(2004). Using the miraEST assembler for reliable and automated mRNA transcript assembly and

SNP detection in sequenced ESTs. Genome Res. 14, 1147-1159.

S37. Hahn, C., Bachmann, L., and Chevreux, B. (2013). Reconstructing mitochondrial genomes

directly from genomic next-generation sequencing reads--a baiting and iterative mapping

approach. Nucleic Acids Res. 41, e129.

S38. Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7:

improvements in performance and usability. Mol. Biol. Evol. 30, 772-780.

S39. Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24,

1586-1591.

S40. Huang, X., and Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Res. 9,

868-877.

S41. Okonechnikov, K., Golosova, O., Fursov, M., and team, U. (2012). Unipro UGENE: a unified

bioinformatics toolkit. Bioinformatics 28, 1166-1167.

S42. Smith, J.V., Braun, E.L., and Kimball, R.T. (2013). Ratite nonmonophyly: independent evidence

from 40 novel Loci. Syst. Biol. 62, 35-49.

S43. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high

throughput. Nucleic Acids Res. 32, 1792-1797.

S44. Lanfear, R., Calcott, B., Ho, S.Y., and Guindon, S. (2012). Partitionfinder: combined selection of

partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29,

1695-1701.

S45. Stamatakis, A., Hoover, P., and Rougemont, J. (2008). A rapid bootstrap algorithm for the

RAxML Web servers. Syst Biol 57, 758-771.

Page 41: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S46. Mitchell, K.J., Llamas, B., Soubrier, J., Rawlence, N.J., Worthy, T.H., Wood, J., Lee, M.S., and

Cooper, A. (2014). Ancient DNA reveals elephant birds and kiwi are sister taxa and clarifies

ratite bird evolution. Science 344, 898-900.

S47. Mayr, G. (2014). The middle Eocene European “ratite” Palaeotis (Aves, Palaeognathae) restudied

once more. Paläontol Z, DOI 10.1007/s12542-12014-10248-y.

S48. Springer, M.S., Meredith, R.W., Eizirik, E., Teeling, E., and Murphy, W.J. (2008). Morphology

and placental mammal phylogeny. Syst. Biol. 57, 499-503.

S49. Yonezawa, T., and Hasegawa, M. (2010). Was the universal common ancestry proved? Nature

468, E9; discussion E10.

S50. Yonezawa, T., and Hasegawa, M. (2012). Some problems in proving the existence of the

universal common ancestor of life on Earth. ScientificWorldJournal 2012, 479824.

S51. O'Leary, M.A., Bloch, J.I., Flynn, J.J., Gaudin, T.J., Giallombardo, A., Giannini, N.P., Goldberg,

S.L., Kraatz, B.P., Luo, Z.X., Meng, J., et al. (2013). The placental mammal ancestor and the

post-K-Pg radiation of placentals. Science 339, 662-667.

S52. Maddison, W.P., and Maddison, D.R. (2015). MESQUITE: a modular system for evolutionary

analysis. Volume Version 3.03.

S53. Johnston, P. (2011). New morphological evidence supports congruent phylogenies and

Gondwana vicariance for palaeognathous birds. Zool. J. Linn. Soc. 163, 959–9825.

S54. Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B.,

Liu, L., Suchard, M.A., Huelsenbeck, J.P. (2012) MrBayes 3.2: efficient Bayesian phylogenetic

inference and model choice across a large model space. Syst. Biol. 61, 539-542. doi:

10.1093/sysbio/sys029.

S55. Houde, P. (1986). Ancestors of ostriches found in the Northern Hemisphere suggest a new

hypothesis for origin of ratites. Nature 324, 563–565.

S56. Houde, P. (1988). Paleognathous Birds from the Early Tertiary of the Northern Hemisphere,

(Cambridge MA: Publications of the Nuttall Ornithological Club).

Page 42: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S57. Worthy, T.H., Mitri, M., Handley, W.D., Lee, M.S.Y., Anderson, A., Sand, C. (2016) Osteology

supports a stem-galliform affinity for the giant extinct flightless bird Sylviornis neocaledoniae

(Sylviornithidae,Galloanseres). PLoS ONE 11, e0150871.doi:10.1371/journal.pone.0150871

S58. Yuri, T., Kimball, R.T., Harshman, J., Bowie, R.C., Braun, M.J., Chojnowski, J.L., Han, K.L.,

Hackett, S.J., Huddleston, C.J., Moore, W.S., et al. (2013). Parsimony and model-based analyses

of indels in avian nuclear genes reveal congruent and incongruent phylogenetic signals. Biology

(Basel) 2, 419-444.

S59. Thorne,J.L., Kishino, H., and Painter, I.S. (1998) Estimating the rate of evolution of the rate of

molecular evolution. Mol. Biol. Evol. 15, 1647-1657

S60. Thorne, J.L., and Kishino, H. (2002). Divergence time and evolutionary rate estimation with

multilocus data. Syst. Biol. 51, 689-702.

S61. Tamura, K. (1992). Estimation of the number of nucleotide substitutions when there are strong

transition-transversion and G+C-content biases. Mol. Biol. Evol. 9, 678-687.

S62. Nishihara, H., Okada, N., and Hasegawa, M. (2007). Rooting the eutherian tree: the power and

pitfalls of phylogenomics. Genome Biol. 8, R199.

S63. Inoue, J., Donoghue, P.C., and Yang, Z. (2010). The impact of the representation of fossil

calibrations on Bayesian estimation of species divergence times. Syst. Biol. 59, 74-89.

S64. Yang, Z., and Rannala, B. (2006). Bayesian estimation of species divergence times under a

molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol. 23, 212-226.

S65. Zhang, C., Stadler, T., Klopfstein, S., Heath, T.A., and Ronquist, F. (2015) Total-Evidence

Dating under the Fossilized Birth-Death Process. Syst. Biol. doi: 10.1093/sysbio/syv080

S66. Nabholz, B., Lanfear, R., and Fuchs, J. (2016). Body mass-corrected molecular rate for bird

mitochondrial DNA. Mol. Ecol. 25, 4438–4449.

S67. Lartillot, N., and Delsuc, F. (2012). Joint reconstruction of divergence times and life-history

evolution in placental mammals using a phylogenetic covariance model. Evolution 66, 1773-

1787.

Page 43: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S68. Thewissen, J.G., Williams, E.M., Roe, L.J., and Hussain, S.T. (2001) Skeletons of terrestrial

cetaceans and the relationship of whales to artiodactyls. Nature 413, 277–281.

S69. Lartillot, N. (2014). A phylogenetic Kalman filter for ancestral trait reconstruction using

molecular data. Bioinformatics 30, 488-496.

S70. Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with

variable rates over sites: approximate methods. J. Mol. Evol. 39, 306-314.

S71. Mayr, G. (2009). Paleogene Fossil Birds, (New York: Springer Publishing).

S72. Seranno, F.J., Palmqvist, P., and Sanz, J.L. (2015). Multivariate analysis of neognath skeletal

measurements: implications for body mass estimation in Mesozoic birds. Zool. J. Linn. Soc. 173,

929-955.

S73. Balanoff, A.M., and Rowe, T. (2007). Osteological description of an embryonic skeleton of the

extinct elephant bird, Aepyornis (Palaeognathae: Ratitae). , J. Vert. Paleontol. Memoir 9, 27, 53.

S74. Oskam, C.L., Haile, J., McLay, E., Rigby, P., Allentoft, M.E., Olsen, M.E., Bengtsson, C.,

Miller, G.H., Schwenninger, J.L., Jacomb, C., et al. (2010). Fossil avian eggshell preserves

ancient DNA. Proc. Biol. Sci. 277, 1991-2000.

S75. Endo, H., Akishinonomiya, F., Yonezawa, T., Hasegawa, M., Rakotondraparany, F., Sasaki, M.,

Taru, H., Yoshida, A., Yamasaki, T., Itou, T., et al. (2012). Coxa morphologically adapted to

large egg in aepyornithid species compared with various palaeognaths. Anat. Histol. Embryol.

41, 31-40.

S76. Worthy, T.H., Worthy, J.P., Tennyson, A.J.D., Salisbury, S.W., Hand, S.J., and Scofield, R.P.

(2013). Miocene fossils show that kiwi (Apteryx, Apterygidae) are probably not phyletic

dwarves. (Proceedings of the 8th International Meeting Society of Avian Paleontology and

Evolution).

S77. Pagel, M., Meade, A., and Barker, D. (2004). Bayesian estimation of ancestral character states on

phylogenies. Syst. Biol. 53, 673-684.

Page 44: Phylogenomics and Morphology of Extinct Paleognaths Reveal the

S78. Jarvis, E.D., Mirarab, S., Aberer, A.J., Li, B., Houde, P., Li, C., Ho, S.Y., Faircloth, B.C.,

Nabholz, B., Howard, J.T., et al. (2014). Whole-genome analyses resolve early branches in the

tree of life of modern birds. Science 346, 1320-1331.

S79. Lenz, O.K., Wilde, V., Mertz, D.F., and Riegel, W. (2014) New palynology-based astronomical

and revised 40Ar/39Ar ages for the Eocene maar lake of Messel (Germany). Int. J. Earth Sci. 21.

doi:10.1007/s00531-014-1126-2

S80. Hou, L., Zhou, Z., Zhang, F., and Wang, Z. (2005). A Miocene ostrich fossil from Gansu

Province, northwest China. Chin. Sci. Bull. 50, 1808–1810.

S81. Janz, L., Elston, R.G., and Burr, G.S. (2009). Dating North Asian surface assemblages with

ostrich eggshell: implications for palaeoecology and extirpation. J. Archaeol. Sci. 36, 1982-1989.