Genome-wide signatures of convergent evolution in echolocating mammals

Posted: September 5, 2013 at 10:42 am

Taxonomic coverage

We collected new genome-wide sequence data from four bat species, selected from the two suborders and encompassing the paraphyly of echolocating bat lineages (see ref. 13). From the suborder Yinpterochiroptera we studied the non-echolocating Old World fruit bat Eidolon helvum (family Pteropodidae) and two laryngeal echolocating species, Megaderma lyra (Megadermatidae) and Rhinolophus ferrumequinum (Rhinolophidae). From the suborder Yangochiroptera we studied the laryngeal echolocating species Pteronotus parnellii (Mormoopidae) that has independently evolved constant frequency echolocation.

From Ensembl (http://www.ensembl.org/), we also obtained sequence data from two additional batsthe laryngeal echolocating species Myotis lucifugus (Yangochiroptera; Broad Institute) and the non-echolocating Old World fruit bat Pteropus vampyrus (Yinpterochiroptera; Baylor College of Medicine Human Genome Sequencing Center)as well as the echolocating bottlenose dolphin Tursiops truncatus. Genomic sequences from 15 additional mammal species were downloaded from Ensembl giving a total of 22 mammals (listed in Supplementary Table 1).

To investigate the prevalence of convergent evolution at a genome-wide level associated with the independent evolution of echolocation in bats and cetaceans, we used a method that builds on maximum-likelihood phylogenetic reconstruction. This method compares, for a given sequence alignment of orthologous coding sequences (CDS), the goodness-of-fit of the accepted phylogenetic tree with that of an alternative convergent hypothesis (in this case, in which echolocating taxa were forced into a spurious monophyletic clade). From our data set, we identified and tested three hypotheses: (1) H0, the commonly accepted species phylogeny (for example, refs 13, 23, 24, 25) in which cetaceans (represented in our data set by the common bottlenose dolphin Tursiops truncatus) are nested within the even-toed ungulates in the order Cetartiodactyla, and the order Chiroptera is split into the suborders Yangochiroptera and Yinpterochiroptera, with paraphyly of bat laryngeal echolocation13; (2) H1, or batbat echolocation convergence (monophyly of all echolocating bats in the data set); and (3) H2, or batdolphin convergence (monophyly of all echolocating mammals in the data set). All three phylogenetic hypotheses are shown in Fig. 1. The scale bar (in amino acid substitutions) is provided for approximate reference only, as branch lengths were optimized at runtime.

Because the H2 (batdolphin) hypothesis is necessarily a radical rearrangement of the commonly accepted species topology, and the concept of an exact branching order or the true topology does not apply in this case, we proposed a number of separate but related versions of this hypothesis, all of which were evaluated equally in the analysis. In each case the rest of the mammalian species phylogeny was fixed, as in the H1 hypothesis. In the first case we constrained all five echolocating taxa to a single ancestral node (hard polytomy); second we enumerated the seven bifurcating trees that are possible where the position of T. truncatus is free to vary, but the suborders of echolocating batsYangochiroptera (P. parnellii and M. lucifugus) and Yinpterochiroptera (R. ferrumequinum and M. lyra)were preserved. A final topology was specified as a soft polytomy, with the resolution of the clade of echolocators being resolved by RAxML at runtime, with the rest of the phylogeny remaining constrained. A majority clade-consensus (MCC) summary phylogeny was constructed from these 2,326 inferred soft-polytomy H2 trees using TreeAnnotator v1.7.4 (in the BEAST v1.7.4 distribution31). This phylogeny recovered the Yangochiroptera and Yinpterochiroptera clades of echolocating bats with good (>50%) node support. When we compared the goodness-of-fit of all phylogenies (as opposed to pairwise comparison relative to the species phylogeny Ho) we found the species phylogeny was preferred at 1,170 loci (55%), with the batbat phylogeny H1 preferred next most often (548 loci; 26%). The soft-polytomy version of H2 (resolved by RAxML) was the preferred phylogeny among 50% of the remaining loci, with remaining support equally split between the other H2 versions. We therefore adopted the soft polytomy, RAxML resolved version of H2 as our main batdolphin hypothesis.

Novel sequence data from the four bat species listed above were generated by BGI on an Illumina Genome Analyzer platform (Illumina), based on genomic libraries of 500-bp insert sizes. Using this method we obtained approximately 3341Gb of short read sequence data per species.

The CLC de novo algorithm (CLC bio) was used for assembling raw reads into contigs using different k-mer size values ranging from 32 to 50. The assembled contigs from the CLC output were then processed using the module Prepare of the SOAP package to do scaffold assembly using the scaff command of SOAPdenovo. Finally, gaps were filled using the GapCloser32 tool. The resulting assemblies consisted of between 210,080 and 315,526 genomic sequences (depending on species), with an average depth of coverage of 17 to 18. Estimated genome size was approximately ~2Gb in all four bats, whereas contiguity (as assessed by the N50 statistic) ranged from 16,292bp (M. lyra genomic sequences) to 27,140bp (E. helvum). Homology-based gene prediction analyses using the genBlastG33 tool recovered 20,424 gene models for R. ferrumequinum, 20,043 for M. lyra, 20,455 for E. helvum and 20,357 for P. parnellii, in line with published gene content values for other mammals34. The completeness/contiguity of the gene representation was evaluated using the CEGMA (Core Eukaryotic Genes Mapping Approach) pipeline35, 36 and found ranging across species between 61.29% to 77.02% and 90.32% to 96.77% for complete and partial genes, respectively. These compared well to the published M. lucifugus genome; when we analysed that genome using CEGMA the comparable completeness/contiguity scores for complete and partial genes were in the middle of this range (62.9% and 91.5%, respectively).

To identify genes adequate for systematic phylogenetic-based analyses of convergent sequence evolution, we next filtered the above predictions for single-copy orthologous protein-coding genes conserved across the Eutheria. This was achieved by performing reciprocal blast searches against a database consisting of the gene models for the four bats, and using as queries the human sequences of 11,185 genes reported as 1-to-1 or apparent 1-to-1 orthologues between the human and Myotis genomes in Ensembl databases (http://www.ensembl.org/, release 63). In total we determined 7,612 1-to-1 orthologous genes, from which the longest coding sequences (CDSs) were then retrieved from Ensembl for the 18 additional mammalian genomes (Supplementary Table 1).

Coding gene sequences (CDS) of individual loci were built and aligned as codons using a modified version of transAlign37 incorporating MAFFT38, such that all sequences remained in the correct reading frame. Any ambiguously aligned sites, and codons with excessive numbers of gaps, were removed from each gene alignment using Gblocks39 under the following options: t = c b1 = $b1 b2 = $b1 b3 = 1 b4 = 6 b5 = h, where b1 = 70% of the sequences sampled in the data set.

In order to avoid potential biases due to either sequencing or assembly errors, for all phylogenetic and molecular evolution analyses, we chose to focus on only a subset of the identified genes. Specifically, we restricted our downstream analyses on data sets, which after filtering out of ambiguous sites showed no missing data in any of the sampled bats. The exception to this rule was P. vampyrus, which, because of its comparatively lower genome coverage, was missing in around 2% of CDS alignments. All final CDS alignments used in our analyses were characterized by a minimum length of 450bp (or 150 codons/amino acids) and included a minimum of six bat species, the dolphin Tursiops truncatus and the additional following mammals as outgroups: Canis familiaris, Equus caballus, Bos taurus, Mus musculus and Homo sapiens. Of the 2,326 loci examined, 642 were also included in the analysis of ref. 20.

Read the original:
Genome-wide signatures of convergent evolution in echolocating mammals

Related Posts