The ctenophore genome and the evolutionary origins of neural systems

Posted: May 22, 2014 at 11:43 am

Source material

Animals (Pleurobrachia bachei, Euplokamis dunlapae, Dryodora glandiformis, Beroe abyssicola, Bolinopsis infundibulum and Mertensiid) were collected at Friday Harbour Laboratories (Pacific North-Western Coast of USA) and maintained in running seawater for up to 2weeks. Other species were collected at the Atlantic coast of Florida and around Woods Hole, Massachusetts (Pleurobrachia pileus, Pleurobrachia sp., Mnemiopsis leidyi) as well as central Pacific (Palau, Hawaii, Coeloplana astericola, Vallicula multiformis). Animals were anaesthetized in 60% (volume/body weight) isotonic MgCl2 (337mM). Specific tissues were surgically removed with sterile fine forceps and scissors and processed for DNA/RNA isolations as well as metabolomics or pharmacological/electrophysiological tests. Whole animals were used for all in situ hybridization and immunohistochemical tests as described35. Genomic DNA (gDNA) was isolated using Genomic-tip (QIAGEN) and total RNA was extracted using RNAqueous-Micro (Ambion/Life Technology) or RNAqueous according to manufacturers recommendations. Quality and quantity of gDNA was analysed on a Qubit2.0 Fluorometer (Life Technologies) and for RNA we used a 2100 Bioanalyzer (Agilent Technologies). For all details see Supplementary Methods sections 1.11.3.

All genomic sequence data for de novo assembly were generated on Roche 454 Titanium and Illumina Genome Analyzer IIx, HiSeq2000 and MiSeq instruments using both shotgun pair-end and mate-pair sequencing libraries with 39 kb inserts as summarized in Supplementary Tables 1 and 2. Shotgun sequencing was performed from a single individual. Owing to a limited amount of starting gDNA, mate pair libraries were constructed from 1012 individuals. In total, the genome sequencing is composed of 132,015,600,107 bp or ~132 Gb of data, which corresponds to 733825 physical coverage of the Pleurobrachia genome (the size of the P. bachei genome is estimated to be ~160180Mb); see Supplementary Methods sections 1.42.1.2.

The Pleurobrachia bachei draft genome was assembled using a custom approach designed to leverage the individual strengths of three popular de novo assembly packages and strategies: Velvet36, SOAPdenovo37, and pseudo-454 hybrid assembly with ABySS38. First, using filtered and corrected data, we performed individual assemblies from 454 and Illumina reads by the Newbler (Roche, Inc.) software. Then the merged/hybrid assembly was achieved using three individual assemblies (SOAPdenovo, Velvet and ABySS/Newbler as described in Supplementary Methods 2.2). Three gene model predictions were performed by Augustus39 and Fgenesh predictions with the Softberry Inc. Fgenesh++ pipeline40, 41 to incorporate information from full-length cDNA alignments and similar proteins from the eukaryotic section of the NCBI NR database42. After initial gene predictions in each of the three sets of genomic scaffolds, we screened each set of gene models for internal redundancy with the BLASTP program from NCBIs BLAST+ software suite43. A model was considered redundant if it: had 90% identity to other model; the alignment between the two models had a bit score of at least 100; and the model was shorter than the other model.

Scaffolds producing these gene models were pooled and then screened for prokaryotic contamination using UCSCs BLAT software package44 to produce the draft genome assembly version 1.0 (statistics can be found in Supplementary Table 5 and Supplementary Methods 2).

For annotation, gene models were uploaded to the In-VIGO BLAST interface, a blastp alignment of gene models was performed against the entirety of NCBIs non-redundant protein database and the Swiss-Prot protein database, and subsequently annotated in terms of Gene Ontology and KEGG pathways as well as Pfam domain identification. Transposable elements (TEs) were identified using not only WU-BLAST and its implementation in CENSOR but also databases for all known classes, superfamilies and clades of TEs described in the literature and/or collected in Repbase45. Detected sequences have been clustered based on their pairwise identities by using BLASTclust. All autonomous non-LTR retrotransposons have been classified based on RTclass1 (ref. 46). To merge partially predicted, non-redundant gene models with assembled transcriptome data, a custom Java tool was developed. This Java tool extended partial gene model predictions based on using transcriptome sequences to bridge 5 and 3 fragments of partially predicted genes. Using this Java tool, analysis of alignments of non-redundant gene models to assembled Pleurobrachia transcriptomes resulted to 19,523 (Supplementary Table 30) gene models. These gene models were used to also identify their possible homologues in assembled transcriptomes from 10 other ctenophore species sequenced (Supplementary Tables 10 and 11). All genomic sequences were submitted to NCBI on SRA accession number Project SRP001155 (Supplementary Methods 3.13.2).

Three sequencing technology platforms were used for transcriptome profiling (RNA-seq): Roche 454 Titanium, Illumina HiSeq2000 and Ion Proton/PGM (Ion Torrent, Life Technologies). RNA-seq was performed from all major embryonic and developmental stages (1 cell, 2 cells, 4 cells, 8 cells, 16 cells, 32 cells, 64 cells, early and later gastrula, 1 day and 3 day larvae), major adult tissues and organs (combs, mouth, tentacles, stomach, the aboral organ, body walls), and whole body of Pleurobrachia bachei. We developed a reduced representation sequencing protocol for the 454 and Ion Torrent sequencing platforms that can detect low abundance transcripts47. The method reduces the amount of sequencing and gives more accurate quantification and additional details of the procedure are reported elsewhere47, 48. In summary, we have generated 499,699,347 reads or ~47.9 Gb to achieve approximately 2,000 coverage of the Pleurobrachia transcriptome.

In addition, Illumina HiSeq sequencing was also performed with RNA extracted from the following ctenophore species: Euplokamis dunlapae, Coeloplana astericola, Vallicula multiformis, Pleurobrachia pileus, Pleurobrachia sp. (collected from the Middle Atlantic and later identified as a subspecies of P. pileus), Dryodora glandiformis, Beroe abyssicola, Mnemiopsis leidyi, Bolinopsis infundibulum and an undescribed species which belongs to the family Mertensiidae (Supplementary Table 3). Each sequencing project was individually assembled using the Trinity de novo assembly package49 and in selected cases using MIRA. Reads from developmental stages were also assembled using the CLCBio Genomics Workbench. Before each assembly, reads were quality trimmed and had adaptor contamination removed with cutadapt50. Full summaries of the transcriptome assemblies are presented in Supplementary Tables 4 and 10. Each transcriptome was mapped to the Pleurobrachia genome, and aligned to both NCBIs non-redundant protein database (NR) and the UniProtKB/Swiss-Prot (SP) protein database. Gene Ontology51 and Kyoto Encyclopedia of Genes and Genomes52, 53 (KEGG) terms were associated with each transcript. By first translating transcripts in all six reading frames, Pfam/SMART domains54 were assigned to each reference transcriptome.

Each reference transcriptome and its full set of annotation and expression data was uploaded to our transcriptome database http://moroz.hpc.ufl.edu/slimebase2/browse.php for downstream analysis and visualization55, 56. The database is integrated with UCSC type genome browser. Via the genome project homepage (http://neurobase.rc.ufl.edu/Pleurobrachia) all data sets have direct download options. Quantification of gene expression profiling was performed on all transcriptional data as described in Supplementary Methods 4.4. Hierarchical clustering was performed by Spotfire agglomerative algorithm. All primary transcriptome data was submitted to NCBI on SRA accession number Project SRP000992. (See Supplementary Methods 4.14.2.3 for details.)

To reconstruct basal metazoan phylogeny (see controversies in10, 11, 12, 13, 14, 15, 57), we conducted two sets of phylogenomic analysis using tools described elsewhere58. All analyses included new data from Pleurobrachia bachei and the sponges Sycon (Calcarea) and Aphrocallistes (Hexactinellida). For the first set of analyses, Ctenophora was represented by two species of Pleurobrachia and Mnemiopsis leidyi. Initial analyses included the taxa in Supplementary Table 12. For a subsequent analysis, sampling within Ctenophora was expanded to include ten additional taxa, each represented by a relatively deeply sequenced Illumina transcriptome (Supplementary Table 13). In order to reduce noise in the phylogenetic signal, we used strict criteria to exclude paralogues, highly derived sequences, mistranslated sequence regions, and ambiguously aligned positions in sequence alignments. Analyses were conducted in RAxML 7.2.7 (refs 59) using maximum likelihood (ML) with the CAT +WAG + F model. Topological robustness (that is, nodal support) for all ML analyses was assessed with 100 replicates of nonparametric bootstrapping. Details of phylogenomic analyses are presented in Supplementary Methods 7. ShimodairaHasegawa test17 was implemented in RAxML with the PROTGAMMAWAGF model17.

See the rest here:
The ctenophore genome and the evolutionary origins of neural systems

Related Posts