Pan genome of the phytoplankton Emiliania underpins its global distribution

Posted: June 12, 2013 at 3:46 pm

Fundamental uncertainties exist regarding the physiology and ecology of E. huxleyi, and the relationships between different morphotypes (Fig. 1a). To investigate its gene repertoire and physiological capacity, we sequenced the diploid genome of CCMP1516 using the Sanger shotgun approach. The haploid genome is estimated to be 141.7megabases (Mb) and 97% complete on the basis of conserved eukaryotic single-copy genes5, 6 (Supplementary Table 1, Supplementary Data 7 and Supplementary Information 1.11.4). It is dominated by repetitive elements, constituting >64% of the sequence, much greater than seen for sequenced diatoms (Fig. 2 and Supplementary Information 2.10). Of the 30,569 protein-coding genes predicted93% of which have transcriptomic support (expressed sequence tag or RNA-seq) (Supplementary Information 1.51.7, 2.12.2 and Supplementary Data 13)we identified expansions in gene families specific to iron/macromolecular transport, post-translational modification, cytoskeletal development and signal transduction relative to other sequenced eukaryotic algae (Supplementary Information 2.3).

a, E. huxleyi has five well-characterized calcification morphotypes and an overcalcified state1. b, Cladogram showing the distinct branch occupied by the haptophyte lineage on the basis of RAxML analysis of concatenated, nuclear-encoded proteins after addition of homologues from CCMP1516 and a pico-prymnesiophyte-targeted metagenome8. Lineages with algal taxa are indicated (symbol). Filled circles represent nodes with70% bootstrap support. The tree is rooted for display purposes only.

Structural composition of genomes from CCMP1516 and the diatom P. tricornutum. Grey-shaded regions of each class depict proportions of tandem repeats and low-complexity regions. The grey vertical box contains only tandem repeats and low-complexity sequence. Pie charts indicate the proportion of non-repeated (white) and repeated or low-complexity (black) sequences in each haploid genome.

The E. huxleyi genome provides a crucial reference point for evolutionary, cellular and physiological studies because haptophytes represent a distinct branch on the eukaryotic tree of life (Fig. 1b). Consistent with other published analyses7, conserved marker genes demonstrate the haptophytes branch as a sister clade to heterokonts, alveolates and rhizarians. However, as a lineage possessing secondary plastids, the evolutionary history of haptophyte genomes may be more complex8 than that suggested by a single concatenated analysis. Thus, individual gene phylogenies were constructed using clusters of orthologous proteins (1,563) identified by comparative analysis of E. huxleyi and at least 9 of 48 taxa sampled from across eukaryotes (Supplementary Information 2.4). E. huxleyi was monophyletic, with heterokonts in 2833% of the resolved trees and the green lineage (green algae and plants) in 1114%. Less frequent relationships were also observed, presumably reflecting a mosaic genome8 with contributions from the host lineage, the eukaryotic endosymbiont, and possibly horizontal gene transfer (Supplementary Fig. 1 and Supplementary Data 4).

Coccolithophores produce the anti-stress osmolyte dimethylsulphoniopropionate (DMSP), which can be demethylated to produce methylmercaptopropionate and/or cleaved by some organisms, such as E. huxleyi, to produce the predominant natural source of atmospheric sulphur, dimethylsulphide. Although the gene encoding the DmdA protein, which catalyses the initial demethylation of DMSP, was not detected in the genome, genes that produce sulphur and carbon intermediates and function in later stages of DMSP degradation were identified9. Also present is an intron-containing, but otherwise bacterial dddD-like, gene encoding an acetyl-coenzyme A (acetyl-CoA) transferase proposed to add CoA to DMSP before cleavage9 (Supplementary Table 2). These data will facilitate molecular approaches for probing DMSP biogeochemistry and the environmental importance of sulphur production and biotransformations.

E. huxleyi synthesizes unusual lipids that are used as nutritional/feedstock supplements, polymer precursors and petrochemical replacements. Two functionally redundant pathways for the synthesis of omega-3 polyunsaturated eicosapentaenoic and docosahexaenoic fatty acids were partially characterized10 (Supplementary Table 3). Pathway analysis indicates that E. huxleyi sphingolipids are primarily glucosylceramides, often with an unusual C9 methyl branch (Supplementary Table 3) found only in fungi and some animals11. Genes for two zinc-containing quinone reductases, involved in reduction of alkenone ,-double bonds used in paleotemperature reconstructions and proposed biofuels, were also identified12, 13.

Coccoliths have precise nanoscale architecture and unique light-scattering properties of interest to material and optoelectronic scientists. Carbonic anhydrase is associated with biomineralization in other organisms14 and accelerates bicarbonate formation. The 15 E. huxleyi carbonic anhydrase isozymes and genes involved in calcium and carbon transport, H+ efflux, cytoskeleton organization and polysaccharide modulation (Supplementary Table 4) represent targets for resolving molecular mechanisms governing coccolith formation, and will aid in predicting response patterns to anthropogenic CO2 increases and ocean acidification.

The global distribution of E. huxleyi (for example, Fig. 3a, c) and its capacity for bloom formation under different physiochemical parameters are puzzling. To investigate the potential influence of genome variation in this ecological dynamic, three E. huxleyi isolates (92A, EH2 and Van556) from different oceanic regions were deeply sequenced (265352-fold coverage) (Fig. 3a, c, Supplementary Tables 57 and Supplementary Information 2.6). Two approaches were used to compare genomes. First, sequence reads were assembled and contigs aligned to the CCMP1516 reference genome using Standard Nucleotide BLAST (BLASTn; Supplementary Information 2.6.1). Although these isolates show >98% 18S ribosomal RNA (rRNA) identity, only 5477% of their contigs showed similarity to CCMP1516. 71Mb of the remaining contigs were shared between at least two deeply sequenced strains. 840Mb appeared to be isolate specific, as did 27Mb of CCMP1516. Flow cytometric genome-size estimates also showed heterogeneity across isolates, with haploid genome sizes ranging from 99 to 133Mb (Supplementary Information 2.5, 2.6.1 and Supplementary Table 5). These findings indicated considerable intraspecific variation.

a, Isolation locations shown over the averaged Reynolds monthly sea-surface temperature (SST) climatology (19852007). b, tBLASTn homology search results using predicted CCMP1516 proteins against assemblies from other strains. Bars are coloured according to the number of gene products and nucleotide per cent identity. c, Best Bayesian topology, where node values indicate posterior probability/maximum-likelihood bootstrap support. Haploid genome sizes (in Mb) are provided in brackets (with ND indicating not determined), and shaded boxes denote robust clades of geographically dispersed strains. The variable distribution of nitrite reductase (NirS) and plastocyanin (PetE) is shown.

To examine potential variations in gene content further, sequence reads were directly mapped to the CCMP1516 genome. Of the 30,569 predicted genes in CCMP1516, between 1,373 and 2,012 different genes were not found in 92A, Van556 and EH2 (cumulatively 5,218, or 17% of CCMP1516 genes), and 364 appeared to be missing from all three. These findings cannot be explained by poor coverage or sequencing bias alone. Of 458 highly conserved eukaryotic genes from the CEGMA set5, 9597% were identified in the isolates, indicating nearly complete genome sequences (Supplementary Data 7). Together, de novo assemblies and direct mapping to CCMP1516 indicate that the pan genome of E. huxleyi represents a rapidly changing repository of genetic information with genomic fluidity estimated to be10%15 (on the basis of CCMP1516 gene content).

View original post here:
Pan genome of the phytoplankton Emiliania underpins its global distribution

Related Posts