Mapping epigenetic divergence in the massive radiation of Lake Malawi cichlid fishes – Nature.com

Posted: October 11, 2021 at 10:54 am

The methylomes of Lake Malawi cichlids feature conserved vertebrate characteristics

To characterise the methylome variation and assess possible functional relationships in natural populations of Lake Malawi cichlids, we performed high-coverage whole-genome bisulfite sequencing of methylomes (WGBS) from liver tissues of six different cichlid species. Muscle methylome (WGBS) data for three of the six species were also generated to assess the extent to which methylome divergence was tissue-specific. Moreover, to examine the correlation between transcriptome and methylome divergences, total transcriptomes (RNAseq) from both liver and muscle tissues of four species were generated. Only wild-caught male specimens (23 biological replicates for each tissue and each species) were used for all sequencing datasets (Fig.1ac, Supplementary Fig.1, Supplementary Data1, and Supplementary Table1). The species selected were: Rhamphochromis longiceps (RL), a pelagic piscivore (Rhamphochromis group); Diplotaxodon limnothrissa (DL), a deep-water pelagic carnivore (Diplotaxodon group); Maylandia zebra (MZ) and Petrotilapia genalutea (PG), two rock-dwelling algae eaters (Mbuna group); Aulonocara stuartgranti (AS), a benthic invertebrate-eating sand/rock-dweller that is genetically part of the deep-benthic group; Astatotilapia calliptera (AC), a species of rivers and lake margins40 (Fig.1b).

a Map of Africa (main river systems are highlighted in white) and magnification of Lake Malawi (scale bar: 40km). b Photographs (not to scale) of the six Lake Malawi cichlid species part of this study spanning five of the seven described eco-morphological groups. The symbols represent the different habitats (pelagic/benthic [wave symbol], rock/sand-dwelling/littoral [rock symbol]and adjacent rivers part of Lake Malawi catchment), and the type of diet (fish, fish/zooplankton, algae, invertebrates) for each group. The species representing each group are indicated by their initials (see below). c Diagram summarising the sampling and sequencing strategies for liver and muscle methylome (whole-genome bisulfite sequencing, WGBS) and whole transcriptome (RNAseq) datasets. See Methods, Supplementary Fig.1 and Supplementary Table1. d Violin plots showing the distribution of liver DNA methylation levels in CG sequence context (averaged mCG/CG levels over 50bp-long bins genome-wide) in different genomic regions: overall, gene bodies, exons, promoter regions (TSS500bp), CpG-islands in promoters and outside (orphan) and in repeat/transposon regions. mC levels for two different repeat classes are given: DNA transposon superfamily Tc2-Mariner (n=5,378) and LINE I (n=407). e Average liver mCG profiles across genes differ depending on their transcriptional activity in liver: from non-expressed (0) to genes showing low (1), intermediate (2), high (3) and highest (4) expression levels (Methods). Results shown in (d, e) are for Mbuna MZ (liver, n=3) and are representative of the results for all other species, and are based on average mC/C in 50bp non-overlapping windows. RL, Rhamphochromis longiceps; DL, Diplotaxodon limnothrissa; MZ, Maylandia zebra; PG, Petrotilapia genalutea; AS, Aulonocara stuartgranti; AC, Astatotilapia calliptera. CreditsFish photographs: Hannes Svardal and M. Emlia Santos. Geographical map modified from http://www.d-maps.com/.

On average, 285.5155.6 million paired-end reads (see Supplementary Data1) for liver and muscle methylomes were generated with WGBS, yielding ~1015x per-sample coverage at CG dinucleotide sites (Supplementary Fig.2ad; see Methods and Supplementary Notes). To account for species-specific genotype and avoid methylation biases due to species-specific single nucleotide polymorphism (SNP), WGBS reads were mapped to SNP-corrected versions of the Maylandia zebra reference genome (UMD2a; see Methods). Mapping rates were not significantly different among all WGBS samples (Dunns test with Bonferroni correction, p>0.05; Supplementary Fig.2e), reflecting the high level of conservation at the DNA sequence level across the Malawi radiation (Supplementary Fig.3). In parallel, liver and muscle transcriptomes were generated for four species using the same specimens as used for WGBS, yielding on average 11.90.7 million paired-end reads (meansd; Fig.1c, Supplementary Data1 and Methods).

We first characterised global features of the methylome of Lake Malawi cichlids. The genome of Lake Malawi cichlid was found to have copies of DNA methyltransferases (DNMTs) and ten-eleven translocation methylcytosine dioxygenases (TETs), the readers and erasers of DNA methylation respectively (Supplementary Fig.4ac). Like that of mammals and other teleost fish, the genomes of Lake Malawi cichlids have high levels of DNA methylation genome-wide in the CG dinucleotide sequence context, consistently across all samples in both tissues analysed (Fig.1d and Supplementary Fig.2ac). Gene bodies generally show higher methylation levels than the genome-wide average, while the majority of promoter regions are unmethylated (Fig.1d). CpG islands (CGIs; i.e., CpG-rich regionsabundant in Lake Malawi cichlid genomes; Supplementary Fig.5ai, Supplementary Notes and Methods) are almost entirely devoid of methylation in promoters, while orphan CGIs, residing outside promoters, are mostly highly methylated (Fig.1d and Supplementary Fig.5f, g). While 70% of mammalian promoters contain CGIs41, only 1520% of promoters in Lake Malawi cichlids harbour CGIs (Supplementary Fig.5d), similar to frog and zebrafish genomes41. Notably, orphan CGIs, which may have important cis-regulatory functions42, compose up to 80% of all predicted CGIs in Lake Malawi cichlids (Supplementary Fig.5e). Furthermore, repetitive regions, as well as transposable elements, are particularly enriched for cytosine methylation, suggesting a methylation-mediated silencing of their transcription (Fig.1d, Supplementary Fig.6ad), similar to that observed in zebrafish and other animals8,18. Interestingly, certain transposon families, such as LINE I and Tc2-Mariner, part of the DNA transposon familythe most abundant TE family predicted in Lake Malawi cichlid genome (Supplementary Fig.6a, b, Supplementary Notes, and ref. 38)have recently expanded considerably in the Mbuna genome (Supplementary Fig.6c and refs. 38,43). While Tc2-Mar DNA transposons show the highest median methylation levels, LINE I elements have some of the lowest, yet most variable, methylation levels of all transposon families, which correlates with their evolutionary recent expansion in the genome (Fig.1d, e and Supplementary Fig.6d, e). Finally, transcriptional activity in liver and muscle tissues of Lake Malawi cichlids was negatively correlated with methylation in promoter regions (Spearmans correlation test, =0.40, p<0.002), while being weakly positively correlated with methylation in gene bodies (=0.1, p<0.002; Fig.1e and Supplementary Fig.7ad and Supplementary Table2). This is consistent with previous studies highlighting high methylation levels in bodies of active genes in plants and animals, and high levels of methylation at promoters of weakly expressed genes in vertebrates8,24. We conclude that the methylomes of Lake Malawi cichlids share many regulatory features, and possibly associated functions, with those of other vertebrates, which renders Lake Malawi cichlids a promising model system in this context.

To assess the possible role of DNA methylation in phenotypic diversification, we then sought to quantify and characterise the differences in liver and muscle methylomes across the genomes of Lake Malawi haplochromine cichlids. Despite overall very low sequence divergence36 (Supplementary Fig.3), Lake Malawi cichlids were found to show substantial methylome divergence across species within each tissue type, while within-species biological replicates always clustered together (Fig.2a). The species relationships inferred by clustering of the liver methylomes at conserved individual CG dinucleotides recapitulate some of the genetic relationship inferred from DNA sequence36, with one exceptionthe methylome clusters A. calliptera samples as an outgroup, not a sister group to Mbuna (Fig.2a and Supplementary Fig.3a, b). This is consistent with its unique position as a riverine species, while all species are obligate lake dwellers (Fig.1b).

a Unbiased hierarchical clustering and heatmap of Spearmans rank correlation scores for genome-wide methylome variation in Lake Malawi cichlids at conserved CG dinucleotides. Dotted boxes group samples by species within each tissue. b Observed/Expected ratios (O/E ratio, enrichment) for some genomic localisations of differentially methylated regions (DMRs) predicted between livers (green) and between muscles (purple) of three Lake Malawi cichlid species, and between tissues (within-species, grey); 2 tests for between categories (p<0.0001), for O/E between liver and muscle DMRs (p=0.99) and between Liver+Muscle vs Tissues (p=0.04). Expected values were determined by randomly shuffling DMRs of each DMR type across the genome (1000 iterations). Categories are not mutually exclusive. c Gene ontology (GO) enrichment for DMRs found between liver methylomes localised in promoters. GO terms: Kyoto Encyclopaedia of Genes and Genomes (KEGG), molecular functions (MF), cellular component (CC), and biological processes (BP). Only GO terms with FDR<0.05 shown. N indicates the number of genes associated with each GO term. Only GO terms with p<0.05 (BenjaminiHochberg false discovery rate [FDR]-corrected p-values) are shown. d Genomic localisation of liver DMRs containing repeats/transposons (TE-DMRs). e. O/E ratios for species TE-DMRs for each TE family. Only O/E2 and 0.5 shown. 2 tests, p<0.0001. f Violin plots showing TE sequence divergence (namely, CpG-adjusted Kimura substitution level as given by RepeatMasker) in M. zebra genome for species TE-DMRs, TEs outside species DMRs (outside) and randomly shuffled TE-DMRs (500 iterations, shuffle). Mean values indicated by red dots, median values by black lines and shown above each graph. Total DMR counts indicated below each graph. Two-sided p-values for KruskalWallis test are shown above the graph. DMR, differentially methylated region; TE, repeat/transposon regions; CGI, predicted CpG islands.

As DNA methylation variation tends to correlate over genomic regions consisting of several neighbouring CG sites, we defined and sought to characterise differentially methylated regions (DMRs) among Lake Malawi cichlid species (50bp-long, 4 CG dinucleotide, and 25% methylation difference across any pair of species, p<0.05; see Methods). In total, 13,331 between-species DMRs were found among the liver methylomes of the six cichlid species (Supplementary Fig.8a). We then compared the three species for which liver and muscle WGBS data were available and found 5,875 and 4,290 DMRs among the liver and muscle methylomes, respectively. By contrast, 27,165 within-species DMRs were found in the between-tissue comparisons (Supplementary Fig.8b). Overall, DMRs in Lake Malawi cichlids were predicted to be as long as 5,000bp (95% CI of median size: 282298bp; Supplementary Fig.8c). While the methylation differences between liver and muscle were the most prominent at single CG dinucleotide resolution (Fig.2a) and resulted in the highest number of DMRs, we found DMRs to be slightly larger and methylation differences within them substantially stronger among species than between tissues (Dunns test, p<2.21016; Supplementary Fig.8c, d).

Next, we characterised the genomic features enriched for between-species methylome divergence in the three cichlid species for which both muscle and liver WGBS data were available (i.e., RL, PG, DL; Fig.1c). In the liver, promoter regions and orphan CGIs have 3.0- and 3.6-fold enrichment respectively for between-species liver DMRs over random expectation (2 test, p<0.0001; Fig.2b)between-species muscle DMRs show similar patterns as well (p=0.99, compared to liver O/E ratios). Methylome variation at promoter regions has been shown to affect transcription activity via a number of mechanisms (e.g., transcription factor binding affinity, chromatin accessibility)21,44 and, in this way, may participate in phenotypic adaptive diversification in Lake Malawi cichlids. In particular, genes with DMRs in their promoter regions show enrichment for enzymes involved in hepatic metabolic functions (Fig.2c). Furthermore, the high enrichment of DMRs in intergenic orphan CGIs (Fig.2b), accounting for n=691 (11.94%) of total liver DMRs, suggests that intergenic CGIs may have DNA methylation-mediated regulatory functions.

The majority of between-species liver DMRs (65.0%, n=3,764) are within TE regions (TE-DMRs; Supplementary Fig.8a, b, e), approximately two-thirds of which are located in unannotated intergenic regions (Fig.2d). However, a small fraction of TE-DMRs are located in gene promoters (12% of all TE-DMRs) and are significantly enriched in genes associated with metabolic pathways (Fig.2d and Supplementary Fig.8f). While there is only a 1.1-fold enrichment of DMRs globally across all TEs (Fig.2b), some TE families are particularly enriched for DMRs, most notably the DNA transposons hAT (hAT6, 10.5-fold), LINE/l (>3.7-fold) and the retrotransposons SINE/Alu (>3.5-fold). On the other hand, the degree of methylation in a number of other TE families shows unexpected conservation among species, with substantial DMR depletion (e.g., LINE/R2-Hero, DNA/Maverick; Fig.2e). Overall, we observe a pattern whereby between-species methylome differences are significantly localised in younger transposon sequences (Dunns test, p=2.21016; Fig.2f). Differential methylation in TE sequences may affect their transcription and transposition activities, possibly altering or establishing new transcriptional activity networks via cis-regulatory functions45,46,47. Indeed, the movement of transposable elements has recently been shown to contribute to phenotypic diversification in Lake Malawi cichlids48.

In contrast to the between-species liver DMRs, within-species DMRs based on comparison of liver against muscle methylomes show much less variation in enrichment across genomic features. Only gene bodies show weak enrichment for methylome variation (Fig.2b). Moreover, both CGI classes, as well as repetitive and intergenic regions show considerable tissue-DMR depletion, suggesting a smaller DNA methylation-related contribution of these elements to tissue differentiation (Fig.2b and Supplementary Fig.8e).

We hypothesised that adaptation to different diets in Lake Malawi cichlids could be associated with distinct hepatic functions, manifesting as differences in transcriptional patterns which, in turn, could be influenced by divergent methylation patterns. To investigate this, we first performed differential gene expression analysis. In total, 3,437 genes were found to be differentially expressed between livers of the four Lake Malawi cichlid species investigated (RL, DL, MZ, and PG; Wald test, false discovery rate adjusted two sided p-value using BenjaminiHochberg[FDR]<0.01; Fig.3a and Supplementary Fig.9ac; see Methods). As with methylome variation, transcriptome variation clustered individuals by species (Supplementary Fig.9d), consistent with species-specific functional liver transcriptome activity.

a Heatmap and unsupervised hierarchical clustering of gene expression values (Z-score) of all differentially expressed genes (DEGs) found among livers of four Lake Malawi cichlid species (Wald tests corrected for multiple testing using false discovery rate FDR<1%). GO enrichment analysis for three DEG clusters are shown in Supplementary Fig.9c. b Significant overlap between DEG and differentially expressed regions (DMRs; p<0.05) linked to a gene (exact hypergeometric test, p=4.71105), highlighting putative functional DMRs (pfDMRs). c Bar plot showing the percentage of pfDMRs localised in either promoters, intergenic regions (0.54kbp away from genes), or in gene bodies, with the proportion of TE content for each group. d Heatmap representing significant GO terms for DEGs associated with pfDMRs for each genomic feature. GO categories: BP, Biological Process; MF, Molecular Function. Only GO terms with BenjaminiHochberg FDR-corrected p-values < 0.05 are shown.Examples of pfDMRs significantly associated with species-specific liver transcriptional changes for the genes thiosulfate:glutathione sulfurtransferase tstd1-like (LOC101468457; q=6.821016) (e), carbonyl reductase [NADPH]-1 cbr1-like (LOC101465189; MZ vs DL, q=0.002; MZ vs RL, q=1.18107) (f) and perforin-1 prf1-like (LOC101465185; MZ vs DL, q=3.681019; MZ vs RL, q=0.00034) (g). Liver and muscle methylome profiles in green and purple, respectively (averaged mCG/CG levels [%] in 50bp bins; n=3 biological replicates for liver DL, PG, and MZ; n=2 biological replicates for liver RL, AS, and AC, and muscle DL, RL, and PG). hj Boxplots showing gene expression values (transcript per million) for the genes in (eg). in livers (green) and muscle (pink). n=3 biological replicates for liver DL, MZ, PG; n=2 biological replicates for liver RL and muscle DL, MZ, PG, and RL. Two-sided q values for Wald tests corrected for multiple testing (BenjaminiHochberg FDR) are shown in graphs. Box plots indicate median (middle line), 25th, 75th percentile (box), and 5th and 95th percentile (whiskers) as well as outliers (single points). CGI, CpG islands; Repeats, transposons and repetitive regions.

Next, we checked for the association between liver DMRs and transcriptional changes. Of the 6,797 among-species DMRs that could be assigned to a specific gene (i.e., DMRs within promoters, gene bodies or located 0.54kbp away from a gene; see Methods), 871 were associated with differentially expressed genes, which is greater than expected by chance (Fig.3b; p<4.7105), suggesting that DMRs are significantly associated with liver gene expression. Of these 871 putative functional DMRs (pfDMRs), the majority (42.8%) are localised over gene bodies, hinting at possible intronic cis-regulatory elements or alternative splicing49. The remaining pfDMRs are in intergenic (30.2%) or promoters (27%) (Fig.3c). The majority of pfDMRs contain younger TE sequences, in particular in intronic regions, while only few contain CGIs (Supplementary Fig.10ac). In promoters and intergenic regions, 63% of pfDMR sequences contain TEs (Fig.3c). As methylation levels at cis-regulatory regions may be associated with altered transcription factor (TF) activity22,24,25, we performed TF binding motif enrichment analysis using between-species liver DMRs and found significant enrichment for specific TF recognition binding motifs. Several TF genes known to recognise some of the enriched binding motifs are differentially expressed among the livers of the three cichlid species and have liver-associated functions (Supplementary Fig.10d, e). For example, the gene of the transcription factor hepatocyte nuclear factor 4 alpha (hnf4a), with important functions in lipid homeostasis regulation and in liver-specific gene expression50, is >2.5x-fold downregulated (q9105) in the rock-dwelling algae-eater P. genalutea compared to the pelagic piscivores D. limnothrissa and R. longiceps, possibly in line with adaptation to different diets (Supplementary Fig.10e).

Furthermore, genomic regions containing pfDMRs are also significantly associated in the livers with altered transcription of many other genes involved in hepatic and metabolic oxidation-reduction processes (Fig.3d and Supplementary Fig.10f). These include genes encoding haem-containing cytochrome P450 enzymes (such as cyp3a4, cy7b1; Supplementary Fig.10f), which are important metabolic factors in steroid and fatty acid metabolism, as well as genes encoding other hepatic enzymes involved in energy balance processes. This enrichment is associated with significant methylome divergence among species, in particular in promoter regions and gene bodies (Fig.3d). For example, the gene sulfurtransferase tstd1-like, an enzyme involved in energy balance and the mitochondrial metabolism, is expressed exclusively in the liver of the deep-water pelagic species D. limnothrissa, where it shows ~80% reduced methylation levels in a gene-body DMR compared to all the other species (Fig.3e, h). Another example is the promoter of the enzyme carbonyl reductase [NADPH] 1 (cbr1) which shows significant hypomethylation (2.2kbp-long DMR) in the algae-eaters MZ and PG, associated with up to ~60-fold increased gene expression in their livers compared to the predatory Rhamphochromis and Diplotaxodon (Fig.3f, i). Interestingly, cbr1 is involved in the metabolism of various fatty acids in the liver and has been associated with fatty acid-mediated cellular signalling in response to environmental perturbation51. As a final example, we highlight the cytotoxic effector perforin 1-like (prf1-like), an important player in liver-mediated energy balance and immune functions52. Its promoter is hypermethylated (>88% mCG/CG) exclusively in the liver of the deep-water species DL, while having low methylation levels (~25%) in the four other species (Fig.3g). This gene is not expressed in DL livers but is highly expressed in the livers of the other species that all show low methylationlevels at their promoters (Fig.3j). Taken together, these results suggest that species-specific methylome divergence is associated with transcriptional remodelling of ecologically-relevant genes, which might facilitate phenotypic diversification associated with adaption to different diets.

We further hypothesised that between-species DMRs that are found in both the liver and muscle methylomes could relate to functions associated with early development/embryogenesis. Given that liver is endoderm-derived and muscle mesoderm-derived, such shared multi-tissue DMRs could be involved in processes that find their origins prior to or early in gastrulation. Such DMRs could also have been established early on during embryogenesis and may have core cellular functions. Therefore, we focussed on the three species for which methylome data were available for both tissues (Fig.1c) to explore the overlap between muscle and liver DMRs (Fig.4a). Based on pairwise species comparisons (Supplementary Fig.11a, b), we identified methylome patterns unique to one of the three species. We found that 4048% of these were found in both tissues (multi-tissue DMRs), while 3943% were liver-specific and only 1318% were muscle-specific (Fig.4b).

a Distinct species-specific methylome patterns in Lake Malawi cichlids can be found in liver or muscle tissues, or in both tissues (multi-tissue). b Histograms showing the total counts of species DMRs that are either liver-, muscle-specific or present in both (multi). Only species DMRs showing distinct DNA methylation patterns in one species are shown. c GO enrichment plots for each DMR class. Only GO terms with BenjaminiHochberg FDR-corrected p-values < 0.05 are shown. df Examples of species multi-tissue DMRs in genes related to embryonic and developmental processes. Namely, in the genes coding for visual system homeobox 2 vsx2 (LOC101486458), growth-associated protein 43 gap43 (LOC101472990) and teneurin transmembrane protein 2 tenm2 (LOC101470261). Liver and muscle methylome profiles shownin green and purple, respectively (averaged mCG/CG levels [%] in 50bp bins for 2 samples per tissue per species; scale indicated below each graph).

The relatively high proportion of multi-tissue DMRs suggests there may be extensive among-species divergence in core cellular or metabolic pathways. To investigate this further, we performed GO enrichment analysis. As expected, liver-specific DMRs are particularly enriched for hepatic metabolic functions, while muscle-specific DMRs are significantly associated with muscle-related functions, such as glycogen catabolic pathways (Fig.4c). Multi-tissue DMRs, however, are significantly enriched for genes involved in development and embryonic processes, in particular related to cell differentiation and brain development (Fig.4cf), and show different properties from tissue-specific DMRs. Indeed, in all the three species, multi-tissue DMRs are three times longer on average (median length of multi-tissue DMRs: 726bp; Dunns test, p<0.0001; Supplementary Fig.11c), are significantly enriched for TE sequences (Dunns test, p0.03; Supplementary Fig.11d) and are more often localised in promoter regions (Supplementary Fig.11e) compared to liver and muscle DMRs. Furthermore, multi-tissue species-specific methylome patterns show significant enrichment for specific TF binding motif sequences. These binding motifs are bound by TFs with functions related to embryogenesis and development, such as the transcription factors Forkhead box protein K1 (foxk1) and Forkhead box protein A2 (foxa2), with important roles during liver development53 (Supplementary Fig.11f), possibly facilitating core phenotypic divergence early on during development.

Several examples of multi-tissue DMRs are worth highlighting as generating hypotheses for potential future functional studies (Fig.4df). The visual system homeobox 2 (vsx2) gene in the offshore deep-water species Diplotaxodon limnothrissa is almost devoid of methylation in both liver and muscle, in contrast to the other species (1.9kbp-long DMR; Fig.4d and Supplementary Fig.11g). vsx2 has been reported to play an essential role in the development of the eye and retina in zebrafish with embryonic and postnatal active transcription localised in bipolar cells and retinal progenitor cells54. D. limnothrissa populates the deepest parts of the lake of all cichlid species (down to approximately 250m, close to the limits of oxygenation) and features morphological adaptations to dimly-lit environments, such as larger eye size55. vsx2 may therefore participate in the visual adaptation of Diplotaxodon to the dimmer parts of the lake via DNA methylation-mediated gene regulation during development. Another example of a multi-tissue DMR specific to D. limnothrissa is located in the promoter of the gene coding for the growth-associated protein 43 (gap43) involved in neural development and plasticity, and also neuronal axon regeneration56. The promoter of gap43 is largely devoid of methylation (overall <5% average mCG/CG levels over this 5.2 kbp-long DMR) in both muscle and liver tissues of D. limnothrissa, while being highly methylated (>86% mCG/CG) in the other species (Fig.4e). In A. calliptera, the transcription of gap43 is restricted to the brain and embryo (Supplementary Fig.11h), consistent with a role in neural development and in the adult brain. Finally, another multi-tissue DMR potentially involved in neural embryonic functions is located in the promoter region of the gene tenm2, coding for teneurin transmembrane protein (Fig.4f). tenm2 is a gene expressed early on during zebrafish embryogenesis as well as in cichlid brain and embryo (Supplementary Fig.11i) and is involved in neurodevelopment and neuron migration-related cell signalling57. This 2.7kbp-long DMR is completely unmethylated in the algae-eating rock-dweller Petrotilapia genalutea (almost 80% reduction in methylation levelsoverall compared to the other species) and may mediate species-specific adaptive phenotypic plasticity related to synapse formation and neuronal networks.

View original post here:
Mapping epigenetic divergence in the massive radiation of Lake Malawi cichlid fishes - Nature.com

Related Posts