Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of … – Nature.com

Posted: November 30, 2023 at 8:34 pm

Genome assembly and annotation

The widely cultivated A. sinensis cultivar Qinggui1 was selected for genome sequencing (Fig.1a). We generated a total of 376.4Gb Single Molecule Real-Time (PacBio SMRT) sequences and 60.8Gb paired HiSeq reads (PE150), along with 325.0Gb effective chromosome conformation capture (Hi-C) reads (TableS1). The assembly was initialized by PacBio SMRT sequences, which were corrected with high-quality paired HiSeq reads. A genome size of 2.16Gb was obtained after the final assembly. The Hi-C interaction matrices showed a distinct separation pattern of 11 blocks that could be used to cluster and orient the contigs and anchor them to 11 chromosomes (Fig.1b and Tables1 andS2). The size of the genome that we assembled was similar to the size estimated by flow cytometry13. Mapping the short reads back to the assembly led to a correction of 29,533 single-base errors and 9426 small Indels. The identification of 1,588,740 heterozygous SNPs showed a low level of heterozygosity in this self-fertilized plant. Evaluation by the Benchmarking Universal Single-Copy Orthologs (BUSCO) method19,20 showed >99% completeness of the genome (TableS3). These results confirm a high-quality genome assembly. Please refer to Table1 and Data availability for detailed information on the genome assembly.

a Morphology of the sequenced plant. b Hi-C map of chromosomes. c a-b. SNP and indel density and distribution identified between A. sinensis (GS) and A. sinensis (QH); c Density and distribution of LTR retrotransposons (purple: LTR; blue: Copia-type; dark green: Gypsy-type); d Gene density and distribution; e Colinear gene pairs within the genome. The colors of linking lines indicate the number of one-to-one gene pairs in the collinearity blocks: 40, green: 20, blue: 10, gray: 5. This figure was prepared by using shinyCircos110.

Approximately 80.24% of the assembly (1.66Gb) was identified to be repetitive sequences, which was higher than estimates in another Apiaceae family member, coriander (70.59%) (Fig.1c, TableS4). Long terminal repeats (LTRs), primarily consisting of Gypsy and Copia subtypes, were most abundant. The other repeats were categorized as DNA transposons (3.65%), long interspersed nuclear elements (LINEs; 1.26%), short interspersed nuclear elements (SINEs, 0.03%), and uncharacterized repeats (19.77%) (TableS5).

We predicted a total of 41,040 protein-coding genes (TableS6) using ab initio methods, protein homology, and RNA-seq reads from different tissues. Of them, 98.3% were mapped to the chromosomes, and most were distributed in the terminal regions (Fig.1c). Using the iTAK pipeline21, we predicted 2,996 transcription factor (TF) genes in the A. sinensis genome. The top five TF families were MYB/MYB-related (209), AP2/ERF-ERF (172), bHLH (166), C2H2 (154), and NAC (135). Compared with those in other Apiaceae plants, GeBP, HSF, GARP-G2-like, C2C2-GATA, C2C2-Co-like, HB-WOX, and Trihelix families were expanded whereas C2C2-YABBY, B3-ARF, and GRAS genes dramatically decreased in A. sinensis (Fig.S1). The genome that we assembled in this study included more TF genes in most TF families than that in the published A. sinensis (GS) genome (Fig.S1).

Despite the increasing number of sequenced genomes of medicinal plants, systematic studies of their evolutionary relationships are relatively scarce. To explore the phylogenetic position of A. sinensis in the Apiaceae family and its evolutionary relations with other species, we selected typical representative families/orders and medicinal plant species of rosids and asterids according to the Angiosperm Phylogeny Group classification (APG V4) classification system22 and constructed a phylogenetic tree using one-to-one homologous gene families. These 20 representative angiosperms included 12 well-known medicinal plant species (TableS7) from 14 families and 12 orders, representing the major botanical taxonomic groups of core eudicots.

Among these species, Vitis vinifera was chosen for its important evolutionary position and its wide use as a model and basal plant for plant evolutionary research23. Arabidopsis thaliana and Solanum lycopersicum are well-studied model eudicot plants24,25. Theobroma cacao and Camellia sinensis are two of the most important beverage crops and are rich in secondary metabolites such as caffeine26,27. C. sinensis is also one of the basal species of asterid plants27. Populus trichocarpa was selected as a model plant for the study of lignin biosynthesis and phenylpropanoid metabolism28, which is also one of the most important metabolic pathways in A. sinensis related to the bioactive metabolites of ferulic acid, lignans, and coumarins. Cannabis sativa is one of the most valuable agriculturally important crops in nature and is also used to produce well-known drugs - tetrahydrocannabinol (THC) and cannabidiol (CBD)29. Ophiorrhiza pumila, belonging to the family Rubiaceae, is an important herbaceous medicinal plant and can accumulate camptothecin (CPT)30. Scutellaria baicalensis, Salvia miltiorrhiza, Taraxacum mongolicum, Artemisia annua, Lonicera japonica, Panax notoginseng, Panax ginseng, Angelica sinensis, and snapdragon (Antirrhinum majus L.) are widely used as traditional Chinese medicines with thousands of years of history in China. In addition, we also included Daucus carota, Apium graveolens, and Coriandrum sativum, which are important members of the Apiaceae family, to examine the evolutionary relationships within the family and the evolutionary status of A. sinensis.

We identified a total of 2133 one-to-one orthologous gene families shared by all the species (Fig.S2). Using these orthologs, we constructed a phylogenetic tree by the concatenation method. As expected, the topology of the tree was consistent with the APG V4 classification. In the Apiales order, Araliaceae was grouped with Apiaceae, and Araliaceae was considered to be the ancestral family. Divergence time estimates showed that these two families separated around 58 MYA. Within the Apiaceae family, A. graveolens and D. carota diverged approximately 23 MYA, which is much earlier than the divergence of A. sinensis (QH) and its sister clade C. sativum (12 MYA) (Fig.2a).

a Molecular phylogenetic tree of 20 representative angiosperm species constructed using 2133 concatenated conserved protein sequences by the ML and BI methods. b Phylogenetic tree of A. sinensis and other Apiaceae species, inferred by estimating divergence time using 3188 single-copy ortholog sequences. P. notoginseng was used as an outgroup. The numbers in green and red colors indicate gene family expansion and contraction compared with the most recent common ancestors, respectively. Estimated divergence times (MYA, million years ago) are indicated at each node. The Venn diagram shows the proportion of gene families under the unchanged (blue), expansion (red) and contraction (green) scenarios. c KEGG pathway enrichment analysis of expanded gene families in the A. sinensis (QH) genome. Only the enriched KEGG pathways with p values<0.05 are displayed. d Distribution of 4DTv distances of syntenic orthologous genes of Apiaceae species. The black arrows mark the WGD events. e The KS distribution for orthologous gene pairs within Apiaceae species. V. vinifera was used as the model organism for evolutionary analysis. The shape of the curve and the position of the peak are almost identical between A. sinensis (QH) and A. sinensis (GS). The highlighted peak regions represent two WGD events.

To further investigate the evolutionary relationships among Apiaceae species, we clustered approximately 91.3% (206,682) of the genes from five Apiaceae species and one outgroup species (P. notoginseng) into 29,108 orthologous groups and extracted 3189 single-copy genes (TableS8). We constructed a phylogenetic tree based on the concatenated sequence alignment of these single-copy gene families (Fig.2b). C. sativum showed the most marked gene expansion. A. sinensis (QH) and A. sinensis (GS) were clustered together and C. sativum was their closest relative. A. sinensis (QH) had more expanded and fewer contracted gene families than A. sinensis (GS) (Fig.2b).

We identified 3698 genes as members of significantly expanded gene families (P<0.01) in A. sinensis (QH) and mapped them to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for functional enrichment analysis. We detected 33 significantly enriched pathways (P<0.05), and the top enriched metabolic pathways included Glycosphingolipid biosynthesis, Zeatin biosynthesis, Benzoxazinoid biosynthesis, Oxidative phosphorylation, Sesquiterpenoid and triterpenoid biosynthesis, Biosynthesis of unsaturated fatty acids, Selenocompound metabolism, and Indole alkaloid biosynthesis (Fig.2c and TableS9). Some of the enriched KEGG pathways were involved in plant volatile biosynthesis, such as Sesquiterpenoid and triterpenoid biosynthesis and Phenylpropanoid biosynthesis, which suggested that these genes may contribute to the adaptive phenotypic diversification of A. sinensis species.

Whole-genome duplications (WGDs) are widely recognized as a major source of species diversification in many eukaryotic lineages based on various lines of evidence31. To identify potential WGD events, we calculated the nucleotide divergence at fourfold synonymous third-codon transversion positions (4dTv) and the synonymous substitution rates (Ks) for collinear gene pairs within each species. In addition to the five members of the Apiaceae family, namely, D. carota, A. graveolens, C. sativum, A. sinensis (GS), and A. sinensis (QH), we also included the model plant V. vinifera in our study.

The intragenomic paralogous genes of the Apiaceae species exhibit three distinct peaks in their 4dTv distributions (Fig.2d). The last peak (), shared with V. vinifera, signifies an ancient Whole Genome Triplication (WGT) event common to all eudicot plants. The first two peaks indicate two recent lineage-specific Whole Genome Duplication (WGD) events that took place prior to the divergence of the family members within the Apiaceae family. This observation aligns with a previous study which suggested that A. sinensis has undergone three polyploidy events13. By comparing the peak positions across species, we inferred a sequence of WGD events: A. sinensis experienced the most recent event, followed by C. sativum and then A. graveolens. This sequence corroborates our phylogenetic tree and divergence time estimates, thereby enhancing the consistency of our findings.

Ks values of homologous genes from different genomes can be used to estimate the time of species divergence32. In this study, we compared the Ks peak values within each species and identified two distinct peaks at Ks 0.5 and 1.0, corresponding to two WGD events (Fig.2e). The peak positions of A. sinensis (QH) and A. sinensis (GS) were nearly identical (see TableS10 for complete peak values), suggesting similar evolutionary histories for these two varieties. However, the peak at around 1.7 is not evident, likely due to the loss or divergence of ancient duplicate genes following the earliest WGD event. The order of the peak values aligned with the phylogenetic relationships of carrot, celery, coriander, and Angelica. This implied that the order of WGD events happened in these species was carrot, celery, coriander, and Angelica which was also consistent with the previous 4dTV analysis.

A total of 41,040 high-confidence genes were predicted, which is 2,163 fewer than the published genome annotation of 43,202 genes. To evaluate the integrity of the gene set, both gene sets were first compared using the same BUSCO version and parameters. A proportion of complete genes of 96.41% was found in A. sinensis (QH), while A. sinensis (GS) had only 88.10%. Second, common databases, including the InterproScan33, Gene Ontology (GO)34, KEGG35, SwissProt36, TrEMBL, KOG, and nonredundant protein NCBI databases, were used to functionally annotate these two gene sets. Approximately 95.76% of the genes were annotated in A. sinensis (QH), while only 90.38% were annotated in A. sinensis (GS). Third, OrthoFinder (v2.5.4)37 was used to cluster these two gene sets for further analysis. The percentage of genes in orthologous groups was 94.9% in A. sinensis (QH), while it was only 82.6% in A. sinensis (GS). The species-specific gene number was 2,111 in A. sinensis (QH) and 7,496 in A. sinensis (GS). In summary, we provided a better reference gene annotation for A. sinensis species.

The genomic differences between A. sinensis (QH) and A. sinensis (GS) were investigated. Highly collinear relationships were evident between these two genomes (Fig.3a, b). A large inversion was also observed along homologous chromosomes Chr09 (A. sinensis (QH)) and chr04 (A. sinensis (GS)), which is highlighted by a red arrow in Fig.3a and a red square in Fig.4b. Good collinearity was found in this region between A. sinensis (QH) and A. graveolens, suggesting that A. sinensis (GS) had an assembly error in this region or that this is an inherent feature of the A. sinensis (GS) genome. Relatively good collinearity was observed at the genome level between A. sinensis and A. graveolens. Furthermore, reciprocal translocations were observed along chromosomes 05 and 07 in A. sinensis (QH), as well as along chromosomes 09, 11, and 10 in A. graveolens (Fig.3b). This phenomenon was consistent between A. sinensis (GS) and A. graveolens, further confirming the occurrence of translocations between these chromosomes. The collinearities between A. sinensis (QH) and other species in Apiaceae are displayed in Fig.S3.

a Macrosynteny between A. sinensis (QH) and A. sinensis (GS) was verified using MUMmer98 (version 4.0). Each dot represents a homologous block. Blue and green colors indicate different orientations of the sequences, while the red arrow refers to intrachromosomal inversions. The plot was generated using Dot (https://dot.sandbox.bio/). b Genome collinearity analysis among A. sinensis (QH), A. sinensis (GS), and A. graveolens. MCScanX86 was used to identify collinear gene blocks among these three genomes. The red square highlights intrachromosomal inversions between A. sinensis (QH) and A. sinensis (GS). The color of linking lines indicates the number of one-to-one gene pairs in the collinearity blocks: orange (40), green (20), and gray (5). c The genome distribution of genes with strong functional effects between A. sinensis (QH) and A. sinensis (GS). d KEGG pathway enrichment analysis of genes with strong functional effects.

a Changes in metabolites between NG and EF samples. The horizontal axis shows log2-fold changes, and the vertical axis shows log2 absolute content changes. The dot colors represent the different compound classes. Numbers in brackets indicate the number of compounds upregulated in NG and EF samples. b Heatmap of the contents of metabolites Coumarins and lignans and Terpenoids and phthalides with different contents between the NG and EF groups. The data were normalized by the Z score in rows. The red and blue arrows indicate the upregulated and downregulated metabolites, respectively (VIP1 and LOG2 (fold change) 1 or 1). c Heatmap showing differential gene expression related to coumarin, lignan and lignin biosynthesis between NG and EF samples in Angelica roots. The red and blue arrows indicate the upregulated and downregulated genes (LOG2 (fold change) 1 or 1 and p 0.05), respectively. Only the genes with FPKM5 in at least one sample are shown.

A total of 1.227 million SNPs and 242,250 Indels were detected in syntenic blocks between the two A. sinensis genomes. The distributions of SNPs and indels were similar but uneven across the whole genome (Fig.1c). Most of the genetic variations were located in the intergenic regions. Of these, 38,862 SNPs and 8887 indels were located in the coding regions, affecting 9,547 and 5,125 genes, respectively. Within coding regions, 909 genetic variations (affecting 686 genes) were annotated as having a strong effect on gene function, with frameshifts or changes at the start or stop codon (Supplementary Data1). These genes were not evenly distributed across the whole genome (Fig.3c) and enriched in the KEGG pathways of biosynthesis of various secondary metabolites, such as Indole alkaloid, Betalain, Isoquinoline alkaloid and Sesquiterpenoid, and triterpenoid biosynthesis (Fig.3d). The numbers of SNPs and indels were higher on chromosomes 10 and 11 than those on other chromosomes (Fig.1c and TableS11).

To understand the biosynthesis of various bioactive components in Angelica roots, we conducted nontargeted metabolomics profiling on normally growing and early-flowering Angelica roots. More than 716 high-confidence metabolites were detected and identified, including 39 flavonoids, 12 terpenoids, 47 alkaloids, 74 phenolic acids, 10 phthalides, 31 coumarins, and 24 lignans (Supplementary Data2), of which 299 compounds were determined as differential metabolites using univariate and multivariate statistical methods with the parameters of FC2 or 0.5 and VIP (variable importance in projection) 1, including 145 upregulated and 154 downregulated metabolites.

The class of metabolites appeared to have completely different metabolic patterns in the Angelica roots between NG (normal growth) and EF (early flowering and bolting) samples. The Angelica roots in NG samples were rich in organic acids, amino acids and derivatives, saccharides and alcohols, and nucleotides and derivatives, while the Angelica roots in EF samples were rich in phenolic acids, LPC, LPE, coumarins, lignans, flavonols, and flavonoids (Fig.4a). In particular, the differential production of these bioactive compounds in NG and EF Angelica roots showed that some phthalides and coumarins were more highly accumulated in NG roots than in EF roots, whereas most lignans accumulated at higher levels in EF roots than in NG roots (Fig.4b). It demonstrated the higher medicinal value of NG roots than EF roots since these phthalides and coumarins displayed more important bioactivities in experimental and clinical studies.

Transcriptome analyses of these Angelica roots under different developmental conditions also unveiled the differentially expressed metabolic genes in their biosynthesis pathways in line with metabolomics data (Fig.4c). The metabolic genes putatively involved in the biosynthesis of lignans and coumarins, both of which are derived from the phenylpropanoid pathway that often leads to the biosynthesis of well-known lignin and flavonoids, were upregulated in EF roots compared with NG roots (Fig.4c). In contrast, most genes putatively involved in phthalide and coumarin biosynthesis were expressed at higher levels in NG roots than in EF roots, consistent with their higher pharmaceutical values (Fig.4c).

Although the common shared metabolic enzymes and pathways involving lignin, coumarins, lignans, and flavonoids are well known, the specific genes/enzymes involved in the production of many coumarins and lignans are poorly understood13,38,39. This new Angelica genome assembly provided more than 100 metabolic genes that encode all known enzyme homologs involved in the biosynthesis of coumarins and lignans (Supplementary Data3). The phenylpropanoid pathway genes, including phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumaroyl-CoA ligase (4CL), hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase (HCT), caffeic acid O-methyltransferase (COMT), caffeoyl-CoA O-methyltransferase (CCoAOMT), etc., contributing to lignin biosynthesis via HCT and CCR genes, via dirigent protein (DIR), or via flavonoid synthesis by CHS and for coumarin biosynthesis from different products of 4CL by cinnamic acid 2-hydroxylase (C2H), p-coumarate 3-hydroxylase (C3H) with HCT, or feruloyl-CoA hydroxylase (F6H), were all assembled and annotated in our genome to provide insights on the biosynthesis of various pharmaceutically important products (Fig.5a). Lignans have unique antitumor activities and reduce lifestyle-related diseases40. Lignans were also enriched in Angelica roots, particularly of EF status, in which a subset of biosynthesis genes and contents of lignans and derivatives were upregulated, including dirigent protein (DIR), pinoresinol-lariciresinol reductase (PLR), and secoisolariciresinol dehydrogenase (SIRD) for the biosynthesis of pinoresinol and lariciresinol, secoisolariciresinol, and matairesinol aglycones and their glycosides as products of UGT71/74 glycosyltransferses40 (Fig.5a).

a Putative biosynthesis pathways of coumarins, lignin, lignans and flavonoids. The numbers in parentheses indicate the number of genes. Different background colors represent the synthetic pathways of different products. The PT genes are highlighted in red. The genes in different gene families are listed in Supplementary Data3. b Rootless phylogenetic tree of PT genes. The tree shows the grouping of PT genes according to the type of substrate (ah). The orthologous genes in A. sinensis (QH) and A. sinensis (GS) are highlighted. The genes in the c and d subtrees had relatively high expression levels.

Prenyltransferase (PT) catalyzes the prenylation of umbelliferone into linear or/and angular furanocoumarin biosynthesis34,35. PTs are involved in the biosynthesis of chlorophyll, vitamin E, heme, phylloquinone, and various secondary metabolites by prenyl modifications of chlorophyllide a/b, vitamin E, heme B, and many metabolites, such as 1,4-dihydroxyl-2-napthoic acid, p-hydroxylbenzoic acid, flavonoids, phloroglucinol, homogentisate, and coumarins, with different prenyl donors, such as isoprenyl diphosphate, dimethylallyl diphosphate, and geranyl diphosphate (Fig.5b). Despite the divergent functions of these PTs, they involved in coumarin biosynthesis that evolved most likely via convergent evolution since coumarins mainly occur in a few unrelated plant families, such as Fabaceae, Moraceae, Apiaceae and Rutaceae34,35. This finding is also supported by a previous study19, which showed independent evolution of coumarin biosynthesis-related PTs in these families. Furthermore, these PTs that catalyze both linear (demethylsuberosin, e.g., PsPT1 and PcPT1) and angular (osthenol, e.g., PsPT2) furanocoumarin biosynthesis are clustered together in one clade for Apiaceae species (Fig.5b), likely resulting from gene duplications followed by neofunctionalization and positive selection38,41.

As two major pharmaceutically important components in Angelica roots, ligustilide and butylidenephthalide are generally regarded as essential contributors to the main medical functions of Angelica roots42,43,44,45. However, their biosynthesis pathways remain elusive. The oxidation or transfer of isoprenoids or condensation of malonyl CoAs with other acyl CoAs by type III polyketide synthases (PKSs) or their combinations could be involved in the biosynthesis of these phthalides46,47. We therefore examined the A. sinensis genome together with transcriptome and metabolite profiling for the biosynthesis of ligustilide and butylidenephthalide and other monoterpene volatiles that contribute to the medicinal functions of Angelica roots.

To more clearly profile bioactive components in Angelica roots, volatile terpenoids, and phthalides were examined by using headspace solid-phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS). The volatiles of early-flowering (EF) and normally growing (NG) roots showed notable differences. In addition to the higher levels (~47% of total volatiles) of Z-ligustilide and Z-butylidenephthalide and their E- type isomers as major components in NG roots, the EF roots of A. sinensis also contained fewer phthalides (34% of total volatiles), as well as much less abundant monoterpenes, such as -pinene and E--farnesene, (Figs.6a, b). These data indicated that early bolting and flowering also negatively impacted volatile accumulation in Angelica roots.

a Headspace solid-phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS) analysis of the contents and composition of volatiles in Angelica roots from early-flowering (EF) and normally growing (NG) plants. b Differential content analysis of the volatiles in Angelica roots between EF and NG plants. c Enzymatic reactions in the mevalonate (MVA) and methylerythritol phosphate (MEP) pathways in plants and synthesis of short-chain prenyl diphosphates. The MVA pathway is shown in light red; the MEP pathway is shown in light green. Abbreviations and full names are given in TableS16. Data are expressed as the meansSDs from at least three independent experiments with triplicates. Differences between NG and EF samples are considered significant when **P<0.01 and *P<0.05 in Students t test.

Genome analyses revealed that three key gene families involved in the MEP pathway toward monoterpene synthesis, MCT, HDS, and HDR, were expanded in the A. sinensis genome in comparison with the Arabidopsis and grapevine genomes (Supplementary Data4). A. sinensis genome sequences revealed an extremely enhanced monoterpene pathway during the evolution of several genera in the Apiaceae family (Supplementary Data4), which is consistent with the diverse and enriched monoterpene volatile profiles in these plants (Fig.6a).

Transcriptome data showed that genes involved in glycolysis and the pentose phosphate pathway were downregulated in EF Angelica roots, which also negatively affected the mavalonic pathway (MVP) and 2-C-methyl-erythrose 4-phosphate (MEP) pathway, leading to the biosynthesis of mono-and sesquiterpenoids (Fig.6c). The DXS, MDS, CMK, and HDR genes involved in the plastic MEP pathway, one IPPI and two GPPS genes for monoterpenoid biosynthesis were significantly downregulated in EF Angelica roots compared with NG Angelica roots (Fig.6c).

A. sinensis is a triennial medicinal plant that typically flowers in its third year but can flower early in May of its second year (Fig.7a). As Angelica roots contain a wide range of terpenoid volatiles at abundant levels, they are also regarded as major components contributing to clinical functions48. Terpenoid synthase family genes play key catalytic roles in plant terpenoid biosynthesis. A total of 28 putative TPS genes in the A. sinensis genome belonging to five TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g) were identified (Fig.7b). The TPS-b family was expanded in both A. sinensis (15) and C. sativum (20), and the expansion of TPS-b genes in the A. sinensis genome was mainly due to tandem duplication (Ks<0.1). There were 5 more TPS genes in A. sinensis (QH) than in A. sinensis (GS), which indicated that the completion of A. sinensis (QH) was better than that of A. sinensis (GS). We detected 8 TPS genes that were expressed in Angelica roots (FPKM1 at any samples), and most of them had higher expression levels in NG roots than in EF roots (Fig.7b).

a Plants were sown simultaneously and grown in the same environment. Samples were taken at the same time for observation and analysis. EF early flowering, NG normal growth. We highlight the highly lignified Angelica root of the EF plant and the normally developed storage root of the NG plant on the right side. b Five TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g) were clearly identified. The genes from A. sinensis (QH) and A. sinensis (GS) are highlighted by red and green dots, respectively. The heatmap of gene expression is illustrated.

To further verify the possibility that PKSs are involved in the biosynthesis of the polyketide derivatives ligustilide and butylidenephthalide in A. sinensis, we analyzed genes that are involved in the biosynthesis of acetyl-CoA and malonyl CoA, which are used as substrates for type II and III PKSs for the production of polyketides (Fig.8a)46,47. Acetyl-CoA carboxylase (ACC) is the main enzyme catalyzing the conversion of glycolysis pathway-derived acetyl-CoA into malonyl CoA, which is a key intermediate for fatty acid, polyketide, and flavonoid biosynthesis47. Plant ACC is composed of two subunits, the biotin carboxylase and carboxyl transferase subunits47. The coding genes for two ACC subunits, BCCP2 (CAC1) (4) and CAC2-CAC3 (5), were expanded in the A. sinensis genome in comparison with the Arabidopsis and grapevine genomes, respectively (TableS12). Consistent with lower Z-ligustilide and Z-butylidenephthalide levels in EF Angelica roots, at least two ACC subunit genes were downregulated in EF roots compared with NG roots (Fig.8b).

a The malonyl-CoA biosynthesis metabolic pathway. b Heatmap displaying the expression of typical ACC genes in Angelica roots between EF and NG plants. c The overall expression (FPKM) of ACC and PKS genes in Angelica roots between EF and NG plants. d Phylogeny of polyketide synthase genes (PKSs). The heatmap displays the gene expression in Angelica roots between EF and NG plants. The color of gene IDs shows the source of different species: red: A. sinensis; blue: A. thaliana; black: seed sequences. The red stars highlight the upregulated genes, and the blue stars highlight the downregulated genes.

PKS consists of a large gene family encoding multifunctional enzymes that catalyze condensation of malonyl CoAs or malonyl CoA with other acyl CoAs to generate diverse polyketides46,47. In particular, type III PKS (TKS) catalyzes linear tetraketide-CoA synthesis with hexanoyl-CoA and malonyl CoA and might provide a backbone for Z-ligustilide and Z-butylidenephthalide biosynthesis49. A previous study showed that a TKS olivetolic acid cyclase (OAC) catalyzed a C2C7 intramolecular aldol condensation with carboxylate retention in the linear tetraketide-CoA to form olivetolic acid in Cannabis sativa49. OAC was structurally similar to Z-ligustilide and Z-butylidenephthalide, with only differences in the position of the olefinic link and hydroxyl group49. A multifunctional protein (MFP) could handle the switch of olefinic links and hydroxyl groups in the lipid metabolism process50. It has thus been proposed that Z-ligustilide and Z-butylidenephthalide are synthesized via a similar mechanism through the PKS pathway, although the exact enzyme or gene responsible for their biosynthesis remains unknown. In the A. sinensis genome, PKSs also formed a large gene family of 120 members, among which the type III PKS genes are expanded (TableS13 and Fig.8d).

Transcriptome analyses showed that four PKS genes, namely, As05G08873, As11G04238, As10G03800, and As08G02849, were highly expressed in Angelica roots (Fig.S4), and in particular, we also found that some of the PKS genes were repressed in EF Angelica roots as compared with NG roots (Fig.8d), indicating that these PKSs might be involved in the biosynthesis of phthalides. The overall expression of ACC and PKS genes in Angelica roots was lower in EF plants (Fig.8c). Further studies with isotope-labeled substrates in tracer experiments, together with enzyme and molecular approaches, are needed to unveil the mechanism underlying the biosynthesis of Z-ligustilide and Z-butylidenephthalide in A. sinensis.

See more here:
Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of ... - Nature.com

Related Posts