Category Archives: Transhuman News

"Ground-Breaking" Release of World’s Largest Whole Genome Resource – Inside Precision Medicine

Posted: November 30, 2023 at 8:35 pm

Entire genome sequences for nearly half a million people have been released by the UK Biobank, representing the largest dataset of its kind in the world.

The resource has the potential to offer new insights into the causes of major common diseases and guide the choice of potential therapeutic targets.

It has hailed as a step change in genomics and is available to approved researchers around the world through the UK Biobank Research Analysis Platform.

This is a veritable treasure trove for approved scientists undertaking health research, and I expect it to have transformative results for diagnoses, treatments and cures around the globe, said UK Biobank principal investigator Sir Rory Collins, PhD.

Executive vice president for innovative medicine research and development at industry partner Johnson & Johnson John Reed, PhD, maintained the findings could pave the way for more efficient clinical development and drive progress towards precision medicine.

This landmark dataset will enable us to leverage the power of artificial intelligence and machine learning for rapidly identifying novel disease targets and helping researchers predict how a candidate medicine might impact certain subpopulations of patients, based on their genetics, he said.

The UK Biobank whole genome sequencing (WGS) consortium was formed in 2018 with the goal of sequencing the genomes of all UK biobank participants.

The five-year project cost 200m, involved 11 partners and took 350,000 hours of sequencing time to create 27.5 petabytes of genetic data. At its peak, over 20,000 whole genomes, each with around three billion base pairs of DNA, were being sequenced each month. It resulted in the genomes of 491,554 UK Biobank volunteers being sequenced overall.

Half the funding came from the U.K. government and the Wellcome research organisation. The remaining 100 million was given by the biopharmaceutical and healthcare companies Amgen, AstraZeneca, GlaxoSmithKline, and Johnson & Johnson.

In return for their 25m investment, each of the four companies received a nine-month head start with the data before its public release.

The large-scale biomedical database and research UK Biobank resource follows the health of half a million volunteers recruited in 2006 and has already provided numerous clinical insights.

Data collected on over 10,000 variables, including blood pressure, cognitive function, diet and bone density, have been studied to examine why having the same genetic predisposition for a disease can result in different outcomes, reactions and side-effects to identical treatments.

It has led to thousands of scientific studies being published, and major insights such as the discovery that Type 1 diabetes is as common in adults as children.

Executive vice president of research and development at Amgen David Rees, PhD, said: This ground-breaking dataset allows scientists to explore how genetics affect levels of proteins, metabolites and other physiological factors, more closely than ever before, promising to accelerate our understanding of the genetic underpinnings of disease.

Chief executive of UK Research and Innovation (UKRI) professor Dame Ottoline Leyser, PhD, noted: Researchers can now apply to access de-identified full genome data from half a million participants, alongside a rich combination of medical, biochemical, lifestyle and environmental data from volunteers involved.

Today marks an important milestone in UKRIs commitment to realise the potential of genetics for biomedical research, innovation and translation to the clinic.

Follow this link:
"Ground-Breaking" Release of World's Largest Whole Genome Resource - Inside Precision Medicine

Posted in Genome | Comments Off

Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet – Nature.com

Posted: at 8:35 pm

Lemmon, Z. H. et al. Rapid improvement of domestication traits in an orphan crop by genome editing. Nat. Plants 4, 766770 (2018).

Article CAS PubMed Google Scholar

Ye, C. Y. & Fan, L. Orphan crops and their wild relatives in the genomic era. Mol. Plant 14, 2739 (2021).

Article CAS PubMed Google Scholar

Cullis, C. & Kunert, K. J. Unlocking the potential of orphan legumes. J. Exp. Bot. 68, 18951903 (2017).

CAS PubMed Google Scholar

Tadele, Z. Orphan crops: their importance and the urgency of improvement. Planta 250, 677694 (2019).

Article CAS PubMed Google Scholar

Chiurugwi, T., Kemp, S., Powell, W. & Hickey, L. T. Speed breeding orphan crops. Theor. Appl. Genet. 132, 607616 (2019).

Article PubMed Google Scholar

Shi, J. et al. Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat. Commun. 10, 464 (2019).

Article CAS PubMed PubMed Central Google Scholar

Zou, C. et al. The genome of broomcorn millet. Nat. Commun. 10, 436 (2019).

Article CAS PubMed PubMed Central Google Scholar

Leipe, C., Long, T., Sergusheva, E. A., Wagner, M. & Tarasov, P. E. Discontinuous spread of millet agriculture in eastern Asia and prehistoric population dynamics. Sci. Adv. 5, eaax6225 (2019).

Article CAS PubMed PubMed Central Google Scholar

Lu, H. et al. Earliest domestication of common millet (Panicum miliaceum) in East Asia extended to 10,000 years ago. Proc. Natl Acad. Sci. USA 106, 73677372 (2009).

Article CAS PubMed PubMed Central Google Scholar

Wang, C.-C. et al. Genomic insights into the formation of human populations in East Asia. Nature 591, 413419 (2021).

Article CAS PubMed PubMed Central Google Scholar

Dal Corso, M. et al. Between cereal agriculture and animal husbandry: millet in the early economy of the North Pontic region. J. World Prehist. 35, 321374 (2022).

Article Google Scholar

Filipovi, D. et al. New AMS 14C dates track the arrival and spread of broomcorn millet cultivation and agricultural change in prehistoric Europe. Sci. Rep. 10, 13698 (2020).

Article PubMed PubMed Central Google Scholar

Martin, L. et al. The place of millet in food globalization during Late Prehistory as evidenced by new bioarchaeological data from the Caucasus. Sci. Rep. 11, 13124 (2021).

Article CAS PubMed PubMed Central Google Scholar

Santra, D. K., Khound, R. & Das, S. Proso Millet (Panicum miliaceum L.) Breeding: Progress, Challenges and Opportunities (Springer, 2019).

Singh, M. & Sood, S. Millets and Pseudo Cereals: Genetic Resources and Breeding Advancements (Woodhead Publishing, 2020).

United States Department of Agriculture (USDA) & National Agricultural Statistics Service. 2021 Crop Production (USDA, 2022).

Habiyaremye, C. et al. Proso millet (Panicum miliaceum L.) and its potential for cultivation in the Pacific Northwest, U.S.: a review. Front. Plant Sci. 7, 1961 (2017).

Article PubMed PubMed Central Google Scholar

Xu, Y. et al. Domestication and spread of broomcorn millet (Panicum miliaceum L.) revealed by phylogeography of cultivated and weedy populations. Agronomy 9, 835 (2019).

Article CAS Google Scholar

Hunt, H. V. et al. Genetic diversity and phylogeography of broomcorn millet (Panicum miliaceum L.) across Eurasia. Mol. Ecol. 20, 47564771 (2011).

Article PubMed PubMed Central Google Scholar

Boukail, S. et al. Genome wide association study of agronomic and seed traits in a world collection of proso millet (Panicum miliaceum L.). BMC Plant Biol. 21, 330 (2021).

Article CAS PubMed PubMed Central Google Scholar

Li, C. et al. Genetic divergence and population structure in weedy and cultivated broomcorn millets (Panicum miliaceum L.) revealed by specific-locus amplified fragment sequencing (SLAF-Seq). Front. Plant Sci. 12, 688444 (2021).

Article PubMed PubMed Central Google Scholar

Hellmann, I. et al. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals. Genome Res. 18, 10201029 (2008).

Article CAS PubMed PubMed Central Google Scholar

Gore, M. A. et al. A first-generation haplotype map of maize. Science 326, 11151117 (2009).

Article CAS PubMed Google Scholar

Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 16551664 (2009).

Article CAS PubMed PubMed Central Google Scholar

Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945959 (2000).

Article CAS PubMed PubMed Central Google Scholar

Raj, A., Stephens, M. & Pritchard, J. K. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197, 573589 (2014).

Article PubMed PubMed Central Google Scholar

Jombart, T., Devillard, S. & Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 11, 94 (2010).

Article PubMed PubMed Central Google Scholar

Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497501 (2012).

Article CAS PubMed PubMed Central Google Scholar

Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408414 (2015).

Article CAS PubMed Google Scholar

Stevens, C. J., Shelach-Lavi, G., Zhang, H., Teng, M. & Fuller, D. Q. A model for the domestication of Panicum miliaceum (common, proso or broomcorn millet) in China. Veg. Hist. Archaeobot. 30, 2133 (2021).

Article Google Scholar

Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170175 (2021).

Article CAS PubMed PubMed Central Google Scholar

Manni, M., Berkeley, M. R., Seppey, M., Simo, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 46474654 (2021).

Article CAS PubMed PubMed Central Google Scholar

Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

PubMed PubMed Central Google Scholar

Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).

Article PubMed PubMed Central Google Scholar

Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162176 (2020).

Article CAS PubMed Google Scholar

Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655662 (2021).

Article CAS PubMed PubMed Central Google Scholar

Qin, P. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell 184, 35423558 (2021).

Article CAS PubMed Google Scholar

Kou, Y. et al. Evolutionary genomics of structural variation in Asian rice (Oryza sativa) domestication. Mol. Biol. Evol. 37, 35073524 (2020).

Article CAS PubMed PubMed Central Google Scholar

Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875879 (2018).

Article CAS PubMed PubMed Central Google Scholar

Tang, D., Ade, J., Frye, C. A. & Innes, R. W. Regulation of plant defense responses in Arabidopsis by EDR2, a PH and START domain-containing protein. Plant J. 44, 245257 (2005).

Article CAS PubMed PubMed Central Google Scholar

Chen, H., Patterson, N. & Reich, D. Population differentiation as a test for selective sweeps. Genome Res. 20, 393402 (2010).

Article CAS PubMed PubMed Central Google Scholar

Sun, Y. et al. Biased mutations and gene losses underlying diploidization of the tetraploid broomcorn millet genome. Plant J. 113, 787801 (2023).

Article CAS PubMed Google Scholar

Tamaki, S., Matsuo, S., Wong, H. L., Yokoi, S. & Shimamoto, K. Hd3a protein is a mobile flowering signal in rice. Science 316, 10331036 (2007).

Article CAS PubMed Google Scholar

Li, P. et al. LAZY1 controls rice shoot gravitropism through regulating polar auxin transport. Cell Res. 17, 402410 (2007).

Article CAS PubMed Google Scholar

Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527534 (2022).

Article CAS PubMed PubMed Central Google Scholar

Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop domestication. Cell 127, 13091321 (2006).

Article CAS PubMed Google Scholar

Lin, Z. et al. Parallel domestication of the Shattering1 genes in cereals. Nat. Genet. 44, 720724 (2012).

Article CAS PubMed PubMed Central Google Scholar

Yoon, J., Cho, L.-H., Antt, H. W., Koh, H.-J. & An, G. KNOX protein OSH15 induces grain shattering by repressing lignin biosynthesis genes. Plant Physiol. 174, 312325 (2017).

Article CAS PubMed PubMed Central Google Scholar

Jiang, L. et al. The APETALA2-like transcription factor SUPERNUMERARY BRACT controls rice seed shattering and seed size. Plant Cell 31, 1736 (2019).

Article CAS PubMed PubMed Central Google Scholar

Niederhuth, C. E., Cho, S. K., Seitz, K. & Walker, J. C. Letting go is never easy: abscission and receptor-like protein kinases. J. Integr. Plant Biol. 55, 12511263 (2013).

Article CAS PubMed Google Scholar

Roongsattham, P. et al. Cellular and pectin dynamics during abscission zone development and ripe fruit abscission of the monocot oil palm. Front. Plant Sci. 7, 540 (2016).

Article PubMed PubMed Central Google Scholar

Sweeney, M. T. et al. Global dissemination of a single mutation conferring white pericarp in rice. PLoS Genet. 3, e133 (2007).

Posted in Genome | Comments Off

Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome … – Nature.com

Posted: at 8:35 pm

Loftus, R. T., MacHugh, D. E., Bradley, D. G., Sharp, P. M. & Cunningham, P. Evidence for two independent domestications of cattle. Proc. Natl Acad. Sci. USA 91, 27572761 (1994).

Article ADS CAS PubMed PubMed Central Google Scholar

Verdugo Marta, P. et al. Ancient cattle genomics, origins, and rapid turnover in the Fertile Crescent. Science 365, 173176 (2019).

Article ADS PubMed Google Scholar

Utsunomiya, Y. T. et al. Genomic clues of the evolutionary history of Bos indicus cattle. Anim. Genet. 50, 557568 (2019).

Article CAS PubMed Google Scholar

Thornton, P., Nelson, G., Mayberry, D. & Herrero, M. Impacts of heat stress on global cattle production during the 21st century: a modelling study. Lancet Planet. Health 6, e192e201 (2022).

Article PubMed Google Scholar

Kim, K. et al. The mosaic genome of indigenous African cattle as a unique genetic resource for African pastoralism. Nat. Genet. 52, 10991110 (2020).

Article PubMed Google Scholar

Chen, S. et al. Zebu cattle are an exclusive legacy of the South Asia Neolithic. Mol. Biol. Evol. 27, 16 (2010).

Article PubMed Google Scholar

Papachristou, D. et al. Genomic diversity and population structure of the indigenous Greek and Cypriot cattle populations. Genet. Sel. Evol. 52, 43 (2020).

Article CAS PubMed PubMed Central Google Scholar

Felius, M. et al. On the history of cattle genetic resources. Diversity 6, 705750 (2014).

Article Google Scholar

Chen, N. et al. Whole-genome resequencing reveals world-wide ancestry and adaptive introgression events of domesticated cattle in East Asia. Nat. Commun. 9, 2337 (2018).

Article ADS PubMed PubMed Central Google Scholar

Wu, D.-D. et al. Pervasive introgression facilitated domestication and adaptation in the Bos species complex. Nat. Ecol. Evol. 2, 11391145 (2018).

Article PubMed Google Scholar

Medugorac, I. et al. Whole-genome analysis of introgressive hybridization and characterization of the bovine legacy of Mongolian yaks. Nat. Genet. 49, 470475 (2017).

Article CAS PubMed Google Scholar

Sinding, M.-H. S. et al. Kouprey (Bos sauveli) genomes unveil polytomic origin of wild Asian Bos. iScience 24, 103226 (2021).

Article ADS CAS PubMed PubMed Central Google Scholar

Lenstra, J. A. et al. Meta-analysis of mitochondrial DNA reveals several population bottlenecks during worldwide migrations of cattle. Diversity 6, 178187 (2014).

Article Google Scholar

Li, Y. et al. Whole-genome sequencing reveals selection signals among Chinese, Pakistani, and Nepalese goats. J. Genet. Genomics 50, 362365 (2023).

Article PubMed Google Scholar

Dixit, Y., Hodell, D. A. & Petrie, C. A. Abrupt weakening of the summer monsoon in northwest India ~4100 yr ago. Geology 42, 339342 (2014).

Article ADS CAS Google Scholar

Ali, N. S., Sartori-Valinotti, J. C. & Bruce, A. J. Periodic fever, aphthous stomatitis, pharyngitis, and adenitis (PFAPA) syndrome. Clin. Dermatol. 34, 482486 (2016).

Article PubMed Google Scholar

Duchesne, A. et al. Progressive ataxia of Charolais cattle highlights a role of KIF1C in sustainable myelination. PLoS Genet. 14, e1007550 (2018).

Article PubMed PubMed Central Google Scholar

Miyajima, D. et al. Profilin1 regulates sternum development and endochondral bone formation. J. Biol. Chem. 287, 3354533553 (2012).

Article CAS PubMed PubMed Central Google Scholar

Song, K. et al. The transcriptional coactivator CAMTA2 stimulates cardiac growth by opposing class II histone deacetylases. Cell 125, 453466 (2006).

Article CAS PubMed Google Scholar

Fougerousse, F. et al. The muscle-specific enolase is an early marker of human myogenesis. J. Muscle Res. Cell Motil. 22, 535544 (2001).

Article CAS PubMed Google Scholar

Kazantseva, A. et al. Human hair growth deficiency is linked to a genetic defect in the phospholipase gene LIPH. Science 314, 982985 (2006).

Article ADS CAS PubMed Google Scholar

Jirimutu et al. Genome sequences of wild and domestic bactrian camels. Nat. Commun. 3, 1202 (2012).

Article ADS CAS PubMed Google Scholar

Tian, S. et al. Genomic analyses reveal genetic adaptations to tropical climates in chickens. iScience 23, 101644 (2020).

Article ADS CAS PubMed PubMed Central Google Scholar

Yang, J. et al. Whole-genome sequencing of native sheep provides insights into rapid adaptations to extreme environments. Mol. Biol. Evol. 33, 25762592 (2016).

Article CAS PubMed PubMed Central Google Scholar

Deng, C., Chen, H., Yang, N., Feng, Y. & Hsueh, A. J. W. Apela regulates fluid homeostasis by binding to the APJ receptor to activate Gi. Signal. J. Biol. Chem. 290, 1826118268 (2015).

Article CAS PubMed Google Scholar

Jin, H., Fishman, Z. H., Ye, M., Wang, L. & Zuker, C. S. Top-down control of sweet and bitter taste in the mammalian brain. Cell 184, 257271.e16 (2021).

Article CAS PubMed Google Scholar

Zhang, K., Lenstra, J. A., Zhang, S., Liu, W. & Liu, J. Evolution and domestication of the Bovini species. Anim. Genet. 51, 637657 (2020).

Article CAS PubMed Google Scholar

Robinson, T. P. et al. Global Livestock Production Systems (Food and Agriculture Organization of the United Nations (FAO) and International Livestock Research Institute (ILRI), 2011).

Li, X., Shen, J. & Ran, Z. Crosstalk between the gut and the liver via susceptibility loci: novel advances in inflammatory bowel disease and autoimmune liver disease. Clin. Immunol. 175, 115123 (2017).

Article CAS PubMed Google Scholar

Dai, W. et al. Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma. Proc. Natl Acad. Sci. USA 113, 33173322 (2016).

Article ADS CAS PubMed PubMed Central Google Scholar

Wang, F. et al. Genetic variation in Mon1a affects protein trafficking and modifies macrophage iron loading in mice. Nat. Genet. 39, 10251032 (2007).

Article CAS PubMed Google Scholar

Tomizawa, Y. et al. Inhibition of lung cancer cell growth and induction of apoptosis after reexpression of 3p21.3 candidate tumor suppressor gene SEMA3B. Proc. Natl Acad. Sci. USA 98, 1395413959 (2001).

Article ADS CAS PubMed PubMed Central Google Scholar

Raymond, J. R. Jr., Appleton, K. M., Pierce, J. Y. & Peterson, Y. K. Suppression of GNAI2 message in ovarian cancer. J. Ovarian Res. 7, 66 (2014).

Article PubMed PubMed Central Google Scholar

Potiron, V. A. et al. Semaphorin SEMA3F affects multiple signaling pathways in lung cancer cells. Cancer Res. 67, 87088715 (2007).

Article CAS PubMed Google Scholar

Bechara, E. G., Sebestyn, E., Bernardis, I., Eyras, E. & Valcrcel, J. RBM5, 6, and 10 differentially regulate NUMB alternative splicing to control cancer cell proliferation. Mol. Cell 52, 720733 (2013).

Article CAS PubMed Google Scholar

Grabek, K. R. et al. Genetic variation drives seasonal onset of hibernation in the 13-lined ground squirrel. Commun. Biol. 2, 478 (2019).

Article PubMed PubMed Central Google Scholar

Chen, N. et al. Ancient genomes reveal tropical bovid species in the Tibetan Plateau contributed to the prevalence of hunting game until the late Neolithic. Proc. Natl Acad. Sci. USA 117, 2815028159 (2020).

Article ADS CAS PubMed PubMed Central Google Scholar

Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

Article CAS PubMed PubMed Central Google Scholar

Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278288 (2013).

Article CAS PubMed PubMed Central Google Scholar

Malinsky, M., Matschiner, M. & Svardal, H. Dsuite-Fast D-statistics and related admixture evidence from VCF files. Mol. Ecol. Resour. 21, 584595 (2021).

Article PubMed Google Scholar

Racimo, F., Marnetto, D. & Huerta-Snchez, E. Signatures of archaic adaptive introgression in present-day human populations. Mol. Biol. Evol. 34, 296317 (2016).

PubMed Central Google Scholar

Gong, Y. et al. ILDR1 is important for paracellular water transport and urine concentration mechanism. Proc. Natl Acad. Sci. USA 114, 52715276 (2017).

Article ADS CAS PubMed PubMed Central Google Scholar

Ling, S. et al. Structural mechanism of cooperative activation of the human calcium-sensing receptor by Ca2+ ions and L-tryptophan. Cell Res. 31, 383394 (2021).

Article CAS PubMed PubMed Central Google Scholar

Vasilopoulos, Y. et al. Association analysis of the skin barrier gene cystatin A at the PSORS5 locus in psoriatic patients: evidence for interaction between PSORS1 and PSORS5. Eur. J. Hum. Genet. 16, 10021009 (2008).

Article CAS PubMed Google Scholar

Kariuki, S. N. & Williams, T. N. Human genetics and malaria resistance. Hum. Genet. 139, 801811 (2020).

Article PubMed PubMed Central Google Scholar

Gaughan, J. B., Sejian, V., Mader, T. L. & Dunshea, F. R. Adaptation strategies: ruminants. Anim. Front. 9, 4753 (2019).

Article PubMed Google Scholar

Brash, D. E. & Haseltine, W. A. UV-induced mutation hotspots occur at DNA damage hotspots. Nature 298, 189192 (1982).

Article ADS CAS PubMed Google Scholar

Vandewauw, I. et al. A TRP channel trio mediates acute noxious heat sensing. Nature 555, 662666 (2018).

Article ADS CAS PubMed Google Scholar

Lindley, E. P. Contagious bovine pleuropneumonia. In Diseases of Cattle in the Tropics: Economic and Zoonotic Relevance (eds Ristic M. & McIntyre W. I. M.) (Springer, 1981).

Van Alfen, N. K. Encyclopedia of Agriculture and Food Systems (Elsevier, 2014).

Brown, C. G. D. Dynamics and impact of tick-borne diseases of cattle. Trop. Anim. Health Prod. 29, 1S3S (1997).

Article CAS PubMed Google Scholar

Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 21142120 (2014).

Read the original:
Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome ... - Nature.com

Posted in Genome | Comments Off

Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China – Virology Journal – Virology Journal

Posted: at 8:35 pm

Viral metagenomic analysis

The number of clean reads was 21,157,543 for the RNA sample and 26,789,502 for the DNA sample. For RNA, the data were assembled to a total sequence length of 2,337,534, with 60.92% GC content. The length of the largest contig was 11,556 nt, which was identified as APPV (Table1), and named as APPV-SDHY-2022 for further analysis in this study. For DNA, the data were assembled with a total sequence length of 38,447,346 and 41.71% GC content. Other viruses, including Getah virus, porcine picobirnavirus, porcine kobuvirus, porcine sapovirus, Po-Circo-like virus, porcine serum-associated circular virus, porcine bocavirus 1, porcine parvovirus 1, porcine parvovirus 5 and porcine circovirus 3 were also identified by sequence alignment ((Table1), however, most contigs of these viruses were less than 500bp (see Additional file 2: Table s2 & Table s3). No other known pathogens (PRRSV, PPV2-4/68, CSFV, PCV2 and Japanese encephalitis virus) related to abortion were sequenced.

APPV presence was confirmed in the pooled sample by RTPCR amplification targeting the NS3 gene (see Additional file 3: Fig.s1A). The assembled sequence of the PCR products was identical to that of APPV-SDHY-2022 (see Additional file 3: Fig.s1B). This provided additional evidence of APPV presence in the abortion cases.

The genome of strain APPV-SDHY-2022 (GenBank accession no. OP381297) contains 11,556 nucleotides (nt) and consists of a 5UTR (370 nt, positions 1 to 370), CDS (10,909 nt, 371 to 11,279), and 3UTR (277 nt, 11,280 to 11,556). The nucleotide and amino acid sequences of the individual proteins of the strains were aligned separately, and the homology between APPV-SDHY-2022 and the reference strains was determined (Table2). Sequence alignment based on APPV polyprotein CDS showed that the nucleotide identities of APPV-SDHY-2022 with Clade I, Clade II, and Clade III strains were 82.6-84.2%, 93.2-93.6%, and 80.7-85%, respectively, while the amino acid identities were 91.4-92.4%, 96.4-97.7%, and 90.6-92.2%, respectively. APPV-SDHY-2022 shared the highest nucleotide identity (93.6%) with APPV-China/GD-SHM/2016, and the highest amino acid identity (97.7%) with GD-YJHSEY2N. Among the 12 mature proteins, NS5A showed the lowest homology (77.6-93.3% at the nt level) with the reference strains.

Phylogenetic analysis was performed based on complete polyprotein CDS and NS5A nucleotide sequences. The results showed that APPV-SDHY-2022 belongs to a separate branch of Clade II (Fig.2A). Moreover, the results revealed that the homology of NS5A nucleotide sequences was above 94.6% for the same isoform, 84.7-94.5% for different isoforms of the same clade and 76.8-81.1% for different clades (Table3). Therefore, we proposed that Clade II strains can be further divided into three subclades and that APPV-SDHY-2022 belongs to subclade 2.3. APPV-China/GD-SD/2016 and APPV-China/GZ01/2016 belong to subclade 2.2, and the other Chinese strains among the Clade II cluster belong to subclade 2.1 (Fig.2B). Since Clade II strains were found only in China, this typing method can help us better analyze the evolution of Clade II strains.

Phylogenetic analysis of Chinese APPV strains. Phylogenetic trees based on the nucleotide sequences of the complete polyprotein CDS (A) and the NS5A gene (B) were constructed by the neighbor-joining (NJ) method with 1,000 bootstrap replicates in MEGA11 software. The APPV-SDHY-2022 strain reported in this study is indicated with a red dot

To further explore the genetic evolution of APPV, potential recombination events were identified using Recombination Detection Program version 4 (RDP4) and then examined using SimPlot version 3.5.1. Among all available APPV strains, 8 strains (GD-DH01-2018, GD-BZ01-2018, JX-JM01-2018A01, GD2, GD-HJ-2017.04, GD-LN-2017.04, GD-CT4, and GD-MH01-2018) had potential genetic recombination events. Although NGS of APPV-SDHY-2022 confirmed recombination events of JX-JM01-2018A01 and GD-HJ-2017.04 by RDP4 (see Additional file 4: Table s4), no obvious genetic recombination in APPV-SDHY-2022 strains was observed by SimPlot software in this study (Fig.3).

Recombination analysis of the complete genomes of the APPV-SDHY-2022 strain from Shandong Province. Potential recombination events were identified using Recombination Detection Program 4 (RDP4) and then examined using similarity plots and bootstrap analysis in Simplot 3.5.1. The major and minor parents were JX-JM01-2018A01 and GD-HJ-2017.04, respectively

Amino acid sequences of individual viral proteins of all the Chinese APPV strains were analyzed. No amino acid insertions or deletions were found in the APPV-SDHY-2022 strain. The amino acid sequences of the individual proteins were compared to identify those that differentiate Clade II from Clade I and Clade III, and 20 unique amino acids were found in Clade II strains (Fig.4), among which, most sites were distributed on NS5A(7H,16A,69Q,131Q,152M,189I,280A,397F,437A) and NS5B(77V,139P,193P,231K,274A), and the remaining sites were on Npro (85D,120E), C(90K), Erns(91K,139Y) and NS3(30T). Interestingly, the amino acids at these unique sites were identical between Clade I and Clade III strains, demonstrating that it is possible to determine the type of strain by measuring these specific amino acids alone.

The unique amino acids found in Clade II APPV strains. Amino acid sequences of viral proteins were aligned with reference strains using MEGA11 and BioEdit software

In this study, putative N-glycosylation sites in the three important glycoproteins, Erns, E1, and E2, in Chinese APPV strains were also predicted. APPV-SDHY-2022, along with most of the strains in Clade II, is heavily glycosylated, with a total of ten N-glycosylation sites (N104 in the E1 protein; N12, N26, N43, N64, and N99 in the Erns protein; N51,N64,N103, and N127 in the E2 protein) (Fig.5). All the Chinese APPV strains had a conserved putative N-glycosylation site at N104 with a consensus N-I-T motif in the E1 protein. The putative N-glycosylation sites in the Erns and E2 proteins differed greatly among strains in different subclades, and 9 patterns of putative N-glycosylation sites were observed in E2 proteins, including N51+N64+N103, N64+N103, N51+N64+N103+N141,N51+N64+N127+N103+N141,N51+N64+N103+N127,N64+N103+N127,N51+N127,N51+N64,N64(Fig.5). Among the N-glycosylation sites of E2 proteins, a putative site at N64 was highly conserved.

Putative N-glycosylation sites of Erns, E1 and E2 proteins. The putative N-glycosylation sites within the Erns, E1 and E2 sequences of Chinese APPV strains were predicted according to a glycosylation analysis algorithm, and are shown as a blue shaded box

To analyze the effect of glycosylation sites on the antigenicity of the E2 protein, the antigenic index was determined by the Jameson-Wolf method in this study, and the results showed that aa positions at 1~9, 15~28, 34~44, 49~55, 62~82, 118~130, 136~158, 174~184, 188~196 and 200~205 of the E2 protein were the potential immunodominant regions. A comparison of the antigenic index within Chinese strains with and without a specific putative site showed that the putative N-glycosylation site at N51 had a negative effect on the antigenicity of the corresponding region (Fig.6).

Antigenicity prediction for the E2 protein. The Jameson-Wolf algorithm, which combines secondary structure information with backbone flexibility to predict surface accessibility, was used to determine the predicted antigenic index, with a threshold value of 1.7. The putative N-glycosylation sites within the E2 sequences of Chinese APPV strains are shown as a blue arrow. Representative strains from different Clades/subclades or patterns of putative N-glycosylation sites were included, and the strains in each subclade with different patterns of putative N-glycosylation sites are underlined

To further analyze the effect of glycosylation sites on conformational epitopes of the E2 protein, BepiPred-3.0 was used to predict B-cell conformational epitopes. The results showed that the 15 most likely B-cell conformational epitope residues varied among different Clades/subclades or patterns of N-glycosylation sites, and 39E, 70R, 173R, 190K, and 191N were conserved residues among all Chinese strains (Table4) (see also the graphical representations of the predicted epitopes in Fig.7).

Conformational B-cell epitope prediction for the E2 protein. The potential B-cell conformational epitopes of the E2 protein in APPV Chinese strains were predicted by BepiPred-3.0, and the residues with scores above the threshold (default value is 0.1512) are predicted to be part of an epitope and colored in yellow on the graph (where Y-axes depict BepiPred-3.0 epitope scores and X-axes protein sequence positions). Shown is the graphical output of B-cell discontinuous epitope predictions for the E2 protein with APPV-SDHY-2022 as an example

Read the original here:
Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China - Virology Journal - Virology Journal

Posted in Genome | Comments Off

Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based … – Nature.com

Posted: at 8:35 pm

Unusual low-quality ONT genomes due to extensive modifications

We sequenced 12 microbial strains of Listeria monocytogenes using Illumina and ONT R9.4 flowcells (~200990Mbp, SUP model) (Fig.1a, Supplementary Tables1 and 2). The ONT reads were assembled into genomes with sequencing errors further polished by Medaka and Homopolish (Supplementary Table3, see Methods). The Illumina and ONT read were hybrid assembled for evaluation purposes (Supplementary Table4). When compared with the Illumina/ONT hybrid assemblies (Fig.1b), seven ONT-only genomes exhibited high quality (HQ) ranging from Q47 to Q60 (e.g., R19-2905 and R20-0088). However, five isolates (R20-0026, R20-0030, R20-0127, R20-0148, and R20-0150) showed unexpectedly low quality (LQ) varying from Q26 to Q32. The accuracy of these five LQ genomes remained unimproved after replicated ONT sequencing. Further investigation of the five LQ genomes revealed excessive amounts of mismatch errors (15335670) compared with the seven HQ ones (040 mismatches) (Fig.1c). Homopolymer errors (i.e., indels) were not the source of inferior quality (7306, Supplementary Table5).

a Workflow of ONT-only and ONT/Illumina hybrid assembly; b Q scores; c number of mismatches (red: LQ, gray: HQ); d comparison of ONT and Illumina reads by IGV; e numbers of 5mC, 6mA, and mismatches between HQ/LQ strains (n=12, red: LQ, gray: HQ). Error bars represent the minimum and maximum values.

Manual inspection revealed that these mismatches were ONT basecalling errors uncorrected after genome polishing (Fig.1d and Supplementary Fig.1). As mismatch errors in ONT are mainly due to epigenetic modifications, we computed the frequency of well-known methylation in these isolates (see Method and Supplementary Table6). In terms of 5-methylcytosine (5mC), the numbers of modified loci in the five LQ genomes (~240340k) were not significantly higher than those in the HQ ones (210345k, P=0.89, Fig.1e). Similarly, the numbers of N6-methyladenine (6mA) modifications also showed no significant difference between the LQ and HQ groups (98218k vs. 126223k, P=0.34). Because the numbers of mismatch errors in LQ genomes are significantly higher than those of HQ ones (P=0.005), we suspected ONT basecalling algorithms failed to distinguish the novel modification types in the LQ isolates.

We removed the modifications in all microbial samples by WGA (Fig.2a), which randomly amplifies the genome fragments without retaining any epigenetic modification (see Methods). The WGA-demodified samples were sequenced by ONT (R9.4), assembled into chromosomes, and compared with the Illumina/ONT hybrid genomes (Fig.2a, Supplementary Tables7 and 8). The five LQ genomes after WGA exhibited significantly higher quality than those without demodifications (e.g., Q26 to Q53 in R20-0026) (Fig.2b, Supplementary Table9). In particular, the amounts of mismatch errors significantly reduced after demodification (e.g., 5670 to 16 in R20-0026) (Fig.2c). Consequently, the unexpected low quality of ONT was due to excessive modification-induced errors untrained in their basecalling model. The demodification by WGA can produce high-quality ONT genomes without the need for Illumina short reads.

a Worflow of WGA-demodified ONT; b Q scores of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); c numbers of mismatches of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); d WGA and ONT-only genome quality with respect to sequencing depth (shading: mininum and maximum quality in five replicates, line: median quality); e numbers of active/available pores during WGA-demodified and ordinary ONT sequencing.

However, while WGA successfully erased these modifications, the sequencing cost increased by two factors. First, WGA required a higher sequencing depth (~100) for assembling a complete genome when compared with ordinary ONT sequencing (~30) (Fig.2d and Supplementary Figs.2 and 3). It was due to the uneven amplification of WGA, which led to non-uniform sequencing depth and a fragmented assembly at moderate coverage. Second, the WGA-demodified samples may reduce the ONT yields. We observed the number of available/active pores could sometimes decrease quickly (e.g., less than 100 pores after 12h) (Fig.2e), which was possibly owing to the hyperbranched structure unresolved after WGA10. Consequently, the sequencing cost of WGA-demodified samples using ONT is much higher than ordinary sequencing.

We developed a novel computational method (called Modpolish) for correcting these modification-mediated errors without WGA and prior knowledge of the modification systems. Modpolish identifies and corrects the modification-mediated errors by leveraging basecalling quality, basecalling consistency, and evolutionary conservation (Fig.3a, see Methods). Briefly, because the ONT signals are disturbed by modifications, the basecalling quality is substantially lower than the modification-free loci (Supplementary Fig.4). As such, the basecalled nucleotides are often inconsistent at the modified loci (Supplementary Fig.5), yet these loci are within conservative motifs (Supplementary Fig.6). In conjunction with the conservation degree measured by closely-related genomes, only the modified loci with ultra-high conservation will be corrected by Modpolish, avoiding false corrections of strain variations with high specificity.

a Workflow of Modpolish; b Q scores before and after Modpolish; c numbers of mismatches before and after Modpolish (gray: before Modpolish, black: after Modpolish); d the antiviral defending systems encoded by the 12 strains (gray: before Modpolish, black: after Modpolish); e the sequence motif of modification sites in the four mza-encoding strains; f the sequence motif of modification sites on the R20-0026 strain.

We assessed the accuracy of Modpolish by comparing the quality of the ONT-only genomes (polished by Medaka) with those further polished by Modpolish. The results indicated that Modpolish significantly improved the quality of all LQ genomes from Q2734 to Q60 (Fig.3b, Supplementary Table10). The number of mismatches also greatly decreased (e.g., from 5670 to 67 in R20-0026) (Fig. (3c). The numbers of mismatches in some HQ genomes were also reduced by Modpolish. For instance, the mismatches in the R19-2905 were erased from 40 to 6. Consequently, our results suggested that Modpolish made no false corrections on the HQ genomes (Supplementary Tables1113). The comparison of different basecaller versions and models (v4.0.14 vs. v6.3.4, HAC vs. SUP) indicated that these errors remain exist and Modpolish successfully erases most of them (Supplementary Fig.7).

As the modification systems often involve anti-phage defense (e.g., R-M, BREX, DISARM)11,12,13, we investigated the defending systems possessed by the HQ and LQ strains (Fig.3d) (Supplementary Data1). All the HQ genomes encompass at least one R-M system (e.g., Type I, II, or III), which is missing in all LQ isolates. Instead, four LQ strains (i.e., R20-0030, R20-0127, R20-0148, R20-150) carry a novel methyltransferase-encoding mza defending system which is absent in all HQ genomes (Supplementary Fig.8). Analysis of modification sites of the four mza-encoding LQ strains revealed pentanucleotide motif GCAGC (Fig.3e, Supplementary Fig.6). On the other hand, modification loci in the LQ R20-0026 all centered on the motif GCTGG (Fig.3f). Together, these results suggested that two lineage-specific modification systems extensively edited the five LQ genomes. Although their underlying mechanisms remained unclear, the editing at specific motifs with high conservation within each lineage allowed cost-effective in silico correction of these errors by Modpolish.

We then assessed the performance of Modpolish on public ONT datasets sequenced by R9.4 (SUP) and R10.4 flowcells (SUP, duplex/simplex modes). In the R9.4 dataset14, we first compared the quality of seven bacterial genomes polished by Medaka and Modpolish (Fig.4a, Supplementary Table14). The quality of five genomes significantly improved from ~Q45 to Q60. Similarly, the improvement was mainly due to the reduction of mismatches (Fig.4b). For instance, the number of mismatches decreased from 388 to 13 in the Staphylococcus genome after Modpolish. On average, the mismatch reduction rates of all genomes ranged from 50-96%. Consequently, although these bacterial genomes are not extensively modified, Modpolish can further improve their quality after Medaka without false corrections.

Comparison of Medaka and Modpolish for a Q scores and b mismatches on the R9.4 dataset; comparison of Medaka and Modpolish for c Q scores and d mismatches on the R10.4 dataset.

In the R10.4 (duplex mode) dataset3, we compared the genome qualities polished by Medaka and Modpolish (downsampled to ~60) (Fig.4c, Supplementary Table15). In general, Modpolish made little or no improvement in the duplex dataset. For instance, the mismatches produced by Modpolish only reduced from 20 to 19 on the Bacillus genome (Fig.4d). The overall genome quality is very high such that no differences can be seen (Q60). Modpolish demonstrated marginal on a recently published simplex dataset (R10.4, kit 14, Dorado v0.1.1) (Supplementary Fig.9). Therefore, the qualities of ONT R10.4 flowcells, in particular the duplex mode, is not only higher than those of R9.4 and require nearly no further correction. On the other hand, Modpolish may be used to fill the accuracy gap between simplex and duplex modes when the projects aim for higher throughput.

View original post here:
Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based ... - Nature.com

Posted in Genome | Comments Off

CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma – Inside Precision Medicine

Posted: at 8:35 pm

Researchers at the Gladstone Institutes have developed a CRISPR-based genome shredding technique that shows promise in treating glioblastoma, an incurable brain cancer. Their research was published this week in the journal Cell Reports.

Much of the work done to develop the technique was done in the lab of Jennifer Doudna, PhD, an author on the paper, and co-winner of the 2020 Nobel Prize in Chemistry for the discovery of CRISPR-Cas9 gene editing technology. Other key players are Mitchel Berger, MD, a neurosurgeon and director of the Brain Tumor Center at University of California, San Francisco (UCSF), whose team helped secure patient-derived cell samples, and Alexendar Perez, MD, PhD, a resident at UCSF who performed much of the computational analysis needed for the study.

Computational analysis was necessary for diving into the non-coding portions of the genome to identify repetitive sequences shared by the glioblastoma cells. Cancer treatments rarely kill all tumor cells. In glioblastoma and other highly recurrent cancers, tumor cells that escape treatment develop multiple genetic mutations that allow them to proliferate.

Building on prior research, the Gladstone team surmised that mutated glioblastoma cells have a unique genetic signature that could be targeted. According to the paper, the team identified unique recurrent GBM-specific sgRNAs mainly in the non-coding genome that were generated by TMZ [chemotherapy] signature mutations characteristic of hypermutated gliomas. Those sequences are the beacon that guides CRISPR to the cancerous cells where it cuts up them up leading to genome fragmentation and DNA damage-induced cell death.

There is a lot to do before this CRISPR-based genome shredding technique can be used therapeutically. For example, the researchers noted that there are inefficiencies in the delivery modalities that need to be addressed. And it is important to note that the work published in Cell Reports does not detail a path to direct clinical implementation for this approach. But the results are promising evidence of CRISPRs potential to treat not just glioblastoma but other hypermutated tumors, according to Christof Fellmann, PhD, study lead and corresponding author on the paper. We see CRISPR as a gateway to a new therapeutic approach that wont be subject to the possibility of tumor cell escape.

And the researchers have reason to be hopeful. Results in the paper indicate that the technique works only on the tumor cells, sparing healthy ones during treatment. And in cases where tumor cells escaped the initial shredding, they succumbed to a second round of treatment. We understand so much today about glioblastoma and its biology, yet the treatment regimens havent improved, said I-Li Tan, PhD, first author on the paper. Now we have a precise way to target the cells that are driving the cancer, and we hope this may one day lead to a cure.

View post:
CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma - Inside Precision Medicine

Posted in Genome | Comments Off

Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion … – Nature.com

Posted: at 8:35 pm

Construction of the VgrG database

Encoded as a stand-alone gene or fused at the N-terminus of the toxin, the MIX domains can assist the delivery of their cognate T6SS effector19,20. As the central component of the spike complex, VgrG is a good marker to explore the potential conserved domains involved in the delivery of T6SS effectors. Therefore, we set out to create a comprehensive dataset of VgrG proteins from available Gram-negative genome sequences lodged in the public GenBank database.

Previous studies have revealed that the Afp8 proteins of extracellular contractile injection systems (eCISs) are homologous to VgrG proteins, thus representing a potential confounding influence on the integrity of the dataset24,25,26. Therefore, we firstly downloaded 872 experimentally verified VgrG proteins from the established SecReT6 T6SS database27. It provides a positive control dataset to better avoid potential false positive hits (such as Afp8 homologs). A bioinformatic scan for conserved domains confirmed that the VgrG domain (accession: COG3501) was present in all 872 verified VgrG proteins in addition to 472 Afp8 proteins available from the dbeCIS database26. Importantly, the identified domains found in 861 (99%) verified VgrGs range between 451 and 750 amino acids, whereas there are only 10 (2%) Afp8 proteins that fall within this size range (Fig.1a). We therefore proposed the use of an empirical criterion for the further systematic screening for bona fide VgrG proteins in the 133,722 publicly available bacterial genomes (Fig.1a). Using this approach, a total of 130,825 VgrG proteins were successfully identified from 45,041 Gram-negative bacterial genomes.

a The workflow for the identification of valid VgrGs from 133,722 publicly available bacterial genomes. The 872 VgrGs available from the established T6SS database SecReT6 (red) and 472 putative Afp8 proteins, encoding VgrG domains, available from the eCIS database dbeCIS (green) were used as positive and negative datasets respectively for the selection of the empirical criteria for large-scale VgrG screening. b The 872 VgrGs available from the SecReT6 database with predefined subtype information are indicated by colored stars (key). VgrGs from subtypes i4a and i5 were mixed within the same clade in the tree, but these two subtypes were indeed closely related in the previous study27. The known type iii T6SS clade, derived mostly from Bacteroidetes, is highlighted with red shadow.

To further characterize the VgrG proteins identified above, we constructed a maximum-likelihood (ML) phylogenetic tree based specifically on the sequences of the conserved VgrG domains (Fig.1b). Using the aforementioned 872 previously defined VgrGs as indicators, we observed that our ML tree exhibited a similar overall topology regarding types/subtypes of T6SS operons as previously described27, supporting the validity of our approach.

Firstly, a screen was performed to identify MIX containing protein, based on the aforementioned VgrG database. A total of 7208 MIX containing proteins within vgrG loci were identified, which are widely distributed among various bacteria (Supplementary Fig.1). Importantly, sandwiched between vgrG and downstream effector gene, MIX domain exhibit multiple encoding configurations including single proteins and fusions at the C-terminus of VgrG or N-terminus of effector (Supplementary Fig.2).

Based on the encoding features of MIX domain, we then developed a screening strategy to identify more conserved domains with similar multiple encoding configurations as MIX within vgrG loci from the VgrG database created above (Fig.2). In brief, we scanned a maximum of three downstream genes of each vgrG locus to collect the conserved domains within the proteins sandwiched by vgrG and downstream toxin (if present). A domain family was reported if it was present in both of two encoded forms: stand-alone gene (i.e., single form) and fused to either the C-terminus of VgrG or the N-terminus of a toxin (i.e., fusion form). Finally, to further explore the presence of these domain families within vgrG loci in finer detail, we extended our search without the limitation of linkage to known toxins to identify more candidate domain-containing proteins within vgrG loci (Fig.2).

For each vgrG locus, a maximum of three continuous downstream genes encoded on the same strand as vgrG, with an intergenic distance between adjacent genes of <1kb were collected. Known components of the T6SS operon and any annotated pseudogenes were excluded. Then, the 280,581 remained downstream genes were scanned for conserved domains by batch CD-search. A total of 1321 putative toxin domain families were deduced from a collection of 928 experimentally verified exotoxins/effectors available from the VFDB database53. Each domain family identified within downstream genes dataset were further classified into three cases for final manual curation and determination.

After the screening process and careful manual curation, DUF2345 (cl01733), FIX-like (cl41761), LysM (cl21525), 5 (cl33691), PG_binding_1 (cl38043) and PHA00368 (cl30808) were successfully identified (Supplementary Table1). As shown in Supplementary Fig.3, besides the single form, all these domain families have at least one fusion form. Further, the FIX-like (cl41761), LysM (cl21525), 5 (cl33691) and PG_binding_1 (cl38043) families can be found in both fusion forms. Notably, some of them were encoded adjacent to known T6SS adaptor, which implies that their functions can be different from T6SS adaptors.

Besides MIX domain, three well characterized T6SS adaptor families (DUF4123, DUF2169, and DUF1795) had been reported to assist the interaction between VgrG and its cognate effectors. We further screened these adaptor families encoded within vgrG loci. Amongst 130,825 vgrG loci, besides three adaptor domains (37.44%) and MIX domain (3.14%), 31.33% of vgrG loci encode at least one of the six conserved domain families identified here. In contrast, only 28.09% of vgrG loci do not include any of the adaptor/MIX/conserved domains mentioned above (Supplementary Fig.4).

Although DUF2345 is considered as an extension of the VgrG gp5 domain, it is not encoded by all VgrGs6,28,29. Nevertheless, among the aforementioned six conserved domains, the DUF2345 domain is the most frequently identified in vgrG loci (Supplementary Table1). We therefore explored its function in T6SS. Three vgrG loci encoding the DUF2345 domain were found in Escherichia coli PAR, Pseudomonas aeruginosa strain PAO1 and PS42 (Fig.3a). Sequence comparison indicated that AKO63_2953 (VgrGPAR), AKO63_2954 (DUF2345PAR) and AKO63_2955 (M35PAR), corresponding to the VgrG domain, the DUF2345 domain and the M35 (metallopeptidase) toxin domain of PA0262 (VgrG2bPA), respectively. Similarly, Q094_05019 (VgrGPS) of P. aeruginosa PS42 encodes VgrG domain, whereas Q094_05020 encodes N-terminal DUF2345 domain and C-terminal M35 domain. AlphaFold v2.0 predicted that VgrGPAR, VgrGPS and VgrG domain of VgrG2bPA have the same conformation (Supplementary Fig.5a). Further, E.coli locus (VgrGPAR, DUF2345PAR and M35PAR), PS42 locus (VgrGPS and Q094_05020) and VgrG2bPA form similar trimmer structure, which implies that these three complexes might endow similar biological functions (Supplementary Fig.5b). As these three loci encode VgrG, toxin and immunity proteins, we speculate that DUF2345 maybe involved in the interaction between VgrG and its cognate effector.

a The vgrG loci of E. coli PAR, P. aeruginosa PAO1 and PS42. b E. coli expressing VgrG2bPA or its truncated mutant VgrG2bPAM35 were detected by Western blot. Anti-RpoB is lysis control. c Survival of E. coli expressing VgrG2bPA or its truncated mutant VgrG2bPAM35 in pET22b. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. d Intraspecies P. aeruginosa competition assay between the VgrG2bPAPA0261 strain and various isogenic attacker strains at 37C for 24h. Competition assay between the parental strain (PAO1) and itself (gray) is the internal control. The values and error bars represent the meanSD (n=3 biological replicates). A one-way ANOVA with Dunnetts test was employed using the parent versus prey competition as the comparator (*p<0.05; ns, not significant). e E. coli expressing M35PAR, AKO63_2955-2956 or DUF2345PAR were detected by western blot. Anti-RpoB is lysis control. f Survival of E. coli expressing M35PAR, AKO63_2955-2956 or DUF2345PAR in pET22b. Ten-fold serial dilutions of cultures were spotted on LB agar containing the given concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. g Interactions between DUF2345PAR and VgrGPAR or M35PAR. Shown here are immunoblots of lysates (total) and immunoprecipitates with anti-FLAG affinity beads (IP: FLAG) of DUF2345PAR transformed with empty vector or a plasmid encoding Myc-tagged VgrGPAR or S-tagged M35PAR. GFP and VgrGPRE are control proteins. h DUF2345PAR mediates the interaction between VgrGPAR and M35PAR. Shown here are immunoblots of lysates (total) and immunoprecipitates with an anti-FLAG affinity beads (IP:FLAG) of M35PAR transformed with a plasmid encoding either Myc-tagged VgrGPAR or S-tagged DUF2345PAR.

Wood et al. showed that VgrG2bPA-PA0261 constitutes a T6SS antibacterial effector-immunity pair30. E. coli toxicity assay was used to test whether the DUF2345 domain in VgrG2bPA is toxic to bacteria (Fig.3b, c). As expected, overexpressed in E. coli, VgrG2bPA exhibited acute toxicity and co-expression of the immunity gene (PA0261) relieved this growth defect. Crucially, truncation of the M35 domain of VgrG2bPA restored growth, which indicated that DUF2345 in itself is not toxic to E. coli. Intraspecies P. aeruginosa competition assays were also performed to determine whether the DUF2345 domain could affect the function of VgrG2bPA (Fig.3d). Although the VgrG2bPAPA0261 strain exhibited a significant growth disadvantage against the wildtype PAO1 strain, it could no longer be outcompeted by both ClpV2PA and VgrG2bPA attacker strain. Notably, compared with the wildtype vgrG2bPA gene, the complement of vgrG2bPADUF2345 could not restore the growth advantage of the attacker strain. Further, although the secretion of Hcp (the T6SS inner stylet protein) was not affected, complemented in the VgrG2bPA strain, VgrG2bPADUF2345 could only be detected in the cells, but not in the supernatant (Supplementary Fig.6a). In addition, the production of VgrG2bPADUF2345 was still detrimental to E. coli when it remains in the periplasm (Supplementary Fig.6b, c). Therefore, it is clear that the DUF2345 domain disturbs the antibacterial ability of VgrG2bPA by ablation of its secretion.

We subsequently explored the function of DUF2345 when encoded as a distinct gene, which is within the locus containing vgrGPAR, M35PAR, along with the cognate immunity protein (Fig.3a). E. coli toxicity assay demonstrated that M35PAR exhibited bacterial killing activity, which was inhibited by its immunity protein (Fig.3e, f). Consistent with the results of Fig.3c, expression of DUF2345PAR in isolation had no deleterious effect on bacterial growth (Fig.3f). Immunoprecipitation assays of proteins co-expressed in E.coli confirmed that DUF2345PAR can specifically bind VgrGPAR and M35PAR, but not VgrGRPE (VgrG in Burkholderia sp. RPE67) (Fig.3g). Importantly, M35PAR could not interact with VgrGPAR in the absence of DUF2345PAR (Fig.3h). These results implied that DUF2345PAR is involved in the interaction between VgrGPAR and M35PAR to assist the loading of M35PAR on the T6SS spike.

Taken together, DUF2345 domain is indispensable for the delivery of its cognate toxin via fusion at the C-terminus of VgrG or encoded as a single gene.

Considering that DUF2345 is encoded as either a fusion at the C-terminus of VgrG or a distinct gene downstream of vgrG, we then investigated whether the sequences of VgrG domains showed a correlation with those of DUF2345. An iterative procedure was devised to hierarchically cluster the 52,277 VgrG domains and their cognate DUF2345 domains, respectively. At the 30% amino-acid sequence similarity cutoff, VgrG domains form three major clusters and ten outliers, whereas DUF2345 domains were classified into 37 distinct groups (Supplementary Fig.7). These findings imply that, compared to the relatively conserved VgrG domains, the sequences of DUF2345 domains exhibited higher diversity.

As we demonstrated above, DUF2345 is involed in the interaction between VgrG and the toxin protein. To further delve into this, we performed a Sankey analysis to investigate the relationship between DUF2345 domains and their downstream toxins in greater detail. It is interesting to note that most of DUF2345 clusters showed an obvious taxon-specific distribution and correlated well with their downstream toxins (Fig.4). Meanwhile, we also noticed that there are some toxins which correlated to more than one of DUF2345 clusters, such as Lyz-like and DUF2235 domains. To test whether this is a result of the intrinsic sequence diversity of these toxins, an iterative procedure was applied to further subdivide these toxin groups. As expected, the sub-clusters of Lyz-like and DUF2235 domains also correlated well to DUF2345 groups (Supplementary Fig.8). Thus, our data reveals that, DUF2345 domains exhibit high sequence diversity andcorrelate well with their downstream toxins.

A Sankey diagram showing the relationship between bacterial phylum/class, family, the corresponding DUF2345 clusters and the downstream toxin domain families (from left to right). Only DUF2345-encoding loci with adjacent known toxin domains were included. Loci from genomes without necessary taxa information were excluded. The number of sequences involved in each node is given after the node name. The red arrows on the right indicate some toxins which were linked to more than one DUF2345 clusters.

Absent from T6SS, LysM containing protein is one of the core components of eCIS, which shares several key homologous proteins in common with T6SS and forms a similar architecture31,32. Therefore, it is fascinating that our systematic screening implied that LysM domain is likely to be functional in T6SS.

Figure5a showed a vgrG loci encoding a LysM containing protein in Ketobacter alkanivorans GI5. E. coli toxicity assay showed that Kalk_10455 exhibited acute toxicity and co-expression of Kalk_10450 relieved this growth defect, which indicated that Kalk_10450 is an immunity protein against Kalk_10455 (Fig.5b, c). Notably, Kalk_10465 (VgrGG15) and Kalk_10460 (LysMG15) exhibited no toxicity when they were expressed in E. coli (Fig.5c). Although immunoprecipitation assays of proteins co-expressed in E.coli confirmed that Kalk_10455 specifically binds LysMG15 and VgrGG15, Kalk_10455 could not bind VgrGG15 in the absence of LysMG15 (Fig.5d).

a The vgrG loci of Ketobacter alkanivorans GI5 and Burkholderia sp. RPE67. b Immunoblots demonstrating the expression of VgrG2bG15, LysMG15 and Kalk_10455 in E. coli. Anti-RpoB is lysis control. c Survival of E. coli expressing VgrGG15, LysMG15 and Kalk_10455 in pETduet. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. d Interactions between VgrGG15, LysMG15 and Kalk_10455. Shown here are immunoblots of lysates (total) and immunoprecipitates with anti-FLAG affinity beads (IP: FLAG) of Kalk_10455 and GFP transformed with a plasmid encoding Myc-tagged VgrGG15 or Strep-tagged LysMG15. 0423PA is control protein. e Immunoblots demonstrating the expression of BRPE_05220 and NLPC_P60 domain in E. coli. Anti-RpoB is lysis control. f Survival of E. coli expressing BRPE_05220 and NLPC_P60RPE domain in pETduet. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. g LysM domain mediates the interaction between VgrGRPE and BRPE_05220. Shown here are immunoblots of lysates (Input) and immunoprecipitates with an anti-FLAG affinity beads (IP:FLAG) of BRPE_05220 or BRPE_05220LysM transformed with a plasmid encoding either Myc-tagged VgrGRPE or Myc-tagged 0423PA. 0423PA is control protein.

BRPE67_05220 in Burkholderia sp. RPE67, which includes both LysMRPE and NLPC_P60RPE domain, was used to further explore the function of LysM domain (Fig.5a). E. coli toxicity assays demonstrated that BRPE_05220 exhibited bacterial killing activity. Moreover, expression of NLPC_P60RPE domain in isolation had deleterious effect on bacterial growth, which was inhibited by BRPE_05230 (Fig.5e, f). Further, immunoprecipitated wildtype BRPE_05220, but not LysM truncated in BRPE_05220 (BRPE_05220LysM), interacted with BRPE_05210 (VgrG RPE) (Fig.5g).

AlphaFold v2.0 predicted that BRPE_05210 (VgrG) and BRPE_05220 (LysM and NLPC_P60) form similar trimmer structure with VgrG2bPA, which implied that LysM may mediate the interation between VgrG and toxin (Supplementary Fig.9). Further, the LysM domain phylogenetic analysis revealed the diversity of T6SS-related LysM domains, which is evolutionarily distinct from the phage-/eCIS-associated LysM domains (Supplementary Fig.10).

In sum, encoded at downstream of LysM containing gene or fused at the C-terminal of LysM domain, toxin interacts with VgrG in a LysM dependent manner implying LysM may assist the loading of its cognate effector onto the secretion apparatus.

The DUF2345 containing proteins exhibit specific correlation with their downstream diverse toxins (Fig.4). A similar Sankey analysis was performed to investigate the relationship between the other five identified conserved domain families along with the confirmed co-effector (MIX) and their downstream toxins (Supplementary Fig.11). Notably, most of the characterized toxin domains showed an obvious domain specific distribution with limited exceptions. For instance, as polymorphic toxins, RHS-containing proteins encode variable C-terminal toxic domains with conserved N-terminal RHS domain13. Most of the Rhs superfamily are linked to FIX-like (cl41761) and 5 (cl33691) domains. LysM domains are mainly correlated with Lyz_like, NlpD and NLPC_P60 superfamilies. As these domain families identified in this study, including FIX-like (cl41761), LysM (cl21525), 5 (cl33691), PG_binding (cl38043) and PHA00386 (cl30808), share a similar genetic organization and correlation with downstream toxins as the DUF2345 domain, it is reasonable to speculate that they would also function in the T6SS effector discrimination.

The overall distribution of the six conserved domain families was then analyzed (Fig.6). It is interesting to note, these families were not evenly encoded among different bacterial families. For example, although DUF2345 domains are widely distributed among Proteobacteria genomes, they are rarely encoded in the genomes of Vibrionaceae and Rhodospirillaceae bacterial families. In contrast, the PG_binding_1 domain is limited to the genomes of -proteobacteria, including the families of Chromatiaceae, Sinobacteraceae and Vibrionaceae. In general, although these conserved domains are widely encoded among various bacteria, their distributions exhibit obvious taxonomic specificity, which is coincident with their corresponding cognate effectors as shown in Fig.4 and Supplementary Fig.11.

Only taxa with genomes encoding at least one of the six conserved domains within the vgrG loci are shown for brevity. A total of 55,228 vgrG loci are included, but genomes without known assigned genus are excluded. The circles represent phylum, class, order, family and genus from inner to outer, and are color-coded by phylum/class (key). The family names are given outside the taxonomic tree. The outer heatmaps represent the percentage of genomes encoding the corresponding conserved domains for each genus (key).

Posted in Genome | Comments Off

TRISH to investigate the effects of spaceflight on the human genome, central nervous system – Odessa American

Posted: at 8:34 pm

HOUSTON The Translational Research Institute for Space Health (TRISH) will conduct a suite of human health and performance research projects during Axiom Spaces upcoming Axiom Mission 3 (Ax-3) to the International Space Station (ISS), scheduled to launch in 2024. TRISH is a consortium led by Baylor College of Medicines Center for Space Medicine with partners California Institute of Technology and Massachusetts Institute of Technology.

The selected research projects are designed to enhance understanding of the human experience in space and inform the development of high-impact scientific and technological solutions to help humans thrive on future space missions. Each project is part of TRISHs commercial spaceflight health research program, Enhancing eXploration Platforms and Analog Definition (EXPAND). The projects are led by researchers from across the nation who will investigate key space health topics, including space motion sickness, sleep disturbance, genome alterations, changes to cognitive function and eye and brain health, a news release said.

Our commercial spaceflight partners such as Axiom Space are instrumental to cutting-edge research, including these projects designed to reveal how the human body and mind function in the extreme environment of space, said Dr. Emmanuel Urquieta, TRISH chief medical officer, EXPAND program lead and assistant professor in the Center for Space Medicine at Baylor. This work represents an important step in our journey to understand the bodys response to challenging conditions, which is critical for improving human health both here on Earth and on future long-duration missions, including to the Moon and Mars.

Ax-3 is the third commercial astronaut mission to the ISS. The Ax-3 crew will live and work aboard the ISS for up to 14 days, implementing a full mission comprising microgravity research, educational outreach and technology demonstrations. The four-person crew includes Commander Michael Lpez-Alegra, Pilot Walter Villadei, and Mission Specialists Alper Gezeravc and Marcus Wandt. A SpaceX Falcon 9 rocket will launch the Ax-3 crew aboard a SpaceX Dragon spacecraft to the ISS from NASAs Kennedy Space Center in Florida.

Axiom Space appreciates the continued partnership with TRISH on our commercial astronaut missions and the opportunity to further our knowledge of human health through rigorous scientific studies, said Lucie Low, chief scientist for microgravity research at Axiom Space. TRISHs growing database of medical information collected from commercial spaceflight participants provides additional data sets that can help to inform the expanding commercial space industry.

The TRISH EXPAND biomedical research projects for Ax-3 include:

Cognitive and Physiologic Responses in Commercial Space Crew on Short-Duration Missions, Mathias Basner, M.D., Ph.D., M.S., University of Pennsylvania Perelman School of Medicine

Spaceflight participants experience a multitude of stressors that can affect brain function and crew performance. Basners team will track spaceflight participants performance in memory, abstraction, spatial orientation, emotion recognition, risk decision-making and sustained attention before and after the mission to assess the mental impact of space travel.

Otolith and Posture Evaluation II, Mark Shelhamer, Sc.D., Johns Hopkins University

Many space travelers develop motion sickness, nausea and disorientation shortly after launch and landing, which can impact performance. Using a series of tests administered on a tablet device, Shelhamer will study how astronauts inner ears and eyes sense and respond to motion before and immediately after spaceflight to better predict who is likely to develop space motion sickness.

Space Omics + BioBank, Richard Gibbs, Ph.D., Baylor College of Medicine

Gibbs team will gather biological specimens from astronauts before and after their mission to assess the effects of spaceflight on the human body at the genomic level. Comparisons of the pre- and post-flight samples can yield critical insights into the impact of space travel on human health and advance health care on Earth by revealing alterations in gene expression in response to extreme environmental stressors.

SANS Surveillance, TRISH

Understanding Spaceflight Associated Neuro-Ocular Syndrome (SANS), which involves changes to the eyes and brain during spaceflight, is of critical importance to NASA. This project collects related ocular images and vision function data during the ground phases of the mission.

Standardized research questionnaires, TRISH

TRISH has implemented a set of standardized research questionnaires for the crew to collect data on their sleep, personality, health history, team dynamics and immune-related symptoms. These additional contextual and qualitative data points will become part of TRISHs EXPAND research database, which collects and stores pre-flight, in-flight and post-flight health data from commercial astronauts in a centralized research database, available to current and future scientists exploring space health.

Sensorimotor adaptation, TRISH

The ability to stand, balance and have full body control will be critical elements when astronauts return to the moon. TRISH collects data before and after flight to help understand the level of sensorimotor ability and change as well as time to recovery.

TRISH is thrilled to continue our work in advancing human health in space with the help of Axiom Space, said Jimmy Wu, TRISH senior biomedical engineer, EXPAND program lead and instructor in Baylors Center for Space Medicine. The Axiom team and spaceflight participants are helping us make strides in understanding the risks to human health during space travel.

TRISH is an applied space health research catalyst empowered by the NASA Human Research Program to solve the challenges of human deep space exploration. Led by Baylor College of Medicines Center for Space Medicine, the consortium leverages partnerships with Caltech and MIT.

Like Loading...

More:
TRISH to investigate the effects of spaceflight on the human genome, central nervous system - Odessa American

Posted in Genome | Comments Off

The venom preceded the stinger: Genomic studies shed light on the origins of bee venom – EurekAlert

Posted: at 8:34 pm

image:

Components of the venom cocktail used by wild bees such as the Banded Mud-Bee (Megachile ericetorum) are evolutionarily older than their sting.

Credit: Bjrn von Reumont

FRANKFURT. Venoms have developed in many animal groups independently of each other. One group that has many venomous species is Hymenoptera, an insect order that also includes aculeates (stinging insects) such as bees, wasps and ants. Hymenoptera is very species-rich, with over 6,000 species of bees alone. And yet, despite the great ecological and economic importance of hymenopterans, very little is known about the evolutionary development of their venoms.

By means of comparative genomics, researchers led by Dr. Bjrn von Reumont, who is currently a visiting scientist in the Applied Bioinformatics Working Group at the Institute for Cell Biology & Neuroscience of Goethe University Frankfurt, have now examined systematically and for the first time how the most important components of the venom of bees and other hymenopteran taxa developed in the course of evolution. The toxins are complex mixtures composed of small proteins (peptides) and a few large proteins and enzymes. Stinging insects actively inject this poisonous cocktail into their prey or attackers with the help of a special sting apparatus.

In the first step, the researchers identified which of the peptides and proteins in the venom were most prevalent in hymenopterans. To do this, they drew on information from protein databases, although this was sparse. In addition, they analyzed the proteins in the venoms of two wild bee species the violet carpenter bee (Xylocopa violacea) and the great-banded furrow-bee (Halictus scabiosae) as well as of the honeybee (Apis mellifera). They found the same 12 families of peptides and proteins in all the hymenopteran venoms analyzed. These are evidently a common ingredient in these venom cocktails.

In collaboration with colleagues from the Leibniz Institute for the Analysis of Biodiversity Change (LIB), the Technical University of Munich (TUM) and the LOEWE Center for Translational Biodiversity Genomics (LOEWE TBG), the research team then searched for the genes of these 12 peptide and protein families in the genome of 32 hymenopteran taxa, including sweat bees and stingless bees, but also wasps and ants such as the notorious fire ant (Solenopsis invicta). The differences in these genes, in some cases only the exchange of single letters of the genetic code, helped the scientists to determine the relationship between the genes of different species and later with the help of artificial intelligence and machine learning to compile a lineage of the venom genes.

The surprising result was that many of the venom genes analyzed are present in all hymenopterans. Evidently the common ancestor of all hymenopteran taxa already possessed these genes. This makes it highly probable that hymenopterans are venomous as an entire group, concludes von Reumont. For other groups, such as Toxicofera, which includes snakes, anguids (lizards) and iguania, science is still debating whether the venoms can be traced back to a common ancestor or whether they evolved separately.

Within Hymenoptera, only the stinging insects bees, wasps and ants have an actual stinger to administer the venom. The evolutionary old parasitic sawflies, by contrast, use their ovipositor along with their eggs to inject substances that alter their host plants physiology: The sirex wood wasp (Sirex noctilio), for example, not only introduces a fungus into the plant, which facilitates the colonization of the wood by its larvae, but also its own poisonous cocktail with the venom proteins examined in the study. The purpose of these proteins is to create suitable conditions in the plant for the larvae. This means that the sirex wood wasp can also be classified as venomous, says von Reumont.

New venom components in bees are the gene for the peptide melittin and genes for representatives of the newly described protein family anthophilin-1. The fact that melittin is encoded by just one single gene came as a surprise to the researchers, as von Reumont explains: Not only are there many different variants of melittin, but the peptide also accounts for up to 60 percent of the dry weight of bee venom. That is why science previously assumed that there must be many gene copies. We were able to disprove this quite clearly. Because they found the melittin gene only in bees, the researchers also invalidated the hypothesis that it belongs to a group of venom genes postulated for stinging insects called aculeatoxins. Von Reumont is convinced: This shows us once again that genome data are the only way to draw meaningful conclusions about the evolution of venom genes.

The Frankfurt study is the first one to show for an entire insect group with around one million species where venom genes originated and how they have developed. It provides a starting point for tracing the evolution of venom genes in the ancestors of Hymenoptera as well as specializations within the group. However, to be able to perform comparative genomics on a large scale, analysis methods for the partly very large protein families must first be automated.

Experimental study

Animals

Prevalent bee venom genes evolved before the aculeate stinger and eusociality

23-Oct-2023

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Read this article:
The venom preceded the stinger: Genomic studies shed light on the origins of bee venom - EurekAlert

Posted in Genome | Comments Off

Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of … – Nature.com

Posted: at 8:34 pm

Genome assembly and annotation

The widely cultivated A. sinensis cultivar Qinggui1 was selected for genome sequencing (Fig.1a). We generated a total of 376.4Gb Single Molecule Real-Time (PacBio SMRT) sequences and 60.8Gb paired HiSeq reads (PE150), along with 325.0Gb effective chromosome conformation capture (Hi-C) reads (TableS1). The assembly was initialized by PacBio SMRT sequences, which were corrected with high-quality paired HiSeq reads. A genome size of 2.16Gb was obtained after the final assembly. The Hi-C interaction matrices showed a distinct separation pattern of 11 blocks that could be used to cluster and orient the contigs and anchor them to 11 chromosomes (Fig.1b and Tables1 andS2). The size of the genome that we assembled was similar to the size estimated by flow cytometry13. Mapping the short reads back to the assembly led to a correction of 29,533 single-base errors and 9426 small Indels. The identification of 1,588,740 heterozygous SNPs showed a low level of heterozygosity in this self-fertilized plant. Evaluation by the Benchmarking Universal Single-Copy Orthologs (BUSCO) method19,20 showed >99% completeness of the genome (TableS3). These results confirm a high-quality genome assembly. Please refer to Table1 and Data availability for detailed information on the genome assembly.

a Morphology of the sequenced plant. b Hi-C map of chromosomes. c a-b. SNP and indel density and distribution identified between A. sinensis (GS) and A. sinensis (QH); c Density and distribution of LTR retrotransposons (purple: LTR; blue: Copia-type; dark green: Gypsy-type); d Gene density and distribution; e Colinear gene pairs within the genome. The colors of linking lines indicate the number of one-to-one gene pairs in the collinearity blocks: 40, green: 20, blue: 10, gray: 5. This figure was prepared by using shinyCircos110.

Approximately 80.24% of the assembly (1.66Gb) was identified to be repetitive sequences, which was higher than estimates in another Apiaceae family member, coriander (70.59%) (Fig.1c, TableS4). Long terminal repeats (LTRs), primarily consisting of Gypsy and Copia subtypes, were most abundant. The other repeats were categorized as DNA transposons (3.65%), long interspersed nuclear elements (LINEs; 1.26%), short interspersed nuclear elements (SINEs, 0.03%), and uncharacterized repeats (19.77%) (TableS5).

We predicted a total of 41,040 protein-coding genes (TableS6) using ab initio methods, protein homology, and RNA-seq reads from different tissues. Of them, 98.3% were mapped to the chromosomes, and most were distributed in the terminal regions (Fig.1c). Using the iTAK pipeline21, we predicted 2,996 transcription factor (TF) genes in the A. sinensis genome. The top five TF families were MYB/MYB-related (209), AP2/ERF-ERF (172), bHLH (166), C2H2 (154), and NAC (135). Compared with those in other Apiaceae plants, GeBP, HSF, GARP-G2-like, C2C2-GATA, C2C2-Co-like, HB-WOX, and Trihelix families were expanded whereas C2C2-YABBY, B3-ARF, and GRAS genes dramatically decreased in A. sinensis (Fig.S1). The genome that we assembled in this study included more TF genes in most TF families than that in the published A. sinensis (GS) genome (Fig.S1).

Despite the increasing number of sequenced genomes of medicinal plants, systematic studies of their evolutionary relationships are relatively scarce. To explore the phylogenetic position of A. sinensis in the Apiaceae family and its evolutionary relations with other species, we selected typical representative families/orders and medicinal plant species of rosids and asterids according to the Angiosperm Phylogeny Group classification (APG V4) classification system22 and constructed a phylogenetic tree using one-to-one homologous gene families. These 20 representative angiosperms included 12 well-known medicinal plant species (TableS7) from 14 families and 12 orders, representing the major botanical taxonomic groups of core eudicots.

Among these species, Vitis vinifera was chosen for its important evolutionary position and its wide use as a model and basal plant for plant evolutionary research23. Arabidopsis thaliana and Solanum lycopersicum are well-studied model eudicot plants24,25. Theobroma cacao and Camellia sinensis are two of the most important beverage crops and are rich in secondary metabolites such as caffeine26,27. C. sinensis is also one of the basal species of asterid plants27. Populus trichocarpa was selected as a model plant for the study of lignin biosynthesis and phenylpropanoid metabolism28, which is also one of the most important metabolic pathways in A. sinensis related to the bioactive metabolites of ferulic acid, lignans, and coumarins. Cannabis sativa is one of the most valuable agriculturally important crops in nature and is also used to produce well-known drugs - tetrahydrocannabinol (THC) and cannabidiol (CBD)29. Ophiorrhiza pumila, belonging to the family Rubiaceae, is an important herbaceous medicinal plant and can accumulate camptothecin (CPT)30. Scutellaria baicalensis, Salvia miltiorrhiza, Taraxacum mongolicum, Artemisia annua, Lonicera japonica, Panax notoginseng, Panax ginseng, Angelica sinensis, and snapdragon (Antirrhinum majus L.) are widely used as traditional Chinese medicines with thousands of years of history in China. In addition, we also included Daucus carota, Apium graveolens, and Coriandrum sativum, which are important members of the Apiaceae family, to examine the evolutionary relationships within the family and the evolutionary status of A. sinensis.

We identified a total of 2133 one-to-one orthologous gene families shared by all the species (Fig.S2). Using these orthologs, we constructed a phylogenetic tree by the concatenation method. As expected, the topology of the tree was consistent with the APG V4 classification. In the Apiales order, Araliaceae was grouped with Apiaceae, and Araliaceae was considered to be the ancestral family. Divergence time estimates showed that these two families separated around 58 MYA. Within the Apiaceae family, A. graveolens and D. carota diverged approximately 23 MYA, which is much earlier than the divergence of A. sinensis (QH) and its sister clade C. sativum (12 MYA) (Fig.2a).

a Molecular phylogenetic tree of 20 representative angiosperm species constructed using 2133 concatenated conserved protein sequences by the ML and BI methods. b Phylogenetic tree of A. sinensis and other Apiaceae species, inferred by estimating divergence time using 3188 single-copy ortholog sequences. P. notoginseng was used as an outgroup. The numbers in green and red colors indicate gene family expansion and contraction compared with the most recent common ancestors, respectively. Estimated divergence times (MYA, million years ago) are indicated at each node. The Venn diagram shows the proportion of gene families under the unchanged (blue), expansion (red) and contraction (green) scenarios. c KEGG pathway enrichment analysis of expanded gene families in the A. sinensis (QH) genome. Only the enriched KEGG pathways with p values<0.05 are displayed. d Distribution of 4DTv distances of syntenic orthologous genes of Apiaceae species. The black arrows mark the WGD events. e The KS distribution for orthologous gene pairs within Apiaceae species. V. vinifera was used as the model organism for evolutionary analysis. The shape of the curve and the position of the peak are almost identical between A. sinensis (QH) and A. sinensis (GS). The highlighted peak regions represent two WGD events.

To further investigate the evolutionary relationships among Apiaceae species, we clustered approximately 91.3% (206,682) of the genes from five Apiaceae species and one outgroup species (P. notoginseng) into 29,108 orthologous groups and extracted 3189 single-copy genes (TableS8). We constructed a phylogenetic tree based on the concatenated sequence alignment of these single-copy gene families (Fig.2b). C. sativum showed the most marked gene expansion. A. sinensis (QH) and A. sinensis (GS) were clustered together and C. sativum was their closest relative. A. sinensis (QH) had more expanded and fewer contracted gene families than A. sinensis (GS) (Fig.2b).

We identified 3698 genes as members of significantly expanded gene families (P<0.01) in A. sinensis (QH) and mapped them to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for functional enrichment analysis. We detected 33 significantly enriched pathways (P<0.05), and the top enriched metabolic pathways included Glycosphingolipid biosynthesis, Zeatin biosynthesis, Benzoxazinoid biosynthesis, Oxidative phosphorylation, Sesquiterpenoid and triterpenoid biosynthesis, Biosynthesis of unsaturated fatty acids, Selenocompound metabolism, and Indole alkaloid biosynthesis (Fig.2c and TableS9). Some of the enriched KEGG pathways were involved in plant volatile biosynthesis, such as Sesquiterpenoid and triterpenoid biosynthesis and Phenylpropanoid biosynthesis, which suggested that these genes may contribute to the adaptive phenotypic diversification of A. sinensis species.

Whole-genome duplications (WGDs) are widely recognized as a major source of species diversification in many eukaryotic lineages based on various lines of evidence31. To identify potential WGD events, we calculated the nucleotide divergence at fourfold synonymous third-codon transversion positions (4dTv) and the synonymous substitution rates (Ks) for collinear gene pairs within each species. In addition to the five members of the Apiaceae family, namely, D. carota, A. graveolens, C. sativum, A. sinensis (GS), and A. sinensis (QH), we also included the model plant V. vinifera in our study.

The intragenomic paralogous genes of the Apiaceae species exhibit three distinct peaks in their 4dTv distributions (Fig.2d). The last peak (), shared with V. vinifera, signifies an ancient Whole Genome Triplication (WGT) event common to all eudicot plants. The first two peaks indicate two recent lineage-specific Whole Genome Duplication (WGD) events that took place prior to the divergence of the family members within the Apiaceae family. This observation aligns with a previous study which suggested that A. sinensis has undergone three polyploidy events13. By comparing the peak positions across species, we inferred a sequence of WGD events: A. sinensis experienced the most recent event, followed by C. sativum and then A. graveolens. This sequence corroborates our phylogenetic tree and divergence time estimates, thereby enhancing the consistency of our findings.

Ks values of homologous genes from different genomes can be used to estimate the time of species divergence32. In this study, we compared the Ks peak values within each species and identified two distinct peaks at Ks 0.5 and 1.0, corresponding to two WGD events (Fig.2e). The peak positions of A. sinensis (QH) and A. sinensis (GS) were nearly identical (see TableS10 for complete peak values), suggesting similar evolutionary histories for these two varieties. However, the peak at around 1.7 is not evident, likely due to the loss or divergence of ancient duplicate genes following the earliest WGD event. The order of the peak values aligned with the phylogenetic relationships of carrot, celery, coriander, and Angelica. This implied that the order of WGD events happened in these species was carrot, celery, coriander, and Angelica which was also consistent with the previous 4dTV analysis.

A total of 41,040 high-confidence genes were predicted, which is 2,163 fewer than the published genome annotation of 43,202 genes. To evaluate the integrity of the gene set, both gene sets were first compared using the same BUSCO version and parameters. A proportion of complete genes of 96.41% was found in A. sinensis (QH), while A. sinensis (GS) had only 88.10%. Second, common databases, including the InterproScan33, Gene Ontology (GO)34, KEGG35, SwissProt36, TrEMBL, KOG, and nonredundant protein NCBI databases, were used to functionally annotate these two gene sets. Approximately 95.76% of the genes were annotated in A. sinensis (QH), while only 90.38% were annotated in A. sinensis (GS). Third, OrthoFinder (v2.5.4)37 was used to cluster these two gene sets for further analysis. The percentage of genes in orthologous groups was 94.9% in A. sinensis (QH), while it was only 82.6% in A. sinensis (GS). The species-specific gene number was 2,111 in A. sinensis (QH) and 7,496 in A. sinensis (GS). In summary, we provided a better reference gene annotation for A. sinensis species.

The genomic differences between A. sinensis (QH) and A. sinensis (GS) were investigated. Highly collinear relationships were evident between these two genomes (Fig.3a, b). A large inversion was also observed along homologous chromosomes Chr09 (A. sinensis (QH)) and chr04 (A. sinensis (GS)), which is highlighted by a red arrow in Fig.3a and a red square in Fig.4b. Good collinearity was found in this region between A. sinensis (QH) and A. graveolens, suggesting that A. sinensis (GS) had an assembly error in this region or that this is an inherent feature of the A. sinensis (GS) genome. Relatively good collinearity was observed at the genome level between A. sinensis and A. graveolens. Furthermore, reciprocal translocations were observed along chromosomes 05 and 07 in A. sinensis (QH), as well as along chromosomes 09, 11, and 10 in A. graveolens (Fig.3b). This phenomenon was consistent between A. sinensis (GS) and A. graveolens, further confirming the occurrence of translocations between these chromosomes. The collinearities between A. sinensis (QH) and other species in Apiaceae are displayed in Fig.S3.

a Macrosynteny between A. sinensis (QH) and A. sinensis (GS) was verified using MUMmer98 (version 4.0). Each dot represents a homologous block. Blue and green colors indicate different orientations of the sequences, while the red arrow refers to intrachromosomal inversions. The plot was generated using Dot (https://dot.sandbox.bio/). b Genome collinearity analysis among A. sinensis (QH), A. sinensis (GS), and A. graveolens. MCScanX86 was used to identify collinear gene blocks among these three genomes. The red square highlights intrachromosomal inversions between A. sinensis (QH) and A. sinensis (GS). The color of linking lines indicates the number of one-to-one gene pairs in the collinearity blocks: orange (40), green (20), and gray (5). c The genome distribution of genes with strong functional effects between A. sinensis (QH) and A. sinensis (GS). d KEGG pathway enrichment analysis of genes with strong functional effects.

a Changes in metabolites between NG and EF samples. The horizontal axis shows log2-fold changes, and the vertical axis shows log2 absolute content changes. The dot colors represent the different compound classes. Numbers in brackets indicate the number of compounds upregulated in NG and EF samples. b Heatmap of the contents of metabolites Coumarins and lignans and Terpenoids and phthalides with different contents between the NG and EF groups. The data were normalized by the Z score in rows. The red and blue arrows indicate the upregulated and downregulated metabolites, respectively (VIP1 and LOG2 (fold change) 1 or 1). c Heatmap showing differential gene expression related to coumarin, lignan and lignin biosynthesis between NG and EF samples in Angelica roots. The red and blue arrows indicate the upregulated and downregulated genes (LOG2 (fold change) 1 or 1 and p 0.05), respectively. Only the genes with FPKM5 in at least one sample are shown.

A total of 1.227 million SNPs and 242,250 Indels were detected in syntenic blocks between the two A. sinensis genomes. The distributions of SNPs and indels were similar but uneven across the whole genome (Fig.1c). Most of the genetic variations were located in the intergenic regions. Of these, 38,862 SNPs and 8887 indels were located in the coding regions, affecting 9,547 and 5,125 genes, respectively. Within coding regions, 909 genetic variations (affecting 686 genes) were annotated as having a strong effect on gene function, with frameshifts or changes at the start or stop codon (Supplementary Data1). These genes were not evenly distributed across the whole genome (Fig.3c) and enriched in the KEGG pathways of biosynthesis of various secondary metabolites, such as Indole alkaloid, Betalain, Isoquinoline alkaloid and Sesquiterpenoid, and triterpenoid biosynthesis (Fig.3d). The numbers of SNPs and indels were higher on chromosomes 10 and 11 than those on other chromosomes (Fig.1c and TableS11).

To understand the biosynthesis of various bioactive components in Angelica roots, we conducted nontargeted metabolomics profiling on normally growing and early-flowering Angelica roots. More than 716 high-confidence metabolites were detected and identified, including 39 flavonoids, 12 terpenoids, 47 alkaloids, 74 phenolic acids, 10 phthalides, 31 coumarins, and 24 lignans (Supplementary Data2), of which 299 compounds were determined as differential metabolites using univariate and multivariate statistical methods with the parameters of FC2 or 0.5 and VIP (variable importance in projection) 1, including 145 upregulated and 154 downregulated metabolites.

The class of metabolites appeared to have completely different metabolic patterns in the Angelica roots between NG (normal growth) and EF (early flowering and bolting) samples. The Angelica roots in NG samples were rich in organic acids, amino acids and derivatives, saccharides and alcohols, and nucleotides and derivatives, while the Angelica roots in EF samples were rich in phenolic acids, LPC, LPE, coumarins, lignans, flavonols, and flavonoids (Fig.4a). In particular, the differential production of these bioactive compounds in NG and EF Angelica roots showed that some phthalides and coumarins were more highly accumulated in NG roots than in EF roots, whereas most lignans accumulated at higher levels in EF roots than in NG roots (Fig.4b). It demonstrated the higher medicinal value of NG roots than EF roots since these phthalides and coumarins displayed more important bioactivities in experimental and clinical studies.

Transcriptome analyses of these Angelica roots under different developmental conditions also unveiled the differentially expressed metabolic genes in their biosynthesis pathways in line with metabolomics data (Fig.4c). The metabolic genes putatively involved in the biosynthesis of lignans and coumarins, both of which are derived from the phenylpropanoid pathway that often leads to the biosynthesis of well-known lignin and flavonoids, were upregulated in EF roots compared with NG roots (Fig.4c). In contrast, most genes putatively involved in phthalide and coumarin biosynthesis were expressed at higher levels in NG roots than in EF roots, consistent with their higher pharmaceutical values (Fig.4c).

Although the common shared metabolic enzymes and pathways involving lignin, coumarins, lignans, and flavonoids are well known, the specific genes/enzymes involved in the production of many coumarins and lignans are poorly understood13,38,39. This new Angelica genome assembly provided more than 100 metabolic genes that encode all known enzyme homologs involved in the biosynthesis of coumarins and lignans (Supplementary Data3). The phenylpropanoid pathway genes, including phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumaroyl-CoA ligase (4CL), hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase (HCT), caffeic acid O-methyltransferase (COMT), caffeoyl-CoA O-methyltransferase (CCoAOMT), etc., contributing to lignin biosynthesis via HCT and CCR genes, via dirigent protein (DIR), or via flavonoid synthesis by CHS and for coumarin biosynthesis from different products of 4CL by cinnamic acid 2-hydroxylase (C2H), p-coumarate 3-hydroxylase (C3H) with HCT, or feruloyl-CoA hydroxylase (F6H), were all assembled and annotated in our genome to provide insights on the biosynthesis of various pharmaceutically important products (Fig.5a). Lignans have unique antitumor activities and reduce lifestyle-related diseases40. Lignans were also enriched in Angelica roots, particularly of EF status, in which a subset of biosynthesis genes and contents of lignans and derivatives were upregulated, including dirigent protein (DIR), pinoresinol-lariciresinol reductase (PLR), and secoisolariciresinol dehydrogenase (SIRD) for the biosynthesis of pinoresinol and lariciresinol, secoisolariciresinol, and matairesinol aglycones and their glycosides as products of UGT71/74 glycosyltransferses40 (Fig.5a).

a Putative biosynthesis pathways of coumarins, lignin, lignans and flavonoids. The numbers in parentheses indicate the number of genes. Different background colors represent the synthetic pathways of different products. The PT genes are highlighted in red. The genes in different gene families are listed in Supplementary Data3. b Rootless phylogenetic tree of PT genes. The tree shows the grouping of PT genes according to the type of substrate (ah). The orthologous genes in A. sinensis (QH) and A. sinensis (GS) are highlighted. The genes in the c and d subtrees had relatively high expression levels.

Prenyltransferase (PT) catalyzes the prenylation of umbelliferone into linear or/and angular furanocoumarin biosynthesis34,35. PTs are involved in the biosynthesis of chlorophyll, vitamin E, heme, phylloquinone, and various secondary metabolites by prenyl modifications of chlorophyllide a/b, vitamin E, heme B, and many metabolites, such as 1,4-dihydroxyl-2-napthoic acid, p-hydroxylbenzoic acid, flavonoids, phloroglucinol, homogentisate, and coumarins, with different prenyl donors, such as isoprenyl diphosphate, dimethylallyl diphosphate, and geranyl diphosphate (Fig.5b). Despite the divergent functions of these PTs, they involved in coumarin biosynthesis that evolved most likely via convergent evolution since coumarins mainly occur in a few unrelated plant families, such as Fabaceae, Moraceae, Apiaceae and Rutaceae34,35. This finding is also supported by a previous study19, which showed independent evolution of coumarin biosynthesis-related PTs in these families. Furthermore, these PTs that catalyze both linear (demethylsuberosin, e.g., PsPT1 and PcPT1) and angular (osthenol, e.g., PsPT2) furanocoumarin biosynthesis are clustered together in one clade for Apiaceae species (Fig.5b), likely resulting from gene duplications followed by neofunctionalization and positive selection38,41.

As two major pharmaceutically important components in Angelica roots, ligustilide and butylidenephthalide are generally regarded as essential contributors to the main medical functions of Angelica roots42,43,44,45. However, their biosynthesis pathways remain elusive. The oxidation or transfer of isoprenoids or condensation of malonyl CoAs with other acyl CoAs by type III polyketide synthases (PKSs) or their combinations could be involved in the biosynthesis of these phthalides46,47. We therefore examined the A. sinensis genome together with transcriptome and metabolite profiling for the biosynthesis of ligustilide and butylidenephthalide and other monoterpene volatiles that contribute to the medicinal functions of Angelica roots.

To more clearly profile bioactive components in Angelica roots, volatile terpenoids, and phthalides were examined by using headspace solid-phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS). The volatiles of early-flowering (EF) and normally growing (NG) roots showed notable differences. In addition to the higher levels (~47% of total volatiles) of Z-ligustilide and Z-butylidenephthalide and their E- type isomers as major components in NG roots, the EF roots of A. sinensis also contained fewer phthalides (34% of total volatiles), as well as much less abundant monoterpenes, such as -pinene and E--farnesene, (Figs.6a, b). These data indicated that early bolting and flowering also negatively impacted volatile accumulation in Angelica roots.

a Headspace solid-phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS) analysis of the contents and composition of volatiles in Angelica roots from early-flowering (EF) and normally growing (NG) plants. b Differential content analysis of the volatiles in Angelica roots between EF and NG plants. c Enzymatic reactions in the mevalonate (MVA) and methylerythritol phosphate (MEP) pathways in plants and synthesis of short-chain prenyl diphosphates. The MVA pathway is shown in light red; the MEP pathway is shown in light green. Abbreviations and full names are given in TableS16. Data are expressed as the meansSDs from at least three independent experiments with triplicates. Differences between NG and EF samples are considered significant when **P<0.01 and *P<0.05 in Students t test.

Genome analyses revealed that three key gene families involved in the MEP pathway toward monoterpene synthesis, MCT, HDS, and HDR, were expanded in the A. sinensis genome in comparison with the Arabidopsis and grapevine genomes (Supplementary Data4). A. sinensis genome sequences revealed an extremely enhanced monoterpene pathway during the evolution of several genera in the Apiaceae family (Supplementary Data4), which is consistent with the diverse and enriched monoterpene volatile profiles in these plants (Fig.6a).

Transcriptome data showed that genes involved in glycolysis and the pentose phosphate pathway were downregulated in EF Angelica roots, which also negatively affected the mavalonic pathway (MVP) and 2-C-methyl-erythrose 4-phosphate (MEP) pathway, leading to the biosynthesis of mono-and sesquiterpenoids (Fig.6c). The DXS, MDS, CMK, and HDR genes involved in the plastic MEP pathway, one IPPI and two GPPS genes for monoterpenoid biosynthesis were significantly downregulated in EF Angelica roots compared with NG Angelica roots (Fig.6c).

A. sinensis is a triennial medicinal plant that typically flowers in its third year but can flower early in May of its second year (Fig.7a). As Angelica roots contain a wide range of terpenoid volatiles at abundant levels, they are also regarded as major components contributing to clinical functions48. Terpenoid synthase family genes play key catalytic roles in plant terpenoid biosynthesis. A total of 28 putative TPS genes in the A. sinensis genome belonging to five TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g) were identified (Fig.7b). The TPS-b family was expanded in both A. sinensis (15) and C. sativum (20), and the expansion of TPS-b genes in the A. sinensis genome was mainly due to tandem duplication (Ks<0.1). There were 5 more TPS genes in A. sinensis (QH) than in A. sinensis (GS), which indicated that the completion of A. sinensis (QH) was better than that of A. sinensis (GS). We detected 8 TPS genes that were expressed in Angelica roots (FPKM1 at any samples), and most of them had higher expression levels in NG roots than in EF roots (Fig.7b).

a Plants were sown simultaneously and grown in the same environment. Samples were taken at the same time for observation and analysis. EF early flowering, NG normal growth. We highlight the highly lignified Angelica root of the EF plant and the normally developed storage root of the NG plant on the right side. b Five TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g) were clearly identified. The genes from A. sinensis (QH) and A. sinensis (GS) are highlighted by red and green dots, respectively. The heatmap of gene expression is illustrated.

To further verify the possibility that PKSs are involved in the biosynthesis of the polyketide derivatives ligustilide and butylidenephthalide in A. sinensis, we analyzed genes that are involved in the biosynthesis of acetyl-CoA and malonyl CoA, which are used as substrates for type II and III PKSs for the production of polyketides (Fig.8a)46,47. Acetyl-CoA carboxylase (ACC) is the main enzyme catalyzing the conversion of glycolysis pathway-derived acetyl-CoA into malonyl CoA, which is a key intermediate for fatty acid, polyketide, and flavonoid biosynthesis47. Plant ACC is composed of two subunits, the biotin carboxylase and carboxyl transferase subunits47. The coding genes for two ACC subunits, BCCP2 (CAC1) (4) and CAC2-CAC3 (5), were expanded in the A. sinensis genome in comparison with the Arabidopsis and grapevine genomes, respectively (TableS12). Consistent with lower Z-ligustilide and Z-butylidenephthalide levels in EF Angelica roots, at least two ACC subunit genes were downregulated in EF roots compared with NG roots (Fig.8b).

a The malonyl-CoA biosynthesis metabolic pathway. b Heatmap displaying the expression of typical ACC genes in Angelica roots between EF and NG plants. c The overall expression (FPKM) of ACC and PKS genes in Angelica roots between EF and NG plants. d Phylogeny of polyketide synthase genes (PKSs). The heatmap displays the gene expression in Angelica roots between EF and NG plants. The color of gene IDs shows the source of different species: red: A. sinensis; blue: A. thaliana; black: seed sequences. The red stars highlight the upregulated genes, and the blue stars highlight the downregulated genes.

PKS consists of a large gene family encoding multifunctional enzymes that catalyze condensation of malonyl CoAs or malonyl CoA with other acyl CoAs to generate diverse polyketides46,47. In particular, type III PKS (TKS) catalyzes linear tetraketide-CoA synthesis with hexanoyl-CoA and malonyl CoA and might provide a backbone for Z-ligustilide and Z-butylidenephthalide biosynthesis49. A previous study showed that a TKS olivetolic acid cyclase (OAC) catalyzed a C2C7 intramolecular aldol condensation with carboxylate retention in the linear tetraketide-CoA to form olivetolic acid in Cannabis sativa49. OAC was structurally similar to Z-ligustilide and Z-butylidenephthalide, with only differences in the position of the olefinic link and hydroxyl group49. A multifunctional protein (MFP) could handle the switch of olefinic links and hydroxyl groups in the lipid metabolism process50. It has thus been proposed that Z-ligustilide and Z-butylidenephthalide are synthesized via a similar mechanism through the PKS pathway, although the exact enzyme or gene responsible for their biosynthesis remains unknown. In the A. sinensis genome, PKSs also formed a large gene family of 120 members, among which the type III PKS genes are expanded (TableS13 and Fig.8d).

Transcriptome analyses showed that four PKS genes, namely, As05G08873, As11G04238, As10G03800, and As08G02849, were highly expressed in Angelica roots (Fig.S4), and in particular, we also found that some of the PKS genes were repressed in EF Angelica roots as compared with NG roots (Fig.8d), indicating that these PKSs might be involved in the biosynthesis of phthalides. The overall expression of ACC and PKS genes in Angelica roots was lower in EF plants (Fig.8c). Further studies with isotope-labeled substrates in tracer experiments, together with enzyme and molecular approaches, are needed to unveil the mechanism underlying the biosynthesis of Z-ligustilide and Z-butylidenephthalide in A. sinensis.

See more here:
Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of ... - Nature.com

Posted in Genome | Comments Off

Page 42«..10 20..414243 44..50 60..»

Category Archives: Transhuman News

"Ground-Breaking" Release of World’s Largest Whole Genome Resource – Inside Precision Medicine

Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet – Nature.com

Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome … – Nature.com

Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China – Virology Journal – Virology Journal

Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based … – Nature.com

CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma – Inside Precision Medicine

Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion … – Nature.com

TRISH to investigate the effects of spaceflight on the human genome, central nervous system – Odessa American

The venom preceded the stinger: Genomic studies shed light on the origins of bee venom – EurekAlert

Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of … – Nature.com

The Prometheus League

Breaking News and Updates

Prometheism

Forbidden Fruit

The Evolutionary Perspective

Transtopia Menu

Library Updates

Library Books

Future Euvolution

Lucid Dreams from Childhood

Genetic Revolution

Speciation + Self-Directed Evolution