Page 5«..4567..1020..»

Category Archives: Genome

Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China – Virology Journal – Virology Journal

Posted: November 30, 2023 at 8:35 pm

Viral metagenomic analysis

The number of clean reads was 21,157,543 for the RNA sample and 26,789,502 for the DNA sample. For RNA, the data were assembled to a total sequence length of 2,337,534, with 60.92% GC content. The length of the largest contig was 11,556 nt, which was identified as APPV (Table1), and named as APPV-SDHY-2022 for further analysis in this study. For DNA, the data were assembled with a total sequence length of 38,447,346 and 41.71% GC content. Other viruses, including Getah virus, porcine picobirnavirus, porcine kobuvirus, porcine sapovirus, Po-Circo-like virus, porcine serum-associated circular virus, porcine bocavirus 1, porcine parvovirus 1, porcine parvovirus 5 and porcine circovirus 3 were also identified by sequence alignment ((Table1), however, most contigs of these viruses were less than 500bp (see Additional file 2: Table s2 & Table s3). No other known pathogens (PRRSV, PPV2-4/68, CSFV, PCV2 and Japanese encephalitis virus) related to abortion were sequenced.

APPV presence was confirmed in the pooled sample by RTPCR amplification targeting the NS3 gene (see Additional file 3: Fig.s1A). The assembled sequence of the PCR products was identical to that of APPV-SDHY-2022 (see Additional file 3: Fig.s1B). This provided additional evidence of APPV presence in the abortion cases.

The genome of strain APPV-SDHY-2022 (GenBank accession no. OP381297) contains 11,556 nucleotides (nt) and consists of a 5UTR (370 nt, positions 1 to 370), CDS (10,909 nt, 371 to 11,279), and 3UTR (277 nt, 11,280 to 11,556). The nucleotide and amino acid sequences of the individual proteins of the strains were aligned separately, and the homology between APPV-SDHY-2022 and the reference strains was determined (Table2). Sequence alignment based on APPV polyprotein CDS showed that the nucleotide identities of APPV-SDHY-2022 with Clade I, Clade II, and Clade III strains were 82.6-84.2%, 93.2-93.6%, and 80.7-85%, respectively, while the amino acid identities were 91.4-92.4%, 96.4-97.7%, and 90.6-92.2%, respectively. APPV-SDHY-2022 shared the highest nucleotide identity (93.6%) with APPV-China/GD-SHM/2016, and the highest amino acid identity (97.7%) with GD-YJHSEY2N. Among the 12 mature proteins, NS5A showed the lowest homology (77.6-93.3% at the nt level) with the reference strains.

Phylogenetic analysis was performed based on complete polyprotein CDS and NS5A nucleotide sequences. The results showed that APPV-SDHY-2022 belongs to a separate branch of Clade II (Fig.2A). Moreover, the results revealed that the homology of NS5A nucleotide sequences was above 94.6% for the same isoform, 84.7-94.5% for different isoforms of the same clade and 76.8-81.1% for different clades (Table3). Therefore, we proposed that Clade II strains can be further divided into three subclades and that APPV-SDHY-2022 belongs to subclade 2.3. APPV-China/GD-SD/2016 and APPV-China/GZ01/2016 belong to subclade 2.2, and the other Chinese strains among the Clade II cluster belong to subclade 2.1 (Fig.2B). Since Clade II strains were found only in China, this typing method can help us better analyze the evolution of Clade II strains.

Phylogenetic analysis of Chinese APPV strains. Phylogenetic trees based on the nucleotide sequences of the complete polyprotein CDS (A) and the NS5A gene (B) were constructed by the neighbor-joining (NJ) method with 1,000 bootstrap replicates in MEGA11 software. The APPV-SDHY-2022 strain reported in this study is indicated with a red dot

To further explore the genetic evolution of APPV, potential recombination events were identified using Recombination Detection Program version 4 (RDP4) and then examined using SimPlot version 3.5.1. Among all available APPV strains, 8 strains (GD-DH01-2018, GD-BZ01-2018, JX-JM01-2018A01, GD2, GD-HJ-2017.04, GD-LN-2017.04, GD-CT4, and GD-MH01-2018) had potential genetic recombination events. Although NGS of APPV-SDHY-2022 confirmed recombination events of JX-JM01-2018A01 and GD-HJ-2017.04 by RDP4 (see Additional file 4: Table s4), no obvious genetic recombination in APPV-SDHY-2022 strains was observed by SimPlot software in this study (Fig.3).

Recombination analysis of the complete genomes of the APPV-SDHY-2022 strain from Shandong Province. Potential recombination events were identified using Recombination Detection Program 4 (RDP4) and then examined using similarity plots and bootstrap analysis in Simplot 3.5.1. The major and minor parents were JX-JM01-2018A01 and GD-HJ-2017.04, respectively

Amino acid sequences of individual viral proteins of all the Chinese APPV strains were analyzed. No amino acid insertions or deletions were found in the APPV-SDHY-2022 strain. The amino acid sequences of the individual proteins were compared to identify those that differentiate Clade II from Clade I and Clade III, and 20 unique amino acids were found in Clade II strains (Fig.4), among which, most sites were distributed on NS5A(7H,16A,69Q,131Q,152M,189I,280A,397F,437A) and NS5B(77V,139P,193P,231K,274A), and the remaining sites were on Npro (85D,120E), C(90K), Erns(91K,139Y) and NS3(30T). Interestingly, the amino acids at these unique sites were identical between Clade I and Clade III strains, demonstrating that it is possible to determine the type of strain by measuring these specific amino acids alone.

The unique amino acids found in Clade II APPV strains. Amino acid sequences of viral proteins were aligned with reference strains using MEGA11 and BioEdit software

In this study, putative N-glycosylation sites in the three important glycoproteins, Erns, E1, and E2, in Chinese APPV strains were also predicted. APPV-SDHY-2022, along with most of the strains in Clade II, is heavily glycosylated, with a total of ten N-glycosylation sites (N104 in the E1 protein; N12, N26, N43, N64, and N99 in the Erns protein; N51,N64,N103, and N127 in the E2 protein) (Fig.5). All the Chinese APPV strains had a conserved putative N-glycosylation site at N104 with a consensus N-I-T motif in the E1 protein. The putative N-glycosylation sites in the Erns and E2 proteins differed greatly among strains in different subclades, and 9 patterns of putative N-glycosylation sites were observed in E2 proteins, including N51+N64+N103, N64+N103, N51+N64+N103+N141,N51+N64+N127+N103+N141,N51+N64+N103+N127,N64+N103+N127,N51+N127,N51+N64,N64(Fig.5). Among the N-glycosylation sites of E2 proteins, a putative site at N64 was highly conserved.

Putative N-glycosylation sites of Erns, E1 and E2 proteins. The putative N-glycosylation sites within the Erns, E1 and E2 sequences of Chinese APPV strains were predicted according to a glycosylation analysis algorithm, and are shown as a blue shaded box

To analyze the effect of glycosylation sites on the antigenicity of the E2 protein, the antigenic index was determined by the Jameson-Wolf method in this study, and the results showed that aa positions at 1~9, 15~28, 34~44, 49~55, 62~82, 118~130, 136~158, 174~184, 188~196 and 200~205 of the E2 protein were the potential immunodominant regions. A comparison of the antigenic index within Chinese strains with and without a specific putative site showed that the putative N-glycosylation site at N51 had a negative effect on the antigenicity of the corresponding region (Fig.6).

Antigenicity prediction for the E2 protein. The Jameson-Wolf algorithm, which combines secondary structure information with backbone flexibility to predict surface accessibility, was used to determine the predicted antigenic index, with a threshold value of 1.7. The putative N-glycosylation sites within the E2 sequences of Chinese APPV strains are shown as a blue arrow. Representative strains from different Clades/subclades or patterns of putative N-glycosylation sites were included, and the strains in each subclade with different patterns of putative N-glycosylation sites are underlined

To further analyze the effect of glycosylation sites on conformational epitopes of the E2 protein, BepiPred-3.0 was used to predict B-cell conformational epitopes. The results showed that the 15 most likely B-cell conformational epitope residues varied among different Clades/subclades or patterns of N-glycosylation sites, and 39E, 70R, 173R, 190K, and 191N were conserved residues among all Chinese strains (Table4) (see also the graphical representations of the predicted epitopes in Fig.7).

Conformational B-cell epitope prediction for the E2 protein. The potential B-cell conformational epitopes of the E2 protein in APPV Chinese strains were predicted by BepiPred-3.0, and the residues with scores above the threshold (default value is 0.1512) are predicted to be part of an epitope and colored in yellow on the graph (where Y-axes depict BepiPred-3.0 epitope scores and X-axes protein sequence positions). Shown is the graphical output of B-cell discontinuous epitope predictions for the E2 protein with APPV-SDHY-2022 as an example

Read the original here:
Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China - Virology Journal - Virology Journal

Posted in Genome | Comments Off on Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China – Virology Journal – Virology Journal

Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based … – Nature.com

Posted: at 8:35 pm

Unusual low-quality ONT genomes due to extensive modifications

We sequenced 12 microbial strains of Listeria monocytogenes using Illumina and ONT R9.4 flowcells (~200990Mbp, SUP model) (Fig.1a, Supplementary Tables1 and 2). The ONT reads were assembled into genomes with sequencing errors further polished by Medaka and Homopolish (Supplementary Table3, see Methods). The Illumina and ONT read were hybrid assembled for evaluation purposes (Supplementary Table4). When compared with the Illumina/ONT hybrid assemblies (Fig.1b), seven ONT-only genomes exhibited high quality (HQ) ranging from Q47 to Q60 (e.g., R19-2905 and R20-0088). However, five isolates (R20-0026, R20-0030, R20-0127, R20-0148, and R20-0150) showed unexpectedly low quality (LQ) varying from Q26 to Q32. The accuracy of these five LQ genomes remained unimproved after replicated ONT sequencing. Further investigation of the five LQ genomes revealed excessive amounts of mismatch errors (15335670) compared with the seven HQ ones (040 mismatches) (Fig.1c). Homopolymer errors (i.e., indels) were not the source of inferior quality (7306, Supplementary Table5).

a Workflow of ONT-only and ONT/Illumina hybrid assembly; b Q scores; c number of mismatches (red: LQ, gray: HQ); d comparison of ONT and Illumina reads by IGV; e numbers of 5mC, 6mA, and mismatches between HQ/LQ strains (n=12, red: LQ, gray: HQ). Error bars represent the minimum and maximum values.

Manual inspection revealed that these mismatches were ONT basecalling errors uncorrected after genome polishing (Fig.1d and Supplementary Fig.1). As mismatch errors in ONT are mainly due to epigenetic modifications, we computed the frequency of well-known methylation in these isolates (see Method and Supplementary Table6). In terms of 5-methylcytosine (5mC), the numbers of modified loci in the five LQ genomes (~240340k) were not significantly higher than those in the HQ ones (210345k, P=0.89, Fig.1e). Similarly, the numbers of N6-methyladenine (6mA) modifications also showed no significant difference between the LQ and HQ groups (98218k vs. 126223k, P=0.34). Because the numbers of mismatch errors in LQ genomes are significantly higher than those of HQ ones (P=0.005), we suspected ONT basecalling algorithms failed to distinguish the novel modification types in the LQ isolates.

We removed the modifications in all microbial samples by WGA (Fig.2a), which randomly amplifies the genome fragments without retaining any epigenetic modification (see Methods). The WGA-demodified samples were sequenced by ONT (R9.4), assembled into chromosomes, and compared with the Illumina/ONT hybrid genomes (Fig.2a, Supplementary Tables7 and 8). The five LQ genomes after WGA exhibited significantly higher quality than those without demodifications (e.g., Q26 to Q53 in R20-0026) (Fig.2b, Supplementary Table9). In particular, the amounts of mismatch errors significantly reduced after demodification (e.g., 5670 to 16 in R20-0026) (Fig.2c). Consequently, the unexpected low quality of ONT was due to excessive modification-induced errors untrained in their basecalling model. The demodification by WGA can produce high-quality ONT genomes without the need for Illumina short reads.

a Worflow of WGA-demodified ONT; b Q scores of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); c numbers of mismatches of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); d WGA and ONT-only genome quality with respect to sequencing depth (shading: mininum and maximum quality in five replicates, line: median quality); e numbers of active/available pores during WGA-demodified and ordinary ONT sequencing.

However, while WGA successfully erased these modifications, the sequencing cost increased by two factors. First, WGA required a higher sequencing depth (~100) for assembling a complete genome when compared with ordinary ONT sequencing (~30) (Fig.2d and Supplementary Figs.2 and 3). It was due to the uneven amplification of WGA, which led to non-uniform sequencing depth and a fragmented assembly at moderate coverage. Second, the WGA-demodified samples may reduce the ONT yields. We observed the number of available/active pores could sometimes decrease quickly (e.g., less than 100 pores after 12h) (Fig.2e), which was possibly owing to the hyperbranched structure unresolved after WGA10. Consequently, the sequencing cost of WGA-demodified samples using ONT is much higher than ordinary sequencing.

We developed a novel computational method (called Modpolish) for correcting these modification-mediated errors without WGA and prior knowledge of the modification systems. Modpolish identifies and corrects the modification-mediated errors by leveraging basecalling quality, basecalling consistency, and evolutionary conservation (Fig.3a, see Methods). Briefly, because the ONT signals are disturbed by modifications, the basecalling quality is substantially lower than the modification-free loci (Supplementary Fig.4). As such, the basecalled nucleotides are often inconsistent at the modified loci (Supplementary Fig.5), yet these loci are within conservative motifs (Supplementary Fig.6). In conjunction with the conservation degree measured by closely-related genomes, only the modified loci with ultra-high conservation will be corrected by Modpolish, avoiding false corrections of strain variations with high specificity.

a Workflow of Modpolish; b Q scores before and after Modpolish; c numbers of mismatches before and after Modpolish (gray: before Modpolish, black: after Modpolish); d the antiviral defending systems encoded by the 12 strains (gray: before Modpolish, black: after Modpolish); e the sequence motif of modification sites in the four mza-encoding strains; f the sequence motif of modification sites on the R20-0026 strain.

We assessed the accuracy of Modpolish by comparing the quality of the ONT-only genomes (polished by Medaka) with those further polished by Modpolish. The results indicated that Modpolish significantly improved the quality of all LQ genomes from Q2734 to Q60 (Fig.3b, Supplementary Table10). The number of mismatches also greatly decreased (e.g., from 5670 to 67 in R20-0026) (Fig. (3c). The numbers of mismatches in some HQ genomes were also reduced by Modpolish. For instance, the mismatches in the R19-2905 were erased from 40 to 6. Consequently, our results suggested that Modpolish made no false corrections on the HQ genomes (Supplementary Tables1113). The comparison of different basecaller versions and models (v4.0.14 vs. v6.3.4, HAC vs. SUP) indicated that these errors remain exist and Modpolish successfully erases most of them (Supplementary Fig.7).

As the modification systems often involve anti-phage defense (e.g., R-M, BREX, DISARM)11,12,13, we investigated the defending systems possessed by the HQ and LQ strains (Fig.3d) (Supplementary Data1). All the HQ genomes encompass at least one R-M system (e.g., Type I, II, or III), which is missing in all LQ isolates. Instead, four LQ strains (i.e., R20-0030, R20-0127, R20-0148, R20-150) carry a novel methyltransferase-encoding mza defending system which is absent in all HQ genomes (Supplementary Fig.8). Analysis of modification sites of the four mza-encoding LQ strains revealed pentanucleotide motif GCAGC (Fig.3e, Supplementary Fig.6). On the other hand, modification loci in the LQ R20-0026 all centered on the motif GCTGG (Fig.3f). Together, these results suggested that two lineage-specific modification systems extensively edited the five LQ genomes. Although their underlying mechanisms remained unclear, the editing at specific motifs with high conservation within each lineage allowed cost-effective in silico correction of these errors by Modpolish.

We then assessed the performance of Modpolish on public ONT datasets sequenced by R9.4 (SUP) and R10.4 flowcells (SUP, duplex/simplex modes). In the R9.4 dataset14, we first compared the quality of seven bacterial genomes polished by Medaka and Modpolish (Fig.4a, Supplementary Table14). The quality of five genomes significantly improved from ~Q45 to Q60. Similarly, the improvement was mainly due to the reduction of mismatches (Fig.4b). For instance, the number of mismatches decreased from 388 to 13 in the Staphylococcus genome after Modpolish. On average, the mismatch reduction rates of all genomes ranged from 50-96%. Consequently, although these bacterial genomes are not extensively modified, Modpolish can further improve their quality after Medaka without false corrections.

Comparison of Medaka and Modpolish for a Q scores and b mismatches on the R9.4 dataset; comparison of Medaka and Modpolish for c Q scores and d mismatches on the R10.4 dataset.

In the R10.4 (duplex mode) dataset3, we compared the genome qualities polished by Medaka and Modpolish (downsampled to ~60) (Fig.4c, Supplementary Table15). In general, Modpolish made little or no improvement in the duplex dataset. For instance, the mismatches produced by Modpolish only reduced from 20 to 19 on the Bacillus genome (Fig.4d). The overall genome quality is very high such that no differences can be seen (Q60). Modpolish demonstrated marginal on a recently published simplex dataset (R10.4, kit 14, Dorado v0.1.1) (Supplementary Fig.9). Therefore, the qualities of ONT R10.4 flowcells, in particular the duplex mode, is not only higher than those of R9.4 and require nearly no further correction. On the other hand, Modpolish may be used to fill the accuracy gap between simplex and duplex modes when the projects aim for higher throughput.

View original post here:
Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based ... - Nature.com

Posted in Genome | Comments Off on Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based … – Nature.com

CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma – Inside Precision Medicine

Posted: at 8:35 pm

Researchers at the Gladstone Institutes have developed a CRISPR-based genome shredding technique that shows promise in treating glioblastoma, an incurable brain cancer. Their research was published this week in the journal Cell Reports.

Much of the work done to develop the technique was done in the lab of Jennifer Doudna, PhD, an author on the paper, and co-winner of the 2020 Nobel Prize in Chemistry for the discovery of CRISPR-Cas9 gene editing technology. Other key players are Mitchel Berger, MD, a neurosurgeon and director of the Brain Tumor Center at University of California, San Francisco (UCSF), whose team helped secure patient-derived cell samples, and Alexendar Perez, MD, PhD, a resident at UCSF who performed much of the computational analysis needed for the study.

Computational analysis was necessary for diving into the non-coding portions of the genome to identify repetitive sequences shared by the glioblastoma cells. Cancer treatments rarely kill all tumor cells. In glioblastoma and other highly recurrent cancers, tumor cells that escape treatment develop multiple genetic mutations that allow them to proliferate.

Building on prior research, the Gladstone team surmised that mutated glioblastoma cells have a unique genetic signature that could be targeted. According to the paper, the team identified unique recurrent GBM-specific sgRNAs mainly in the non-coding genome that were generated by TMZ [chemotherapy] signature mutations characteristic of hypermutated gliomas. Those sequences are the beacon that guides CRISPR to the cancerous cells where it cuts up them up leading to genome fragmentation and DNA damage-induced cell death.

There is a lot to do before this CRISPR-based genome shredding technique can be used therapeutically. For example, the researchers noted that there are inefficiencies in the delivery modalities that need to be addressed. And it is important to note that the work published in Cell Reports does not detail a path to direct clinical implementation for this approach. But the results are promising evidence of CRISPRs potential to treat not just glioblastoma but other hypermutated tumors, according to Christof Fellmann, PhD, study lead and corresponding author on the paper. We see CRISPR as a gateway to a new therapeutic approach that wont be subject to the possibility of tumor cell escape.

And the researchers have reason to be hopeful. Results in the paper indicate that the technique works only on the tumor cells, sparing healthy ones during treatment. And in cases where tumor cells escaped the initial shredding, they succumbed to a second round of treatment. We understand so much today about glioblastoma and its biology, yet the treatment regimens havent improved, said I-Li Tan, PhD, first author on the paper. Now we have a precise way to target the cells that are driving the cancer, and we hope this may one day lead to a cure.

View post:
CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma - Inside Precision Medicine

Posted in Genome | Comments Off on CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma – Inside Precision Medicine

Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion … – Nature.com

Posted: at 8:35 pm

Construction of the VgrG database

Encoded as a stand-alone gene or fused at the N-terminus of the toxin, the MIX domains can assist the delivery of their cognate T6SS effector19,20. As the central component of the spike complex, VgrG is a good marker to explore the potential conserved domains involved in the delivery of T6SS effectors. Therefore, we set out to create a comprehensive dataset of VgrG proteins from available Gram-negative genome sequences lodged in the public GenBank database.

Previous studies have revealed that the Afp8 proteins of extracellular contractile injection systems (eCISs) are homologous to VgrG proteins, thus representing a potential confounding influence on the integrity of the dataset24,25,26. Therefore, we firstly downloaded 872 experimentally verified VgrG proteins from the established SecReT6 T6SS database27. It provides a positive control dataset to better avoid potential false positive hits (such as Afp8 homologs). A bioinformatic scan for conserved domains confirmed that the VgrG domain (accession: COG3501) was present in all 872 verified VgrG proteins in addition to 472 Afp8 proteins available from the dbeCIS database26. Importantly, the identified domains found in 861 (99%) verified VgrGs range between 451 and 750 amino acids, whereas there are only 10 (2%) Afp8 proteins that fall within this size range (Fig.1a). We therefore proposed the use of an empirical criterion for the further systematic screening for bona fide VgrG proteins in the 133,722 publicly available bacterial genomes (Fig.1a). Using this approach, a total of 130,825 VgrG proteins were successfully identified from 45,041 Gram-negative bacterial genomes.

a The workflow for the identification of valid VgrGs from 133,722 publicly available bacterial genomes. The 872 VgrGs available from the established T6SS database SecReT6 (red) and 472 putative Afp8 proteins, encoding VgrG domains, available from the eCIS database dbeCIS (green) were used as positive and negative datasets respectively for the selection of the empirical criteria for large-scale VgrG screening. b The 872 VgrGs available from the SecReT6 database with predefined subtype information are indicated by colored stars (key). VgrGs from subtypes i4a and i5 were mixed within the same clade in the tree, but these two subtypes were indeed closely related in the previous study27. The known type iii T6SS clade, derived mostly from Bacteroidetes, is highlighted with red shadow.

To further characterize the VgrG proteins identified above, we constructed a maximum-likelihood (ML) phylogenetic tree based specifically on the sequences of the conserved VgrG domains (Fig.1b). Using the aforementioned 872 previously defined VgrGs as indicators, we observed that our ML tree exhibited a similar overall topology regarding types/subtypes of T6SS operons as previously described27, supporting the validity of our approach.

Firstly, a screen was performed to identify MIX containing protein, based on the aforementioned VgrG database. A total of 7208 MIX containing proteins within vgrG loci were identified, which are widely distributed among various bacteria (Supplementary Fig.1). Importantly, sandwiched between vgrG and downstream effector gene, MIX domain exhibit multiple encoding configurations including single proteins and fusions at the C-terminus of VgrG or N-terminus of effector (Supplementary Fig.2).

Based on the encoding features of MIX domain, we then developed a screening strategy to identify more conserved domains with similar multiple encoding configurations as MIX within vgrG loci from the VgrG database created above (Fig.2). In brief, we scanned a maximum of three downstream genes of each vgrG locus to collect the conserved domains within the proteins sandwiched by vgrG and downstream toxin (if present). A domain family was reported if it was present in both of two encoded forms: stand-alone gene (i.e., single form) and fused to either the C-terminus of VgrG or the N-terminus of a toxin (i.e., fusion form). Finally, to further explore the presence of these domain families within vgrG loci in finer detail, we extended our search without the limitation of linkage to known toxins to identify more candidate domain-containing proteins within vgrG loci (Fig.2).

For each vgrG locus, a maximum of three continuous downstream genes encoded on the same strand as vgrG, with an intergenic distance between adjacent genes of <1kb were collected. Known components of the T6SS operon and any annotated pseudogenes were excluded. Then, the 280,581 remained downstream genes were scanned for conserved domains by batch CD-search. A total of 1321 putative toxin domain families were deduced from a collection of 928 experimentally verified exotoxins/effectors available from the VFDB database53. Each domain family identified within downstream genes dataset were further classified into three cases for final manual curation and determination.

After the screening process and careful manual curation, DUF2345 (cl01733), FIX-like (cl41761), LysM (cl21525), 5 (cl33691), PG_binding_1 (cl38043) and PHA00368 (cl30808) were successfully identified (Supplementary Table1). As shown in Supplementary Fig.3, besides the single form, all these domain families have at least one fusion form. Further, the FIX-like (cl41761), LysM (cl21525), 5 (cl33691) and PG_binding_1 (cl38043) families can be found in both fusion forms. Notably, some of them were encoded adjacent to known T6SS adaptor, which implies that their functions can be different from T6SS adaptors.

Besides MIX domain, three well characterized T6SS adaptor families (DUF4123, DUF2169, and DUF1795) had been reported to assist the interaction between VgrG and its cognate effectors. We further screened these adaptor families encoded within vgrG loci. Amongst 130,825 vgrG loci, besides three adaptor domains (37.44%) and MIX domain (3.14%), 31.33% of vgrG loci encode at least one of the six conserved domain families identified here. In contrast, only 28.09% of vgrG loci do not include any of the adaptor/MIX/conserved domains mentioned above (Supplementary Fig.4).

Although DUF2345 is considered as an extension of the VgrG gp5 domain, it is not encoded by all VgrGs6,28,29. Nevertheless, among the aforementioned six conserved domains, the DUF2345 domain is the most frequently identified in vgrG loci (Supplementary Table1). We therefore explored its function in T6SS. Three vgrG loci encoding the DUF2345 domain were found in Escherichia coli PAR, Pseudomonas aeruginosa strain PAO1 and PS42 (Fig.3a). Sequence comparison indicated that AKO63_2953 (VgrGPAR), AKO63_2954 (DUF2345PAR) and AKO63_2955 (M35PAR), corresponding to the VgrG domain, the DUF2345 domain and the M35 (metallopeptidase) toxin domain of PA0262 (VgrG2bPA), respectively. Similarly, Q094_05019 (VgrGPS) of P. aeruginosa PS42 encodes VgrG domain, whereas Q094_05020 encodes N-terminal DUF2345 domain and C-terminal M35 domain. AlphaFold v2.0 predicted that VgrGPAR, VgrGPS and VgrG domain of VgrG2bPA have the same conformation (Supplementary Fig.5a). Further, E.coli locus (VgrGPAR, DUF2345PAR and M35PAR), PS42 locus (VgrGPS and Q094_05020) and VgrG2bPA form similar trimmer structure, which implies that these three complexes might endow similar biological functions (Supplementary Fig.5b). As these three loci encode VgrG, toxin and immunity proteins, we speculate that DUF2345 maybe involved in the interaction between VgrG and its cognate effector.

a The vgrG loci of E. coli PAR, P. aeruginosa PAO1 and PS42. b E. coli expressing VgrG2bPA or its truncated mutant VgrG2bPAM35 were detected by Western blot. Anti-RpoB is lysis control. c Survival of E. coli expressing VgrG2bPA or its truncated mutant VgrG2bPAM35 in pET22b. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. d Intraspecies P. aeruginosa competition assay between the VgrG2bPAPA0261 strain and various isogenic attacker strains at 37C for 24h. Competition assay between the parental strain (PAO1) and itself (gray) is the internal control. The values and error bars represent the meanSD (n=3 biological replicates). A one-way ANOVA with Dunnetts test was employed using the parent versus prey competition as the comparator (*p<0.05; ns, not significant). e E. coli expressing M35PAR, AKO63_2955-2956 or DUF2345PAR were detected by western blot. Anti-RpoB is lysis control. f Survival of E. coli expressing M35PAR, AKO63_2955-2956 or DUF2345PAR in pET22b. Ten-fold serial dilutions of cultures were spotted on LB agar containing the given concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. g Interactions between DUF2345PAR and VgrGPAR or M35PAR. Shown here are immunoblots of lysates (total) and immunoprecipitates with anti-FLAG affinity beads (IP: FLAG) of DUF2345PAR transformed with empty vector or a plasmid encoding Myc-tagged VgrGPAR or S-tagged M35PAR. GFP and VgrGPRE are control proteins. h DUF2345PAR mediates the interaction between VgrGPAR and M35PAR. Shown here are immunoblots of lysates (total) and immunoprecipitates with an anti-FLAG affinity beads (IP:FLAG) of M35PAR transformed with a plasmid encoding either Myc-tagged VgrGPAR or S-tagged DUF2345PAR.

Wood et al. showed that VgrG2bPA-PA0261 constitutes a T6SS antibacterial effector-immunity pair30. E. coli toxicity assay was used to test whether the DUF2345 domain in VgrG2bPA is toxic to bacteria (Fig.3b, c). As expected, overexpressed in E. coli, VgrG2bPA exhibited acute toxicity and co-expression of the immunity gene (PA0261) relieved this growth defect. Crucially, truncation of the M35 domain of VgrG2bPA restored growth, which indicated that DUF2345 in itself is not toxic to E. coli. Intraspecies P. aeruginosa competition assays were also performed to determine whether the DUF2345 domain could affect the function of VgrG2bPA (Fig.3d). Although the VgrG2bPAPA0261 strain exhibited a significant growth disadvantage against the wildtype PAO1 strain, it could no longer be outcompeted by both ClpV2PA and VgrG2bPA attacker strain. Notably, compared with the wildtype vgrG2bPA gene, the complement of vgrG2bPADUF2345 could not restore the growth advantage of the attacker strain. Further, although the secretion of Hcp (the T6SS inner stylet protein) was not affected, complemented in the VgrG2bPA strain, VgrG2bPADUF2345 could only be detected in the cells, but not in the supernatant (Supplementary Fig.6a). In addition, the production of VgrG2bPADUF2345 was still detrimental to E. coli when it remains in the periplasm (Supplementary Fig.6b, c). Therefore, it is clear that the DUF2345 domain disturbs the antibacterial ability of VgrG2bPA by ablation of its secretion.

We subsequently explored the function of DUF2345 when encoded as a distinct gene, which is within the locus containing vgrGPAR, M35PAR, along with the cognate immunity protein (Fig.3a). E. coli toxicity assay demonstrated that M35PAR exhibited bacterial killing activity, which was inhibited by its immunity protein (Fig.3e, f). Consistent with the results of Fig.3c, expression of DUF2345PAR in isolation had no deleterious effect on bacterial growth (Fig.3f). Immunoprecipitation assays of proteins co-expressed in E.coli confirmed that DUF2345PAR can specifically bind VgrGPAR and M35PAR, but not VgrGRPE (VgrG in Burkholderia sp. RPE67) (Fig.3g). Importantly, M35PAR could not interact with VgrGPAR in the absence of DUF2345PAR (Fig.3h). These results implied that DUF2345PAR is involved in the interaction between VgrGPAR and M35PAR to assist the loading of M35PAR on the T6SS spike.

Taken together, DUF2345 domain is indispensable for the delivery of its cognate toxin via fusion at the C-terminus of VgrG or encoded as a single gene.

Considering that DUF2345 is encoded as either a fusion at the C-terminus of VgrG or a distinct gene downstream of vgrG, we then investigated whether the sequences of VgrG domains showed a correlation with those of DUF2345. An iterative procedure was devised to hierarchically cluster the 52,277 VgrG domains and their cognate DUF2345 domains, respectively. At the 30% amino-acid sequence similarity cutoff, VgrG domains form three major clusters and ten outliers, whereas DUF2345 domains were classified into 37 distinct groups (Supplementary Fig.7). These findings imply that, compared to the relatively conserved VgrG domains, the sequences of DUF2345 domains exhibited higher diversity.

As we demonstrated above, DUF2345 is involed in the interaction between VgrG and the toxin protein. To further delve into this, we performed a Sankey analysis to investigate the relationship between DUF2345 domains and their downstream toxins in greater detail. It is interesting to note that most of DUF2345 clusters showed an obvious taxon-specific distribution and correlated well with their downstream toxins (Fig.4). Meanwhile, we also noticed that there are some toxins which correlated to more than one of DUF2345 clusters, such as Lyz-like and DUF2235 domains. To test whether this is a result of the intrinsic sequence diversity of these toxins, an iterative procedure was applied to further subdivide these toxin groups. As expected, the sub-clusters of Lyz-like and DUF2235 domains also correlated well to DUF2345 groups (Supplementary Fig.8). Thus, our data reveals that, DUF2345 domains exhibit high sequence diversity andcorrelate well with their downstream toxins.

A Sankey diagram showing the relationship between bacterial phylum/class, family, the corresponding DUF2345 clusters and the downstream toxin domain families (from left to right). Only DUF2345-encoding loci with adjacent known toxin domains were included. Loci from genomes without necessary taxa information were excluded. The number of sequences involved in each node is given after the node name. The red arrows on the right indicate some toxins which were linked to more than one DUF2345 clusters.

Absent from T6SS, LysM containing protein is one of the core components of eCIS, which shares several key homologous proteins in common with T6SS and forms a similar architecture31,32. Therefore, it is fascinating that our systematic screening implied that LysM domain is likely to be functional in T6SS.

Figure5a showed a vgrG loci encoding a LysM containing protein in Ketobacter alkanivorans GI5. E. coli toxicity assay showed that Kalk_10455 exhibited acute toxicity and co-expression of Kalk_10450 relieved this growth defect, which indicated that Kalk_10450 is an immunity protein against Kalk_10455 (Fig.5b, c). Notably, Kalk_10465 (VgrGG15) and Kalk_10460 (LysMG15) exhibited no toxicity when they were expressed in E. coli (Fig.5c). Although immunoprecipitation assays of proteins co-expressed in E.coli confirmed that Kalk_10455 specifically binds LysMG15 and VgrGG15, Kalk_10455 could not bind VgrGG15 in the absence of LysMG15 (Fig.5d).

a The vgrG loci of Ketobacter alkanivorans GI5 and Burkholderia sp. RPE67. b Immunoblots demonstrating the expression of VgrG2bG15, LysMG15 and Kalk_10455 in E. coli. Anti-RpoB is lysis control. c Survival of E. coli expressing VgrGG15, LysMG15 and Kalk_10455 in pETduet. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. d Interactions between VgrGG15, LysMG15 and Kalk_10455. Shown here are immunoblots of lysates (total) and immunoprecipitates with anti-FLAG affinity beads (IP: FLAG) of Kalk_10455 and GFP transformed with a plasmid encoding Myc-tagged VgrGG15 or Strep-tagged LysMG15. 0423PA is control protein. e Immunoblots demonstrating the expression of BRPE_05220 and NLPC_P60 domain in E. coli. Anti-RpoB is lysis control. f Survival of E. coli expressing BRPE_05220 and NLPC_P60RPE domain in pETduet. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. g LysM domain mediates the interaction between VgrGRPE and BRPE_05220. Shown here are immunoblots of lysates (Input) and immunoprecipitates with an anti-FLAG affinity beads (IP:FLAG) of BRPE_05220 or BRPE_05220LysM transformed with a plasmid encoding either Myc-tagged VgrGRPE or Myc-tagged 0423PA. 0423PA is control protein.

BRPE67_05220 in Burkholderia sp. RPE67, which includes both LysMRPE and NLPC_P60RPE domain, was used to further explore the function of LysM domain (Fig.5a). E. coli toxicity assays demonstrated that BRPE_05220 exhibited bacterial killing activity. Moreover, expression of NLPC_P60RPE domain in isolation had deleterious effect on bacterial growth, which was inhibited by BRPE_05230 (Fig.5e, f). Further, immunoprecipitated wildtype BRPE_05220, but not LysM truncated in BRPE_05220 (BRPE_05220LysM), interacted with BRPE_05210 (VgrG RPE) (Fig.5g).

AlphaFold v2.0 predicted that BRPE_05210 (VgrG) and BRPE_05220 (LysM and NLPC_P60) form similar trimmer structure with VgrG2bPA, which implied that LysM may mediate the interation between VgrG and toxin (Supplementary Fig.9). Further, the LysM domain phylogenetic analysis revealed the diversity of T6SS-related LysM domains, which is evolutionarily distinct from the phage-/eCIS-associated LysM domains (Supplementary Fig.10).

In sum, encoded at downstream of LysM containing gene or fused at the C-terminal of LysM domain, toxin interacts with VgrG in a LysM dependent manner implying LysM may assist the loading of its cognate effector onto the secretion apparatus.

The DUF2345 containing proteins exhibit specific correlation with their downstream diverse toxins (Fig.4). A similar Sankey analysis was performed to investigate the relationship between the other five identified conserved domain families along with the confirmed co-effector (MIX) and their downstream toxins (Supplementary Fig.11). Notably, most of the characterized toxin domains showed an obvious domain specific distribution with limited exceptions. For instance, as polymorphic toxins, RHS-containing proteins encode variable C-terminal toxic domains with conserved N-terminal RHS domain13. Most of the Rhs superfamily are linked to FIX-like (cl41761) and 5 (cl33691) domains. LysM domains are mainly correlated with Lyz_like, NlpD and NLPC_P60 superfamilies. As these domain families identified in this study, including FIX-like (cl41761), LysM (cl21525), 5 (cl33691), PG_binding (cl38043) and PHA00386 (cl30808), share a similar genetic organization and correlation with downstream toxins as the DUF2345 domain, it is reasonable to speculate that they would also function in the T6SS effector discrimination.

The overall distribution of the six conserved domain families was then analyzed (Fig.6). It is interesting to note, these families were not evenly encoded among different bacterial families. For example, although DUF2345 domains are widely distributed among Proteobacteria genomes, they are rarely encoded in the genomes of Vibrionaceae and Rhodospirillaceae bacterial families. In contrast, the PG_binding_1 domain is limited to the genomes of -proteobacteria, including the families of Chromatiaceae, Sinobacteraceae and Vibrionaceae. In general, although these conserved domains are widely encoded among various bacteria, their distributions exhibit obvious taxonomic specificity, which is coincident with their corresponding cognate effectors as shown in Fig.4 and Supplementary Fig.11.

Only taxa with genomes encoding at least one of the six conserved domains within the vgrG loci are shown for brevity. A total of 55,228 vgrG loci are included, but genomes without known assigned genus are excluded. The circles represent phylum, class, order, family and genus from inner to outer, and are color-coded by phylum/class (key). The family names are given outside the taxonomic tree. The outer heatmaps represent the percentage of genomes encoding the corresponding conserved domains for each genus (key).

Read more:
Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion ... - Nature.com

Posted in Genome | Comments Off on Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion … – Nature.com

TRISH to investigate the effects of spaceflight on the human genome, central nervous system – Odessa American

Posted: at 8:34 pm

HOUSTON The Translational Research Institute for Space Health (TRISH) will conduct a suite of human health and performance research projects during Axiom Spaces upcoming Axiom Mission 3 (Ax-3) to the International Space Station (ISS), scheduled to launch in 2024. TRISH is a consortium led by Baylor College of Medicines Center for Space Medicine with partners California Institute of Technology and Massachusetts Institute of Technology.

The selected research projects are designed to enhance understanding of the human experience in space and inform the development of high-impact scientific and technological solutions to help humans thrive on future space missions. Each project is part of TRISHs commercial spaceflight health research program, Enhancing eXploration Platforms and Analog Definition (EXPAND). The projects are led by researchers from across the nation who will investigate key space health topics, including space motion sickness, sleep disturbance, genome alterations, changes to cognitive function and eye and brain health, a news release said.

Our commercial spaceflight partners such as Axiom Space are instrumental to cutting-edge research, including these projects designed to reveal how the human body and mind function in the extreme environment of space, said Dr. Emmanuel Urquieta, TRISH chief medical officer, EXPAND program lead and assistant professor in the Center for Space Medicine at Baylor. This work represents an important step in our journey to understand the bodys response to challenging conditions, which is critical for improving human health both here on Earth and on future long-duration missions, including to the Moon and Mars.

Ax-3 is the third commercial astronaut mission to the ISS. The Ax-3 crew will live and work aboard the ISS for up to 14 days, implementing a full mission comprising microgravity research, educational outreach and technology demonstrations. The four-person crew includes Commander Michael Lpez-Alegra, Pilot Walter Villadei, and Mission Specialists Alper Gezeravc and Marcus Wandt. A SpaceX Falcon 9 rocket will launch the Ax-3 crew aboard a SpaceX Dragon spacecraft to the ISS from NASAs Kennedy Space Center in Florida.

Axiom Space appreciates the continued partnership with TRISH on our commercial astronaut missions and the opportunity to further our knowledge of human health through rigorous scientific studies, said Lucie Low, chief scientist for microgravity research at Axiom Space. TRISHs growing database of medical information collected from commercial spaceflight participants provides additional data sets that can help to inform the expanding commercial space industry.

The TRISH EXPAND biomedical research projects for Ax-3 include:

Cognitive and Physiologic Responses in Commercial Space Crew on Short-Duration Missions, Mathias Basner, M.D., Ph.D., M.S., University of Pennsylvania Perelman School of Medicine

Spaceflight participants experience a multitude of stressors that can affect brain function and crew performance. Basners team will track spaceflight participants performance in memory, abstraction, spatial orientation, emotion recognition, risk decision-making and sustained attention before and after the mission to assess the mental impact of space travel.

Otolith and Posture Evaluation II, Mark Shelhamer, Sc.D., Johns Hopkins University

Many space travelers develop motion sickness, nausea and disorientation shortly after launch and landing, which can impact performance. Using a series of tests administered on a tablet device, Shelhamer will study how astronauts inner ears and eyes sense and respond to motion before and immediately after spaceflight to better predict who is likely to develop space motion sickness.

Space Omics + BioBank, Richard Gibbs, Ph.D., Baylor College of Medicine

Gibbs team will gather biological specimens from astronauts before and after their mission to assess the effects of spaceflight on the human body at the genomic level. Comparisons of the pre- and post-flight samples can yield critical insights into the impact of space travel on human health and advance health care on Earth by revealing alterations in gene expression in response to extreme environmental stressors.

SANS Surveillance, TRISH

Understanding Spaceflight Associated Neuro-Ocular Syndrome (SANS), which involves changes to the eyes and brain during spaceflight, is of critical importance to NASA. This project collects related ocular images and vision function data during the ground phases of the mission.

Standardized research questionnaires, TRISH

TRISH has implemented a set of standardized research questionnaires for the crew to collect data on their sleep, personality, health history, team dynamics and immune-related symptoms. These additional contextual and qualitative data points will become part of TRISHs EXPAND research database, which collects and stores pre-flight, in-flight and post-flight health data from commercial astronauts in a centralized research database, available to current and future scientists exploring space health.

Sensorimotor adaptation, TRISH

The ability to stand, balance and have full body control will be critical elements when astronauts return to the moon. TRISH collects data before and after flight to help understand the level of sensorimotor ability and change as well as time to recovery.

TRISH is thrilled to continue our work in advancing human health in space with the help of Axiom Space, said Jimmy Wu, TRISH senior biomedical engineer, EXPAND program lead and instructor in Baylors Center for Space Medicine. The Axiom team and spaceflight participants are helping us make strides in understanding the risks to human health during space travel.

TRISH is an applied space health research catalyst empowered by the NASA Human Research Program to solve the challenges of human deep space exploration. Led by Baylor College of Medicines Center for Space Medicine, the consortium leverages partnerships with Caltech and MIT.

Like Loading...

More:
TRISH to investigate the effects of spaceflight on the human genome, central nervous system - Odessa American

Posted in Genome | Comments Off on TRISH to investigate the effects of spaceflight on the human genome, central nervous system – Odessa American

The venom preceded the stinger: Genomic studies shed light on the origins of bee venom – EurekAlert

Posted: at 8:34 pm

image:

Components of the venom cocktail used by wild bees such as the Banded Mud-Bee (Megachile ericetorum) are evolutionarily older than their sting.

Credit: Bjrn von Reumont

FRANKFURT. Venoms have developed in many animal groups independently of each other. One group that has many venomous species is Hymenoptera, an insect order that also includes aculeates (stinging insects) such as bees, wasps and ants. Hymenoptera is very species-rich, with over 6,000 species of bees alone. And yet, despite the great ecological and economic importance of hymenopterans, very little is known about the evolutionary development of their venoms.

By means of comparative genomics, researchers led by Dr. Bjrn von Reumont, who is currently a visiting scientist in the Applied Bioinformatics Working Group at the Institute for Cell Biology & Neuroscience of Goethe University Frankfurt, have now examined systematically and for the first time how the most important components of the venom of bees and other hymenopteran taxa developed in the course of evolution. The toxins are complex mixtures composed of small proteins (peptides) and a few large proteins and enzymes. Stinging insects actively inject this poisonous cocktail into their prey or attackers with the help of a special sting apparatus.

In the first step, the researchers identified which of the peptides and proteins in the venom were most prevalent in hymenopterans. To do this, they drew on information from protein databases, although this was sparse. In addition, they analyzed the proteins in the venoms of two wild bee species the violet carpenter bee (Xylocopa violacea) and the great-banded furrow-bee (Halictus scabiosae) as well as of the honeybee (Apis mellifera). They found the same 12 families of peptides and proteins in all the hymenopteran venoms analyzed. These are evidently a common ingredient in these venom cocktails.

In collaboration with colleagues from the Leibniz Institute for the Analysis of Biodiversity Change (LIB), the Technical University of Munich (TUM) and the LOEWE Center for Translational Biodiversity Genomics (LOEWE TBG), the research team then searched for the genes of these 12 peptide and protein families in the genome of 32 hymenopteran taxa, including sweat bees and stingless bees, but also wasps and ants such as the notorious fire ant (Solenopsis invicta). The differences in these genes, in some cases only the exchange of single letters of the genetic code, helped the scientists to determine the relationship between the genes of different species and later with the help of artificial intelligence and machine learning to compile a lineage of the venom genes.

The surprising result was that many of the venom genes analyzed are present in all hymenopterans. Evidently the common ancestor of all hymenopteran taxa already possessed these genes. This makes it highly probable that hymenopterans are venomous as an entire group, concludes von Reumont. For other groups, such as Toxicofera, which includes snakes, anguids (lizards) and iguania, science is still debating whether the venoms can be traced back to a common ancestor or whether they evolved separately.

Within Hymenoptera, only the stinging insects bees, wasps and ants have an actual stinger to administer the venom. The evolutionary old parasitic sawflies, by contrast, use their ovipositor along with their eggs to inject substances that alter their host plants physiology: The sirex wood wasp (Sirex noctilio), for example, not only introduces a fungus into the plant, which facilitates the colonization of the wood by its larvae, but also its own poisonous cocktail with the venom proteins examined in the study. The purpose of these proteins is to create suitable conditions in the plant for the larvae. This means that the sirex wood wasp can also be classified as venomous, says von Reumont.

New venom components in bees are the gene for the peptide melittin and genes for representatives of the newly described protein family anthophilin-1. The fact that melittin is encoded by just one single gene came as a surprise to the researchers, as von Reumont explains: Not only are there many different variants of melittin, but the peptide also accounts for up to 60 percent of the dry weight of bee venom. That is why science previously assumed that there must be many gene copies. We were able to disprove this quite clearly. Because they found the melittin gene only in bees, the researchers also invalidated the hypothesis that it belongs to a group of venom genes postulated for stinging insects called aculeatoxins. Von Reumont is convinced: This shows us once again that genome data are the only way to draw meaningful conclusions about the evolution of venom genes.

The Frankfurt study is the first one to show for an entire insect group with around one million species where venom genes originated and how they have developed. It provides a starting point for tracing the evolution of venom genes in the ancestors of Hymenoptera as well as specializations within the group. However, to be able to perform comparative genomics on a large scale, analysis methods for the partly very large protein families must first be automated.

Experimental study

Animals

Prevalent bee venom genes evolved before the aculeate stinger and eusociality

23-Oct-2023

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Read this article:
The venom preceded the stinger: Genomic studies shed light on the origins of bee venom - EurekAlert

Posted in Genome | Comments Off on The venom preceded the stinger: Genomic studies shed light on the origins of bee venom – EurekAlert

Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of … – Nature.com

Posted: at 8:34 pm

Genome assembly and annotation

The widely cultivated A. sinensis cultivar Qinggui1 was selected for genome sequencing (Fig.1a). We generated a total of 376.4Gb Single Molecule Real-Time (PacBio SMRT) sequences and 60.8Gb paired HiSeq reads (PE150), along with 325.0Gb effective chromosome conformation capture (Hi-C) reads (TableS1). The assembly was initialized by PacBio SMRT sequences, which were corrected with high-quality paired HiSeq reads. A genome size of 2.16Gb was obtained after the final assembly. The Hi-C interaction matrices showed a distinct separation pattern of 11 blocks that could be used to cluster and orient the contigs and anchor them to 11 chromosomes (Fig.1b and Tables1 andS2). The size of the genome that we assembled was similar to the size estimated by flow cytometry13. Mapping the short reads back to the assembly led to a correction of 29,533 single-base errors and 9426 small Indels. The identification of 1,588,740 heterozygous SNPs showed a low level of heterozygosity in this self-fertilized plant. Evaluation by the Benchmarking Universal Single-Copy Orthologs (BUSCO) method19,20 showed >99% completeness of the genome (TableS3). These results confirm a high-quality genome assembly. Please refer to Table1 and Data availability for detailed information on the genome assembly.

a Morphology of the sequenced plant. b Hi-C map of chromosomes. c a-b. SNP and indel density and distribution identified between A. sinensis (GS) and A. sinensis (QH); c Density and distribution of LTR retrotransposons (purple: LTR; blue: Copia-type; dark green: Gypsy-type); d Gene density and distribution; e Colinear gene pairs within the genome. The colors of linking lines indicate the number of one-to-one gene pairs in the collinearity blocks: 40, green: 20, blue: 10, gray: 5. This figure was prepared by using shinyCircos110.

Approximately 80.24% of the assembly (1.66Gb) was identified to be repetitive sequences, which was higher than estimates in another Apiaceae family member, coriander (70.59%) (Fig.1c, TableS4). Long terminal repeats (LTRs), primarily consisting of Gypsy and Copia subtypes, were most abundant. The other repeats were categorized as DNA transposons (3.65%), long interspersed nuclear elements (LINEs; 1.26%), short interspersed nuclear elements (SINEs, 0.03%), and uncharacterized repeats (19.77%) (TableS5).

We predicted a total of 41,040 protein-coding genes (TableS6) using ab initio methods, protein homology, and RNA-seq reads from different tissues. Of them, 98.3% were mapped to the chromosomes, and most were distributed in the terminal regions (Fig.1c). Using the iTAK pipeline21, we predicted 2,996 transcription factor (TF) genes in the A. sinensis genome. The top five TF families were MYB/MYB-related (209), AP2/ERF-ERF (172), bHLH (166), C2H2 (154), and NAC (135). Compared with those in other Apiaceae plants, GeBP, HSF, GARP-G2-like, C2C2-GATA, C2C2-Co-like, HB-WOX, and Trihelix families were expanded whereas C2C2-YABBY, B3-ARF, and GRAS genes dramatically decreased in A. sinensis (Fig.S1). The genome that we assembled in this study included more TF genes in most TF families than that in the published A. sinensis (GS) genome (Fig.S1).

Despite the increasing number of sequenced genomes of medicinal plants, systematic studies of their evolutionary relationships are relatively scarce. To explore the phylogenetic position of A. sinensis in the Apiaceae family and its evolutionary relations with other species, we selected typical representative families/orders and medicinal plant species of rosids and asterids according to the Angiosperm Phylogeny Group classification (APG V4) classification system22 and constructed a phylogenetic tree using one-to-one homologous gene families. These 20 representative angiosperms included 12 well-known medicinal plant species (TableS7) from 14 families and 12 orders, representing the major botanical taxonomic groups of core eudicots.

Among these species, Vitis vinifera was chosen for its important evolutionary position and its wide use as a model and basal plant for plant evolutionary research23. Arabidopsis thaliana and Solanum lycopersicum are well-studied model eudicot plants24,25. Theobroma cacao and Camellia sinensis are two of the most important beverage crops and are rich in secondary metabolites such as caffeine26,27. C. sinensis is also one of the basal species of asterid plants27. Populus trichocarpa was selected as a model plant for the study of lignin biosynthesis and phenylpropanoid metabolism28, which is also one of the most important metabolic pathways in A. sinensis related to the bioactive metabolites of ferulic acid, lignans, and coumarins. Cannabis sativa is one of the most valuable agriculturally important crops in nature and is also used to produce well-known drugs - tetrahydrocannabinol (THC) and cannabidiol (CBD)29. Ophiorrhiza pumila, belonging to the family Rubiaceae, is an important herbaceous medicinal plant and can accumulate camptothecin (CPT)30. Scutellaria baicalensis, Salvia miltiorrhiza, Taraxacum mongolicum, Artemisia annua, Lonicera japonica, Panax notoginseng, Panax ginseng, Angelica sinensis, and snapdragon (Antirrhinum majus L.) are widely used as traditional Chinese medicines with thousands of years of history in China. In addition, we also included Daucus carota, Apium graveolens, and Coriandrum sativum, which are important members of the Apiaceae family, to examine the evolutionary relationships within the family and the evolutionary status of A. sinensis.

We identified a total of 2133 one-to-one orthologous gene families shared by all the species (Fig.S2). Using these orthologs, we constructed a phylogenetic tree by the concatenation method. As expected, the topology of the tree was consistent with the APG V4 classification. In the Apiales order, Araliaceae was grouped with Apiaceae, and Araliaceae was considered to be the ancestral family. Divergence time estimates showed that these two families separated around 58 MYA. Within the Apiaceae family, A. graveolens and D. carota diverged approximately 23 MYA, which is much earlier than the divergence of A. sinensis (QH) and its sister clade C. sativum (12 MYA) (Fig.2a).

a Molecular phylogenetic tree of 20 representative angiosperm species constructed using 2133 concatenated conserved protein sequences by the ML and BI methods. b Phylogenetic tree of A. sinensis and other Apiaceae species, inferred by estimating divergence time using 3188 single-copy ortholog sequences. P. notoginseng was used as an outgroup. The numbers in green and red colors indicate gene family expansion and contraction compared with the most recent common ancestors, respectively. Estimated divergence times (MYA, million years ago) are indicated at each node. The Venn diagram shows the proportion of gene families under the unchanged (blue), expansion (red) and contraction (green) scenarios. c KEGG pathway enrichment analysis of expanded gene families in the A. sinensis (QH) genome. Only the enriched KEGG pathways with p values<0.05 are displayed. d Distribution of 4DTv distances of syntenic orthologous genes of Apiaceae species. The black arrows mark the WGD events. e The KS distribution for orthologous gene pairs within Apiaceae species. V. vinifera was used as the model organism for evolutionary analysis. The shape of the curve and the position of the peak are almost identical between A. sinensis (QH) and A. sinensis (GS). The highlighted peak regions represent two WGD events.

To further investigate the evolutionary relationships among Apiaceae species, we clustered approximately 91.3% (206,682) of the genes from five Apiaceae species and one outgroup species (P. notoginseng) into 29,108 orthologous groups and extracted 3189 single-copy genes (TableS8). We constructed a phylogenetic tree based on the concatenated sequence alignment of these single-copy gene families (Fig.2b). C. sativum showed the most marked gene expansion. A. sinensis (QH) and A. sinensis (GS) were clustered together and C. sativum was their closest relative. A. sinensis (QH) had more expanded and fewer contracted gene families than A. sinensis (GS) (Fig.2b).

We identified 3698 genes as members of significantly expanded gene families (P<0.01) in A. sinensis (QH) and mapped them to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for functional enrichment analysis. We detected 33 significantly enriched pathways (P<0.05), and the top enriched metabolic pathways included Glycosphingolipid biosynthesis, Zeatin biosynthesis, Benzoxazinoid biosynthesis, Oxidative phosphorylation, Sesquiterpenoid and triterpenoid biosynthesis, Biosynthesis of unsaturated fatty acids, Selenocompound metabolism, and Indole alkaloid biosynthesis (Fig.2c and TableS9). Some of the enriched KEGG pathways were involved in plant volatile biosynthesis, such as Sesquiterpenoid and triterpenoid biosynthesis and Phenylpropanoid biosynthesis, which suggested that these genes may contribute to the adaptive phenotypic diversification of A. sinensis species.

Whole-genome duplications (WGDs) are widely recognized as a major source of species diversification in many eukaryotic lineages based on various lines of evidence31. To identify potential WGD events, we calculated the nucleotide divergence at fourfold synonymous third-codon transversion positions (4dTv) and the synonymous substitution rates (Ks) for collinear gene pairs within each species. In addition to the five members of the Apiaceae family, namely, D. carota, A. graveolens, C. sativum, A. sinensis (GS), and A. sinensis (QH), we also included the model plant V. vinifera in our study.

The intragenomic paralogous genes of the Apiaceae species exhibit three distinct peaks in their 4dTv distributions (Fig.2d). The last peak (), shared with V. vinifera, signifies an ancient Whole Genome Triplication (WGT) event common to all eudicot plants. The first two peaks indicate two recent lineage-specific Whole Genome Duplication (WGD) events that took place prior to the divergence of the family members within the Apiaceae family. This observation aligns with a previous study which suggested that A. sinensis has undergone three polyploidy events13. By comparing the peak positions across species, we inferred a sequence of WGD events: A. sinensis experienced the most recent event, followed by C. sativum and then A. graveolens. This sequence corroborates our phylogenetic tree and divergence time estimates, thereby enhancing the consistency of our findings.

Ks values of homologous genes from different genomes can be used to estimate the time of species divergence32. In this study, we compared the Ks peak values within each species and identified two distinct peaks at Ks 0.5 and 1.0, corresponding to two WGD events (Fig.2e). The peak positions of A. sinensis (QH) and A. sinensis (GS) were nearly identical (see TableS10 for complete peak values), suggesting similar evolutionary histories for these two varieties. However, the peak at around 1.7 is not evident, likely due to the loss or divergence of ancient duplicate genes following the earliest WGD event. The order of the peak values aligned with the phylogenetic relationships of carrot, celery, coriander, and Angelica. This implied that the order of WGD events happened in these species was carrot, celery, coriander, and Angelica which was also consistent with the previous 4dTV analysis.

A total of 41,040 high-confidence genes were predicted, which is 2,163 fewer than the published genome annotation of 43,202 genes. To evaluate the integrity of the gene set, both gene sets were first compared using the same BUSCO version and parameters. A proportion of complete genes of 96.41% was found in A. sinensis (QH), while A. sinensis (GS) had only 88.10%. Second, common databases, including the InterproScan33, Gene Ontology (GO)34, KEGG35, SwissProt36, TrEMBL, KOG, and nonredundant protein NCBI databases, were used to functionally annotate these two gene sets. Approximately 95.76% of the genes were annotated in A. sinensis (QH), while only 90.38% were annotated in A. sinensis (GS). Third, OrthoFinder (v2.5.4)37 was used to cluster these two gene sets for further analysis. The percentage of genes in orthologous groups was 94.9% in A. sinensis (QH), while it was only 82.6% in A. sinensis (GS). The species-specific gene number was 2,111 in A. sinensis (QH) and 7,496 in A. sinensis (GS). In summary, we provided a better reference gene annotation for A. sinensis species.

The genomic differences between A. sinensis (QH) and A. sinensis (GS) were investigated. Highly collinear relationships were evident between these two genomes (Fig.3a, b). A large inversion was also observed along homologous chromosomes Chr09 (A. sinensis (QH)) and chr04 (A. sinensis (GS)), which is highlighted by a red arrow in Fig.3a and a red square in Fig.4b. Good collinearity was found in this region between A. sinensis (QH) and A. graveolens, suggesting that A. sinensis (GS) had an assembly error in this region or that this is an inherent feature of the A. sinensis (GS) genome. Relatively good collinearity was observed at the genome level between A. sinensis and A. graveolens. Furthermore, reciprocal translocations were observed along chromosomes 05 and 07 in A. sinensis (QH), as well as along chromosomes 09, 11, and 10 in A. graveolens (Fig.3b). This phenomenon was consistent between A. sinensis (GS) and A. graveolens, further confirming the occurrence of translocations between these chromosomes. The collinearities between A. sinensis (QH) and other species in Apiaceae are displayed in Fig.S3.

a Macrosynteny between A. sinensis (QH) and A. sinensis (GS) was verified using MUMmer98 (version 4.0). Each dot represents a homologous block. Blue and green colors indicate different orientations of the sequences, while the red arrow refers to intrachromosomal inversions. The plot was generated using Dot (https://dot.sandbox.bio/). b Genome collinearity analysis among A. sinensis (QH), A. sinensis (GS), and A. graveolens. MCScanX86 was used to identify collinear gene blocks among these three genomes. The red square highlights intrachromosomal inversions between A. sinensis (QH) and A. sinensis (GS). The color of linking lines indicates the number of one-to-one gene pairs in the collinearity blocks: orange (40), green (20), and gray (5). c The genome distribution of genes with strong functional effects between A. sinensis (QH) and A. sinensis (GS). d KEGG pathway enrichment analysis of genes with strong functional effects.

a Changes in metabolites between NG and EF samples. The horizontal axis shows log2-fold changes, and the vertical axis shows log2 absolute content changes. The dot colors represent the different compound classes. Numbers in brackets indicate the number of compounds upregulated in NG and EF samples. b Heatmap of the contents of metabolites Coumarins and lignans and Terpenoids and phthalides with different contents between the NG and EF groups. The data were normalized by the Z score in rows. The red and blue arrows indicate the upregulated and downregulated metabolites, respectively (VIP1 and LOG2 (fold change) 1 or 1). c Heatmap showing differential gene expression related to coumarin, lignan and lignin biosynthesis between NG and EF samples in Angelica roots. The red and blue arrows indicate the upregulated and downregulated genes (LOG2 (fold change) 1 or 1 and p 0.05), respectively. Only the genes with FPKM5 in at least one sample are shown.

A total of 1.227 million SNPs and 242,250 Indels were detected in syntenic blocks between the two A. sinensis genomes. The distributions of SNPs and indels were similar but uneven across the whole genome (Fig.1c). Most of the genetic variations were located in the intergenic regions. Of these, 38,862 SNPs and 8887 indels were located in the coding regions, affecting 9,547 and 5,125 genes, respectively. Within coding regions, 909 genetic variations (affecting 686 genes) were annotated as having a strong effect on gene function, with frameshifts or changes at the start or stop codon (Supplementary Data1). These genes were not evenly distributed across the whole genome (Fig.3c) and enriched in the KEGG pathways of biosynthesis of various secondary metabolites, such as Indole alkaloid, Betalain, Isoquinoline alkaloid and Sesquiterpenoid, and triterpenoid biosynthesis (Fig.3d). The numbers of SNPs and indels were higher on chromosomes 10 and 11 than those on other chromosomes (Fig.1c and TableS11).

To understand the biosynthesis of various bioactive components in Angelica roots, we conducted nontargeted metabolomics profiling on normally growing and early-flowering Angelica roots. More than 716 high-confidence metabolites were detected and identified, including 39 flavonoids, 12 terpenoids, 47 alkaloids, 74 phenolic acids, 10 phthalides, 31 coumarins, and 24 lignans (Supplementary Data2), of which 299 compounds were determined as differential metabolites using univariate and multivariate statistical methods with the parameters of FC2 or 0.5 and VIP (variable importance in projection) 1, including 145 upregulated and 154 downregulated metabolites.

The class of metabolites appeared to have completely different metabolic patterns in the Angelica roots between NG (normal growth) and EF (early flowering and bolting) samples. The Angelica roots in NG samples were rich in organic acids, amino acids and derivatives, saccharides and alcohols, and nucleotides and derivatives, while the Angelica roots in EF samples were rich in phenolic acids, LPC, LPE, coumarins, lignans, flavonols, and flavonoids (Fig.4a). In particular, the differential production of these bioactive compounds in NG and EF Angelica roots showed that some phthalides and coumarins were more highly accumulated in NG roots than in EF roots, whereas most lignans accumulated at higher levels in EF roots than in NG roots (Fig.4b). It demonstrated the higher medicinal value of NG roots than EF roots since these phthalides and coumarins displayed more important bioactivities in experimental and clinical studies.

Transcriptome analyses of these Angelica roots under different developmental conditions also unveiled the differentially expressed metabolic genes in their biosynthesis pathways in line with metabolomics data (Fig.4c). The metabolic genes putatively involved in the biosynthesis of lignans and coumarins, both of which are derived from the phenylpropanoid pathway that often leads to the biosynthesis of well-known lignin and flavonoids, were upregulated in EF roots compared with NG roots (Fig.4c). In contrast, most genes putatively involved in phthalide and coumarin biosynthesis were expressed at higher levels in NG roots than in EF roots, consistent with their higher pharmaceutical values (Fig.4c).

Although the common shared metabolic enzymes and pathways involving lignin, coumarins, lignans, and flavonoids are well known, the specific genes/enzymes involved in the production of many coumarins and lignans are poorly understood13,38,39. This new Angelica genome assembly provided more than 100 metabolic genes that encode all known enzyme homologs involved in the biosynthesis of coumarins and lignans (Supplementary Data3). The phenylpropanoid pathway genes, including phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumaroyl-CoA ligase (4CL), hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase (HCT), caffeic acid O-methyltransferase (COMT), caffeoyl-CoA O-methyltransferase (CCoAOMT), etc., contributing to lignin biosynthesis via HCT and CCR genes, via dirigent protein (DIR), or via flavonoid synthesis by CHS and for coumarin biosynthesis from different products of 4CL by cinnamic acid 2-hydroxylase (C2H), p-coumarate 3-hydroxylase (C3H) with HCT, or feruloyl-CoA hydroxylase (F6H), were all assembled and annotated in our genome to provide insights on the biosynthesis of various pharmaceutically important products (Fig.5a). Lignans have unique antitumor activities and reduce lifestyle-related diseases40. Lignans were also enriched in Angelica roots, particularly of EF status, in which a subset of biosynthesis genes and contents of lignans and derivatives were upregulated, including dirigent protein (DIR), pinoresinol-lariciresinol reductase (PLR), and secoisolariciresinol dehydrogenase (SIRD) for the biosynthesis of pinoresinol and lariciresinol, secoisolariciresinol, and matairesinol aglycones and their glycosides as products of UGT71/74 glycosyltransferses40 (Fig.5a).

a Putative biosynthesis pathways of coumarins, lignin, lignans and flavonoids. The numbers in parentheses indicate the number of genes. Different background colors represent the synthetic pathways of different products. The PT genes are highlighted in red. The genes in different gene families are listed in Supplementary Data3. b Rootless phylogenetic tree of PT genes. The tree shows the grouping of PT genes according to the type of substrate (ah). The orthologous genes in A. sinensis (QH) and A. sinensis (GS) are highlighted. The genes in the c and d subtrees had relatively high expression levels.

Prenyltransferase (PT) catalyzes the prenylation of umbelliferone into linear or/and angular furanocoumarin biosynthesis34,35. PTs are involved in the biosynthesis of chlorophyll, vitamin E, heme, phylloquinone, and various secondary metabolites by prenyl modifications of chlorophyllide a/b, vitamin E, heme B, and many metabolites, such as 1,4-dihydroxyl-2-napthoic acid, p-hydroxylbenzoic acid, flavonoids, phloroglucinol, homogentisate, and coumarins, with different prenyl donors, such as isoprenyl diphosphate, dimethylallyl diphosphate, and geranyl diphosphate (Fig.5b). Despite the divergent functions of these PTs, they involved in coumarin biosynthesis that evolved most likely via convergent evolution since coumarins mainly occur in a few unrelated plant families, such as Fabaceae, Moraceae, Apiaceae and Rutaceae34,35. This finding is also supported by a previous study19, which showed independent evolution of coumarin biosynthesis-related PTs in these families. Furthermore, these PTs that catalyze both linear (demethylsuberosin, e.g., PsPT1 and PcPT1) and angular (osthenol, e.g., PsPT2) furanocoumarin biosynthesis are clustered together in one clade for Apiaceae species (Fig.5b), likely resulting from gene duplications followed by neofunctionalization and positive selection38,41.

As two major pharmaceutically important components in Angelica roots, ligustilide and butylidenephthalide are generally regarded as essential contributors to the main medical functions of Angelica roots42,43,44,45. However, their biosynthesis pathways remain elusive. The oxidation or transfer of isoprenoids or condensation of malonyl CoAs with other acyl CoAs by type III polyketide synthases (PKSs) or their combinations could be involved in the biosynthesis of these phthalides46,47. We therefore examined the A. sinensis genome together with transcriptome and metabolite profiling for the biosynthesis of ligustilide and butylidenephthalide and other monoterpene volatiles that contribute to the medicinal functions of Angelica roots.

To more clearly profile bioactive components in Angelica roots, volatile terpenoids, and phthalides were examined by using headspace solid-phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS). The volatiles of early-flowering (EF) and normally growing (NG) roots showed notable differences. In addition to the higher levels (~47% of total volatiles) of Z-ligustilide and Z-butylidenephthalide and their E- type isomers as major components in NG roots, the EF roots of A. sinensis also contained fewer phthalides (34% of total volatiles), as well as much less abundant monoterpenes, such as -pinene and E--farnesene, (Figs.6a, b). These data indicated that early bolting and flowering also negatively impacted volatile accumulation in Angelica roots.

a Headspace solid-phase microextraction-gas chromatography-mass spectrometry (SPME-GC-MS) analysis of the contents and composition of volatiles in Angelica roots from early-flowering (EF) and normally growing (NG) plants. b Differential content analysis of the volatiles in Angelica roots between EF and NG plants. c Enzymatic reactions in the mevalonate (MVA) and methylerythritol phosphate (MEP) pathways in plants and synthesis of short-chain prenyl diphosphates. The MVA pathway is shown in light red; the MEP pathway is shown in light green. Abbreviations and full names are given in TableS16. Data are expressed as the meansSDs from at least three independent experiments with triplicates. Differences between NG and EF samples are considered significant when **P<0.01 and *P<0.05 in Students t test.

Genome analyses revealed that three key gene families involved in the MEP pathway toward monoterpene synthesis, MCT, HDS, and HDR, were expanded in the A. sinensis genome in comparison with the Arabidopsis and grapevine genomes (Supplementary Data4). A. sinensis genome sequences revealed an extremely enhanced monoterpene pathway during the evolution of several genera in the Apiaceae family (Supplementary Data4), which is consistent with the diverse and enriched monoterpene volatile profiles in these plants (Fig.6a).

Transcriptome data showed that genes involved in glycolysis and the pentose phosphate pathway were downregulated in EF Angelica roots, which also negatively affected the mavalonic pathway (MVP) and 2-C-methyl-erythrose 4-phosphate (MEP) pathway, leading to the biosynthesis of mono-and sesquiterpenoids (Fig.6c). The DXS, MDS, CMK, and HDR genes involved in the plastic MEP pathway, one IPPI and two GPPS genes for monoterpenoid biosynthesis were significantly downregulated in EF Angelica roots compared with NG Angelica roots (Fig.6c).

A. sinensis is a triennial medicinal plant that typically flowers in its third year but can flower early in May of its second year (Fig.7a). As Angelica roots contain a wide range of terpenoid volatiles at abundant levels, they are also regarded as major components contributing to clinical functions48. Terpenoid synthase family genes play key catalytic roles in plant terpenoid biosynthesis. A total of 28 putative TPS genes in the A. sinensis genome belonging to five TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g) were identified (Fig.7b). The TPS-b family was expanded in both A. sinensis (15) and C. sativum (20), and the expansion of TPS-b genes in the A. sinensis genome was mainly due to tandem duplication (Ks<0.1). There were 5 more TPS genes in A. sinensis (QH) than in A. sinensis (GS), which indicated that the completion of A. sinensis (QH) was better than that of A. sinensis (GS). We detected 8 TPS genes that were expressed in Angelica roots (FPKM1 at any samples), and most of them had higher expression levels in NG roots than in EF roots (Fig.7b).

a Plants were sown simultaneously and grown in the same environment. Samples were taken at the same time for observation and analysis. EF early flowering, NG normal growth. We highlight the highly lignified Angelica root of the EF plant and the normally developed storage root of the NG plant on the right side. b Five TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e/f, and TPS-g) were clearly identified. The genes from A. sinensis (QH) and A. sinensis (GS) are highlighted by red and green dots, respectively. The heatmap of gene expression is illustrated.

To further verify the possibility that PKSs are involved in the biosynthesis of the polyketide derivatives ligustilide and butylidenephthalide in A. sinensis, we analyzed genes that are involved in the biosynthesis of acetyl-CoA and malonyl CoA, which are used as substrates for type II and III PKSs for the production of polyketides (Fig.8a)46,47. Acetyl-CoA carboxylase (ACC) is the main enzyme catalyzing the conversion of glycolysis pathway-derived acetyl-CoA into malonyl CoA, which is a key intermediate for fatty acid, polyketide, and flavonoid biosynthesis47. Plant ACC is composed of two subunits, the biotin carboxylase and carboxyl transferase subunits47. The coding genes for two ACC subunits, BCCP2 (CAC1) (4) and CAC2-CAC3 (5), were expanded in the A. sinensis genome in comparison with the Arabidopsis and grapevine genomes, respectively (TableS12). Consistent with lower Z-ligustilide and Z-butylidenephthalide levels in EF Angelica roots, at least two ACC subunit genes were downregulated in EF roots compared with NG roots (Fig.8b).

a The malonyl-CoA biosynthesis metabolic pathway. b Heatmap displaying the expression of typical ACC genes in Angelica roots between EF and NG plants. c The overall expression (FPKM) of ACC and PKS genes in Angelica roots between EF and NG plants. d Phylogeny of polyketide synthase genes (PKSs). The heatmap displays the gene expression in Angelica roots between EF and NG plants. The color of gene IDs shows the source of different species: red: A. sinensis; blue: A. thaliana; black: seed sequences. The red stars highlight the upregulated genes, and the blue stars highlight the downregulated genes.

PKS consists of a large gene family encoding multifunctional enzymes that catalyze condensation of malonyl CoAs or malonyl CoA with other acyl CoAs to generate diverse polyketides46,47. In particular, type III PKS (TKS) catalyzes linear tetraketide-CoA synthesis with hexanoyl-CoA and malonyl CoA and might provide a backbone for Z-ligustilide and Z-butylidenephthalide biosynthesis49. A previous study showed that a TKS olivetolic acid cyclase (OAC) catalyzed a C2C7 intramolecular aldol condensation with carboxylate retention in the linear tetraketide-CoA to form olivetolic acid in Cannabis sativa49. OAC was structurally similar to Z-ligustilide and Z-butylidenephthalide, with only differences in the position of the olefinic link and hydroxyl group49. A multifunctional protein (MFP) could handle the switch of olefinic links and hydroxyl groups in the lipid metabolism process50. It has thus been proposed that Z-ligustilide and Z-butylidenephthalide are synthesized via a similar mechanism through the PKS pathway, although the exact enzyme or gene responsible for their biosynthesis remains unknown. In the A. sinensis genome, PKSs also formed a large gene family of 120 members, among which the type III PKS genes are expanded (TableS13 and Fig.8d).

Transcriptome analyses showed that four PKS genes, namely, As05G08873, As11G04238, As10G03800, and As08G02849, were highly expressed in Angelica roots (Fig.S4), and in particular, we also found that some of the PKS genes were repressed in EF Angelica roots as compared with NG roots (Fig.8d), indicating that these PKSs might be involved in the biosynthesis of phthalides. The overall expression of ACC and PKS genes in Angelica roots was lower in EF plants (Fig.8c). Further studies with isotope-labeled substrates in tracer experiments, together with enzyme and molecular approaches, are needed to unveil the mechanism underlying the biosynthesis of Z-ligustilide and Z-butylidenephthalide in A. sinensis.

See more here:
Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of ... - Nature.com

Posted in Genome | Comments Off on Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of … – Nature.com

Genetic diversity and ancestry of the Khmuic-speaking ethnic groups … – Nature.com

Posted: September 21, 2023 at 10:16 am

Ethical statement

Ethical approval of this study was granted from the Human Experimentation Committee of the Research Institute for Health Sciences, Chiang Mai university, Thailand (Certificate of Ethical Clearance No. 31/2022). During the research, we protect the rights of participants and their identity, and we confirm that all experiments were performed in accordance with relevant guidelines and regulations based on the experimental protocol on human subjects under the Declaration of Helsinki. Written informed consent from all volunteers was obtained prior to the interview and sample collection.

A total of 95 unrelated subjects residing in five villages of Nan province, Thailand, were enrolled with written informed consent. Volunteers were healthy subjects who were over 20years old, of Khmuic-speaking ethnicity and had no ancestors that were known to be from other recognized ethnic groups for at least three generations. We collected personal data using form-based oral interviews for self-reported unrelated lineages, linguistics, and migration histories. Following the manufacturer's instructions, we collected buccal or saliva samples and extracted DNA using the Gentra Puregene Buccal Cell Kit (Qiagen, Germany).

Genotyping was carried out using the Affymetrix Axiom Genome-Wide Human Origins array10. Affymetrix Genotyping Console v4.2s primary screening produced a total of 93 samples that were genotyped for 622,834 loci on the hg19 version of the human reference genome coordinates (genotype call rate97%). We used PLINK version 1.90b5.224 to exclude loci and individuals with more than 5% missing data and also exclude mtDNA and sex chromosome from our analysis. We further excluded loci that did not pass the HardyWeinberg equilibrium test (P value<0.00005) or had more than 5% missing data, within any population. We used KING 2.325 to determine individual relatedness, and we removed one person from each pair of first degree kinship. After these quality control measures, there are 81 Khmuic-speaking people (Fig.1) with 612,614 loci overall.

We next used PLINK version 1.90b5.2 to merge our newly obtained genotyping results with a set of genome-wide SNP data8, which included populations from East/Southeast Asia, South Asia, African Mbuti, European French, and Southeast Asian ancient samples9,10,11,12,13. It should be noted that in this collection, allelic data from ancient samples was gathered using pseudo-haploid techniques, and samples with less than 15,000 informative loci were eliminated. After filtering the positions of SNPs that can be jointly analyzed within this dataset, we excluded SNPs that had more than 5% missing data or with a minor allele frequency (MAF) less than 3.3104 or were not in HardyWeinberg equilibrium with a significance level of P<0.00005. As a result, 353,505 positions in a dataset consisting of 979 individuals from 90 populations (Supplementary Table 1 and 2) were used for subsequent analysis.

In order to investigate the genetic structure and relationships of the analyzed sample, we used PLINK version 1.90b5.2 to perform pruning for linkage disequilibrium, excluding one variant from pairs with r2>0.4 within windows of 200 variants and a step size of 25 variants. A total of 959 individuals from the sample set, excluding the Mbuti and French populations, were incorporated. There were 149,384 SNPs positions available for this analysis. The Principal Component Analysis (PCA) was carried out using smartpca from EIGENSOFT with the "lsqproject" and "autoshrink" options.

To infer population structure, we employed 155,709 SNP positions derived from a sample set of 979 individuals, which encompassed both Asian samples and the outgroups represented by the Mbuti and French populations, for the ADMIXTURE analysis. The clustering tool ADMIXTURE version 1.3.014 was run from K=2 to K=10 with 100 replicates for each K and using random seeds with the -P option. For each K, the top 20 ADMIXTURE replicates with the highest likelihood for the major mode were displayed using PONG version 1.4.726. For these PCA and ADMIXTURE analyses, the ancient samples and highly drifted modern populations (Mlabri, Onge, Mamanwa, Khamu, and Lua) were projected.

To test admixture and excess ancestry sharing, we used admixr version 0.7.127 from ADMIXTOOLS version 5.110 to calculate the f3 and f4-statistics, with assessed through block jackknife resampling across the genome and using Mbuti as the outgroup. A total of 353,505 SNPs from 979 samples were used in these analyses. Additionalf4-statistics were computed when ancient samples were involved, using French as the outgroup to avoid deep attraction to Africans and only transversions (2,94751,452 SNPs depending on the quality of samples) to avoid potential noise from ancient DNA damage patterns28. We used pheatmap package in R version 3.6.0 to visualize the heatmap of f3 and f4 profiles.

To examine the haplotype sharing between different groups, we used SHAPEIT version 4.1.329 to phase the modern samples. We employed South Asian and East Asian populations as a reference panel (excluding the Kinh Vietnamese) and the recombination map from the 1000 Genomes Phase330 was also used. Our analysis specifically focused on modern population data, consisting of 359,539 SNPs. For the preparation of the reference panel, we extracted individuals of East and South Asian descent, as well as the overlapping sites with our data, for each chromosome from the 1000 Genomes Phase3 data using bcftools version 1.4. The phasing accuracy of SHAPEIT4 can be improved by increasing the number of conditioning neighbors in the Positional Burrows-Wheeler Transform (PBWT) on which haplotype estimation is based29. We conducted phasing with the option -pbwt-depth 8 for 8 conditioning neighbors, while keeping other parameters as default. Subsequently, we employed ChromoPainter version 231 on the phased dataset to initiate the investigation of haplotype sharing with sample sizes for each population were randomly down-sampled to 4 and 8. The former was used for 10 iterations of the EM (expectation maximization) process to estimate the switch rate and global mutation probability. The latter was employed for the chromosomal painting process with the estimated switch and global mutation rates. The output of this process was then used for downstream analyses. We then attempted to paint the chromosomes of each individual, with all the modern Asian samples serving as donors and recipients via the -a argument. The EM estimation yielded a switch rate of approximately 251.21 and a global mutation probability of approximately 0.00001, which were subsequently used as starting values for these parameters for all donors in the painting process. The heatmap results were generated using the pheatmap package in R.

To construct the admixture graph, our initial step involved selecting backbone populations from different language families in Southeast Asia. Specifically, we used f4-statistics to choose representative ethnic groups that speak Austronesian, Tai-Kadai, Austroasiatic, Hmong-Mien, and Sino-Tibetan languages, which included Atayal, Dai, Cambodian, Miao, and Naxi, respectively. We employed the African Mbuti and North Indian populations (Gujarati, Brahmin Tiwari, and Lodhi) who speak Indo-European languages as outgroups. Our focus was on constructing the admixture graph for the Austroasiatic language family in Thailand. Thus, we categorized these populations according to their linguistic branches; Katuic (Bru and Soa), Monic (Mon), Palaungic (Lawa_Eastern, Lawa_Western, Palaung, Blang), and Mlabri. Our interested Khmuic-speaking people were divided into the Khamu (consist of four Khamu populations) and Lua (consist of two Lua populations together with HtinMal and HtinPray).

For modeling the admixture graph, we used a dataset of 359,539 SNPs from modern populations as the input for ADMIXTOOLS 232. Initially, we computed pairwise f2 statistics between the groups using the extract_f2 function with specific parameters; maxmiss=0 (no missing SNPs to calculate), useallsnp: NO (no missing data to allow), and blg=0.05 (SNP block size set in 0.05 morgans). Then, we extracted allele frequency products from the computed f2 blocks using f2_from_precomp. Next, for each scenario, we searched for the best-fitting admixture graph by running ten independent runs of find_graphs. From the 100 independent runs, we selected the one with the lowest score (computed based on residuals between the expected and observed f-statistics given the data) using random_admixturegraph. To confirm the fitting graph, we tested the graph with the lowest score using qpgraph with parameters numstart=100, diag=0.0001, return_fstats=TRUE. This allowed us to check if the absolute value of the worst-fitting Z score was below 3. Starting with no migrations (numadmix=0), we gradually added migrations until we found a fitting graph, which we considered as the best-fitting graph for that particular scenario.

Go here to see the original:
Genetic diversity and ancestry of the Khmuic-speaking ethnic groups ... - Nature.com

Posted in Genome | Comments Off on Genetic diversity and ancestry of the Khmuic-speaking ethnic groups … – Nature.com

Researchers to Apply Genome Analysis to Childhood Cancers; Goal … – The Japan News

Posted: at 10:16 am

Yomiuri Shimbun file photo National Cancer Center Hospital in Chuo Ward, Tokyo, in May 2021

The Yomiuri Shimbun

13:51 JST,September 19, 2023

While it is a simple fact that every case of cancer begins with a genetic mutation that occurs when cells divide, it is a complex reality that types of cancer and the right drugs for treating them vary depending on the specific type of mutation. A team of researchers is working to address this problem via genome analysis of pediatric cancer patients.

As early as November, researchers from the University of Tokyo Hospital, the National Cancer Center and other institutions are to embark on research on whole genome analysis of pediatric cancer patients to help diagnose and treat cancers that affect children. The teams goal is to confirm the efficacy of whole genome analysis, put the analysis into practice and, in the future, establish a system in which most patients can be diagnosed using this analysis.

Cancer begins when cells in the body become abnormal because of a mistake in copying a gene during cell division and proliferate out of control. By identifying the particular mutation, genome analysis leads to the selection of the best treatment for each patient.

Pediatric cancer, with 2,000 to 2,500 new cases diagnosed each year, is a general term for cancers that develop in children under the age of 15. The affected population is small and diverse, making accurate diagnosis difficult.

As part of the governments 2019 action plan for whole genome analysis, the team will receive research funding from the Japan Agency for Medical Research and Development.

Gene panel testing, which has been covered by insurance since 2019, examines some of the genetic mutations associated with cancer, but advances in technology have now made it possible to analyze the entire genome. The panel test was developed primarily for adults, and it is believed that some genetic mutations unique to pediatric cancer can only be detected by whole genome analysis.

About 20 university hospitals, including those of the University of Tokyo and Kyoto University, as well as hospitals specializing in pediatric cancer treatment, will participate in the research. Among other goals, they hope to determine how well they can detect genomic abnormalities that can be diagnosed and treated.

Tissues and other specimens containing cancer cells sampled from patients at each medical institution will be collected at the National Center for Child Health and Development and sent to a private laboratory for analysis.

The National Cancer Center will analyze the data and a group of about five experts in pediatric cancer and genomics, led by the University of Tokyo Hospital, will discuss treatment methods for each case based on clinical information such as the patients symptoms. The patients doctor will then explain the potential treatments and other information to the patients family.

The research results will also be anonymized and made available to pharmaceutical companies and research institutions for use in, among other things, the development of new pediatric medicines. The team envisions expanding the number of eligible patients in fiscal 2024 and beyond.

We would like to make whole genome sequencing a standard test so that pediatric cancer patients can have it covered by their insurance, leading to the best possible treatment, said University of Tokyo Prof. Motohiro Kato, the team leader.

Follow this link:
Researchers to Apply Genome Analysis to Childhood Cancers; Goal ... - The Japan News

Posted in Genome | Comments Off on Researchers to Apply Genome Analysis to Childhood Cancers; Goal … – The Japan News

How Bats’ Genomes May Help Them Avoid Cancer and Survive … – Technology Networks

Posted: at 10:16 am

Register for free to listen to this article

Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

A new study has analyzed the genomes of bats to investigate their ability to tolerate viral infections and avoid cancer findings that could have implications for our knowledge of human cancers as well as virus transmission from animals. The research is published in Genome Biology and Evolution.

Mice are some of the most commonly used animals in experiments that inform human health but another mammal may be even more informative. Enter the bat famed as the only mammal capable of flight, but also for its longevity, low cancer rates and strong immune systems.

Bats unusual immune systems allow them to better tolerate viral infections, though this can also spell danger for human health. They can play a key role in the spillover of viral infections into humans.

Studying bats immune systems could reveal more about cancer development and provide insights into preventing the spread of disease from animals to people. However, research efforts to uncover exactly what makes bats immune systems tick have been hampered by small sample sizes and limitations in genetic analysis approaches.

In the current study, researchers utilized long-read sequencing to carry out a comprehensive genomic analysis of two bat species, adding these to existing high-quality genomes to characterize the genetic features associated with their low cancer rates and robust immune responses.

The studys lead author, Dr. Armin Scheben, explained that the team compared 13 existing bat genomes plus their 2 new additional genomes against those of humans, mice, dogs, pigs and horses. Our study increased the quantity of data by sampling 15 bat species and also increased the quality of data by using more complete genomes mainly generated with long-read DNA sequencing, said Scheben, a postdoctoral fellow at Cold Spring Harbor Laboratory, speaking to Technology Networks.

We looked for changes in both gene gains and losses as well as more subtle adaptive changes in DNA sequences that make bats different from the other mammals, he added.

They investigated the positive selection of cancer-related genes genes included either in the Tumor Suppressor Database or the Catalogue of Somatic Mutations in Cancer. They found evidence for positive selection of 33 tumor suppressor genes and 6 DNA repair genes, suggesting a link to the bats low rates of cancer and increased longevity. Strikingly, cancer-related genes were also enriched more than twofold in bat genomes compared to those of other mammals.

Subscribe to Technology Networks daily newsletter, delivering breaking science news straight to your inbox every day.

The researchers also found changes in type I interferon (IFN) genes, which are part of the innate immune system and help to activate antiviral responses. They observed a loss of IFN- genes, while IFN- were relatively unaffected. By relying on the potentially more potent IFN- instead of IFN-, bats may have improved antiviral responses, possibly contributing to their ability to tolerate viruses that can be transmitted to humans.

We show that the bat immune system differs strongly from our own in a gene region known as the type I interferon locus, Scheben said. By targeting this gene region and the proteins it produces with therapeutics, we may be able to treat infectious diseases better in humans. Similarly, bats show signs of genetic adaptations in many anti-cancer genes, which could inspire therapeutics to treat cancer.

Scheben goes on to explain that, while the research is somewhat limited by not experimentally testing these genetic mechanisms, he considers the study to be more of a hypothesis generator. To dig deeper into these findings, the team is now working on developing what he calls batified mouse models mice genetically modified to carry bat variants of genes.

By testing these batified mice, we aim to better understand how bats resist infections and cancer, Scheben explains. These findings can help other researchers, at universities and in industry, to prioritize specific genes and gene variants as targets for therapeutics.

Reference: Scheben A, Ramos OM, Kramer M, et al. Long-read sequencing reveals rapid evolution of immunity- and cancer-related genes in bats. 2023. Genome Biol. Evol. doi: 10.1093/gbe/evad148

Dr. Armin Scheben was speaking to Dr. Sarah Whelan, Science Writer for Technology Networks.

More here:
How Bats' Genomes May Help Them Avoid Cancer and Survive ... - Technology Networks

Posted in Genome | Comments Off on How Bats’ Genomes May Help Them Avoid Cancer and Survive … – Technology Networks

Page 5«..4567..1020..»