Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion … – Nature.com

Posted: November 30, 2023 at 8:35 pm

Construction of the VgrG database

Encoded as a stand-alone gene or fused at the N-terminus of the toxin, the MIX domains can assist the delivery of their cognate T6SS effector19,20. As the central component of the spike complex, VgrG is a good marker to explore the potential conserved domains involved in the delivery of T6SS effectors. Therefore, we set out to create a comprehensive dataset of VgrG proteins from available Gram-negative genome sequences lodged in the public GenBank database.

Previous studies have revealed that the Afp8 proteins of extracellular contractile injection systems (eCISs) are homologous to VgrG proteins, thus representing a potential confounding influence on the integrity of the dataset24,25,26. Therefore, we firstly downloaded 872 experimentally verified VgrG proteins from the established SecReT6 T6SS database27. It provides a positive control dataset to better avoid potential false positive hits (such as Afp8 homologs). A bioinformatic scan for conserved domains confirmed that the VgrG domain (accession: COG3501) was present in all 872 verified VgrG proteins in addition to 472 Afp8 proteins available from the dbeCIS database26. Importantly, the identified domains found in 861 (99%) verified VgrGs range between 451 and 750 amino acids, whereas there are only 10 (2%) Afp8 proteins that fall within this size range (Fig.1a). We therefore proposed the use of an empirical criterion for the further systematic screening for bona fide VgrG proteins in the 133,722 publicly available bacterial genomes (Fig.1a). Using this approach, a total of 130,825 VgrG proteins were successfully identified from 45,041 Gram-negative bacterial genomes.

a The workflow for the identification of valid VgrGs from 133,722 publicly available bacterial genomes. The 872 VgrGs available from the established T6SS database SecReT6 (red) and 472 putative Afp8 proteins, encoding VgrG domains, available from the eCIS database dbeCIS (green) were used as positive and negative datasets respectively for the selection of the empirical criteria for large-scale VgrG screening. b The 872 VgrGs available from the SecReT6 database with predefined subtype information are indicated by colored stars (key). VgrGs from subtypes i4a and i5 were mixed within the same clade in the tree, but these two subtypes were indeed closely related in the previous study27. The known type iii T6SS clade, derived mostly from Bacteroidetes, is highlighted with red shadow.

To further characterize the VgrG proteins identified above, we constructed a maximum-likelihood (ML) phylogenetic tree based specifically on the sequences of the conserved VgrG domains (Fig.1b). Using the aforementioned 872 previously defined VgrGs as indicators, we observed that our ML tree exhibited a similar overall topology regarding types/subtypes of T6SS operons as previously described27, supporting the validity of our approach.

Firstly, a screen was performed to identify MIX containing protein, based on the aforementioned VgrG database. A total of 7208 MIX containing proteins within vgrG loci were identified, which are widely distributed among various bacteria (Supplementary Fig.1). Importantly, sandwiched between vgrG and downstream effector gene, MIX domain exhibit multiple encoding configurations including single proteins and fusions at the C-terminus of VgrG or N-terminus of effector (Supplementary Fig.2).

Based on the encoding features of MIX domain, we then developed a screening strategy to identify more conserved domains with similar multiple encoding configurations as MIX within vgrG loci from the VgrG database created above (Fig.2). In brief, we scanned a maximum of three downstream genes of each vgrG locus to collect the conserved domains within the proteins sandwiched by vgrG and downstream toxin (if present). A domain family was reported if it was present in both of two encoded forms: stand-alone gene (i.e., single form) and fused to either the C-terminus of VgrG or the N-terminus of a toxin (i.e., fusion form). Finally, to further explore the presence of these domain families within vgrG loci in finer detail, we extended our search without the limitation of linkage to known toxins to identify more candidate domain-containing proteins within vgrG loci (Fig.2).

For each vgrG locus, a maximum of three continuous downstream genes encoded on the same strand as vgrG, with an intergenic distance between adjacent genes of <1kb were collected. Known components of the T6SS operon and any annotated pseudogenes were excluded. Then, the 280,581 remained downstream genes were scanned for conserved domains by batch CD-search. A total of 1321 putative toxin domain families were deduced from a collection of 928 experimentally verified exotoxins/effectors available from the VFDB database53. Each domain family identified within downstream genes dataset were further classified into three cases for final manual curation and determination.

After the screening process and careful manual curation, DUF2345 (cl01733), FIX-like (cl41761), LysM (cl21525), 5 (cl33691), PG_binding_1 (cl38043) and PHA00368 (cl30808) were successfully identified (Supplementary Table1). As shown in Supplementary Fig.3, besides the single form, all these domain families have at least one fusion form. Further, the FIX-like (cl41761), LysM (cl21525), 5 (cl33691) and PG_binding_1 (cl38043) families can be found in both fusion forms. Notably, some of them were encoded adjacent to known T6SS adaptor, which implies that their functions can be different from T6SS adaptors.

Besides MIX domain, three well characterized T6SS adaptor families (DUF4123, DUF2169, and DUF1795) had been reported to assist the interaction between VgrG and its cognate effectors. We further screened these adaptor families encoded within vgrG loci. Amongst 130,825 vgrG loci, besides three adaptor domains (37.44%) and MIX domain (3.14%), 31.33% of vgrG loci encode at least one of the six conserved domain families identified here. In contrast, only 28.09% of vgrG loci do not include any of the adaptor/MIX/conserved domains mentioned above (Supplementary Fig.4).

Although DUF2345 is considered as an extension of the VgrG gp5 domain, it is not encoded by all VgrGs6,28,29. Nevertheless, among the aforementioned six conserved domains, the DUF2345 domain is the most frequently identified in vgrG loci (Supplementary Table1). We therefore explored its function in T6SS. Three vgrG loci encoding the DUF2345 domain were found in Escherichia coli PAR, Pseudomonas aeruginosa strain PAO1 and PS42 (Fig.3a). Sequence comparison indicated that AKO63_2953 (VgrGPAR), AKO63_2954 (DUF2345PAR) and AKO63_2955 (M35PAR), corresponding to the VgrG domain, the DUF2345 domain and the M35 (metallopeptidase) toxin domain of PA0262 (VgrG2bPA), respectively. Similarly, Q094_05019 (VgrGPS) of P. aeruginosa PS42 encodes VgrG domain, whereas Q094_05020 encodes N-terminal DUF2345 domain and C-terminal M35 domain. AlphaFold v2.0 predicted that VgrGPAR, VgrGPS and VgrG domain of VgrG2bPA have the same conformation (Supplementary Fig.5a). Further, E.coli locus (VgrGPAR, DUF2345PAR and M35PAR), PS42 locus (VgrGPS and Q094_05020) and VgrG2bPA form similar trimmer structure, which implies that these three complexes might endow similar biological functions (Supplementary Fig.5b). As these three loci encode VgrG, toxin and immunity proteins, we speculate that DUF2345 maybe involved in the interaction between VgrG and its cognate effector.

a The vgrG loci of E. coli PAR, P. aeruginosa PAO1 and PS42. b E. coli expressing VgrG2bPA or its truncated mutant VgrG2bPAM35 were detected by Western blot. Anti-RpoB is lysis control. c Survival of E. coli expressing VgrG2bPA or its truncated mutant VgrG2bPAM35 in pET22b. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. d Intraspecies P. aeruginosa competition assay between the VgrG2bPAPA0261 strain and various isogenic attacker strains at 37C for 24h. Competition assay between the parental strain (PAO1) and itself (gray) is the internal control. The values and error bars represent the meanSD (n=3 biological replicates). A one-way ANOVA with Dunnetts test was employed using the parent versus prey competition as the comparator (*p<0.05; ns, not significant). e E. coli expressing M35PAR, AKO63_2955-2956 or DUF2345PAR were detected by western blot. Anti-RpoB is lysis control. f Survival of E. coli expressing M35PAR, AKO63_2955-2956 or DUF2345PAR in pET22b. Ten-fold serial dilutions of cultures were spotted on LB agar containing the given concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. g Interactions between DUF2345PAR and VgrGPAR or M35PAR. Shown here are immunoblots of lysates (total) and immunoprecipitates with anti-FLAG affinity beads (IP: FLAG) of DUF2345PAR transformed with empty vector or a plasmid encoding Myc-tagged VgrGPAR or S-tagged M35PAR. GFP and VgrGPRE are control proteins. h DUF2345PAR mediates the interaction between VgrGPAR and M35PAR. Shown here are immunoblots of lysates (total) and immunoprecipitates with an anti-FLAG affinity beads (IP:FLAG) of M35PAR transformed with a plasmid encoding either Myc-tagged VgrGPAR or S-tagged DUF2345PAR.

Wood et al. showed that VgrG2bPA-PA0261 constitutes a T6SS antibacterial effector-immunity pair30. E. coli toxicity assay was used to test whether the DUF2345 domain in VgrG2bPA is toxic to bacteria (Fig.3b, c). As expected, overexpressed in E. coli, VgrG2bPA exhibited acute toxicity and co-expression of the immunity gene (PA0261) relieved this growth defect. Crucially, truncation of the M35 domain of VgrG2bPA restored growth, which indicated that DUF2345 in itself is not toxic to E. coli. Intraspecies P. aeruginosa competition assays were also performed to determine whether the DUF2345 domain could affect the function of VgrG2bPA (Fig.3d). Although the VgrG2bPAPA0261 strain exhibited a significant growth disadvantage against the wildtype PAO1 strain, it could no longer be outcompeted by both ClpV2PA and VgrG2bPA attacker strain. Notably, compared with the wildtype vgrG2bPA gene, the complement of vgrG2bPADUF2345 could not restore the growth advantage of the attacker strain. Further, although the secretion of Hcp (the T6SS inner stylet protein) was not affected, complemented in the VgrG2bPA strain, VgrG2bPADUF2345 could only be detected in the cells, but not in the supernatant (Supplementary Fig.6a). In addition, the production of VgrG2bPADUF2345 was still detrimental to E. coli when it remains in the periplasm (Supplementary Fig.6b, c). Therefore, it is clear that the DUF2345 domain disturbs the antibacterial ability of VgrG2bPA by ablation of its secretion.

We subsequently explored the function of DUF2345 when encoded as a distinct gene, which is within the locus containing vgrGPAR, M35PAR, along with the cognate immunity protein (Fig.3a). E. coli toxicity assay demonstrated that M35PAR exhibited bacterial killing activity, which was inhibited by its immunity protein (Fig.3e, f). Consistent with the results of Fig.3c, expression of DUF2345PAR in isolation had no deleterious effect on bacterial growth (Fig.3f). Immunoprecipitation assays of proteins co-expressed in E.coli confirmed that DUF2345PAR can specifically bind VgrGPAR and M35PAR, but not VgrGRPE (VgrG in Burkholderia sp. RPE67) (Fig.3g). Importantly, M35PAR could not interact with VgrGPAR in the absence of DUF2345PAR (Fig.3h). These results implied that DUF2345PAR is involved in the interaction between VgrGPAR and M35PAR to assist the loading of M35PAR on the T6SS spike.

Taken together, DUF2345 domain is indispensable for the delivery of its cognate toxin via fusion at the C-terminus of VgrG or encoded as a single gene.

Considering that DUF2345 is encoded as either a fusion at the C-terminus of VgrG or a distinct gene downstream of vgrG, we then investigated whether the sequences of VgrG domains showed a correlation with those of DUF2345. An iterative procedure was devised to hierarchically cluster the 52,277 VgrG domains and their cognate DUF2345 domains, respectively. At the 30% amino-acid sequence similarity cutoff, VgrG domains form three major clusters and ten outliers, whereas DUF2345 domains were classified into 37 distinct groups (Supplementary Fig.7). These findings imply that, compared to the relatively conserved VgrG domains, the sequences of DUF2345 domains exhibited higher diversity.

As we demonstrated above, DUF2345 is involed in the interaction between VgrG and the toxin protein. To further delve into this, we performed a Sankey analysis to investigate the relationship between DUF2345 domains and their downstream toxins in greater detail. It is interesting to note that most of DUF2345 clusters showed an obvious taxon-specific distribution and correlated well with their downstream toxins (Fig.4). Meanwhile, we also noticed that there are some toxins which correlated to more than one of DUF2345 clusters, such as Lyz-like and DUF2235 domains. To test whether this is a result of the intrinsic sequence diversity of these toxins, an iterative procedure was applied to further subdivide these toxin groups. As expected, the sub-clusters of Lyz-like and DUF2235 domains also correlated well to DUF2345 groups (Supplementary Fig.8). Thus, our data reveals that, DUF2345 domains exhibit high sequence diversity andcorrelate well with their downstream toxins.

A Sankey diagram showing the relationship between bacterial phylum/class, family, the corresponding DUF2345 clusters and the downstream toxin domain families (from left to right). Only DUF2345-encoding loci with adjacent known toxin domains were included. Loci from genomes without necessary taxa information were excluded. The number of sequences involved in each node is given after the node name. The red arrows on the right indicate some toxins which were linked to more than one DUF2345 clusters.

Absent from T6SS, LysM containing protein is one of the core components of eCIS, which shares several key homologous proteins in common with T6SS and forms a similar architecture31,32. Therefore, it is fascinating that our systematic screening implied that LysM domain is likely to be functional in T6SS.

Figure5a showed a vgrG loci encoding a LysM containing protein in Ketobacter alkanivorans GI5. E. coli toxicity assay showed that Kalk_10455 exhibited acute toxicity and co-expression of Kalk_10450 relieved this growth defect, which indicated that Kalk_10450 is an immunity protein against Kalk_10455 (Fig.5b, c). Notably, Kalk_10465 (VgrGG15) and Kalk_10460 (LysMG15) exhibited no toxicity when they were expressed in E. coli (Fig.5c). Although immunoprecipitation assays of proteins co-expressed in E.coli confirmed that Kalk_10455 specifically binds LysMG15 and VgrGG15, Kalk_10455 could not bind VgrGG15 in the absence of LysMG15 (Fig.5d).

a The vgrG loci of Ketobacter alkanivorans GI5 and Burkholderia sp. RPE67. b Immunoblots demonstrating the expression of VgrG2bG15, LysMG15 and Kalk_10455 in E. coli. Anti-RpoB is lysis control. c Survival of E. coli expressing VgrGG15, LysMG15 and Kalk_10455 in pETduet. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. d Interactions between VgrGG15, LysMG15 and Kalk_10455. Shown here are immunoblots of lysates (total) and immunoprecipitates with anti-FLAG affinity beads (IP: FLAG) of Kalk_10455 and GFP transformed with a plasmid encoding Myc-tagged VgrGG15 or Strep-tagged LysMG15. 0423PA is control protein. e Immunoblots demonstrating the expression of BRPE_05220 and NLPC_P60 domain in E. coli. Anti-RpoB is lysis control. f Survival of E. coli expressing BRPE_05220 and NLPC_P60RPE domain in pETduet. Ten-fold serial dilutions of cultures were spotted on LB agar containing the stated concentrations of IPTG and grown for 24h. The image is representative of three independent experiments. g LysM domain mediates the interaction between VgrGRPE and BRPE_05220. Shown here are immunoblots of lysates (Input) and immunoprecipitates with an anti-FLAG affinity beads (IP:FLAG) of BRPE_05220 or BRPE_05220LysM transformed with a plasmid encoding either Myc-tagged VgrGRPE or Myc-tagged 0423PA. 0423PA is control protein.

BRPE67_05220 in Burkholderia sp. RPE67, which includes both LysMRPE and NLPC_P60RPE domain, was used to further explore the function of LysM domain (Fig.5a). E. coli toxicity assays demonstrated that BRPE_05220 exhibited bacterial killing activity. Moreover, expression of NLPC_P60RPE domain in isolation had deleterious effect on bacterial growth, which was inhibited by BRPE_05230 (Fig.5e, f). Further, immunoprecipitated wildtype BRPE_05220, but not LysM truncated in BRPE_05220 (BRPE_05220LysM), interacted with BRPE_05210 (VgrG RPE) (Fig.5g).

AlphaFold v2.0 predicted that BRPE_05210 (VgrG) and BRPE_05220 (LysM and NLPC_P60) form similar trimmer structure with VgrG2bPA, which implied that LysM may mediate the interation between VgrG and toxin (Supplementary Fig.9). Further, the LysM domain phylogenetic analysis revealed the diversity of T6SS-related LysM domains, which is evolutionarily distinct from the phage-/eCIS-associated LysM domains (Supplementary Fig.10).

In sum, encoded at downstream of LysM containing gene or fused at the C-terminal of LysM domain, toxin interacts with VgrG in a LysM dependent manner implying LysM may assist the loading of its cognate effector onto the secretion apparatus.

The DUF2345 containing proteins exhibit specific correlation with their downstream diverse toxins (Fig.4). A similar Sankey analysis was performed to investigate the relationship between the other five identified conserved domain families along with the confirmed co-effector (MIX) and their downstream toxins (Supplementary Fig.11). Notably, most of the characterized toxin domains showed an obvious domain specific distribution with limited exceptions. For instance, as polymorphic toxins, RHS-containing proteins encode variable C-terminal toxic domains with conserved N-terminal RHS domain13. Most of the Rhs superfamily are linked to FIX-like (cl41761) and 5 (cl33691) domains. LysM domains are mainly correlated with Lyz_like, NlpD and NLPC_P60 superfamilies. As these domain families identified in this study, including FIX-like (cl41761), LysM (cl21525), 5 (cl33691), PG_binding (cl38043) and PHA00386 (cl30808), share a similar genetic organization and correlation with downstream toxins as the DUF2345 domain, it is reasonable to speculate that they would also function in the T6SS effector discrimination.

The overall distribution of the six conserved domain families was then analyzed (Fig.6). It is interesting to note, these families were not evenly encoded among different bacterial families. For example, although DUF2345 domains are widely distributed among Proteobacteria genomes, they are rarely encoded in the genomes of Vibrionaceae and Rhodospirillaceae bacterial families. In contrast, the PG_binding_1 domain is limited to the genomes of -proteobacteria, including the families of Chromatiaceae, Sinobacteraceae and Vibrionaceae. In general, although these conserved domains are widely encoded among various bacteria, their distributions exhibit obvious taxonomic specificity, which is coincident with their corresponding cognate effectors as shown in Fig.4 and Supplementary Fig.11.

Only taxa with genomes encoding at least one of the six conserved domains within the vgrG loci are shown for brevity. A total of 55,228 vgrG loci are included, but genomes without known assigned genus are excluded. The circles represent phylum, class, order, family and genus from inner to outer, and are color-coded by phylum/class (key). The family names are given outside the taxonomic tree. The outer heatmaps represent the percentage of genomes encoding the corresponding conserved domains for each genus (key).

Read more:
Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion ... - Nature.com

Related Posts