A machine learning method for the identification and … – Nature.com

GuiltyTargets-COVID-19 web tool

We start by providing a high level overview about the capabilities of the GuiltyTargets-COVID-19 web tool. The web application initially allows the user to browse through a ranked list of potential targets generated using six bulk RNA-Seq and three single cell RNA-Seq datasets applied to a lung specific proteinprotein interaction (PPI) network reconstruction. Our website is also equipped with several filtering options to allow the user to quickly obtain the most relevant results. The candidate targets were ranked using a machine learning algorithm, GuiltyTargets19, which aims to quantify the degree of similarity of a candidate target to other known (candidate) drug targets. Further details about GuiltyTargets are outlined in the Methods section of this paper.

The user can retrieve a consensus ranking of any combination of datasets desired (Fig. 1). For each protein listed, its level of differential gene expression (upregulated, downregulated, no differential expressed) is displayed using a color coding system in addition to its association with COVID-19 as described in the literature. This latter feature is accomplished using an automated web search of scientific articles from PubMed that mention the protein in combination with COVID-19.

Though we provide nine different RNA-Seq datasets to explore, our tool also allows one to upload their own gene expression data. Uploaded data is sent through the GuiltyTargets algorithm and, after a short period of time, a ranking of candidate proteins is made available to the user to download and explore.

To further elucidate their linkage to known disease mechanisms, GuiltyTargets-COVID-19 enables one to explore the neighborhood of any given candidate target within the lung tissue specific PPI network reconstruction (Fig. 2). The network is labeled with information about known disease associations in humans in addition to virus-host interactions.

Importantly, in order to present the user with a list of possible drug candidates for a given protein, we parsed the ChEMBL database to generate a mapping of known ligands for each of the prioritized proteins and included this information in our web application. Direct links to the ligands description pages were added to GuiltyTargets-COVID-19 so that researchers can quickly explore the each compounds profile.

To point out potential target related safety issues, GuiltyTargets-COVID-19 includes a list of adverse effects for each target-linked compound, all of which were derived from the NSIDES database20. By making this information readily available, the user can quickly decide which compounds for a given target are most viable.

Altogether, GuiltyTargets-COVID-19 implements a comprehensive workflow involving computational target prioritization supplemented with annotations from several key databases.

Screenshot of the GuiltyTargets-COVID-19 web application available at https://guiltytargets-covid.eu/.

In the following sections, we demonstrate the utility of GuiltyTargets-COVID-19 based on the analysis of 6 bulk RNA-Seq and 3 single cell RNA-Seq datasets. A detailed overview of the data and workflow can be found in the Differential gene expression section of the Methods. In brief, GuiltyTargets-COVID-19 maps differentially expressed genes in each of these datasets to a lung tissue specific, genome-wide PPI network, which was constructed using data from BioGRID21, IntAct22 and STRING23 (see PPI Network Construction in Methods). Users can choose a combination of these datasets and the tool will present a ranking of each protein for each selected dataset based on its similarity to known drug targets. Additionally, a consensus ranking is also calculated if multiple datasets were selected.

For our analysis, we initially performed a ranking for each individual dataset. This ranking was performed using the GuiltyTargets positive-unlabeled machine learning algorithm19, which combines a PPI network, a differential gene expression (DGE) dataset, and a list of included nodes that are labeled as putative targets. Based on these results, GuiltyTargets then quantifies the probability that a candidate protein could be labeled as a target as well. In order to create a usable model, GuiltyTargets-COVID-19 was trained using a set of 218 proteins targeted by small compounds extracted from ChEMBL. This set of proteins was previously found to be involved in cellular response mechanisms specific to COVID-19 that have been shown to be transcriptionally dysregulated in several bulk RNA-Seq datasets15. The set of 218 proteins may thus be regarded as an extendable set of candidate targets. We chose this approach as there are currently very few approved drugs for COVID-19 (7 as of December 2022 in the European Union), hence making a machine learning model based ranking with respect to only known targets of approved drugs rather questionable.

In order to maximize transparency, GuiltyTargets-COVID-19 also reports the ranking performance of the GuiltyTargets machine learning algorithm that is calculated using the cross-validated area under receiver operator characteristic curve (AUC). As show in Fig. 6, the cross-validated AUCs found for each of the nine datasets used in this work were found to be between 85% and 90%, which align with the results reported in19. Additional details regarding the algorithms performance can be found in the Methods Section.

First degree neighbors of the (a) AKT3 and (b) PIK3CA proteins. Nodes are colored according to their associations: light orange means no virus or human association was found, dark orange indicates only human association, purple signifies viral association, and and dark blue nodes are proteins with associations to both viral mechanisms and human processes. The neighboring proteins and their associations for AKT3 and PIK3CA are outlined in Supplementary Data S1 and S2, respectively.

For our use case, we focused on proteins with a predicted target likelihood higher than 85% in each of the nine datasets. This resulted in 5167 candidate targets for each of the bulk RNA-Seq datasets and 4565 candidate targets for each of the scRNA-Seq datasets. By enabling the filter option novel in our web tool, we can select for those prioritized targets that are not among the original set of 218 proteins labeled as known targets and used for training the model.

Among these prioritized targets, there was a considerable difference between the analyzed bulk RNA-Seq data, with only a single protein target appearing among the top candidates for all 6 datasets: AKT3 (Fig. 3). AKT3 is of great interest in COVID-19 research as the PI3K/AKT signaling pathway plays a central role in cell survival. Moreover, researchers have observed an association between this pathway and coagulopathies in SARS-CoV-2 infected patients24. It has been suggested that the PI3K/AKT signaling pathway can be over-activated in COVID-19 patients either by direct or indirect mechanisms, thus suggesting this pathway may serve as a potential therapeutic target25.

To better understand the relationship of AKT3 with known COVID-19 disease mechanisms, the user can also download a CSV file comprised of the direct (first-degree) neighbors of AKT3 in the lung tissue specific PPI network used for our analysis. Each first-degree neighbor is additionally annotated to indicate whether the corresponding protein is associated with either the disease or with the virus itself. Figure 2a provides a visualization of the AKT3 neighbor network generated using Cytoscape 3.9.126.

Interestingly, a larger number of shared prioritized protein targets can be found among the scRNA-Seq data. Based on the 17 cell types identified in the three datasets, four common target candidates were identified: AKT2, AKT3, MAPK11, and MLKL. The presence of AKT3, as well as its isoform AKT2, in our list of prioritized targets supports the predicted association of the PI3K/AKT signaling pathway with COVID-19 as observed in our analysis of the bulk RNA datasets. Interestingly, our analysis of the single-cell datasets revealed two additional proteins of interest, MAPK11 and MLKL. MAPK11 is targeted by the compound losmapimod, which was tested against COVID-19 in a (terminated) phase III clinical trial (NCT04511819). The trial ended in August 2021 due to the rapidly evolving environment for the treatment of Covid-19 and ongoing challenges to identify and enroll qualified patients to participate (https://clinicaltrials.gov/ct2/show/NCT04511819). MLKL is a pseudokinase that plays a key role in TNF-induced necroptosis, a programmed cell death process. Recent evidence suggests that it can become dysregulated by the inflammatory response due to SARS-CoV-2 infection27. According to the DGldb database28 (which is cross-referenced by GuiltyTargets-COVID-19), the protein is also druggable and thus may serve as a therapeutic target.

Overall, these results demonstrate that GuiltyTargets-COVID-19 has the capability of identifying candidate targets with a clear disease association as well as assessing their potential druggability.

Venn diagram of the number of prioritized targets from the bulk RNA-Seq datasets.

After analyzing the top ranked protein targets shared by each group of RNA-Seq data, we next sought to characterize those candidates found in unique cell types (Table 1). Interestingly, we found that PIK3CA was only ranked among the top therapeutic candidates in goblet cells. Goblet cells are modified epithelial cells that secrete mucus on the surface of mucous membranes of organs, particularly those of the lower digestive tract and airways. Dactolisib is a compound targeting PIK3CA that has been tested in a phase II clinical trial for its ability to reduce COVID-19 disease severity (NCT04409327). The trial was terminated due to an insufficient accrual rate (https://clinicaltrials.gov/ct2/show/NCT04409327). Figure 2b depicts the PIK3CA protein and its first-degree neighbors as defined by the PPI network used in the GuiltyTargets-COVID-19 algorithm.

Another interesting drug we identified during our analysis is the compound varespladib, a compound that is currently being tested in a phase II clinical trial (NCT04969991) and which targets PLA2G2A, a potential protein target that primarily affects NKT cells (Table 1). To better support the user in finding more information about the disease context of such candidate targets, GuiltyTargets-COVID-19 also includes links to PubMed articles in which the protein and its roles in COVID-19 are discussed. Identification of relevant articles is discussed in the the Methods section.

Altogether, these results demonstrate that the tool presented here can be used for cell type specific target prioritization as well as aiding in characterizing the proteins in the context of COVID-19.

GuiltyTargets-COVID-19 also includes a feature for identifying small compound ligands from the ChEMBL database with reported activity (pChEMBL > 5) against candidate targets. In our use case, we were able to identify 186 ligands for AKT3, the top prioritized target across bulk RNA-Seq datasets. Furthermore, 126 ligands were mapped to the four candidate targets that were found among all single cell RNA-Seq datasets. A complete report of the number of ligands mapped to protein targets unique for a given cell type can be found in Table 2. We observed a high imbalance of mapped ligands for different cell types with secretory cells being targeted by the vast majority of compounds.

In total, these results demonstrate the ability of GuiltyTargets-COVID-19 to efficiently identify active ligands against candidate targets, thus supporting researchers in rapidly identifying potential new drugs for therapeutic intervention or repurposing.

An important factor that must be taken into consideration with new target candidates are the adverse events which are associated with the drugs targeting these proteins. To better assess the suggested therapeutics, we mapped significant adverse effects from the NSIDES database (http://tatonettilab.org/offsides) to the extracted ChEMBL compounds. Hence, each protein can be visualized in tandem with the ligands that target it, as well as any side effects found to be associated with the linked compounds. To showcase this feature, Fig. 4 depicts the AKT3 protein as well as its associated ligands and their side effects as shown in the GuiltyTargets-COVID-19 web application.

Screenshot of part of the adverse effect network for the AKT3 protein.

Read the rest here:
A machine learning method for the identification and ... - Nature.com

Related Posts

Comments are closed.