Description of datasets
In this study we, utilized five independent cohorts including four cancer- and one Alzheimer-related datasets. Gene expression profiling was done using RNA-seq and DNA methylation data were obtained using the Illumina Infinium HumanMethylation450 BeadChip, measuring DNA methylation levels (beta values) on about 450,000 genomic loci.
The TCGA cohorts were obtained using the TCGAbiolinks package50 (Version2.24.3). TCGA-LUSC22 and TCGA-LUAD23 had clinical and genomic data from 589 and 592 patients, respectively (Supplementary TableS2). Information on the pathological stages of the tumors was included in both datasets. We used this information to stratify the patients into distinct risk groups and compared the resulting stratification with clusters obtained from our approach.
TCGA-LIHC24 was provided by a comprehensive study that included 436 cases with clinical information available in the data. We used the Ishak fibrosis score51 and alpha-fetoprotein (AFP) level52,53,54,55,56 to stratify patients into low-, intermediate-, and high-risk groups. The employed score is described later in this section.
TCGA-L AML was provided by a thorough genomic and epigenomic study on 200 adult cases with AML25. The risk groups were defined based on cytogenetic abnormalities25,57.
In addition, we used the ROSMAP cohort provided by the longitudinal cohort studies of aging and dementia. We downloaded the ROSMAP dataset from accelerating Medicines Partnership- AD58 with Synapse IDs syn3388564 (bulk RNA-seq) and syn5850422 (DNA methylation), using the synapser (https://r-docs.synapse.org/articles/synapser.html) R package (Version0.6.61) and a custom R scripts (Version3.6.1)59.
In the TCGA cohorts, events were defined by patients death and the time to an event referred to the duration from the initial diagnosis to death time or the last follow-up. In the ROSMAP cohort, the event was clinical diagnosis of any dementia including mild cognitive impairment with or without other cognitive conditions, Alzheimers dementia with or without other cognitive conditions, and other primary causes of dementia without clinical evidence of Alzheimers dementia. The time to an event in this context referred to the age at which the first dementiarelated diagnosis was made.
To enhance the power of our network, we included cases that have either a single type of data (i.e., gene expression or DNA methylation) or both data available. In the survival analysis, we included only patients whose gene expression, DNA methylation, and survival data were available (Supplementary TableS2).
The initial step in preprocessing involves normalizing the gene expression data. This is accomplished via a logarithmic transformation in based 10 to stabilize the variance and make the data more amenable to following analyses. Because logarithm of zero is not defined, a small offset is added to the expression levels prior to applying this transformation. iNETgrate further preprocesses data in two steps: cleaning and filtering. The former step involved cleaning DNA methylation and clinical data using the wrapper function cleanAllData(). Loci with more than (50%) missing beta values were removed, while loci with less than (50%) missing values were imputed. The imputation was performed by replacing each missing value with the mean of the beta values for the corresponding locus (preprocessDnam()). The clinical data was subsequently cleaned by removing cases with missing survival time and status (prepareSurvival()). The cleaned survival data had patient information including ID, events, time, and risk based on the clinical gold standard.
The second step in the preprocessing data was filtering out genes and loci that have a weak absolute Pearson correlation with survival time and vital status. This was performed by calling electGenes() inside the cleanAllData() wrapper function. In this study, we set the absolute correlation coefficient cutoffs to 0.2 in all TCGA datasets and 0.1 in the ROSMAP dataset.
Every gene and locus that met the quality control criteria was retained for the subsequent steps. In addition, we used computeUnion() to include corresponding loci of the selected genes and corresponding genes of the selected loci in the next steps of analysis.
In iNETgrate, every node represents a gene with two features (i.e., gene expression and DNA methylation values). Therefore, we needed to calculate the DNA methylation value for each gene using computEigenloci(). This function calculated a weighted average of loci levels for their corresponding gene in the following way. When the number of loci corresponding to a gene was less than six, the first principal component (i.e., eigenloci) was calculated directly by taking a weighted average of beta values using PCA. This was the case for almost (95%) of loci in our datasets (Supplementary Fig.S1).
For the remaining (5%) of cases, in which the number of loci representing a gene was six or more, we used findCore() to determine the most connected cluster of loci for each gene. Specifically, a graph was constructed for each gene using the igraph package (Version 1.5.0). In this graph, each locus is represented by a node. We used a fast greedy algorithm60 to calculate the pairwise correlation between loci and detected communities (i.e., clusters) in the graph. Within each community, the average pairwise correlation was computed. The community with the highest average pairwise correlation was identified as a dense subset of highly co-methylated loci in the graph, and the eigenloci value was then computed based on this subset.
We constructed a network in which nodes represent genes and edges are weighted based on the absolute correlation of gene expression and DNA methylation levels for each pair of genes. This was achieved using the makeNetwork() function. The weight of the edges between genes (g_i) and (g_j) was calculated using the following equation:
$$begin{aligned} mathscr {W}(g_i,g_j)=(1-mu )|{{,textrm{cor},}}_E(g_i,g_j)|+ mu |{{,textrm{cor},}}_M(g_i,g_j)|, end{aligned}$$
(1)
Here, (mathscr {W}(g_i,g_j)) describes the integrated similarity between genes (g_i) and (g_j). The term (|{{,textrm{cor},}}_E(g_i,g_j)|) represents the absolute value of the Pearson correlation between the gene expression levels of genes (g_i) and (g_j). Similarly, (|{{,textrm{cor},}}_M(g_i,g_j)|) represents the absolute value of the Pearson correlation between the DNA methylation levels of these two genes. The hyperparameter (mu ) is an integrative factor controlling the relative contributions of gene expression and DNA methylation data in the network. When (mu =0), the network is based solely on gene expression data. Increasing the value of (mu ) emphasizes the DNA methylation data in the model, whereas (mu =1) indicates that only DNA methylation data is used in calculating the edge weights (i.e., gene similarities).
Construction of the network and identification of the modules were done by the wrapper function makeNetwork(), which first uses the pickSoftTreshold() function (RsquaredCut=0.75) from the weighted gene co-expression network analysis20(WGCNA) package (Version 1.72.1) to determine the optimal soft-thresholding power for our integrated network. Then, the blockwiseModules() function (with minModuleSize=5, the absolute value of Pearson correlation, and the default values for the rest of parameters) is utilized to execute a hierarchical clustering approach. This leads to identification of modules, where each module is a group of genes that exhibit similar patterns of expression and DNA methylation. Additionally, module zero is designed to contain outlier genes that cannot be confidently assigned to any module due to their weak or negligible correlation with other genes.
We employed PCA to compute an eigengene for every module (computEgengenes()). In order to balance the contribution of high-risk and low-risk groups, the gene expression and DNA methylation data were oversampled. Intermediate-risk cases were not included in the PCA. An eigengene is computed from a weighted average of gene expression levels ((E^e)), DNA methylation levels ((E^m)), or both ((E^{em})), using the following equations:
$$begin{aligned} E^e = alpha ^e_{_1} g^e_{_1} + alpha ^e_{_2} g^e_{_2} + cdots + alpha ^e_{_n} g^e_{_n}, end{aligned}$$
(2)
$$begin{aligned} E^m = alpha ^m_{_1} g^m_{_1} + alpha ^m_{_2} g^m_{_2} + cdots + alpha ^m_{_n} g^m_{_n}, end{aligned}$$
(3)
$$begin{aligned} E^{em} = (1-mu ) E^e + mu E^m. end{aligned}$$
(4)
Here, n is the number of genes in the module, (g^e_{_i}) is the expression level of gene i, and (g^m_{_i}) is the methylation level corresponding to gene i (i.e., eigenloci), while (alpha ^e_{_n}) and (alpha ^m_{_n}) are the corresponding weights. These weights are computed using PCA ensuring maximum variance and minimum loss of biological information. The eigengene levels are then inferred for the intermediate-risk group using the same weights obtained from PCA. It should be emphasized that regardless of which eigengenes are used, our network and the corresponding modules are consistently constructed based on both gene expression and DNA methylation data and they depend on the (mu ) hyperparameter. The resulting eigengenes are robust features, carrying useful biological information, which can be leveraged in classification, clustering, and other downstream analyses including survival analysis.
To identify the optimal subset of modules for precise prognostication of risk groups, we conducted a two-step survival analysis using analyzeSurvival(). In the first step, we performed a penalized Cox regression analysis using the least absolute shrinkage and selection operator (lasso) penalty29,30 from the glmnet R package61 (Version 4.1.7). This analysis identified the three modules that were most associated with the survival data. Second, we fitted an AFT model31 to each combination of the top three modules and determined the optimal combination that leads to the smallest p-value. p-values were based on a log-rank test with a null hypothesis that there is no difference between the two high- and low-risk groups62.
To categorize the risk groups, iNETgrate uses findAliveCutoff() that searches for a cutoff on the AFT predictions such that the difference between high- vs. low-risk groups is optimized. More specifically, for each risk group, the function iterates over all possible cutoff values leading to a recall of more than a given threshold (i.e., for low-risk: minRecall=0.2, for high-risk: minRecall=0.1 in ROSMAP and 0.05 in other datasets) and selects the cutoff value that maximizes precision.
To ensure the reliability of our integrative approach, we performed a comparative analysis by benchmarking our results against alternative methodologies including a well-known patient similarity network called SNFtool. We also compared our results vs. risk classification according to the clinical gold standards based on the intrinsic nature of the disease in each cohort.
The SNFtool first computes a similarity matrix for each data type (i.e., gene expression and DNA methylation). That is, using each data type independently, a network is constructed where nodes are patients and weights of the edges represent similarity between patients computed based on correlation. The networks (similarity matrices) are then fused to create a consensus network representing the overall similarity between patients across different data types. The resulting patient similarity network is then used to cluster patients into subgroups. We noted that the SNFtool faced some limitations in using all the DNA methylation loci due to memory exhaustion while computing the similarity matrices. We tackled this issue by filtering out loci with a relatively low variation characterized by a standard deviation of less than 0.1. Determining the appropriate cutoff for a given dataset is subjective and challenging for SNFtool users.
In lung cohorts (LUSC and LUAD), we evaluated the risk groups based on the tumor stage. Specifically, we classified stages I,IA,IB,II, and IIA as the low-risk group, stages IIIB and IV as the high-risk group, and the remaining stages as the intermediate-risk group. In the LIHC cohort, we considered a case high-risk if the AFP level was greater than 500 or the Ishak fibrosis score was six. In contrast, patients were considered low-risk if their AFP levels were smaller than 250 and their Ishak fibrosis scores were 0, 1, or 2. The remaining cases were considered intermediate-risk. In the LAML cohort, we utilized the classification system available in the clinical data that categorized cases based on cytogenetic criteria into three groups of favorable (low-risk), intermediate, and poor (high-risk). We utilized the Braak score63 to stratify the ROSMAP cohort into three risk groups. Cases with a Braak score of 0, 1, or 2 were considered low-risk, those with a Braak score of 5 or 6 were classified high-risk, while the remaining cases were grouped as intermediate-risk.
See the rest here:
Integrating DNA methylation and gene expression data in a single gene network using the iNETgrate package ... - Nature.com
- Discovering the mysteries of human DNA - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Scientists go deeper into DNA - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Instant Egghead - Genes vs. DNA vs. Chromosomes - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- DNA Calls Out Lineup Of Rappers For Future Battles - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- What is DNA? - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Turn Your DNA Into Fine Art, BMW Zagato Roadster - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- DNA - OFFICIAL URLTV SUMMER MADNESS 2 RECAP! - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- "Binary DNA" - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- 16x9 - DNA Prophecies: Code reveals your future - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Gilbert Gottfried - Space DNA, Sexy Weight Loss, Badonkadonk Booty - Gilbert Gets It - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Animated Health Video Production | DNA Services of America - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Michael Tsarion ~ Mayans ~ 2012 ~ DNA - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Mini-drones to take your DNA? - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- C2CAM - DNA Research - 07-09-2012 - Coast To Coast AM - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Inside The DNA Of MDNA - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- KOTD - Rap Battle - DNA vs Eurgh - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Starchild DNA Showing "Wright" Stuff - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Chrome Cats - DNA of a Winner(Official Video) - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- DNA leads to arrest in 1980 murder of Oxnard girl [Last Updated On: September 8th, 2012] [Originally Added On: September 8th, 2012]
- 'Junk' DNA: Not So Useless After All [Last Updated On: September 8th, 2012] [Originally Added On: September 8th, 2012]
- Decoding Human DNA [Last Updated On: September 9th, 2012] [Originally Added On: September 9th, 2012]
- Planet of the Apes: What is that big hunk of 'junk' DNA up to ? [Last Updated On: September 10th, 2012] [Originally Added On: September 10th, 2012]
- Genetics Breakthrough Changes Thinking About DNA [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- 'Junk DNA' and the mystery of mankind's missing genes [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Real-time observation of single DNA molecule repair [Last Updated On: September 12th, 2012] [Originally Added On: September 12th, 2012]
- Court hears DNA findings in child sex case [Last Updated On: September 12th, 2012] [Originally Added On: September 12th, 2012]
- 2012 International Symposium on Human Identification Features Emerging and Best Practice Forensic DNA Techniques ... [Last Updated On: September 12th, 2012] [Originally Added On: September 12th, 2012]
- DNA could help ID a king [Last Updated On: September 13th, 2012] [Originally Added On: September 13th, 2012]
- DNA with a Twist [Last Updated On: September 13th, 2012] [Originally Added On: September 13th, 2012]
- Three reasons to like junk DNA [Last Updated On: September 13th, 2012] [Originally Added On: September 13th, 2012]
- LBNL Seeks Licensees for Highly Specific and Sensitive DNA Extraction Method [Last Updated On: September 13th, 2012] [Originally Added On: September 13th, 2012]
- Under-twisted DNA origami delivers cancer drugs to tumors [Last Updated On: September 13th, 2012] [Originally Added On: September 13th, 2012]
- DNA ‘junk' contains a treasure of information about disease [Last Updated On: September 14th, 2012] [Originally Added On: September 14th, 2012]
- Research: Hopping DNA supercoils [Last Updated On: September 14th, 2012] [Originally Added On: September 14th, 2012]
- DNA evidence missing in Assange case [Last Updated On: September 16th, 2012] [Originally Added On: September 16th, 2012]
- Missing DNA evidence in Assange case [Last Updated On: September 16th, 2012] [Originally Added On: September 16th, 2012]
- No Assange DNA on torn condom - report [Last Updated On: September 16th, 2012] [Originally Added On: September 16th, 2012]
- Calif. DNA Collection From Arrestees Challenged [Last Updated On: September 17th, 2012] [Originally Added On: September 17th, 2012]
- Federal appeals court to hear challenge to California DNA collection law [Last Updated On: September 17th, 2012] [Originally Added On: September 17th, 2012]
- Applied DNA Sciences Contracts With Inventionland [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- Applied DNA Sciences, Textile Centre of Excellence Unveil Textiles Anti-Counterfeiting Platform [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- Rapist caught by DNA test jailed [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- FBI eager to embrace mobile 'Rapid DNA' testing [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- Expansion of criminal DNA collection proposed [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- Assessment of HPV DNA Alone Insufficient to Identify HPV-Driven Head and Neck Cancers [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- George Zimmerman's DNA, not Trayvon Martin's, found on gun [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- George Zimmerman: No DNA evidence of a struggle for his gun [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- DNA evidence links Vallejo man to January stabbing in SLO, police say [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Legal hurdles threaten to slow FBI's 'Rapid DNA' revolution [Last Updated On: September 21st, 2012] [Originally Added On: September 21st, 2012]
- Judge denies motions to dismiss DNA evidence in Hudson murder case [Last Updated On: September 22nd, 2012] [Originally Added On: September 22nd, 2012]
- Researchers report novel approach for single molecule electronic DNA sequencing [Last Updated On: September 22nd, 2012] [Originally Added On: September 22nd, 2012]
- Novel approach for single molecule electronic DNA sequencing [Last Updated On: September 22nd, 2012] [Originally Added On: September 22nd, 2012]
- DNA helps Wyckoff police nab 'motorcycle burglar' [Last Updated On: September 22nd, 2012] [Originally Added On: September 22nd, 2012]
- Novel DNA barcode engineered: New technology could launch biomedical imaging to next level [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- DNA Microarray 2012: A Focus on Sales Growth [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- DNA in 1980 Maine murder case shown to match defendant [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- DNA recovered during Rayney probe [Last Updated On: September 26th, 2012] [Originally Added On: September 26th, 2012]
- FBI makes headway on DNA testing backlog, report says [Last Updated On: September 26th, 2012] [Originally Added On: September 26th, 2012]
- Male DNA found for first time in female brains [Last Updated On: September 27th, 2012] [Originally Added On: September 27th, 2012]
- Bearing Sons Leaves Male DNA Traces in Mom's Brain [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- Many female brains contain male DNA [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- New drive to take criminals' DNA [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- DNA remains focus in Highway of Tears cases [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- Analysing The Evidence On DNA [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- DNA Clears Death Row Inmate [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- Burn victim identified by DNA in maggots [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- DNA fails to match couple on two other skeletons [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- DNA Dynamics Update on Sports Title [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- DNA solves teen's 1974 murder [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- Some Women's Brains Contain Male DNA: Study [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- DNA exonerates man after 15 years on death row - Video [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- DNA link prompts charges in cold case rapes - Video [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- DNA testing has its limits [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- DNA evidence exonerates 300th prisoner nationwide [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- DNA testing facility in Pune to speed up cases in Mumbai [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- Rape DNA process 'not adequate' [Last Updated On: October 2nd, 2012] [Originally Added On: October 2nd, 2012]
- IntegenX Announces U.S. Launch of the RapidHIT™ 200 System – Rapid DNA Technology That Will Revolutionize the Use of ... [Last Updated On: October 2nd, 2012] [Originally Added On: October 2nd, 2012]
- 300th person exonerated by DNA evidence [Last Updated On: October 2nd, 2012] [Originally Added On: October 2nd, 2012]
- Inherited Diseases Found Sooner in Newborns With DNA Scan [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Woman charged in husband's death gives DNA sample [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]