The Dark Corners of Our DNA Hold Clues about Disease

Posted: December 19, 2014 at 2:44 pm

A deep-learning algorithm shines a light on mutations in once obscure areas of the genome

The so-called "streetlight effect" has often fettered scientists who study complex hereditary diseases. Credit: Svisio/Thinkstock

The so-called streetlight effect has often fettered scientists who study complex hereditary diseases. The term refers to an old joke about a drunk searching for his lost keys under a streetlight. A cop asks, "Are you sure this is where you lost them?" The drunk says, "No, I lost them in the park, but the light is better here."

For researchers who study the genetic roots of human diseases, most of the light has shone down on the 2 percent of the human genome that includes protein-coding DNA sequences. Thats fine. Lots of diseases are caused by mutations there, but those mutations are low-hanging fruit, says University of Toronto (U.T.) professor Brendan Frey who studies genetic networks. Theyre easy to find because the mutation actually changes one amino acid to another one, and that very much changes the protein.

The trouble is, many disease-related mutations also happen in noncoding regions of the genomethe parts that do not directly make proteins but that still regulate how genes behave. Scientists have long been aware of how valuable it would be to analyze the other 98 percent but there has not been a practical way to do it.

Now Frey has developed a deep-learning machine algorithm that effectively shines a light on the entire genome. A paper appearing December 18 in Science describes how this algorithm can identify patterns of mutation across coding and noncoding DNA alike. The algorithm can also predict how likely each variant is to contribute to a given disease. Our method works very differently from existing methods, says Frey, the studys lead author. GWAS-, QTL- and ENCODE-type approaches can't figure out causal relationships. They can only correlate. Our system can predict whether or not a mutation will cause a change in RNA splicing that could lead to a disease phenotype.

RNA splicing is one of the major steps in turning genetic blueprints into living organisms. Splicing determines which bits of DNA code get included in the messenger-RNA strings that build proteins. Different configurations yield different proteins. Misregulated splicing contributes to an estimated 15 to 60 percent of human genetic diseases.

Frey, a computer engineer who has a cross appointment in the universitys Department of Medical Research, trained his algorithm using millions of data points: DNA sequences, genetic variations and RNA splicing patterns. The algorithm was then able to extrapolate how likely it was that any of tens of thousands of mutations could cause a splicing error associated with a particular disease.

The research team tested the method on spinal muscular atrophy as well as nonpolyposis colorectal cancer. Frey says the teams most ambitious case was its study of autism spectrum disorder; about 100 genes are known to be associated with it. In fact, many researchers think it is likely that autism comprises many disorders, each resulting from unique mutations but all resulting in common symptoms.

Working with U.T. autism researcher Stephen Scherer, Frey compared mutations in autism patients genomes with those of controls. Nothing unusual popped up. But when Frey and Scherer tested the genomes against the mutations flagged by Freys algorithm, they saw patterns emerge. According to Frey, Kids with autism are more likely to have these high-scoring mutations that change the meaning of the genome, and that are thought to be involved with brain functions and developmental functions.

Excerpt from:
The Dark Corners of Our DNA Hold Clues about Disease

Related Posts