Machine Intelligence Cracks Genetic Controls

Posted: December 29, 2014 at 4:43 pm

Every recipe has both instructions and ingredients. So does the human genome. An error in the instructions can raise the risk for disease.

Every cell in your body reads the same genome, the DNA-encoded instruction set that builds proteins. But your cells couldnt be more different. Neurons send electrical messages, liver cells break down chemicals, muscle cells move the body. How do cells employ the same basic set of genetic instructions to carry out their own specialized tasks? The answer lies in a complex, multilayered system that controls how proteins are made.

Frey compares the genome to a recipe that a baker might use. All recipes include a list of ingredientsflour, eggs and butter, sayalong with instructions for what to do with those ingredients. Inside a cell, the ingredients are the parts of the genome that code for proteins; surrounding them are the genomes instructions for how to combine those ingredients.

Just as flour, eggs and butter can be transformed into hundreds of different baked goods, genetic components can be assembled into many different configurations. This process is called alternative splicing, and its how cells create such variety out of a single genetic code. Frey and his colleagues used a sophisticated form of machine learning to identify mutations in this instruction set and to predict what effects those mutations have.

Olena Shmahalo/Quanta Magazine

The researchers have already identified possible risk genes for autism and are working on a system to predict whether mutations in cancer-linked genes are harmful. I hope this paper will have a big impact on the field of human genetics by providing a tool that geneticists can use to identify variants of interest, said Chris Burge, a computational biologist at the Massachusetts Institute of Technology who was not involved in the study.

But the real significance of the research may come from the new tools it provides for exploring vast sections of DNA that have been very difficult to interpret until now. Many human genetics studies have sequenced only the small part of the genome that produces proteins. This makes an argument that the sequence of the whole genome is important too, said Tom Cooper, a biologist at Baylor College of Medicine in Houston, Texas.

The splicing code is just one part of the noncoding genome, the area that does not produce proteins. But its a very important one. Approximately 90 percent of genes undergo alternative splicing, and scientists estimate that variations in the splicing code make up anywhere between 10 and 50 percent of all disease-linked mutations. When you have mutations in the regulatory code, things can go very wrong, Frey said.

People have historically focused on mutations in the protein-coding regions, to some degree because they have a much better handle on what these mutations do, said Mark Gerstein, a bioinformatician at Yale University, who was not involved in the study. As we gain a better understanding of [the DNA sequences] outside of the protein-coding regions, well get a better sense of how important they are in terms of disease.

Scientists have made some headway into understanding how the cell chooses a particular protein configuration, but much of the code that governs this process has remained an enigma. Freys team was able to decipher some of these regulatory regions in a paper published in 2010, identifying a rough code within the mouse genome that regulates splicing. Over the past four years, the quality of genetics dataparticularly human datahas improved dramatically, and machine-learning techniques have become much more sophisticated, enabling Frey and his collaborators to predict how splicing is affected by specific mutations at many sites across the human genome. Genome-wide data sets are finally able to enable predictions like this, said Manolis Kellis, a computational biologist at MIT who was not involved in the study.

More here:
Machine Intelligence Cracks Genetic Controls

Related Posts