Next Phase of ENCODE Finds MORE Functional Information in Genome Junk – Discovery Institute

Image credit: Geralt, via Pixabay.

The first publications from the ENCODE project (Encyclopedia of DNA Elements) made a big splash atEvolution News in 2013, and around the world, because it undermined the junk DNA myth and simultaneously fulfilled an ID prediction: that non-coding parts of the genome would prove functional. Junk-DNA proponents like Dan Graur were upset at the time,admittingas Jonathan Wells reported, If ENCODE is right, evolution is wrong.

Well,ModENCODE(ENCODE for model organisms) found unprecedented complexity in the fruit fly genome in 2014, then ENCODE 2 followed up with more discoveries of function. Now, ENCODE 3 has just finished submitting its reports, with record numbers of DNA annotations listed, and ENCODE 4 is gearing up. Nothing like a little overkill to drive the point home: then evolution is wrong. Look at how much constructive science is being done with the assumption that DNA elements are there for a purpose.

Before introducing the latest results,Natureprovides an overview, Perspectives on ENCODE, that recounts the history and purposes of the project:

The ENCODE Project was launched in2003, as the first nearly complete human genome sequence was reported.At that time, our understanding of the human genome was limited.For example, although 5% of the genome was known to be under purifying selection in placental mammals,our knowledge of specific elements, particularly with regards to non-protein coding genes and regulatory regions, was restrictedto a few well-studied loci.

ENCODE commenced asan ambitious effort to comprehensively annotate the elements in the human genome, such as genes, control elements, and transcript isoforms, and was later expanded to annotate the genomes ofseveral model organisms. Mapping assays identified biochemical activities and thus candidate regulatory elements. [Emphasis added.]

Annotations are like labels or comments on things. For instance, if you have a stereo system with a lot of cables, you might affix tags on them to indicate where the TV plugs in, or where each speaker wire goes. In computer programming, wise programmers add comments in English to explain what a section of code does. Comments do not affect the function of the code, but help the next programmer follow the logic.

DNA is like a program, only it did not come with English comments! That is why ENCODE is important; the project is building a searchable database for researchers to find out what a string like ACCCTGTAAAGTG is doing. Is it a gene? Is it a control region? Medical researchers will want to see if a SNP (single nucleotide polymorphism) in that string correlates with a disease.

The project is sending scientists all over the world on a treasure hunt for functions in the junk pile assumed to exist from evolutionary history. The first results in 2013 indicated that over 80 percent of the genome was being transcribed. That was a strong clue that most non-coding regions were functional, even if the functions were unknown, because a cell would be unlikely to spend the energy transcribing nonsense. Indeed, many of those transcriptions turned out to be important regulatory regions.

The project began slowly but is accelerating.

Phase I (20032007) interrogated a specified 1% of the human genome in order toevaluate emerging technologies.

Phase II(20072012)introduced sequencing-based technologies(for example, chromatin immunoprecipitation with sequencing (ChIPseq) and RNA sequencing (RNA-seq)) that interrogated the whole human genome and transcriptome. General assays such as transcript, open-chromatin and histone modification mapping were used on a wide variety of cell lines, while more specific assays, such as mapping transcription factor binding regions, were performed extensively on a smaller number of cell lines to providedetailed annotations on, and to investigate the relationships of, many regulatory proteinsacross the genome.

The findings of ENCODE are accelerating along with sequencing technologies. The latest Phase III reports were published July 29 inNature. Some of the labs involved are telling what they found.

Cold Spring Harbor Laboratoryincludes a must-watch video:

It begins with a striking moment where Ewan Birney, senior ENCODE researcher at CSHL, opens a huge tome containing a complete printout of the human genome. He calls it a big achievement in 2000, but its just a set of boring letters that ENCODE is bringing to life. These letters actually do something, he says, they mean something. The goal is to find out what they mean to learn their functions. Magdalena Skipper, the next speaker, says that the ENCODE Consortium considers functional elements in very broad terms beyond genes to the regulatory elements, switches and even to parts where we have no idea what they are doing, Birney adds. This fits ID proponent Paul Nelsons motto, If something works, its not happening by accident.

So far ENCODE has produced a staggering hundreds of terabytes of raw data in detailed form:

In Phase 3, researchers took advantage of the latest genetic technologies to glean data from biological specimens and deeply investigate theregulatory regions outside of genes, where most of the genomes person-to-person variation lies.Their data identifies some900,000 candidate regulatory elements from the human genome and more than 300,000 from the mouse, which can be explored through ENCODEs new online browser.

Within the hundreds of cell types studied, ENCODE is helping scientists understand why your liver cell is different from your kidney cell, Birney says; the secrets will be found in the switches that turn genes on and off. Its really a first view of that complexity that generates a human being.

Skipper says it was striking to find that they were able to assign a biochemical function to 80 percent of the genome: striking, because not such a long time ago, we still considered that a vast proportion of the human genome was simply junk. Birney comments, Its very hard to get over the density of information in the genome. They found places that are much more complex than expected, and loci thought to be completely silent are actually teeming with life, teeming with things going on; we still really dont understand that. Another surprise is that portions corresponding with disease are being found in non-coding parts of the genome.

This encyclopedia is a living resource. It has a beginning but really no end. It will continue to be improved, and grown, as time goes on.

MITs news releaseis titled, Bringing RNA into genomics. It describes the new technologies MIT used to identify candidate RNA transcripts and then determine their functions. And function is the key word in their work:

These RNA sequencesdo not get translated into proteins, but act in a variety of ways to control how much protein is madefrom protein-coding genes. The research team, which includes scientists from MIT and several other institutions, made use of RNA-binding proteins to help themlocate and assign possible functionsto tens of thousands of sequences of the genome.

The National Institutes of Health (NIH)describes the search for genetic switches that turn genes on and off in different cell types and in various stages of development. This is being determined for the mouse genome as well as the human genome.

A key challenge in ENCODE is thatdifferent genes and functional regions are active in different cell types,said Elise Feingold, Ph.D., scientific advisor for strategic implementation in the Division of Genome Sciences at NHGRI and a lead on ENCODE for the institute. This means that weneed to test a large and diverse number of biological samplesto work towards a catalog of candidate functional elements in the genome.

Significant progress has been made in characterizing protein-coding genes, which comprise less than 2% of the human genome.Researchers know much less about the remaining 98% of the genome, including how much and which parts of it perform other functions.ENCODE is helping to fill in this significant knowledge gap.

The human body is composed of trillions of cells, with thousands of types of cells.While all these cells share a common set of DNA instructions, the diverse cell types (e.g., heart, lung and brain) carry outdistinct functions by using the information encoded in DNA differently.The DNA regions that act asswitchesto turn genes on or off,or tune the exact levels of gene activity, help drive the formation of distinct cell types in the body and govern their functioning in health and disease.

Naturesown News and Views story, Expanded ENCODE delivers invaluable genomic encyclopedia, boasts that Phase III has generated the most comprehensive catalogue yet of thefunctionalelements that regulate our genes.

In the current third phase of the project, the consortium movedfrom cell lines to cells taken directly from human and mouse tissues, providing a more biologically relevant encyclopedia.They also introduced assays to investigate the broader aspects of functional elements for example, to characterize the elements embedded in RNAs or to analyse chromatin looping, which brings separate CREs into close proximity to enable gene regulation.

There are eight technical papers in the special issue ofNature, all open access. Start with Expanded encyclopaedias of DNA elements in the human and mouse genomes for the details, scan through the other papers, and marvel at the exciting discoveries from Phase III. Or you can read ENCODE explained for an overview with illustrations. Its all part of theENCODE Collectiongoing back to 2012.

What is next for ENCODE? The Perspectives on ENCODE article cited earlier explains why much more research is still needed to understand the vast volume of functional information in our DNA:

It is now apparent that elements that govern transcription, chromatin organization, splicing, and otherkey aspects of genome control and function are densely encodedin the human genome; however,despite the discovery of many new elements, the annotationof elements that are highly selective for particular cell types or statesis lagging behind.

Thus, as part of ENCODE 4,considerable effort is being devoted to expanding the cell types and tissuesanalysed as well as mapping the binding regions for many more transcription factors and RNA-binding proteins.

Further research may even find new functions for repetitive sequences, or for silent sequences that only get switched on under unique circumstances, or in particular cell types, or during certain stages of development. In short: the search is on for function in the junk!

Continued here:

Next Phase of ENCODE Finds MORE Functional Information in Genome Junk - Discovery Institute

Related Posts

Comments are closed.