Psst, the human genome was never completely sequenced. Some scientists say it should be – STAT

Posted: June 25, 2017 at 1:46 pm

T

he feat made headlines around the world: Scientists Say Human Genome is Complete, the New York Times announced in 2003. The Human Genome, the journals Science and Nature said in identical ta-dah cover lines unveiling the historic achievement.

There was one little problem.

As a matter of truth in advertising, the finished sequence isnt finished, said Eric Lander, who led the lab at the Whitehead Institute that deciphered more of the genome for the government-funded Human Genome Project than any other. I always say finished is a term of art.

advertisement

Its very fair to say the human genome was never fully sequenced, Craig Venter, another genomics luminary, told STAT.

The human genome has not been completely sequenced and neither has any other mammalian genome as far as Im aware, said Harvard Medical School bioengineer George Church, who made key earlyadvances in sequencing technology.

Geneticist Craig Venter helped sequence the human genome. Now he wants yours

What insiders know, however, is not well-understood by the rest of us, who take for granted that each A, T, C, and G that makes up the DNA of all 23 pairs of human chromosomes has been completely worked out.When scientists finished the first draft of the human genome, in 2001, and again when they had the final version in 2003, no one lied, exactly. FAQsfrom the National Institutes of Health refer to the sequences essential completion, and to the question, Is the human genome completely sequenced? they answer, Yes, with the caveat that its as complete as it can be given available technology.

Perhaps nobody paid much attention because the missing sequences didnt seem to matter. But now it appears they may play a role in conditions such as cancer and autism.

A lot of people in the 1980s and 1990s [when the Human Genome Project was getting started] thought of these regions as nonfunctional, said Karen Miga, a molecular biologist at the University of California, Santa Cruz. But thats no longer the case. Some of them, called satellite regions, misbehave in some forms of cancer, she said, so something is going on in these regions thats important.

Miga regards them as the explorer Livingstone did Africa terra incognita whose inaccessibility seems like a personal affront. Sequencing the unsequenced, she said, is the last frontier for human genetics and genomics.

Church, too, has been making that point, mentioning it at both the May meeting of an effort to synthesize genomes, and at last weekends meeting of the International Society for Stem Cell Research. Most of the unsequenced regions, he said, have some connection to agingand aneuploidy (an abnormal number of chromosomes such as what occurs in Down syndrome).Church estimates 4 percent to 9 percent of the human genome hasnt been sequenced. Miga thinks its 8 percent.

The reason for these gapsis that DNA sequencing machines dont read genomes like humans read books, from the first word to the last. Instead, they first randomly chop up copies of the 23 pairs of chromosomes, which total some 3 billion letters, so the machines arent overwhelmed. The resulting chunks contain from 1,000 letters (during the Human Genome Project) to a few hundred (in todays more advanced sequencing machines). The chunks overlap. Computersmatch up the overlaps, assembling the chunks into the correct sequence.

Thats between difficult and impossible to do if the chunks contain lots of repetitive segments, such as TTAATATTAATATTAATA, or TTAATA three times. The problem is, when you have the same exact words, its hard to assemble, said Lander, just as if jigsaw puzzle pieces show the same exact blue sky.

In 2004, the genome project reportedthat there were 341 gaps in the sequence. Most of the gaps 250 are in the main part of each chromosome, where genes make the proteins that life runs on. These gaps are tiny. Only afew gaps 33 at last count lie in or near each chromosomes centromere (where the two parts of a chromosome connect) and telomeres (the caps at the end of chromosomes), but these 33 are 10 times as long in totalas the 250 gaps.

That makes the centromeres in particular the genomes uncharted Zambezi. Evan Eichler of the University of Washingtonsaid every chromosome has such sequence-defying repetitive elements think of them as DNA stutters including an infamous one thats 171 letters long and repeated end-to-end for thousands of letters.

At the beginning of the Human Genome Project, said Lander, now director of the Broad Institute of MIT and Harvard, it became very clear these highly repetitive sequences would not be tractable with existing technology. It wasnt a cause of a great deal of agonizing at the time, since he and other project leaders expected the next generation of scientists to find a solution.

That hasnt really happened, partly because there hasnt been much motivation to map these regions.Im between agnostic and a little skeptical that these bits will be important for disease, but maybe Im saying that because we cant read them, Lander said.

As new sequencing technology has begun allowing scientists topeek into unsequenced territory, however, theyhave seen that these tough-to-sequence regions frequently have important genes, said Michael Hunkapiller, chairman and CEO of Pacific Biosciences, which makes DNA sequencers. (In 1998, Hunkapiller recruited Venter to his new company, Celera Genomics, to race the government-backed genome project; the race ended in a de facto tie.)

PacBios reason for being is to increase the length of DNA segments that can be read and assemble them, Hunkapiller said. Longer reads have an effect like enlarging jigsaw puzzle pieces; even though the pieces still contain a lot of repeated blue sky, the greater size makes it more likely theyll also contain something sufficiently novel to make assembling them easier. PacBios maximum DNA read is now about 60,000 letters,Hunkapiller said, andaverages 15,000.

With such long reads, Lander said, you could get through a lot of these nasty [unsequenced] regions.

Genome writers gather in New York to pitch bomb-sniffing plants and more. Wheres the funding?

Thats looking more and more like a worthy undertaking, and not only because the unsequenced regions might contain actual protein-making genes. There is evidence that the non-gene parts especially the DNA stutters clearly have disease implications, Hunkapiller said. Three-quarters of the [genome] differences between one person and another are in [such] variants rather than the single-letterspelling differences in As, Ts, Cs, and Gs which get all the attention.In a 2007 paper, Venter (now the chairman of Human Longevity Inc.) and his team showed that there are more person-to-person differences like this, called structural variants, than there are single-letter changes.

Yet about 90 percent of the structural variants, the vast majority of which werent sequenced by either the genome project or a later effort called the 1000 Genomes Project, have been missed, Eichler and his colleagues reported last year.

One reason the stutters are unusually influential is that this repetitive DNA can move around, make copies of itself, flip its orientation, and do other acrobaticsthat can have quite dramatic functional effects, Hunkapiller said. For one thing, repetitive elements around the centromeres, called satellites, might cause a dividing cell to become cancerous, Miga said, because they can destabilizethe entire genome.

When researchers at Stanford University tried to find the genetic cause of a young mans mysterious disease, which caused non-cancerous tumors to grow throughout his body, they found nothing using the standard whole-genome sequencing, Hunkapiller said. But the long reads made possible by the PacBio machines looked for structural variants and found the problem right away, he said.

The stutters might even be what makes us human. Some of these complex duplications appear to be important for the evolution of higher neuroadaptive function aka brain development, Eichler said. A gene called ARHGAP11B, which was created by one such duplication, causes the cortex to develop the myriad folds that support complex thought; SRGAP2C, also a duplication, triggers brain development.

These are new genes that evolved specifically in our lineage over the last few million years, said Eichler. The same duplications can also produce DNA rearrangements associated with neurodevelopmental disorders such as autism and intellectual disability.

Finish the sequence! hasnt become a rallying cry, but maybe it should be, Venter said: Id be the last one to give you a quote saying that we dont need to bother with these [unsequenced] regions.

Sharon Begley can be reached at sharon.begley@statnews.com Follow Sharon on Twitter @sxbegle

Trending

Andy Slavitt cant stop: How a health care wonk

Andy Slavitt cant stop: How a health care wonk became a rabble-rouser

Dope Sick: A harrowing story of best friends, addiction

Dope Sick: A harrowing story of best friends, addiction and a stealth killer

Truly terrifying: Chinese suppliers flood US and Canada with

Truly terrifying: Chinese suppliers flood US and Canada with deadly fentanyl

Recommended

FDA to study whether enough doctors understand claims in

FDA to study whether enough doctors understand claims in ads for cancer drugs

Bluebird Bios gene therapy for blood disorders yields some

Bluebird Bios gene therapy for blood disorders yields some impressive results but also raises questions

In defense of Medicares annual wellness exam

In defense of Medicares annual wellness exam

Read the original here:
Psst, the human genome was never completely sequenced. Some scientists say it should be - STAT

Related Posts