10 million sequences of COVID-19s genomic code have now been organized into a phylogenetic tree in the UC Santa Cruz SARS-CoV-2 Browser, which is the largest tree of genomic sequences of a single species ever assembled. This accomplishment is impressive for both the computer engineering feat of processing such a massive amount of data and the incredible dedication and coordination of the researchers involved.
It is an astounding thing that has happened there, said Clay Fischer, Project Manager for the UCSC Genome Browser.
All of these sequences are assembled by the researchers into a phylogenetic tree that shows the evolutionary history of the virus, with different branches representing the lineages that have mutated throughout the pandemic. This tree is powered by a software tool called UShER that was developed at the UC Santa Cruz Genomics Institute and is hosted on the UCSC Genome Browser website.
Many hands from around the world have brought the Genomics Institute these 10 million sequences that live on the UShER tree. Clinicians worldwide have administered tests to be sent off to local labs, which then sent the samples on for sequencing. Once they are sequenced, they become digital files that are uploaded to databases for genomic information such as GISAID, GenBank, or the COG-UK database.
Angie Hinrichs, a senior software architect at the UCSC Genome Browser and self-described data wrangler, built a pipeline to pull these sequences into the UShER tree automatically. But this process was complicated as some databases, like GISAID, had restrictions that necessitated the manual download of sequences.
For the first half of 2021, I would download them every night before bed, Hinrichs said.
Hinrichs has worked at the UCSC Genome Browser for twenty years. She keeps a low profile, usually preferring to work behind the scenes than in the spotlight. But according to her colleagues, her work curating the tree of COVID-19 genomes and coordinating with the CDC and other health organizations has been of great importance to the pandemic relief effort. She is a part of the Pango team of volunteers who have been monitoring virus sequences to identify new variants. She takes on the ongoing, daily maintenance of updating and annotating the UShER tree, which recently became the default software used by the Pangolin tool, a system used by health officials worldwide to track the spread of variants in their community.
UShER was created early in the pandemic, when researchers at the UC Santa Cruz Genomics Institute recognized that tracing the evolution of a quickly evolving global pathogen like COVID-19 would require a phylogenetic tree that was able to handle an unprecedented amount of data. So, the Genomics Institutes scientific director David Haussler gathered together a team to focus on pathogen genomics, led by Assistant Professor of Biomolecular Engineering Russell Corbett-Detig and including then-postdoc Yatish Turakhia. Turakhia originally wrote the UShER software, which has the ability to rapidly add a new genome sequence to a very large tree of genome sequences.
Making a tree that can handle so much data is an incredible feat of computer engineering that has required herculean efforts from a number of researchers. Before the current pandemic, phylogenetic trees for comparing viral samples were relatively common, but they were built from comparatively small numbers of sequences.
As unprecedented numbers of SARS-CoV-2 sequences became available, the standard tree-building tools simply could not keep up, and researchers often struggled to make sure their analysis kept pace with the amount of samples they would receive. UShERs software and the sustained effort of the team made it possible to grow the tree apace with the pandemics flood of sequences.
Hinrichs says that her two decades of experience working with the massive amounts of data stored on the UCSC Genome Browser helped prepare her to work with the COVID-19 lineages on UShER.
This data coordination is what makes our resources really powerful, Hinrichs said. We have really great resources here, and really great people.
One of those great resources is UCSCs amazing computing hardware maintained by Jorge Garcia, Haifang Telc, and Erich Weiler. Hinrichs explained that having that computing power has been essential for this project.
Big data is our thing, so we were ready to jump on this, she said.
At the beginning of the pandemic, the UCSC pathogen genomics team made guesses as to how many COVID-19 sequences the tree would need to be able to handle. Only Corbett-Detig thought it would reach a million no one anticipated reaching 10 million.
I still get surprised at how far weve come, Turakhia said. The unimaginable amount of data we were able to handle and the fact that we are able to make sense of it quickly is mind-boggling as a computational genomicist.
As the tree has grown, it has required constant attention and updates. Cheng Ye, an undergrad in Turakhias new lab at UC San Diego, was also able to figure out a way to add new sequences faster when the tree had grown to contain millions of sequences already, and helped develop a tool called MatOptimize that moves sequences around on the tree when more data makes it apparent that the original placement was less optimal.
Accumulating reliable data has been instrumental to better understanding what we are up against in the fight against COVID-19 and all its variants. While little was known about this virus at the start of the pandemic, the tree-building tools developed at UC Santa Cruz have helped to put the history of the virus in some perspective and to predict its future, and researchers across campus have leveraged their expertise to aid in the relief efforts. The progress has been astounding; but for the researchers on the browser team, the urgency of their mission and the sheer amount of data that needs to be curated has also been overwhelming at times. Fischer acknowledges that this level of dedication comes at a cost.
It has been two years of blood, sweat, and tears, he said.
See the original post:
The team behind a tree of 10 million Covid sequences - University of California, Santa Cruz
- ENCODE: Encyclopedia Of DNA Elements - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- 07.05.2010 - The Human Genome [ Coast To Coast AM ] - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- NOVA scienceNOW : 51 - Public Genomes, Algae Fuel, Mystery of the Gakkel Ridge, Yoky Matsuoka - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Vincent T. - Genome (Club Remix) - [Preview] - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Comparing The Human And Chimpanzee Genomes - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Whole Genome Sequencing and Its Impact on Clinical Care - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Carlos Bustamante -- "Reconstructing the Great Human Diasporas from Genome Variation Data" - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- 3 Sad Surprises: The Human Genome Project - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- The RFW interviews Genome - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Science Bulletins: Scientists Peer Inside "Superbug" Genome - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Genome : Live @ Smu's : June 3 2012 - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Inoki Genome Federation - Genome 19 - 04 02 2012 - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- THE HUMAN GENOME MUSIC PROJECT - CHROMOSOME 1 - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Genomic Medicine - Bruce Korf (2012) - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Human Genome's 'Blockbuster' Potential Undervalued in Bid GSK vs HGSI - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Announcing the Completion of the First Survey of the Entire Human Genome at the White House - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- DNA analysis Part I. Genomic Sequencing - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- The Genome Question: Moore vs. Jevons with Bud Mishra - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Genome-Wide Association Studies - Karen Mohlke (2012) - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- New human genome research aids understanding of disease [Last Updated On: September 8th, 2012] [Originally Added On: September 8th, 2012]
- UNC Lineberger scientists lead definition of key lung cancer genome [Last Updated On: September 10th, 2012] [Originally Added On: September 10th, 2012]
- Illumina Announces Expedited Individual Genome Sequencing Service (IGS) [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Genome research given a boost with opening of bioscience facility [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Re-Imagining Our Genes: ENCODE Project Reveals Genome as an Information Processing System [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Illumina unveils upgraded genome sequence service [Last Updated On: September 12th, 2012] [Originally Added On: September 12th, 2012]
- US Personalized Cancer Genome Sequencing Market [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- Yale maps “uncharted” genome regions [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- Research and Markets: US Personalized Cancer Genome Sequencing Market [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- 3Qs: New clues to unlocking the genome [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- Oyster Genome Pries Open Mollusk Evolutionary Shell [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Bangladeshi scientist decodes genome of deadly fungus [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Oyster genome uncover the stress adaptation and complexity of shell formation [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- The oyster genome reveals stress adaptation and complexity of shell formation [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Diseases of aging map to a few 'hotspots' on the human genome [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- GnuBIO Awarded $4.5 Million in Funding from the National Human Genome Research Institute to Develop Lower Cost Genome ... [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Oyster genome mystery unravelled [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Devangshu Datta: What's in a genome [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Pacific Oyster Genome Shows Stress Adaptation And Complexity Of Shell Formation [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- UNC Lineberger scientists lead cancer genome analysis of breast cancer [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- Encoding the human genome [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- Cancer genome analysis of breast cancer: Team identifies genetic causes and similarity to ovarian cancer [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- Fungus genome map paves way for 'Snow White' jute variety [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- New online, open access journal focuses on microbial genome announcements [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- By Simply Sharing, Doctors Could Unlock the Genome's Potential [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- Forget the Cloud—Knome Offers Genome Analysis in a Box [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- BGI@CHOP Joint Genome Center to Offer Clinical Next-Generation Sequencing Services [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- Holy Bat Virus! Genome Hints At Origin Of SARS-Like Virus [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- Community Fundraising Effort Helps Researchers Sequence Parrot Genome [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- UMass Med professors are sleuths of the genome [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- Knome Introduces the knoSYS™100; First Plug-and-Play Human Genome Interpretation System [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- First large scale trial of whole-genome cancer testing for clinical decision-making reported [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- Should You Get Your Genome Mapped? [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- Surprising differences between apples and pears [Last Updated On: October 2nd, 2012] [Originally Added On: October 2nd, 2012]
- 50-Hour Whole Genome Sequencing Provides Rapid Diagnosis for Children With Genetic Disorders [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- A map of rice genome variation reveals the origin of cultivated rice [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Genome analysis promises hope for breast cancer patients [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Genome Alberta Welcomes Alberta Minister of Enterprise and Advanced Education, Stephen Khan and Federal Minister of ... [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Fifty-hour whole genome sequencing provides rapid diagnosis for children with genetic disorders [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Will Low-Cost Genome Sequencing Open 'Pandora's Box'? [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Genome testing could help individualize treatments [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Would you get your genome tested? [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- The Genome — a Pandora's Box? [Last Updated On: October 4th, 2012] [Originally Added On: October 4th, 2012]
- Fast genome test could help sick newborns [Last Updated On: October 4th, 2012] [Originally Added On: October 4th, 2012]
- In-Depth Genome Analysis Moves Toward The Hospital Bed [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Your Verdict On Getting A Genome Test? Bring It On [Last Updated On: October 6th, 2012] [Originally Added On: October 6th, 2012]
- Genome-wide study identifies 8 new susceptibility loci for atopic dermatitis [Last Updated On: October 7th, 2012] [Originally Added On: October 7th, 2012]
- Genome-wide study identifies eight new susceptibility loci for atopic dermatitis [Last Updated On: October 7th, 2012] [Originally Added On: October 7th, 2012]
- Genome interpreter vies for place in clinical market [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- The $1,000 Genome: A Bait and Switch? [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- Mount Sinai School of Medicine Offers First-Ever Course with Whole Genome Sequencing [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- First whole genome sequencing of multiple pancreatic cancer patients has been outlined [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Cheap genome sequences demand new rules on privacy [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- UConn Gets Grant For Genome Research [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Inconsistent Genome Privacy Laws Need Toughening, Panel Says [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- US panel calls for stronger privacy for genome data [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- Genome Canada Board Appoints New Chair [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- The $1,000 Genome Is Almost Here- Are We Ready? [Last Updated On: October 15th, 2012] [Originally Added On: October 15th, 2012]
- Global genome effort seeks genetic roots of disease [Last Updated On: October 31st, 2012] [Originally Added On: October 31st, 2012]
- Massive encyclopedia helps explain how the human genome works [Last Updated On: October 31st, 2012] [Originally Added On: October 31st, 2012]
- Genome evolution and carbon dioxide dynamics [Last Updated On: October 31st, 2012] [Originally Added On: October 31st, 2012]