Unusual low-quality ONT genomes due to extensive modifications
We sequenced 12 microbial strains of Listeria monocytogenes using Illumina and ONT R9.4 flowcells (~200990Mbp, SUP model) (Fig.1a, Supplementary Tables1 and 2). The ONT reads were assembled into genomes with sequencing errors further polished by Medaka and Homopolish (Supplementary Table3, see Methods). The Illumina and ONT read were hybrid assembled for evaluation purposes (Supplementary Table4). When compared with the Illumina/ONT hybrid assemblies (Fig.1b), seven ONT-only genomes exhibited high quality (HQ) ranging from Q47 to Q60 (e.g., R19-2905 and R20-0088). However, five isolates (R20-0026, R20-0030, R20-0127, R20-0148, and R20-0150) showed unexpectedly low quality (LQ) varying from Q26 to Q32. The accuracy of these five LQ genomes remained unimproved after replicated ONT sequencing. Further investigation of the five LQ genomes revealed excessive amounts of mismatch errors (15335670) compared with the seven HQ ones (040 mismatches) (Fig.1c). Homopolymer errors (i.e., indels) were not the source of inferior quality (7306, Supplementary Table5).
a Workflow of ONT-only and ONT/Illumina hybrid assembly; b Q scores; c number of mismatches (red: LQ, gray: HQ); d comparison of ONT and Illumina reads by IGV; e numbers of 5mC, 6mA, and mismatches between HQ/LQ strains (n=12, red: LQ, gray: HQ). Error bars represent the minimum and maximum values.
Manual inspection revealed that these mismatches were ONT basecalling errors uncorrected after genome polishing (Fig.1d and Supplementary Fig.1). As mismatch errors in ONT are mainly due to epigenetic modifications, we computed the frequency of well-known methylation in these isolates (see Method and Supplementary Table6). In terms of 5-methylcytosine (5mC), the numbers of modified loci in the five LQ genomes (~240340k) were not significantly higher than those in the HQ ones (210345k, P=0.89, Fig.1e). Similarly, the numbers of N6-methyladenine (6mA) modifications also showed no significant difference between the LQ and HQ groups (98218k vs. 126223k, P=0.34). Because the numbers of mismatch errors in LQ genomes are significantly higher than those of HQ ones (P=0.005), we suspected ONT basecalling algorithms failed to distinguish the novel modification types in the LQ isolates.
We removed the modifications in all microbial samples by WGA (Fig.2a), which randomly amplifies the genome fragments without retaining any epigenetic modification (see Methods). The WGA-demodified samples were sequenced by ONT (R9.4), assembled into chromosomes, and compared with the Illumina/ONT hybrid genomes (Fig.2a, Supplementary Tables7 and 8). The five LQ genomes after WGA exhibited significantly higher quality than those without demodifications (e.g., Q26 to Q53 in R20-0026) (Fig.2b, Supplementary Table9). In particular, the amounts of mismatch errors significantly reduced after demodification (e.g., 5670 to 16 in R20-0026) (Fig.2c). Consequently, the unexpected low quality of ONT was due to excessive modification-induced errors untrained in their basecalling model. The demodification by WGA can produce high-quality ONT genomes without the need for Illumina short reads.
a Worflow of WGA-demodified ONT; b Q scores of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); c numbers of mismatches of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); d WGA and ONT-only genome quality with respect to sequencing depth (shading: mininum and maximum quality in five replicates, line: median quality); e numbers of active/available pores during WGA-demodified and ordinary ONT sequencing.
However, while WGA successfully erased these modifications, the sequencing cost increased by two factors. First, WGA required a higher sequencing depth (~100) for assembling a complete genome when compared with ordinary ONT sequencing (~30) (Fig.2d and Supplementary Figs.2 and 3). It was due to the uneven amplification of WGA, which led to non-uniform sequencing depth and a fragmented assembly at moderate coverage. Second, the WGA-demodified samples may reduce the ONT yields. We observed the number of available/active pores could sometimes decrease quickly (e.g., less than 100 pores after 12h) (Fig.2e), which was possibly owing to the hyperbranched structure unresolved after WGA10. Consequently, the sequencing cost of WGA-demodified samples using ONT is much higher than ordinary sequencing.
We developed a novel computational method (called Modpolish) for correcting these modification-mediated errors without WGA and prior knowledge of the modification systems. Modpolish identifies and corrects the modification-mediated errors by leveraging basecalling quality, basecalling consistency, and evolutionary conservation (Fig.3a, see Methods). Briefly, because the ONT signals are disturbed by modifications, the basecalling quality is substantially lower than the modification-free loci (Supplementary Fig.4). As such, the basecalled nucleotides are often inconsistent at the modified loci (Supplementary Fig.5), yet these loci are within conservative motifs (Supplementary Fig.6). In conjunction with the conservation degree measured by closely-related genomes, only the modified loci with ultra-high conservation will be corrected by Modpolish, avoiding false corrections of strain variations with high specificity.
a Workflow of Modpolish; b Q scores before and after Modpolish; c numbers of mismatches before and after Modpolish (gray: before Modpolish, black: after Modpolish); d the antiviral defending systems encoded by the 12 strains (gray: before Modpolish, black: after Modpolish); e the sequence motif of modification sites in the four mza-encoding strains; f the sequence motif of modification sites on the R20-0026 strain.
We assessed the accuracy of Modpolish by comparing the quality of the ONT-only genomes (polished by Medaka) with those further polished by Modpolish. The results indicated that Modpolish significantly improved the quality of all LQ genomes from Q2734 to Q60 (Fig.3b, Supplementary Table10). The number of mismatches also greatly decreased (e.g., from 5670 to 67 in R20-0026) (Fig. (3c). The numbers of mismatches in some HQ genomes were also reduced by Modpolish. For instance, the mismatches in the R19-2905 were erased from 40 to 6. Consequently, our results suggested that Modpolish made no false corrections on the HQ genomes (Supplementary Tables1113). The comparison of different basecaller versions and models (v4.0.14 vs. v6.3.4, HAC vs. SUP) indicated that these errors remain exist and Modpolish successfully erases most of them (Supplementary Fig.7).
As the modification systems often involve anti-phage defense (e.g., R-M, BREX, DISARM)11,12,13, we investigated the defending systems possessed by the HQ and LQ strains (Fig.3d) (Supplementary Data1). All the HQ genomes encompass at least one R-M system (e.g., Type I, II, or III), which is missing in all LQ isolates. Instead, four LQ strains (i.e., R20-0030, R20-0127, R20-0148, R20-150) carry a novel methyltransferase-encoding mza defending system which is absent in all HQ genomes (Supplementary Fig.8). Analysis of modification sites of the four mza-encoding LQ strains revealed pentanucleotide motif GCAGC (Fig.3e, Supplementary Fig.6). On the other hand, modification loci in the LQ R20-0026 all centered on the motif GCTGG (Fig.3f). Together, these results suggested that two lineage-specific modification systems extensively edited the five LQ genomes. Although their underlying mechanisms remained unclear, the editing at specific motifs with high conservation within each lineage allowed cost-effective in silico correction of these errors by Modpolish.
We then assessed the performance of Modpolish on public ONT datasets sequenced by R9.4 (SUP) and R10.4 flowcells (SUP, duplex/simplex modes). In the R9.4 dataset14, we first compared the quality of seven bacterial genomes polished by Medaka and Modpolish (Fig.4a, Supplementary Table14). The quality of five genomes significantly improved from ~Q45 to Q60. Similarly, the improvement was mainly due to the reduction of mismatches (Fig.4b). For instance, the number of mismatches decreased from 388 to 13 in the Staphylococcus genome after Modpolish. On average, the mismatch reduction rates of all genomes ranged from 50-96%. Consequently, although these bacterial genomes are not extensively modified, Modpolish can further improve their quality after Medaka without false corrections.
Comparison of Medaka and Modpolish for a Q scores and b mismatches on the R9.4 dataset; comparison of Medaka and Modpolish for c Q scores and d mismatches on the R10.4 dataset.
In the R10.4 (duplex mode) dataset3, we compared the genome qualities polished by Medaka and Modpolish (downsampled to ~60) (Fig.4c, Supplementary Table15). In general, Modpolish made little or no improvement in the duplex dataset. For instance, the mismatches produced by Modpolish only reduced from 20 to 19 on the Bacillus genome (Fig.4d). The overall genome quality is very high such that no differences can be seen (Q60). Modpolish demonstrated marginal on a recently published simplex dataset (R10.4, kit 14, Dorado v0.1.1) (Supplementary Fig.9). Therefore, the qualities of ONT R10.4 flowcells, in particular the duplex mode, is not only higher than those of R9.4 and require nearly no further correction. On the other hand, Modpolish may be used to fill the accuracy gap between simplex and duplex modes when the projects aim for higher throughput.
View original post here:
Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based ... - Nature.com
- ENCODE: Encyclopedia Of DNA Elements - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- 07.05.2010 - The Human Genome [ Coast To Coast AM ] - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- NOVA scienceNOW : 51 - Public Genomes, Algae Fuel, Mystery of the Gakkel Ridge, Yoky Matsuoka - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Vincent T. - Genome (Club Remix) - [Preview] - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Comparing The Human And Chimpanzee Genomes - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Whole Genome Sequencing and Its Impact on Clinical Care - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Carlos Bustamante -- "Reconstructing the Great Human Diasporas from Genome Variation Data" - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- 3 Sad Surprises: The Human Genome Project - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- The RFW interviews Genome - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Science Bulletins: Scientists Peer Inside "Superbug" Genome - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Genome : Live @ Smu's : June 3 2012 - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Inoki Genome Federation - Genome 19 - 04 02 2012 - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- THE HUMAN GENOME MUSIC PROJECT - CHROMOSOME 1 - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Genomic Medicine - Bruce Korf (2012) - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Human Genome's 'Blockbuster' Potential Undervalued in Bid GSK vs HGSI - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Announcing the Completion of the First Survey of the Entire Human Genome at the White House - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- DNA analysis Part I. Genomic Sequencing - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- The Genome Question: Moore vs. Jevons with Bud Mishra - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- Genome-Wide Association Studies - Karen Mohlke (2012) - Video [Last Updated On: September 7th, 2012] [Originally Added On: September 7th, 2012]
- New human genome research aids understanding of disease [Last Updated On: September 8th, 2012] [Originally Added On: September 8th, 2012]
- UNC Lineberger scientists lead definition of key lung cancer genome [Last Updated On: September 10th, 2012] [Originally Added On: September 10th, 2012]
- Illumina Announces Expedited Individual Genome Sequencing Service (IGS) [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Genome research given a boost with opening of bioscience facility [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Re-Imagining Our Genes: ENCODE Project Reveals Genome as an Information Processing System [Last Updated On: September 11th, 2012] [Originally Added On: September 11th, 2012]
- Illumina unveils upgraded genome sequence service [Last Updated On: September 12th, 2012] [Originally Added On: September 12th, 2012]
- US Personalized Cancer Genome Sequencing Market [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- Yale maps “uncharted” genome regions [Last Updated On: September 18th, 2012] [Originally Added On: September 18th, 2012]
- Research and Markets: US Personalized Cancer Genome Sequencing Market [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- 3Qs: New clues to unlocking the genome [Last Updated On: September 19th, 2012] [Originally Added On: September 19th, 2012]
- Oyster Genome Pries Open Mollusk Evolutionary Shell [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Bangladeshi scientist decodes genome of deadly fungus [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Oyster genome uncover the stress adaptation and complexity of shell formation [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- The oyster genome reveals stress adaptation and complexity of shell formation [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Diseases of aging map to a few 'hotspots' on the human genome [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- GnuBIO Awarded $4.5 Million in Funding from the National Human Genome Research Institute to Develop Lower Cost Genome ... [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Oyster genome mystery unravelled [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Devangshu Datta: What's in a genome [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- Pacific Oyster Genome Shows Stress Adaptation And Complexity Of Shell Formation [Last Updated On: September 20th, 2012] [Originally Added On: September 20th, 2012]
- UNC Lineberger scientists lead cancer genome analysis of breast cancer [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- Encoding the human genome [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- Cancer genome analysis of breast cancer: Team identifies genetic causes and similarity to ovarian cancer [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- Fungus genome map paves way for 'Snow White' jute variety [Last Updated On: September 24th, 2012] [Originally Added On: September 24th, 2012]
- New online, open access journal focuses on microbial genome announcements [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- By Simply Sharing, Doctors Could Unlock the Genome's Potential [Last Updated On: September 25th, 2012] [Originally Added On: September 25th, 2012]
- Forget the Cloud—Knome Offers Genome Analysis in a Box [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- BGI@CHOP Joint Genome Center to Offer Clinical Next-Generation Sequencing Services [Last Updated On: September 28th, 2012] [Originally Added On: September 28th, 2012]
- Holy Bat Virus! Genome Hints At Origin Of SARS-Like Virus [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- Community Fundraising Effort Helps Researchers Sequence Parrot Genome [Last Updated On: September 29th, 2012] [Originally Added On: September 29th, 2012]
- UMass Med professors are sleuths of the genome [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- Knome Introduces the knoSYS™100; First Plug-and-Play Human Genome Interpretation System [Last Updated On: September 30th, 2012] [Originally Added On: September 30th, 2012]
- First large scale trial of whole-genome cancer testing for clinical decision-making reported [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- Should You Get Your Genome Mapped? [Last Updated On: October 1st, 2012] [Originally Added On: October 1st, 2012]
- Surprising differences between apples and pears [Last Updated On: October 2nd, 2012] [Originally Added On: October 2nd, 2012]
- 50-Hour Whole Genome Sequencing Provides Rapid Diagnosis for Children With Genetic Disorders [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- A map of rice genome variation reveals the origin of cultivated rice [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Genome analysis promises hope for breast cancer patients [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Genome Alberta Welcomes Alberta Minister of Enterprise and Advanced Education, Stephen Khan and Federal Minister of ... [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Fifty-hour whole genome sequencing provides rapid diagnosis for children with genetic disorders [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Will Low-Cost Genome Sequencing Open 'Pandora's Box'? [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Genome testing could help individualize treatments [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- Would you get your genome tested? [Last Updated On: October 3rd, 2012] [Originally Added On: October 3rd, 2012]
- The Genome — a Pandora's Box? [Last Updated On: October 4th, 2012] [Originally Added On: October 4th, 2012]
- Fast genome test could help sick newborns [Last Updated On: October 4th, 2012] [Originally Added On: October 4th, 2012]
- In-Depth Genome Analysis Moves Toward The Hospital Bed [Last Updated On: October 5th, 2012] [Originally Added On: October 5th, 2012]
- Your Verdict On Getting A Genome Test? Bring It On [Last Updated On: October 6th, 2012] [Originally Added On: October 6th, 2012]
- Genome-wide study identifies 8 new susceptibility loci for atopic dermatitis [Last Updated On: October 7th, 2012] [Originally Added On: October 7th, 2012]
- Genome-wide study identifies eight new susceptibility loci for atopic dermatitis [Last Updated On: October 7th, 2012] [Originally Added On: October 7th, 2012]
- Genome interpreter vies for place in clinical market [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- The $1,000 Genome: A Bait and Switch? [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- Mount Sinai School of Medicine Offers First-Ever Course with Whole Genome Sequencing [Last Updated On: October 10th, 2012] [Originally Added On: October 10th, 2012]
- First whole genome sequencing of multiple pancreatic cancer patients has been outlined [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Cheap genome sequences demand new rules on privacy [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- UConn Gets Grant For Genome Research [Last Updated On: October 11th, 2012] [Originally Added On: October 11th, 2012]
- Inconsistent Genome Privacy Laws Need Toughening, Panel Says [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- US panel calls for stronger privacy for genome data [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- Genome Canada Board Appoints New Chair [Last Updated On: October 12th, 2012] [Originally Added On: October 12th, 2012]
- The $1,000 Genome Is Almost Here- Are We Ready? [Last Updated On: October 15th, 2012] [Originally Added On: October 15th, 2012]
- Global genome effort seeks genetic roots of disease [Last Updated On: October 31st, 2012] [Originally Added On: October 31st, 2012]
- Massive encyclopedia helps explain how the human genome works [Last Updated On: October 31st, 2012] [Originally Added On: October 31st, 2012]
- Genome evolution and carbon dioxide dynamics [Last Updated On: October 31st, 2012] [Originally Added On: October 31st, 2012]