Category Archives: Genome

Why Didnt the U.S. Detect Omicron Cases Sooner? – The New York Times

Posted: December 7, 2021 at 6:02 am

Last Friday, just a day after South African scientists first announced the discovery of the Omicron variant, Europe reported its first case: The new coronavirus variant was in Belgium. Before the weekend was out, Australia, Britain, Canada, Denmark, Germany, Israel, Italy and other countries had all found cases.

But in the United States, scientists kept searching.

If we start seeing a variant popping up in multiple countries across the world, usually my intuition is that its already here, said Taj Azarian, a genomic epidemiologist at the University of Central Florida.

On Wednesday, American officials announced that scientists had found it in a California patient who had recently returned from South Africa. By then, Canada had already identified six cases; Britain had found more than a dozen.

On Thursday, additional cases were identified in Minnesota, Colorado, New York and Hawaii, and a second case was found in California, indicating that more are almost certainly lurking, scientists said. Why wasnt the variant detected sooner?

There are various potential explanations, including travel patterns and stringent entrance requirements that may have delayed the variants introduction to the United States. But there are also blind spots and delays in the countrys genomic surveillance system. With many labs now conducting a targeted search for the variant, the pace of detection could quickly pick up.

Since the beginning of the pandemic, scientists have been sequencing the genetic material from samples of the virus, a process that allows them to spot new mutations and identify specific variants. When done routinely and on a large scale, sequencing also allows researchers and officials to keep tabs on how the virus is evolving and spreading.

In the United States, this kind of broad genomic surveillance got off to a very slow start. While Britain quickly harnessed its national health care system to launch an intensive sequencing program, early sequencing efforts in the United States, based primarily out of university laboratories, were more limited and ad hoc.

Even after the C.D.C. launched a sequencing consortium in May 2020, sequencing efforts were stymied by a fragmented health care system, a lack of funding and other challenges.

In January, when cases were surging, the United States was sequencing fewer than 3,000 samples a week, according to the C.D.C.s dashboard, far less than 1 percent of reported cases. (Experts recommend sequencing at least 5 percent of cases.)

But in recent months, the situation has improved dramatically, thanks to a combination of new federal leadership, an infusion of funding and an increasing concern about the emergence and spread of new variants, experts said.

Genomic surveillance really has caught up in the U.S., and it is very good, said Dana Crawford, a genetic epidemiologist at Case Western Reserve University.

The country is now sequencing approximately 80,000 virus samples a week and 14 percent of all positive P.C.R. tests, which are conducted in labs and considered the gold standard for detecting the virus, Dr. Rochelle P. Walensky, the director of the Centers for Disease Control and Prevention, said at a White House briefing on Tuesday.

The problem is that the process takes time, especially when done in volume. The C.D.C.s own sequencing process typically takes about 10 days to complete after it receives a specimen.

We have really good surveillance in terms of quantity, said Trevor Bedford, an expert on viral evolution and surveillance at the Fred Hutchinson Cancer Research Center in Seattle. He added, But by nature, it lags compared to your case reporting. And so well have good eyes on things from two weeks ago.

This kind of delay is not uncommon in countries that have a lot of samples to sequence, Dr. Bedford said.

In some states, the timeline is even longer. The Ohio Department of Health notes that, from start to finish, the process of collecting the sample, testing it, sequencing it and reporting it can take a minimum of 3-4 weeks.

But now that scientists know what they are looking for, they should be able to expedite the process by prioritizing samples that seem most likely to be Omicron, scientists said.

Dec. 6, 2021, 9:04 p.m. ET

In one small bit of luck, Omicron generates a different genetic signal on P.C.R. tests than the Delta variant, which currently accounts for essentially all coronavirus cases in the United States. (In short, mutations in the new variants spike gene mean that Omicron samples test negative for the gene, while testing positive for a different telltale gene.)

Many labs are now expediting these samples, as well as samples from people who recently returned from abroad, for sequencing.

All of the agencies that are involved with genomic surveillance are prioritizing those recent travel-associated cases, Dr. Azarian said.

That may have been how the first California case was flagged so quickly. The patient returned from South Africa on Nov. 22 and began feeling sick on Nov. 25. The person tested positive for the virus on Monday and scientists then sequenced the virus, announcing that they had detected Omicron two days later.

The quick turnaround by the U.S. genomic surveillance system is another example of how much better our system has become over the past few months, Dr. Crawford said.

As much as surveillance has improved, there are still gaps that could slow the detection of more cases in the United States, including enormous geographic variation.

Some states are lagging behind, said Massimo Caputi, a molecular virologist at the Florida Atlantic University School of Medicine.

Over the last 90 days, for instance, Vermont has sequenced and shared about 30 percent of its virus cases and Massachusetts has sequenced about 20 percent, according to GISAID, an international database of viral genomes. Six states, on the other hand Kentucky, Pennsylvania, Ohio, South Carolina, Alabama and Oklahoma have each sequenced and reported fewer than 3 percent of their cases, according to GISAID.

Moreover, scientists can only sequence samples from cases that are detected, and the United States has often struggled to perform enough testing.

Testing is the weakest part of our pandemic response, said Dr. Eric Topol, the founder and director of Scripps Research Translational Institute in La Jolla, Calif. It has been from day one.

Although testing, like genomic surveillance, has vastly improved since the early days of the pandemic, it is still highly uneven. And while rapid, at-home tests have many advantages, the shift of some testing from the lab to the home may present new challenges for surveillance.

With increasing at-home rapid diagnostic tests, if that isnt followed up with, like, a P.C.R. test, those cases wont get sequenced, said Joseph Fauver, a genomic epidemiologist at the University of Nebraska Medical Center. The problem is not insurmountable, he added, but maybe theres a little blind spot there.

There are other, more optimistic reasons that scientists have not detected more cases, although they remain theoretical.

Perhaps infected patients have mild symptoms, and hence are not getting tested and are not subject to genomic surveillance, said Janet Robishaw, the senior associate dean for research at the Florida Atlantic University College of Medicine.

(It is still far too early to know whether Omicron causes disease that is any more or less severe than other variants, scientists stress. Even if the cases are disproportionately mild, which is not yet clear, that could be because the variant has mostly infected young or vaccinated people so far, who are less likely to develop severe disease.)

It is also possible that there was not much community spread of the variant in the United States until recently. When the cases are mostly isolated, and tied to foreign travel, they can fly under the surveillance radar.

Were kind of looking for a needle in the haystack if were looking for just single cases that are unrelated, Dr. Azarian said.

Although it is not yet clear where Omicron emerged, the first outbreaks were detected in South Africa, where the variant is now widespread.

There are fewer flights between southern Africa and the United States than between that region and Europe, where other early Omicron cases were detected, Dr. Caputi said.

And until early November, the United States had banned international travelers from the European Union and South Africa, he noted. Even when officials lifted the ban, travelers from those locations were still required to provide proof of both vaccination and a recent negative Covid test. These measures may have postponed Omicrons arrival.

It is conceivable that Omicron spread is lagging behind in the U.S., Dr. Caputi said in an email.

Either way, he added, he expected scientists to find more cases soon.

See more here:
Why Didnt the U.S. Detect Omicron Cases Sooner? - The New York Times

Posted in Genome | Comments Off

Omicron Alert: Infected samples from past week to be sent for genome sequence – The Indian Express

Posted: at 6:02 am

Amid growing concerns of new Covid-19 variant Omicron, the BMC has decided to conduct genome sequencing on all samples taken from those who have been infected with coronavirus in the last one week.

The civic body has already started tracing international travelers who have been landing at the Mumbai airport since November 11. If found positive, their samples are being sent for genome sequencing to check for Omicron.

Officials from BMCs public health department said that besides international travelers, samples of infected patients from the past week are also being sent for genome sequencing as a precautionary measure.

According to a list received from the airport, more than 1,000 international passengers from countries at risk have arrived in Mumbai since November 11. Many of them have already completed 15 days and there is very little chance that they may test positive for the virus. However, they could have come in contact with other people. So, samples of daily cases will be sent for genome sequencing, said an official.

Among the international passengers who have arrived at Mumbai airport since November 11, around 100 are from the city. They have been traced and tested, said BMC officials. So far, six international passengers from countries at risk have been found positive and their samples sent for genome sequencing. They are from Mumbai, Mumbai Metropolitan Region, Pimpri-Chinchwad and Pune.

From November 11 till Thursday, 2,868 passengers from 40 countries have arrived at the airport.The BMCs genome sequencing lab in Kasturba Hospital takes about four to five days to test a sample. Also, the machine requires a minimum of 350 samples at a time to conduct tests.

Mumbai on Wednesday reported 108 Covid-19 cases, lowest in 19 months. In all, 37,877 samples were tested. In the last one month, the number of daily cases has seen a gradual decrease.

The BMC is also increasing daily testing in the wards. We are doubling the number of daily tests. Contact tracing of all international passengers who have come to Mumbai in the last couple of weeks has also been done, said Dr Avinash Vaydande, Medical Officer of Health (MOH) of R North ward (Dahisar).A few administrative wards have recorded zero daily cases in recent past. In the last few days, the daily cases in our ward have gone down to one or two, said Dr Jitendra Jadhav, MOH of L ward (Kurla).

Visit link:
Omicron Alert: Infected samples from past week to be sent for genome sequence - The Indian Express

Posted in Genome | Comments Off

Planned to setup in Jammu & Srinagar during 2nd COVID wave, two Genome Sequencing labs nowhere in sight – Jammu Kashmir Latest News | Tourism |…

Posted: at 6:02 am

Govind SharmaJAMMU, Dec 6: Two Genome Sequencing laboratories which were planned to be set up in Jammu as well as in Srinagar during 2nd wave of COVID-19, are nowhere in sight and the health authorities in both the regions are still bound to send the samples of Coronavirus positive patients for Genome Sequencing in National Centre for Disease Control (NCDC) Delhi.As the Omnicron variant of Coronavirus has started spreading its tentacles in India, it has become necessary for the Health authorities to do genome sequencing of every Coronavirus positive patient so that this new variant may be stopped from spreading by detecting it timely as according to the experts across the globe, Omnicron variant spreads two times faster than the Delta variant.In view of Omnicron threat, the Government of India, in its latest guidelines, has asked the States and Union Territories to send samples of every COVID positive patient for genomic testing at the INSACOG laboratory network. Following the guidelines of the Government of India, Health authorities in both Jammu and Kashmir Divisions are sending the samples of the COVID positive patients to NCDC Delhi, which is unable to provide the genome testing reports timely, thereby defeating the objective of the genome sequencing.Official sources told Excelsior that during peak of 2nd COVID wave, Health authorities of the UT had decided to setup one Genome Sequencing lab each in Jammu division as well as in Kashmir region as the NCDC Delhi was taking weeks of time to send reports of genome sequencing. They said at that time it was decided that in Jammu region, the Genome Sequencing lab will be established in Government Medical College Jammu while in Kashmir, it was planned to set up either in SKIMS Soura or GMC Srinagar.Accordingly, the technical specifications for the machine used in Genome Sequencing were shared with Jammu and Kashmir Medical Supplies Corporation Limited (JKMSCL) for procurement of two such machines worth Rs 2 crore (approximately) each, sources said, but lamented that even after a lapse of over six months time, the requisite machines could not be procured and the labs for Genome Sequencing were not established.Sources informed that there are only a few genome sequencing labs in the country and all the States are dependent on these labs for Genome testing of samples of COVID positive patients. Had the genome sequencing labs established timely in the UT as was planned, it would have been very helpful for the Health authorities of the UT to timely detect and check spread of Omnicron variant, they said.When contacted, Dr Yashpal Sharma, Managing Director, JKMSCL said that the tenders for the procurement of the genome testing machines were floated thrice by them but every time only a single bid was received. He said now, the concerned authorities in Jammu as well as in Kashmir are evaluating the technical specifications of received bid and if their specifications match, the financial bid will be opened and the machines will be procured. The whole process will take at least 40-60 days, he added.

The Leading Daily of Jammu and Kashmir , India

See original here:
Planned to setup in Jammu & Srinagar during 2nd COVID wave, two Genome Sequencing labs nowhere in sight - Jammu Kashmir Latest News | Tourism |...

Posted in Genome | Comments Off

Surveillance officers asked to ensure expedited genomic analysis through IGSLs – Greater Kashmir

Posted: at 6:02 am

Meanwhile, an official in the Health department said, We have directions to send all samples of foreign returnees (international travellers) for genome sampling to the lab in Delhi.

He, however, said that they sent 5 percent of the total COVID positive cases in general for their genome sampling in Delhi.

They take around one and half months and the period is too long which needs to be reduced to one week time so that the infected patients can be treated timely, he said. The official said that they had a review meeting yesterday in which they requested their higher ups to ensure that the genome testing report should come within a week's time. During second wave, we had sent samples for genome sequencing and it came too late when the infected person had recovered from COVID19. In such cases the chances of infection to other persons can be more, the official cited an example.

Link:
Surveillance officers asked to ensure expedited genomic analysis through IGSLs - Greater Kashmir

Posted in Genome | Comments Off

The UK Government Wants to Sequence Your Babys Genome – WIRED

Posted: November 28, 2021 at 9:53 pm

In November 2019, Matt Hancock, then the United Kingdoms health secretary, unveiled a lofty ambition: to sequence the genome of every baby in the country. It would usher in a genomic revolution, he said, with the future being predictive, preventative, personalized health care.

Hancocks dreams are finally coming to pass. In October, the government announced that Genomics England, a government-owned company, would receive funding to run a research pilot in the UK that aims to sequence the genomes of between 100,000 and 200,000 babies. Dubbed the Newborn Genomes Programme, the plan will be embedded within the UKs National Health Service and will specifically look for actionable genetic conditionsmeaning those for which there are existing treatments or interventionsand which manifest in early life, such as pyridoxine-dependent epilepsy and congenital adrenal hyperplasia.

It will be at least 18 months before recruitment for participants starts, says Simon Wilde, engagement director at Genomics England. The program wont reach Hancocks goal of including every baby; during the pilot phase, parents will be recruited to join. The results will be fed back to the parents as soon as possible, says Wilde. For many of the rare diseases we will be looking for, the earlier you can intervene with a treatment or therapy, the better the longer-term outcomes for the child are.

The babies genomes will also be de-identified and added to the UKs National Genomic Research Library, where the data can be mined by researchers and commercial health companies to study, with the goal of developing new treatments and diagnostics. The aims of the research pilot, according to Genomics England, are to expand the number of rare genetic diseases screened for in early life to enable research into new therapies, and to explore the potential of having a persons genome be part of their medical record that can be used at later stages of life.

Whole genome sequencing, the mapping of the 3 billion base pairs that make up your genetic code, can return illuminating insights into your health. By comparing a genome to a reference database, scientists can identify gene variants, some of which are associated with certain diseases. As the cost of whole genome sequencing has taken a nosedive (it now costs just a few hundred bucks and can return results within the day), its promises to revolutionize health care have become all the more enticingand ethically murky. Unraveling a bounty of genetic knowledge from millions of people requires keeping it safe from abuse. But advocates have argued that sequencing the genomes of newborns could help diagnose rare diseases earlier, improve health later in life, and further the field of genetics as a whole.

Back in 2019, Hancocks words left a bad taste in Josephine Johnstons mouth. It sounded ridiculous, the way he said it, says Johnston, director of research at the Hastings Center, a bioethics research institute in New York, and a visiting researcher at the University of Otago in New Zealand. It had this other agenda, which isn't a health-based agendait's an agenda of being perceived to be technologically advanced, and therefore winning some kind of race.

Read the original here:
The UK Government Wants to Sequence Your Babys Genome - WIRED

Posted in Genome | Comments Off

Chloranthus genome provides insights into the early diversification of angiosperms – Nature.com

Posted: at 9:53 pm

Genome sequencing, assembly, and annotation

Chloranthus spicatus has a genome size of 2.97Gb (gigabases) based on K-mer analysis (Supplementary Fig.2, Supplementary Data1); the individual sequenced had a heterozygosity rate of 0.99%, which is possibly associated with the obligate outcrossing system of this species31. Genomic DNA was sequenced using three different methods: 182Gb of Oxford Nanopore Technologies (ONT) long reads, 240Gb of shotgun short reads (BGIseq 2000), and 240Gb of Hi-C data.

The assembled genome was 2,964.14Mb with a contig N50 size of 4.59Mb, covering 99.7% of the genome size as estimated by K-mer analysis (Supplementary Data2,3). Assembled contigs were then clustered into 15 pseudochromosomes, covering 99.9% of the original 2,964.14Mb assembly, with a super-scaffold N50 of 191.37Mb. After performing the Hi-C validation, the genome showed high contiguity, completeness, and accuracy (Supplementary Fig.3) with the 15 pseudochromosomes corroborated by previous chromosome counts of 2n=3032. In all, 21,392 protein-coding genes were predicted using a combination of homology-based and transcriptome-based approaches. The proteome was estimated to be at least 96.8% complete based on BUSCO (benchmarking universal single-copy orthologs) assessment (Supplementary Data4).

The results obtained by Tandem Repeats Finder were mapped to predict coding genes of C. spicatus to estimate the proportion of incorrectly detected paralogous genes (Supplementary Data5). In the C. spicatus genome, repetitive elements accounted for 70.09% of the genome assembly, of which 97.67% were annotated as transposable elements (TEs) (Supplementary Data5). Long terminal-repeat retrotransposons (LTRs) were the major class of TEs and accounted for 58.79% of the genome. Among the LTRs, the most abundant elements were Gypsy (68.03% of the LTRs), followed by Copia (19.01% of the LTRs) (Supplementary Data6). The time of insertion of LTRs in the genome of C. spicatus was estimated based on a peak substitution rate of 0.03 (Supplementary Fig.4). We assumed a synonymous substitution rate of 1.51109 bases per year following two recent studies of the closely related magnoliid lineages Liriodendron and Chimonanthus, resulting in an LTR burst time of approximately 9.9Ma(see methods).

Comparison of the gene and genome characteristics (e.g., genome size, gene size, exon and intron sizes) of C. spicatus and 17 other phylogenetically diverse flowering plants (Supplementary Data7) revealed that long genes and long introns were more prevalent in the genomes of Chloranthales and magnoliids compared to other angiosperms (Fig.2a, b; Supplementary Data8). As the presence of nonfunctional genes and variation in total gene numbers among different species would bias the statistics of gene characteristics, we selected 2,184 high-confidence orthologs from C. spicatus, six magnoliids, and two well-characterized angiosperm genomes, Arabidopsis thaliana and Oryza sativa (Supplementary Fig.5a, Supplementary Data9). Comparison of the lengths of the coding regions and introns revealed that the average coding region lengths in all nine plant genomes were similar (ranging from 1,5331,557bp), whereas the lengths of introns varied greatly (ranging from 1533,681bp; Supplementary Data9). Chloranthus spicatus (3681bp) and the six magnoliid genomes displayed much longer introns (ranging from 1,2702,390bp) than those of A. thaliana (153bp) and O. sativa (372bp), signifying that the presence of longer genes is due to the extension of the intron length rather than coding regions. We separated the 2184 high-confidence orthologs into groups based on length: <5kb (short genes), 510, 1020, and >20kb (long genes). Long genes (>20kb) in C. spicatus (876) were much greater in number than those in Oryza (2) and Arabidopsis (0) (Supplementary Data8,9).

a Comparison of gene and genome characteristics (i.e., genome size, gene size, exon, and intron sizes) of C. spicatus and 17 other flowering plants. b Comparison of the lengths of the coding regions among nine representative plant genomes. c Collinearity patterns between genomic regions of Amborella, Vitis, and Chloranthus. The colored (red/grey) wedges highlight the major syntenic blocks shared among these species. d The number of synonymous substitutions per synonymous site (Ks) distributions confirming the occurrence of a whole-genome duplication (WGD) event in C. spicatus. Source data underlying Fig. 2a are provided as a Source Data file.

We found a significant correlation between intron length and genome size (R2=0.8869). The highly conserved average length and a number of exons among the nine species compared further indicated that exon structure is mostly consistent among the angiosperms. The average length of introns was approximately 1.66kb, 2.87kb, and 3.35kb for Lauraceae, Magnoliaceae, and Chloranthus, respectively (Supplementary Data8).

LTR-RT represents a major fraction of plant genomes, particularly gymnosperms and magnoliids13. Thus, to understand the constitution of introns in C. spicatus, we looked for repeated elements. LTRs were widely detected in the long introns of C. spicatus and appear to be the major contributor to the long introns in this species. For instance, the gene AT1G04950.1 located on Chromosome 1: 14026061408184 encodes Transcription initiation factor TFIID subunit 6 in A. thaliana. The LTR length of this orthologous gene in C. spicatus (Cspi02386) was significantly longer than that in Lauraceae, Magnoliaceae, O. sativa, and A. thaliana (Supplementary Fig.5b).

We discovered 11,500 intact LTRs and classified them into two groups, Gene-20K LTR (LTRs distributed in genes >20kb length) and ALL LTR (LTRs distributed throughout the genome, Supplementary Fig.6). A similar model distribution of Gene-20K LTR and ALL LTR (Supplementary Figs.7, 8) suggested that the insertion timing of both LTR groups was the same. Further analyses of expression levels revealed that genes with short introns were more likely to exhibit low expression, while genes with long introns exhibited higher expression. However, when the intron length was larger than 40kb, the expression level subsequently declined in C. spicatus (Supplementary Fig.9).

Our investigation of collinearity and synteny patterns between genomic regions of Amborella trichopoda (sister to all other extant angiosperms), Vitis vinifera (sister to all other rosids), and C. spicatus showed highly conserved synteny among these three species (Fig.2c). In addition, this analysis showed clear structural evidence of a single ancient WGD in C. spicatus. The syntenic depth ratio between C. spicatus and A. trichopoda was found to be 1:2, which means that each A. trichopoda region could be matched to two genomic regions in the C. spicatus genome while the comparison of C. spicatus with the ancient hexaploid V. vinifera genome revealed a 2:3 syntenic depth ratio (Fig.2c).

To further investigate the extent of conservation of genome structure between C. spicatus and other angiosperms, we performed pairwise synteny comparisons with several species of magnoliids (Magnolia biondii, Liriodendron chinensis, Persea americana, Cinnamomum kanehirae, Litsea cubeba, Phoebe bournei) (Supplementary Data10). Our results clearly showed that C. spicatus shared a higher number (3,029; i.e., 62.7%) of syntenic blocks (at both the scaffold and chromosome level) with species in its sister clade, the magnoliids, than with Ceratophyllales (2,483, 52.5%), V. vinifera (2,275, 56.5%), or the monocot Oryza sativa (1,700, 45.3%) (Supplementary Fig.10, Supplementary Data10). Amborella trichopoda (1,150, 57%) shared the fewest syntenic blocks with C. spicatus among all the species used for comparative analysis (Supplementary Data10); overall, the number of shared syntenic blocks between these representative genomes generally coincided with phylogenetic relationships.

To further investigate the phylogenetic placement of the C. spicatus WGD, we compared the distribution of Ks values, the number of synonymous substitutions per synonymous site. The Ks distribution for C. spicatus paralogs showed an obvious peak at approximately Ks=0.9, and peaks at similar KS values were identified for other species (Ascarina rubricaulis, Chloranthus japonicus, and Sarcandra glabra) of Chloranthales (Fig.2d); the coincidence of these KS values suggests that an ancient WGD event may be shared among all extant members of this clade. Further, the KS values for orthologs shared by C. spicatus and Phoebe bournei (Lauraceae; magnoliids) show a peak value slightly greater than that observed for paralogs in C. spicatus and other Chloranthales species, which suggests that the Chloranthales WGD occurred after the divergence of Chloranthales and magnoliids (Fig.2d). These observations suggest that the ancient WGD event we detected was specific to Chloranthales.

Although magnoliids also exhibit an ancient WGD event (Supplementary Data11a), this event was not shared with Chloranthales. The age of the Chloranthales WGD is similar to that of a number of ancient polyploidy events that occurred independently in several major clades of angiosperms: the gamma () event (103.67129.35Ma) in the common ancestor of core eudicots; the tau () event (101.82138.82Ma) during the early diversification of monocots; the lambda () event (98.22130.04Ma) during the early diversification of magnoliids; the pi () event (85.78119.82Ma) in Nymphaeales; the kappa () event (98.06130.54Ma) in Chloranthales (this study); and an unnamed event specific to Ceratophyllales33 (Fig.1d). Although these WGD events occurred independently, many of the same stress-related genes were retained independently after these WGD events, including two heat shock transcription factors and Arabidopsis response regulators34. These genes also appear to be retained in Chloranthus (Supplementary Figs.11, 12).

To resolve the long-standing uncertainty regarding the phylogenetic position of Chloranthales and relationships among the five major lineages of Mesangiospermae, 257 single-copy nuclear (SCN) genes were identified using whole-genome sequences from C. spicatus and 17 other flowering plants (strict single-copy genes for each species without missing genes; see Methods for species). The aligned protein-coding regions were analyzed using coalescent and concatenation approaches. Both analyses yielded an identical and highly supported topology (bootstrap values of 100%) (Supplementary Fig.13) in which monocots were sister to all other mesangiosperms; Chloranthales appeared as the sister group to magnoliids, and Chloranthales + magnoliids together were sister to Ceratophyllales + eudicots (Fig.2a, Supplementary Fig.13). We also performed phylogenetic inference based on a 937-SCN gene data set with selection criteria allowing a maximum of three missing species. The phylogenetic results showed an identical topology to that of the 257-SCN gene data set, supporting Chloranthales as the sister to magnoliids (Supplementary Fig.13).

To avoid potential errors caused by sparse gene sampling, we extracted 2,329 low-copy nuclear (LCN) genes, allowing a maximum of three copies for each gene in each species. The phylogenetic trees were then similarly reconstructed by both coalescent and concatenation methods as described above. The resulting species trees were topologically identical to the phylogenetic findings as revealed above based on the 257-SCN and 937-SCN data sets (Supplementary Fig.13). Among the 2,329 LCN gene trees, 61% of the trees (454 out of 742 trees) placed Chloranthales as the sister lineage to magnoliids, with bootstrap support greater than 70% (type I, Fig.1c).

As poor taxon sampling may lead to topological errors, we added a large number of published genomes of the angiosperms and two transcriptomes of Chloranthales to increase our taxon sampling. We extracted 612 mostly single-copy orthologous genes following Yang et al.21and generated a 218-species dataset. The phylogenetic results were congruent with the topologies based on analyses of the 257-SCN, 937-SCN, and 2,329-LCN data sets, supporting monocots and a clade of Chloranthales plus magnoliids as successive sister lineages to a clade of Ceratophyllales plus eudicots (Fig.1d, Supplementary Fig.14).

Phylogenetic analyses were also conducted based on chloroplast DNA sequence data. We selected 80 genes, following a recent study that analyzed 2,881 plastomes1, and obtained two data sets, with 18 species and 134 species, respectively. The resultant topologies using both chloroplast data sets agree with those from the four nuclear data sets in strongly supporting a sister relationship between Chloranthales and magnoliids (Supplementary Figs.15, 16).

Although the same pattern of phylogenetic relationships among the five major groups of Mesangiospermae was consistently recovered in analyses of all four nuclear data sets, phylogenetic incongruence was observed among nuclear gene trees. A major conflict was identified among single-gene trees in all four nuclear gene data sets (257-SCN, 937-SCN, 2,329-LCN, and 612-SCN) involving the placement of the Chloranthales-magnoliids clade relative to monocots and eudicots. We summarized the conflict among gene trees in the 2,329-LCN data set with regard to the proportions of trees supporting three different branching patterns for Chloranthales-magnoliids, monocots, and eudicots. The percentage of gene trees supporting the Chloranthales-magnoliids clade plus eudicots together forming a sister group to monocots (Type II) was higher than percentages for the other two topologies (gene trees with BS>70%: Type I, 30%; Type II, 53%; Type III, 17%; Fig.3a). It is notable that Type I and Type III, the two relationships conflicting with the most likely species tree, are not equal in frequency, suggesting gene tree incongruence patterns not expected under ILS alone (below).

a A summary of the conflicts among gene trees in the 2,329-LCN data set with regard to the proportions of trees supporting three different branching patterns for Chloranthales-magnoliids, monocots, and eudicots.b Gene tree incongruence between nuclear (2,329 LCN genes)and plastid (80 plastid genes) treesin a consensus DensiTree plot.c Aconsensus scenario showingancient gene flow between monocots and eudicots, inferred by QuIBL, PhyloNet, and ABBA-BABA D-statistics. Source data underlying Fig. 3b are provided as a Source Data file.

Furthermore, gene tree discordances were also observed between chloroplast and nuclear gene trees. Phylogenetic analyses of these 18 and 134 flowering plants inferred from 80 concatenated plastid genes strongly supported the placement of the Chloranthales-magnoliids clade as sister to all other Mesangiospermae (Fig.3b and Supplementary Figs.15, 16), which is consistent with the Type I nuclear topology (Fig.3a).

A nonrandom incongruence pattern was observed among different topology types: constituent species of monocots (3 spp.), eudicots (4 spp.), and magnoliids (7 spp.) were assigned to a clade. For each topology type, the majority of genes supported the monophyly of C. spicatus and seven species of magnoliids (Type I: 168/316=53.2%; Type II: 297/496=59.9%; Type III: 122/203=60.1%). We also mapped genes that caused conflict on the chromosomes. Genes that supported both Type I and Type II topologies were found to be evenly distributed on the 15 chromosomes (Supplementary Fig.17). Chi-squared tests showed that gene numbers and locations on each chromosome do not differ significantly (Supplementary Data12, 13).

The observed gene tree incongruence between nuclear and chloroplast trees and among nuclear single-gene trees indicates the possibility of incomplete lineage sorting (ILS) and/or hybridization events during early angiosperm evolution. We first used QuIBL, an approach using branch length distributions across gene trees, to infer putative hybridization patterns35. In all, 100 runs of QuIBL were conducted using 500 randomly selected trees from the 2,329 LCN gene trees. Strong hybridization signals (rate >0.1) were identified in several pairs of major clades of angiosperms (Supplementary Figs.18, 19), including: (i) ancestor of eudicots and ancestor of monocots; (ii) ancestor of eudicots and C. spicatus; (iii) ancestor of the species pair Arabidopsis thaliana-Erythranthe guttata and Vitis vinifera; (iv) Erythranthe guttata and Ceratophyllum demersum. Strong signals of ILS were also detected between Lauraceae and Magnoliaceae (Supplementary Figs.18, 19). Among these events, cases (i) and (ii) can be explained as the causes of gene tree incongruence of the Chloranthales-magnoliids clade relative to monocots and eudicots.

A second analytical approach, PhyloNet, was used to further assess putative hybridization events in our phylogeny. Five network searches were carried out by allowing one to five reticulation events. The species network under the best model (AICs=50.78; BICs=30.52, Supplementary Data14) identified two hybridization events among major clades of angiosperms (Supplementary Fig.20a), supporting ancient hybridization between early members of Nymphaeales and monocots. Ancestral gene flow between monocots and eudicots (Supplementary Fig.20a) was additionally supported by results of QuIBL (Supplementary Fig.19). To test whether the PhyloNet results identified hybridization correctly, we repeated the PhyloNet analyses using coalescent trees simulated without hybridization under the ASTRAL species tree (Supplementary Fig.11). As expected, the species network under the best model (AICs=51.21; BICs=30.25, Supplementary Data14) detected no hybridization events among monocots, eudicots, and magnoliids (Supplementary Fig.20b), suggesting that the analysis using empirical gene trees was not susceptible to false positives.

The unequal frequencies of Type I and Type III topologies discordant with the species tree suggested that ILS alone may not explain the gene tree conflicts in this study; therefore, we also used the ABBA-BABA approach to explicitly model patterns of discordant genealogies. This analysis also inferred frequent hybridization signals (Supplementary Fig.21). Consistent with the other two methods employed, the hybridization event between monocots and eudicots was detected, with the largest absolute Z-value (14.6).

In summary, all three methods used to investigate hybridization (QuIBL, PhyloNet, and ABBA-BABA D-statistics) were unanimous in suggesting ancient gene flow between monocots and eudicots, although with variation among methods in the number of hybridization events and any additional lineages involved in hybridization. A consensus scenario is presented (Fig.3c) showing ancient gene flow between monocots and eudicots.

Terpene synthases (TPSs) are key enzymes that control the production of terpenoids, crucial defense compounds in plants36. To explore the evolution of the TPS family in Magnolia and Chloranthales, as well as to garner a better understanding of terpene evolution in angiosperms, we searched for candidate TPSs in C. spicatus and 17 other flowering plants (the same taxon sampling as in the comparative genomics analyses). Chloranthus spicatus encodes 73 TPSs (Supplementary Data15), similar to V. vinifera (75) and A. coerulea (74), while C. kanehirae exhibited the largest number (95) of TPSs. Particularly, compared to the ANA grade, there was higher diversity in almost all of the magnoliid species and C. spicatus (Supplementary Data16). Furthermore, according to the subfamily classification of TPS genes, TPSs were divided into 6 clades: TPS-a, b, c, e, f, and g (Fig.4b). In Amborella and members of Nymphaeaceae (Euryale ferox and Nymphaea colorata), TPS-a was absent (Supplementary Data16). Furthermore, when we performed GO enrichment analyses using the shared genes between magnoliids and Chloranthales, the genes related to terpene synthase activity (GO:0010333) exhibited a low P-value, indicating that terpene synthase activity was the most enriched of all GO categories (Supplementary Data17). Moreover, our gene family analysis indicates that the TPS-a and TPS-b gene clades expanded remarkably in magnoliids and Chloranthales compared to all other angiosperm clades (Supplementary Data16, Fig.4b); these gene clades primarily consist of angiosperm-specific sesquiterpene and monoterpene synthases, respectively. Several unique Chloranthus-specific sesquiterpenoids, including chlorahololides A, chloranthalactone A, and chlotrichenes A and B with bioactive potential, have been isolated and chemically synthesized in the lab37,38,39.

a A total of 44 genes related to the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway and the mevalonate (MVA) pathway were identified in C. spicatus (left panel). HMGR and DXS exhibited the highest copy numbers in the MEP and MVA pathways, respectively. Differentially expressed genes among seven representative tissues of C. spicatus involved in MEP and MVA pathways are shown in the right panel. b Identification of candidate terpene synthases (TPSs) in C. spicatus and subfamily classification revealed six major clades (TPS-a, b, c, e, f, and g). The gene family tree indicates that TPS-a and TPS-b gene clades are significantly expanded in magnoliids and Chloranthales. c Contraction of R genes in Chloranthales. The nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes were divided into three classes: TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR(RNL). In all, 3,518 NBS genes were identified in 28 angiosperm species. * indicates the data were obtained from a previous study40.

To understand the origin of paralog generation of these TPS genes, we compared the numbers of genes in each duplication type among species of magnoliids and Chloranthus (Supplementary Data18). The results showed tandem (23, 33.3%), WGD (18, 26.1%), and transposition (21, 30.4%) duplication events contributed to the expansion of TPSs in C. spicatus, with only a few proximal repeats (7, 10.1%). The 73 CsTPS genes are not evenly distributed across the 15 chromosomes, with Chr2 having the highest concentration of TPS genes. Tandem repeats are mainly present on Chr2 and Chr7 (5 and 6 tandem repeats, respectively), but are also present on chromosomes 4, 5, 9, 14, and 15. We hypothesize that WGD contributed to TPS expansion as well, for instance, the higher copy number of the pairs CsTPS03 and CsTPS33 and CsTPS05 and CsTPS19 (Supplementary Fig.22, Supplementary Data16, 18).

Next, we investigated the genes involved in the production of non-volatile isoprenoids via the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway and the mevalonate (MVA) pathway and identified 44 genes in C. spicatus that may be involved in these pathways (Supplementary Data19, 20). There were multiple copies of the genes encoding enzymes related to the MVA pathway, and the number was approximately double that detected for genes in the MEP pathway. The gene encoding the HMGR enzyme (Hydroxy-3-methylglutaryl) displayed the highest number of gene copies (12) followed by AACT (Acetoacetyl-CoA thiolase) (6 copies). In the MEP pathway, except for DXS (1-deoxy-D-xylulose-5-phosphate synthase), DXR (1-deoxy-D-xylulose 5-phosphate reductoisomerase1-deoxy-D-xylulose 5-phosphate reductoisomerase), and GGPS (geranylgeranyl pyrophosphate synthase), each remaining enzyme had only one corresponding gene copy. In addition, to further validate this observation, a differential gene expression (DE) analysis was also performed using the transcriptome data from different plant parts (stamen, carpel, and peduncle) (Fig.4a). Regardless of the number of gene copies encoding the enzymes of these pathways, at least one gene copy for each enzyme was highly expressed in each tissue. However, for the multiple-copy genes, a few genes were responsible for most of the expression, while the remaining copies were weakly expressed. Altogether, the analyses of expansion and differential expression of TPSs suggest that the appearance of multiple-copy genes in the MVA pathway could be related to the expansion of the TPS-a subfamily, which is probably responsible for the production of sesquiterpenes in Chloranthales.

Nucleotide-binding site-leucine-rich repeat (NBS-LRR, NBS for short) genes encompass more than 80% of the characterized R genes40. The NBS genes were divided into three classes, namely, TIR-NBS-LRR (TNL), CC-NBS-LRR (CNL), and RPW8-NBS-LRR (RNL)40. We identified 3,518 nucleotide-binding site-leucine-rich repeats (NBS-LRR, NBS for short) genes in 28 angiosperm species, and the nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes were classified into three classes: the Toll/interleukin-1 receptor TIR-NBS-LRR (TNL), N-terminal coiled-coil motif CC-NBS-LRR (CNL), and resistance to powdery mildew8 RPW8-NBS-LRR (RNL)40 (Supplementary Data21). The gene copy number in each NBS class showed considerable variation among the analyzed species (Fig.4c). The highest number of TNL genes was found in M. truncatula (of the 28 species examined), while the highest number of CNLs and RNLs were in S. tuberosum and G. max, respectively. Moreover, M. truncatula, G. max, and S. tuberosum contained more R genes than the other angiosperms examined; Chloranthales and magnoliids contained many fewer R genes. TNL and RNL were absent from Chloranthus and the magnoliids (as in the monocot species, O. sativa), and only 19 and 13 CNLs were present in C. spicatus and Magnolia biondii, respectively (Fig.4c, Supplementary Data21). In the species having both TNL and CNL genes, the CNLs are generally more common than the TNLs, with the exception of A. thaliana, V. vinifera, A. trichopoda, E. ferox, and N. colorata.

Read the original post:
Chloranthus genome provides insights into the early diversification of angiosperms - Nature.com

Posted in Genome | Comments Off

Bihar govt directs officials to conduct genome study, increase vaccination to tackle new Covid variant – Hindustan Times

Posted: at 9:53 pm

In wake of the emergence of the new Covid-19 variant Omnicron, the Bihar government on Sunday directed health officials to conduct genome surveillance of all Covid positive samples to ascertain the strain of the virus

ByAnirban Guha Roy/HTC, Patna

Chief minister Nitish Kumar during a review meeting of the health department on Sunday directed officials to be alert and ensure that there is adequate stocks of medicines and health facilities at the hospitals.

The chief minister also stressed that people who have returned from abroad should be monitored and those who have tested positive should be screened for the new variant. Kumar who also reviewed the ongoing vaccination programme against Covid-19 laid stress on accelerating the vaccination drive and directed officials to use all modes of publicity to encourage people who have taken the 1st dose to inoculate themselves with the 2nd dose as well. There should be state level monitoring of those districts where the vaccination drive and testing is slow. There should not be any laxity on testing and vaccination drive should be widely publicised , the CM said. Till Sunday, Bihar has inoculated 80 million citizens, according to officials.

During the meeting, Bihar representative of World Health Organisation ( WHO) Dr Subramaniyam apprised the CM about the new Covid variant Omicron and its various aspects. Additional chief secretary (health) Prataya Amrit said the health department is on high alert to tackle the new variant.

Health minister Mangal Pandey, while talking to mediapersons on Sunday, said, Though there has been no positive cases of the new variant in the country so far, the health department is on high alert and directives have been issued to health officials to maintain tight vigil while conducting RT-PCR tests to detect any strain of the new variant so that immediate steps can be taken.

Asked about the central governments directives to conduct tests of around 281 persons who have arrived from foreign countries in last two months in Bihar, the health minister said the health officials have started visiting residences of the persons mentioned in the list.

Health officials have started visiting the persons mentioned in the list who have returned from abroad. There are instances where people have their address in Bihar but live in different parts of the country. So, we will tests of all those people staying in Bihar , the minister said, who also attended the review meeting of health department along with the CM.

Get our Daily News Capsule

Thank you for subscribing to our Daily News Capsulenewsletter.

The rest is here:
Bihar govt directs officials to conduct genome study, increase vaccination to tackle new Covid variant - Hindustan Times

Posted in Genome | Comments Off

FPJ explains: Know what Genome sequencing is as it may get mandatory for international travel amid COVID-19 – Free Press Journal

Posted: at 9:53 pm

Amid fears of anticipating raise in Covid cases on account of an increased flow of tourists during the Christmas season, the Brihanmumbai Municipal Corporation (BMC) requested that the state task force members ask for genome sequencing reports of passengers flying in from Europe, the USA, and Russia.

Genomic sequencing has emerged as an increasingly necessary tool in the effort to trace in spots of Covid-19, as governments around the world grapple with the inevitability of continued transmission in lieu of a vaccine.

According to the Centers for Disease Control and Prevention, Genomic sequencing allows scientists to identify SARS-CoV-2 and monitor how it changes over time into new variants, understand how these changes affect the characteristics of the virus, and use this information to better understand how it might impact health.

In short, Genomic sequencing allows health authorities to map coronavirus clusters. Matching the genomic findings of a Covid-19 case to epidemiological information can help authorities track down the source of the virus.

What is genomic sequencing?

The SARS-CoV-2 genome which is sought for during your COVID-19 checks, encodes instructions organized into sections, called genes, to build the virus. Scientists use a process called genomic sequencing to decode the genes and learn more about the virus.

Scientists believe that the genome sequence represents a valuable cue, helping them find genes much more easily and quickly. Genomic sequencing analyses the virus sample collected from a diagnosed patient and compares it with other cases.

According to WHO, Sequencing enabled the world to rapidly identify SARS-CoV-2 and develop diagnostic tests and other tools for outbreak management. Continued genome sequencing supports the monitoring of the diseases spread and evolution of the virus.

As per the current norms on travel amid COVID-19, fully vaccinated passengers are allowed to leave the airport without Rt-PCR tests while the otherwise need to submit updated Rt-PCR test reports. Yet, Mumbai officials are giving a thought on genome sequencing reports from passengers related to international travels.

(To receive our E-paper on whatsapp daily, please click here. We permit sharing of the paper's PDF on WhatsApp and other social media platforms.)

Continued here:
FPJ explains: Know what Genome sequencing is as it may get mandatory for international travel amid COVID-19 - Free Press Journal

Posted in Genome | Comments Off

Researchers can trace the family tree of individual mutations inside our cells – Massive Science

Posted: at 9:53 pm

We all start out as a single cell. That cell divides many, many times to form the trillions of cells in an adult human body. Each of these cells has two copies of all the genes in the human genome, inherited from our biological parents. While copying the genome trillions of times, unsurprisingly, some mistakes are made. Slight genetic variations, called mutations, accumulate in our cells as we grow from a single cell to an adult.

We usually think of mutations as harmful, but they often have no consequence and dont change the meaning of our genetic code. There are two main categories of mutations germline mutations and somatic mutations. Germline mutations found in our sperm or egg cells, can be passed on to our children. Somatic mutations in other cells of our body such as in the brain, liver, or lungs, are never passed on to our children. Although somatic mutations cannot be inherited, they can still be important for human health. For example, somatic mutations in a subset of brain cells can cause seizures within a brain region either with or without visible changes to brain structure.

Over many years, the Walsh lab at Harvard University has contributed to our understanding of somatic mutations in the human brain. The group has studied how somatic mutations in brain cells may contribute to seizures, developmental disorders, or psychiatric and neurological conditions including autism spectrum disorders. They, and others, have investigated how often changes in a single letter of the DNA genetic code called single-nucleotide variants (SNVs) can be detected in typical and atypical brain cells.

Interestingly, by finding which somatic mutations are present in which cells, we can build a family tree of cells in the body and trace back the path of cell divisions that produced our trillions of cells from the single original cell. This family tree would be a treasure trove for understanding how our bodies develop. The Brain Somatic Mosaic Network is a group of researchers who study this problem specifically in the area of human brain development. Since we cannot watch the cells dividing inside a human embryo and cannot see the mutations within cells in tissue samples, these types of studies are quite technically challenging.

The Walsh group recently applied several ingenious methods to trace the developmental path of single cells in the human brain. How complex would this task be? The genome is like a book with six billion letters. Our cells are like trillions of copies of this book in a gigantic library our body. Trying to find somatic SNVs is like trying to find single letter typos present in some of the copies of the book but not all of them. It is a mammoth task. So how would you find these typos? You could read the entire book multiple times or focus on a section you suspect has a typo and read that section multiple times. The first approach would be akin to whole genome sequencing while the second would be equivalent to targeted amplicon sequencing, and the researchers used both approaches in their study.

They also compared somatic SNVs in different organs, including the brain, liver, spleen, and heart, among others. Different body organs grow from different parts of the embryo and start out as distinct groups of cells. The authors were able to see these patterns of branches in the cell family tree. The fraction of cells in each organ with a specific somatic SNV varied depending on the organs origin within the embryo. Even within the brain, a single organ, the fraction of cells with specific sSNVs varied from the forebrain to the hindbrain, showing that they develop from different groups of cells in the embryo.

Measuring the frequency of specific somatic SNVs, the authors also estimated that around 50-100 cells within the embryo probably contributed to forming the brain. The estimate is not exact, because there are many unknowns. The methods used cannot detect all the somatic SNVs in the samples. Even with very deep genome sequencing (reading the book many times) somatic mutations present only in a small number of cells can still be missed. Moreover, the estimate was based on measurements mostly in one individual donor, highlighting another limitation of these approaches. Such studies are technically challenging and expensive, so they mostly include only a few individuals and a few organs from those individuals.

Nevertheless, the study brings us closer to understanding the complex, awe-inspiring process of human development from a single cell. It demonstrates that we can track how individual cells in a human embryo give rise to the organs of an adult human, and how even within the same organ, groups of cells have unique barcodes made of somatic SNVs recording the history of their ancestor embryo cells.

Original post:
Researchers can trace the family tree of individual mutations inside our cells - Massive Science

Posted in Genome | Comments Off

A supergene underlies linked variation in color and morphology in a Holarctic songbird – Nature.com

Posted: at 9:52 pm

To evaluate population structure in redpolls, we sequenced genomes of 73 individuals from the three described redpoll ecotypes (Supplementary Data File1). Our results from whole genomes confirm findings from a previous study using a reduced-representation approach (ddRAD-seq)18: redpolls lack population genetic structure by either geography or ecotype boundaries (Fig.1b), with spatially explicit clustering analyses supporting K=2, and failing to group all individuals according to their species classification. In addition, principal component analysis (PCA) (Supplementary Fig.1) of 25 million single nucleotide polymorphisms (SNPs) further reveal that PC1 explains only 3.14% of total genomic variation across all ecotypes and the majority of their global distribution. However, both PCA and population assignment analyses nonetheless indicate some degree of genetic clustering (Fig.1b, Supplementary Fig.1). PC1 visually separated samples into three clusters, with a left-most cluster containing both lesser and common redpolls, a right-most cluster containing almost entirely hoary redpolls, and a central cluster containing a mix of both common and hoary redpolls. However, many localities were recovered in all three groups, suggesting no influence of geography on genetic structure (Supplementary Fig.2). Because neither geography nor ecotype were perfectly assigned to clusters, we were interested in identifying the genomic regions responsible for generating these clusters, and in investigating their potential evolutionary impacts.

To identify divergent regions of the genome in redpolls, we aligned sequences to a brown-capped rosy-finch (Leucosticte australis) reference genome and searched for local peaks of differentiation between PCA clusters by calculating FST and dXY in 25-kb windows across all chromosomes including all ecotypes. These scans identified a highly differentiated region across 55Mb of chromosome 1 (Fig.2a, c, Supplementary Fig.3). Rerunning PCA and population assignment analyses after the removal of this chromosome either eliminated, or reduced, the variation explained (Supplementary Figs.1 and 4), demonstrating the strong contribution of this region to total genetic differentiation in redpolls. Further, conducting a PCA using only chromosome 1 qualitatively produced much stronger definition in the three clusters originally identified (Fig.2e). Within-group heterozygosity of the middle cluster for the highly differentiated region (0.626) was roughly double that of the outside clusters (0.388 and 0.378 for left and right clusters, respectively), suggesting that the PCA groups represent three possible inversion genotypes. We hereafter refer to these putative genotypes as AA, AB, and BB in left to right order across PC1. We do not distinguish between the ancestral and inverted haplotyes, and use the term inversion to refer to the inversion region rather than the inverted haplotype.

a Chromosome 1 SNPs significantly associated with phenotype (black dots) using mixed model analysis in GEMMA with an alpha of 1105 to correct for multiple comparisons. Numbers indicate the two most significant SNP associations outside (1,2) and inside (3,4) the inversion region and correspond to genotypes in (b) for those specific SNPs. Lightest yellow cells indicate individuals homozygous for the reference allele, darkest indicate homozygous for the alternative allele, intermediate indicate heterozygotes, and white are missing data. c Pi (black and gray lines for AA and AB individuals, respectively) and dXY (green line) in 50-kb windows for the first 80Mb of chromosome 1. d LD calculated as r2 comparing chromosome 1 (black line) to 4 of the other largest chromosomes (shades of gray) and LD within the supergene (black) to the rest of chromosome 1 (gray). e PCA of chromosome 1 with a 40% minor allele frequency cutoff showing three main clusters along PC1. Taxa colors correspond to Fig.1. All sequences were aligned to a brown-capped rosy-finch (Leucosticte australis) reference genome. Source data are provided as Source data file.

Broadly, the pattern of divergence recovered here is consistent with a large pericentric chromosomal inversion10,19,20, including abrupt changes in FST corresponding to the inversion breakpoints, and a central spike at the centromere (Fig.2a, c). Reduced recombination within an inversion is expected to produce patterns of elevated linkage disequilibrium (LD), along with a decrease in nucleotide diversity along the inversion in homozygotes. These patterns are both confirmed here, including a within-cluster decrease in homozygote (AA and BB) nucleotide diversity (), and elevated LD within the inversion when both compared to regions outside the inversion and along other chromosomes (Fig.2c, d). To further characterize the inversion, we selected one individual each from the AA and BB genotype groups to resequence using Oxford Nanopore Technologies MinION long-read sequencing. Structural variant calling with SVIM v1.4.221 identified an inversion extending from 18.9 to 75Mb along chromosome 1; however, overall number of reads supporting the variant call was low due to the size of the inversion and low yield from the MinION runs.

Because redpolls overlap extensively in distribution, species identification is made primarily on the basis of a suite of morphological characters, including plumage coloration (extent of brown and red pigments), bill size and shape, and body size. Transitioning from the AA, to AB, to BB genotype also broadly mirrors a transition in phenotype from dark to light plumage coloration, where the AA genotype is associated with dark plumage, BB is associated with light plumage, and AB is intermediate. Mason and Taylor18 paired phenotypic measurements of plumage and bill morphology with gene expression data to reveal a strong, linear correlation between gene expression and morphology (see ref. 18, Fig.3a). Superimposing inversion genotype on this relationship for the same individuals reveals that inversion homozygotes form the extremes of these categories, while the single heterozygote forms an intermediate (Fig.3a). Although sample size in this comparison is small, it provides strong independent evidence that the chromosome 1 inversion plays a large role in redpoll morphology, and that phenotypic variation may be additive with respect to inversion haplotype copy number.

a Phenotype PC1 and gene expression data (as mds PC1) from Mason and Taylor (2015) colored by inversion genotype, with extreme phenotypes produced by homozygotes shown in gold and pink, and an intermediate phenotype produced by a heterozygote shown in green. b Latitude of sampling site for each individual grouped by inversion genotype demonstrating B haplotypes increase in frequency with latitude (AA n=37, AB n=7, BB n=28). Box hinges represent the first and third quartiles, with centers representing medians. Whiskers represent maximum and minimum values except for BB, where outliers are values exceeding 1.5 times the interquartile range. Source data are provided as Source data file.

In total, we identified 498 annotated genes within the chromosome 1 inversion region. While all genes within the inversion are likely to be linked through the suppression of recombination, and thus could be contributing equally to phenotype, we nonetheless attempted to narrow down candidate gene regions in order to infer which biological processes and pathways were potentially influenced by the inversion and identify associated regions elsewhere in the genome. To do so, we applied two approaches: (1) by compiling a list of genes that fell within the highest FST peaks, and (2) by identifying SNPs significantly associated with species classification using a genome-wide efficient mixed model analysis (GEMMA)22. While species identity does not perfectly correlate with redpoll phenotype because they exhibit continuous phenotypic variation, the fact that species classification relies almost entirely on morphology makes it a reasonable proxy for total phenotypic variation. Finally, we annotated missense mutations within the identified genes based on a variants location with respect to open reading frames using SNPeff v4.323. Our results suggest that the vast majority of SNPs associated with phenotypic variation in redpolls are within or close to the inversion: 99% of 20,443 SNPs significantly associated with redpoll phenotype were located on chromosome 1, with only 167 located elsewhere in the genome (Supplementary Fig.3). To evaluate the reliability of these SNPs in predicting phenotype, we used a Bayesian sparse linear mixed model in a leave-one-out cross validation framework. Predicted phenotypes suggest that allelic variation of the identified SNPs explain a significant proportion of the observed phenotypic variation (R2=0.79; Supplementary Fig.5).

We filtered annotations for genes that either contained, or were adjacent to, significant SNPs as identified by GEMMA or FST outlier analysis, resulting in 322 genes across 7 chromosomes (Supplementary Data File2). Within this gene set, the gene ontology category of biological regulation was overrepresented. While this category is broad and difficult to interpret meaningfully, we note that a number of genes on chromosomes 1 and 2 identified by our analysis had annotations that either relate to coloration or bird bill development or have been implicated in coloration or bird bill development in previous studies (Table1).

Within the chromosome 1 inversion region, some of the most differentiated and significantly associated regions include key genes relating to melanin synthesis: TYR, TYRP2, FZD4, TSKU, FSTL124,25,26,27,28,29. Both TYR and TYRP2 produce melanogenic enzymes that directly synthesize melanin. In addition, FZD4 produces a G protein-coupled receptor in the Wingless-type signaling pathway, which acts as one of the main pathways affecting the regulation of MITF29,30. Previous studies of gene expression in redpolls18 also reported differential gene expression of FZD3, suggesting that Frizzled family receptors may play a significant role in further modulating melanogenesis in this group.

Redpoll phenotype also varies in the amount of red feather coloration resulting from carotenoid pigmentation. Carotenoid pigments are unique in animals in that they cannot be synthesized endogenously and must instead be taken in through their diet before they can be deposited in feathers. Previous studies of genes involved in carotenoid pigmentation in birds highlight the role of two scavenger receptor genes (SCARB131, SCARF232). The proteins produced by these genes likely function in the recognition of the lipoproteins that transport the hydrophobic carotenoid pigments. We identified two genes (ATP8A2, STARD13) within the inversion region that may also be related to carotenoid pigmentation through their involvement in lipid transport. Specifically, STARD13 produces a stAR-related lipid transfer protein, which as a protein family, are involved in intracellular lipid transport, metabolism, and cell signaling events33. While further validation studies are required to understand the role of these genes in carotenoid variation, their functions appear to be in line with other recently reported genes associated with carotenoid pigmentation.

Two additional genes within the chromosome 1 inversion region that could be affecting phenotype are well-characterized: TSKU and FSTL1 are known antagonists of bone morphogenic protein (BMP) signaling24,26. However, the effects of BMP inhibition may influence phenotype in at least two disparate ways: through the regulation of melanogenesis, or by contributing to differences in bill morphology. BMPs are regulators with important roles in epidermal homeostasis and hair follicle growth and pigmentation34. Specifically, BMP4 and BMP6 products have both been demonstrated as inhibiting or stimulating melanogenesis, respectively34. However, other studies have also implicated BMP4 in the development of bird bill morphology35,36. For example, studies of BMP4 in Darwins Finches find strong correlations of BMP4 expression with both bill depth and width35, two traits known to vary in redpolls18,37. Similar to Frizzled, TSKU was also shown to be differentially expressed in redpolls18. We therefore emphasize the observed differences in TSKU and FSTL1 documented here could influence biologically important phenotypic variation in redpoll coloration, bill morphology, or both. Given the implication of BMP4 in multiple pathways affecting different phenotypes, there could be pleiotropic effects resulting from one or more loci altering BMP signaling. Taken together, these candidate loci provide evidence that multiple aspects of redpoll phenotype are likely affected by a single genomic region maintaining associated SNPs from numerous genes in tight physical linkage.

While nearly all associated SNPs with gene annotations were within the chromosome 1 inversion region, three additional genes containing, or neighboring, associated SNPs may also have important phenotypic effects. Two of theseFILIP1L (chromosome 1 but outside of the inversion region), and SFRP4 (chromosome 2)act as regulators in the WNT pathway, suggesting they likely play roles in further modulating melanogenesis38,39. Similar to TSKU and FSTL1, SFRP4 has also been demonstrated to regulate BMP, further emphasizing the possibility of singular or joint effects on plumage coloration and bill morphology.

A third locus outside of the inversion region near an associated SNP on chromosome 2 includes a polyketide synthase (PKS). While this gene was annotated based on similarity to Mycobacterium PKS15/1, its function in birds has yet to be fully validated. However, its synteny with RAB18, and YME1L1, suggests homology with a PKS described in budgerigars (Melopsittacus undulatus)40. Functional validation through yeast-based expression demonstrated that the budgerigar PKS plays a critical role in the accumulation of red/yellow, parrot-specific pigments known as psittacofulvins. The association of PKS with redpoll phenotype indicates that it might play a similar role for organisms that contain carotenoids instead of psittacofulvins. While this requires further investigation, PKSs have been demonstrated elsewhere as important in animal pigment biosynthesis41.

Broadly, redpoll phenotype appears to function as a balanced polymorphism resulting from a 55-Mb inversion that affects plumage coloration and bill morphology. Genetic associations that include loci outside of the inversion region suggest that phenotype is likely modulated further by several independent gene regions to generate the varied forms seen across all redpoll ecotypes. Examination of genotypes at SNPs associated with redpoll morphology (Fig.2b) suggest that the inversion region primarily separates the hoary redpoll from both the common and lesser redpolls, while the additional associations with other genomic regions separate the lesser redpoll from both the hoary and common redpolls. These results demonstrate that the chromosome 1 inversion contains multiple, linked genetic elements that together affect a suite of phenotypic traits in redpolls, providing evidence that redpoll phenotype is broadly controlled by a supergene genetic architecture11. As lesser redpolls form the darkest and smallest end of the redpoll phenotype distribution, the associated SNPs located outside the inversion may also be additive with respect to overall phenotype. Given the range restriction and more extreme phenotype of the lesser redpoll, there is less opportunity for disassortative mating, and its unclear how the derived SNPs outside of the inversion that further modulate phenotype interact with the B inversion haplotype. A previous study of an avian supergene in white-throated sparrows (Zonotrichia albicollis)10 demonstrated that one of the supergene haplotypes in sparrows had likely introgressed from a closely related species. However, topology weighting across windows of the redpoll supergene favored a topology that included a sister relationship between the two haplotypes, with a combined average weight of 54% among the three topologies that included this sister relationship (Supplementary Fig.6), providing evidence that the redpoll supergene likely evolved within the redpoll lineage42.

Considerable theoretical attention has recently been given to the evolution and degradation of supergenes43,44. A primary consequence of supergene-bearing inversions is increased mutational load12,44 due to the difficulty of purging deleterious mutations in the absence (or severe reduction) of recombination. This simple scenario could result in a balanced polymorphism stemming from associative overdominance, where inversion heterozygotes perform best because heterozygosity masks some of the deleterious mutations12. However, redpolls heterozygous for the inversion appear to occur in fewer numbers than homozygotes (7/73 samples in this study), suggesting an alternative mechanism may be responsible for maintaining the polymorphism. Given the presence of all three inversion genotypes in redpolls, no combination of the supergene appears to be lethala finding in contrast to other recently described supergenes of similar size9,45. Because there is no lethal inversion genotype, recombination likely occurs regularly in homozygotes (and possibly at low levels in heterozygotes), potentially allowing for some purging of deleterious mutations. This could have a considerable influence on the maintenance of the variation, and the evolutionary consequences of this supergene.

Understanding the effects of the redpoll supergene, and the forces responsible for its maintenance, is difficult. In the absence of selection (imposed by the environment, or through mate choice), the supergene would function as a single locus with one of the haplotypes eventually becoming fixed or lost due to drift12. Even with some selection, high levels of migration (between inversion genotypes) would swamp out any loci contributing to local adaptation. The persistence of the redpoll supergene is therefore likely dependent on both selection and migration. One scenario that is supported by field data is that the supergene remains balanced through assortative mating. Redpolls often mate assortatively46, but, intermediates and mixed pairs have also been observed from multiple localities37,47. Thus, the strictness of assortative mating may vary depending on the locality or may relax during irruptive population years47. Relaxation of mate choice and mixed pairings would produce the intermediate number of inversion heterozygotes seen in our data, and ultimately maintain the supergene as a stable polymorphism. However, this scenario alone does not provide an explanation for the maintenance of latitudinal differences between ecotypes. Furthermore, no hybrid zone has ever been documented in redpolls, which would be expected under a strict assortative mating scenario. While regions of hybridization have been suggested in places like Iceland, where high color variation exists48,49, previous genetic studies have not recovered support for this hypothesis50.

The phenotypes produced by the supergene are likely subject to environmentally mediated selection: notably, the more northerly distributed redpoll ecotype demonstrates features associated with high-latitude adaptation in other bird species (e.g., whiter color, smaller bill)51,52. Despite including some individuals sampled during the non-breeding season (n=27), we are able to detect differences in latitude by inversion genotype group, with B haplotypes significantly more common at higher latitudes (Fig.3b). This pattern holds when examining only breeding season birds. While it is plausible that an alternative locus is affecting ecotypic distribution, the overall low levels of background genetic variation reflect ongoing gene flow within this system. This pattern could instead reflect incomplete lineage sorting and recent divergence times, however, tests for introgression using an ABBA-BABA framework detected a significant signal of gene flow among redpoll taxa (D=0.0027, p=0.0003). Gene flow among ecotypes would be expected to disrupt linkage between any latitude-associated loci and phenotype through recombination unless those loci were tightly linked as in an inversion. In light of the link between the redpoll supergene and phenotype and differences in breeding distribution between ecotypes18, the supergene may impart local adaptation to the environment. However, given the detection of inversion heterozygotes and the presence of gene flow, the inversion likely does not influence reproductive isolation. Thus, redpolls appear to function as a single species harboring ecotypic variation, rather than as three distinct species.

To explore the evolutionary conditions under which the observed pattern of the inversion polymorphism can remain balanced, we used the program SLiM53 to simulate data under two spatial models of evolution informed by the aspects described above (Fig.4a, b). Both models simulated 100-kb chromosomes, including a 50-kb inversion that contributed to phenotype, in diploid individuals54. The first model included one population with spatially varying selection along the y-axis to approximate differences in fitness for a particular inversion genotype by latitude. In addition, we included assortative mating as determined by an adjustable parameter, and spatial competition such that individuals surrounded by fewer individuals in space received an increase in fitness. The second model also included spatially varying selection but considered two ecotypes as two populations with gene flow. We then varied the strength of selection and the strength of assortative mating or migration parameters and quantified (1) whether or not a simulation resulted in a stable polymorphism, and (2) the inversion genotype ratios that were produced. We compared these ratios to the inversion genotype ratio in redpolls captured by our sampling. These simulations revealed that the strength of assortative mating, or amount of migration, played a larger role in the balancing of the inversion polymorphism than selection did at the levels tested (Fig.4c, d; Supplementary Table1). Regardless of the strength of selection, weak assortative mating or high migration invariably led to the loss of an inversion haplotype due to drift. Strong assortative mating or low levels of migration did maintain both haplotypes but failed to produce inversion heterozygotes. Further, all levels of selection produced spatial stratification of inversion genotypes along the selection gradient.

a, b 1000 diploid individuals were randomly placed in (x,y) coordinate space across one population in model 1 and two populations in model 2, respectively. Phenotype was additive, determined by inversion genotype, with blue squares representing BB individuals, green squares represent AA individuals, and cyan squares represent AB individuals. The strength of selection was determined for all individuals by a combination of the difference between phenotype and position along the y-axis (dotted line), and by a varied selection parameter. The strength of assortative mating or gene flow (solid arrow) was included in model 1 and model 2, respectively, and was varied across iterations. Model 1 also included a fitness adjustment for individuals based on the number of other individuals close by (solid circle). c, d Results from model 1 and model 2, respectively, show the similarity between the genotype ratios produced by simulation and genotype ratio recovered in our sampling of redpolls, where red represents the greatest similarity, gray represents the least similarity, and white represents simulations that failed to produce a stable polymorphism. Scale bar numbers represent the difference between the simulated and empirical genotype ratios with sign indicating either greater or fewer simulated heterozygotes. Numbers inside a cell indicate that only a portion of the 50 simulation iterations produced a stable polymorphism.

While these models are relatively simple and only represent two possibilities, they provide a starting point for further exploration of the complex dynamics that affect the maintenance of supergenes. For example, in redpolls, these simulations suggest that even very weak selection can produce the spatial variation seen among ecotypes, and that some relaxation of assortative mating is likely occurring, as has been proposed for populations in Canada and Alaska (USA)37 and Norway47.

As whole-genome sequences proliferate, an emerging body of literature is providing empirical evidence of intraspecific variation maintained through inversion polymorphism9,10,14,15,55. The maintenance of redpoll ecotypes via an inversion across an environmental gradient places redpolls within this growing number of species. In some cases, such as monkeyflowers (Mimulus guttatus)14, inversion polymorphisms may confer sex-specific effects, and can be maintained within a population through a balance of positive and negative fitness interactions. In other cases, though not exclusive to sex-specific effects, inversions may affect phenotypes related to local adaptation, and species distributed across a heterogeneous environment may retain an inversion polymorphism through spatially varying selection, as suggested here for redpolls. This has recently been demonstrated in seaweed flies (Coelopa frigida)15 and deer mice (Peromyscus maniculatus)55. In addition, studies of Drosophila have reported clinal variation in multiple survival traits controlled by an inversion as a result of spatially varying selection across latitude56. These findings in Drosophila highlight the need for further investigation into the selection pressures and fitness effects of the inversion we report in redpolls.

We provide evidence from whole-genome sequence data of loci associated with redpoll plumage coloration and bill morphology contained within a ~55-Mb inversion supergene. While some authorities classify redpolls as three separate species (e.g., ref. 57), we find no evidence of genome-wide population genetic structure consistent with current taxonomy. Instead, we provide evidence that the suite of morphological traits used to describe redpoll species differences are linked within the identified supergene. The presence of all possible inversion genotypes suggests there are no lethal supergene combinations and indicate that while these traits are likely involved in local adaptation, they are not involved in reproductive isolation. Though breeding distributions vary latitudinally, even minor levels of contemporary gene flow within broad areas of sympatry likely maintain these traits as stable polymorphisms. Manipulations involving common garden experiments, or aviary crosses will help elucidate the strength of selection and may reveal additional unknown genetic interactions with the supergene that are affecting the evolution of redpolls.

With the explosive growth in the number of sequenced genomes and increasing sophistication of analytical tools, detecting structural variants or complex genetic architectures is likely to become common. The large size and high gene content of these classes of variants may in some cases translate to large evolutionary effects. While key theoretical work continues to emerge, the further exploration of empirical patterns between phenotype and supergenes will provide useful insight into the evolutionary effects of similar genetic architectures for a wide range of organisms.

See the rest here:
A supergene underlies linked variation in color and morphology in a Holarctic songbird - Nature.com

Posted in Genome | Comments Off

Page 38«..10 20..373839 40..50 60..»

Category Archives: Genome

Why Didnt the U.S. Detect Omicron Cases Sooner? – The New York Times

Omicron Alert: Infected samples from past week to be sent for genome sequence – The Indian Express

Planned to setup in Jammu & Srinagar during 2nd COVID wave, two Genome Sequencing labs nowhere in sight – Jammu Kashmir Latest News | Tourism |…

Surveillance officers asked to ensure expedited genomic analysis through IGSLs – Greater Kashmir

The UK Government Wants to Sequence Your Babys Genome – WIRED

Chloranthus genome provides insights into the early diversification of angiosperms – Nature.com

Bihar govt directs officials to conduct genome study, increase vaccination to tackle new Covid variant – Hindustan Times

FPJ explains: Know what Genome sequencing is as it may get mandatory for international travel amid COVID-19 – Free Press Journal

Researchers can trace the family tree of individual mutations inside our cells – Massive Science

A supergene underlies linked variation in color and morphology in a Holarctic songbird – Nature.com

The Prometheus League

Breaking News and Updates

Prometheism

Forbidden Fruit

The Evolutionary Perspective

Transtopia Menu

Library Updates

Library Books

Future Euvolution

Lucid Dreams from Childhood

Genetic Revolution

Speciation + Self-Directed Evolution