Detection of SARS-CoV-2 intra-host recombination during superinfection with Alpha and Epsilon variants in New York City – Nature.com

Posted: June 26, 2022 at 10:10 pm

Index case and named contact partner epidemiology

In December 2020, researchers and public health officials in the United Kingdom identified a rapidly spreading SARS-CoV-2 variant within England, then designated as PANGO lineage B.1.1.721, now designated as the Alpha variant of concern in the WHO nomenclature. In NYC, a SARS-CoV-2 genome sequence classified as belonging to the Alpha lineage was obtained from a sample on 4 January 2021 (the index case): NYCPHL-002130 (GISAID accession number EPI_ISL_857200). Due to the potential public health importance of Alpha variant cases in NYC in early 2021, NYC DOHMH conducted a public health investigation related to the individual from which this sample had been obtained. This investigation determined that the individual had recently traveled to Ghana (late December/early January) and developed symptoms consistent with COVID-19 while in Ghana. Contact tracing in New York City identified another case of an Alpha variant infection, sampled on 14 January 2021, in a named contact with a similar travel history (the named contact partner): NYCPHL-002461 (GISAID accession EPI_ISL_883324). The named contact partner had also developed symptoms consistent with COVID-19 while in Ghana, prior to returning the United States.

Typical of the Alpha variant21, NYCPHL-002130 from the index case exhibited S gene target failure (SGTF) phenotype with the TaqPath COVID-19 RT-PCR assay (Table1). NYC PHL uses the ARTIC amplicon-based protocol V3 to sequence full viral genomes and capture intra-host diversity. All 24 mutations diagnostic of the Alpha variant were found in >90% of reads (Table2). The viral genome from this index case showed limited intra-host viral diversity (Fig.1). A single variable site was found at position 23099, with C in 20.4% of reads and A in 79.6% of reads.

Frequencies of individual alleles shown as ticks, a smoothed kernel density plot is used to highlight clustering patterns, and colors represent allele types.

During the initial PCR screening of the sample collected from the named contact partner (NYCPHL-002461-A), the SGTF characteristic of the Alpha variant was not observed (Table1). Furthermore, genome sequencing revealed substantial intra-host viral diversity within the viral genome, a possible signature of superinfection (Fig.1). To confirm that this intra-host diversity was not attributable to experimental or sequencing artifacts, the original sample was re-extracted and re-sequenced (NYCPHL-002461-B) and similar SGTF was observed. Additional extractions were then performed in duplicate from the original stock (NYCPHL-002461-C and -D) and sequenced. The same signature of intra-host diversity was confirmed in all four sequenced extractions. Four nucleotide (nt) substitutions differentiating this sequence from the reference genome were identified at >90% frequency: C241T, C3037T, C14408T, and A23403G (Fig.1; Table2). These four substitutions were all present in the lineage B.1 virus that is ancestral to the named SARS-CoV-2 variants. Numerous additional substitutions, including A23063T (S N501Y), were present, but at slightly lower frequencies. Nonetheless, this genome was classified as an Alpha variant. Notably, the 69/70 and 144 deletions were found at >97% in the sequencing reads, despite the lack of SGTF.

NYCPHL-002461-A, -B, and -D extracts exhibited low Ct values for the ORF1ab and N gene targets, ranging between 15 and 16 (Table1). The S gene target Ct values were around 2 to 3 cycles higher. The difference suggests a reduction of viral template in the S gene target region, but not SGTF. We note NYCPHL-002461-C yielded an invalid result, as the TaqPath assay showed no amplification on all targets, including the MS2 phage extraction-control target.

The presence of multiple intermediate frequency alleles and the lack of SGTF in the TaqPath assay prompted us to investigate the intra-host diversity in the named contact partner, NYCPHL-002461. Using the previously described and validated Galaxy SARS-CoV-2 allelic variation pipeline22, we identified four categories of allelic frequencies: shared, major strain, minor strain, and other (see Fig.1, interactive notebook at https://observablehq.com/@spond/nyc-superinfection). The four replicate sequencing runs for NYCPHL-002461 yielded remarkably similar patterns of these allelic frequencies.

Alleles that fell into the shared category were present at 90% allele frequency in three or more samples. Shared alleles included all four substitutions characteristic of B.1 (Table2) and two deletions in the S gene (69-70 and 144) diagnostic of the Alpha variant.

Major strain occurred at frequencies between 60 and 80% (in at least 3 samples). Major alleles included all 21 substitutions defining the Alpha variant, which we observed at a median allele frequency of 74.1%, and ORF1A deletion (Table2). The remaining major alleles are shared with genome from the index case.

Minor strain alleles occurred at frequencies between 10 and 25% (in at least 3 samples). All but one of the 12 diagnostic Epsilon mutations was found in this set: A28272T is absent in NYCPHL-002461. All remaining minor alleles have been observed in other Epsilon genomes.

The other category encompasses all other variable sites, i.e. those occurring at frequency between 25 and 60% or those found in only one or two samples. The two alleles were found in all four replicate sequences at intermediate frequencies: G7723A (30.3%) and C23099A (46.7%). These frequencies are suggestive of intra-host variation in the major strain.

In contrast to the allelic mixture detected in the named partner (NYCPHL-002461), we observed allele frequencies >90% for all Alpha defining mutations in the sequencing data for the index case, NYCPHL-002130 (Table2). The C23099A mutation, which was at intermediate frequency in NYCPHL-002461 from the named contact partner, was present at 88.1% in NYCPHL-002130 from the index case, consistent with the transmission of a mixed viral population between these individuals.

We identified sub-clades within Alpha and Epsilon that shared substitutions with the major and minor strains (Fig.2). We inferred a maximum likelihood (ML) phylogenetic tree in IQTree2 for the major strain and 3655 related Alpha (B.1.1.7) genomes containing the C2110T, C14120T, C19390T, and T7984C substitutions found in the major strain (Fig.2A). We also inferred an ML tree for the minor strain and 2275 related Epsilon (B.1.429) genomes containing the C8947T, C12100T, and C10641T substitutions found in the minor strain (Fig.2C).

A Phylogeny of Alpha variant immediate relatives. B Root-to-tip regression for Alpha variant. C Phylogeny of Epsilon variant immediate relatives. D Root-to-tip regression for Epsilon variant. NY-NYCPHL-002461 is the genome deposited in GISAID from the case of putative superinfection.NY-NYCPHL-002130 is the genome from the index case.

Root-to-tip regression analyses show that the NYCPHL-002461 sampling date is consistent with the molecular clock for both the major and minor strain sequences (Fig.2B/D), indicating that one would expect viruses of this degree of genetic divergence to have been circulating in mid-January 2021. In fact, genomes identical to the major variant were sampled in both NYC (the NYCPHL-002130 index case) and in Ghana on 8 January 2021 (EPI_ISL_944711), consistent with a scenario in which this particular Alpha virus was acquired in Ghana. These three viruses share a common ancestor around 4 January 2021 and are separated from additional viruses sampled in Ghana by two mutations: C912T and C23099A. Notably, the latter mutation appears at intermediate frequency in both NYCPHL-002130 and NYCPHL-002461.

The minor variant is genetically distinct from all other sampled genomes, including any genome sequenced by NYC DOHMH (Fig.2C). The closest relatives were sampled in California (EPI_ISL_3316023, EPI_ILS_1254173, EPI_ISL_2825578), the United Kingdom (EPI_ILS_873881), and Cameroon (EPI_ISL_1790107, EPI_ISL_1790108, EPI_ISL_1790109). The most similar of these relatives is EPI_ISL_3316023, which was sampled on 11 January 2021 in California and represents the direct ancestor of the minor variant on the phylogeny. The only mutation separating this California genome from the minor variant is T28272A, which is a reversion away from an Epsilon-defining mutation (Table2).

It is unlikely that this minor variant is a laboratory contaminant, as there are no closely related Epsilon genomes sequenced from NYC. That said, NYC represents the probable source of this Epsilon virus. Of the 145 SARS-CoV-2 genomes sequenced by NYC public health surveillance between 10 January 2020 through 16 January 2020, 4 (2.8%) were Epsilon. A similar proportion of Epsilon genomes deposited in GISAID were sampled by other labs during this same period: 11 out of 431 genomes (2.6%)23. No Epsilon genome has been reported to date from Ghana.

A preliminary inquiry of the genome sequencing data from the S gene (12 contiguous read fragments) and N gene (nucleoprotein; 3 contiguous read fragments) regions was suggestive of recombinant genome fragments within the named contact partner. To determine whether pairs of polymorphic sites within individual read fragments displayed evidence of recombination we employed three different four-gamete based recombination detection tests: PHI24, MCL, and R2 vs Dist25 (Table3). The power of each of these tests to detect recombination was seriously constrained by the short lengths of the read fragments and the low numbers of both variant-defining sites and other polymorphic sites with minor allele frequencies >1% within each of the fragments. Only three of the 15 read fragments (read fragments 6 and 8 in the S gene and read fragment 3 in the N-gene) encompassed two or more of the variant-defining sites that were expected to provide the best opportunities to detect recombination. Nevertheless, pairs of sites within four read fragments in the S gene (positions 2312324467 covering fragments 7, 8, 9 and 10) and one read fragment in the nucleoprotein gene (positions 2898629378 covering fragment 3) exhibited signals of significant phylogenetic incompatibility with at least two of the three tests (p<0.05): signals which are consistent with recombination. The only read fragment for which evidence of recombination was supported by all three tests was fragment 3 in the N gene: a fragment that was one among only three that contained multiple variant-defining substitutions. Eight of the fifteen analyzed read-fragment alignments exhibited no signals of recombination using any of the tests, which is unsurprising given the lack within these fragments of both variant-defining substitutions and polymorphic sites with minor allele frequencies greater than 1%.

The four gamete tests on genomic sequencing data is limited by the short length of amplified fragments. To obtain data from longer sequence fragments, we PCR-amplified three regions of the genome from the original nucleic acid extracts, cloned them, and then sequenced individual clones. These longer genomic fragments provide greater resolution for detecting recombination, compared with the short fragments from deep sequencing analysis, because they include more differentiating sites spread out farther across the genome.

The longest cloned region spanned 947 nt within the S gene (positions 2290423850) and contained 5 nt substitutions differentiating the major and minor strains plus a variable site in the major variant. Of the 104 clones sequenced within this region, 60 (57.7%) were major strain haplotypes, 13 (12.5%) were minor strain haplotypes, whereas the remaining 31 clones (29.8%) contained both major and minor strain mutations, consistent with recombination (Fig.3). We observed 11 distinct combinations of major and minor strain mutations across these clones, with two distinct haplotypes present in 6 clones apiece. Most recombinant haplotypes (n=24) are consistent with only a single recombination breakpoint. However, 7 clones are consistent with 2 breakpoints (representing 3 different haplotypes), and 1 clone is consistent with 3 distinct breakpoints.

Each row represents a sequenced clone (n=104). Colored markings denote mutations from the reference genome. Major strain mutations are those found in the Alpha variant. Minor strain mutations are those found in Epsilon variant. Other mutations are found at intermediate or low frequencies. Shared mutations are those shared by B.1 viruses.

The second cloned S region spanned 657 nt in the S gene (positions 2144222098) including the 6970 and 144 deletions characteristic of the major strain and two 2 substitutions in the minor strain. Of the 93 clones sequenced, 69 (74.1%) were major strain haplotypes, 17 (18.3%) were minor strain haplotypes, and 7 (7.5%) were mixed haplotypes (Fig.4). Five of these mixed haplotypes contained only one of the two deletions. One mixed haplotype was consistent with multiple recombination breakpoints. Unlike in the primary sequencing analyses where the 6970 and 144 deletions were present in >98% of sequences, 69-70 was observed in only 72 (77.4%) clones and 144 was observed in only 71 (76.3%). These frequencies are consistent with the frequency of the other major strain substitutions in the primary sequencing analysis.

Each row represents a sequenced clone (n=93). Colored markings denote mutations from the reference genome. Major strain mutations are those found in the Alpha variant. Minor strain mutations are those found in Epsilon variant. Other mutations are found at intermediate or low frequencies.

The third, and shortest, cloned region spanned 476 nt of ORF8 (positions 2779828273), surrounding 4 substitutions defining the major strain and 1 minor strain substitution. Of the 36 cloned sequences, 30 (83.3%) had the major strain haplotype, 2 (5.6%) had the minor variant haplotype, and 4 (11.1%) had mixed haplotypes consistent with a single recombination breakpoint (Fig.5). Note the discriminating substitutions only span 223 nt of this region.

Each row represents a sequenced clone (n=36). Colored markings denote mutations from the reference genome. Major strain mutations are those found in the Alpha variant. Minor strain mutations are those found in Epsilon variant. Other mutations are found at intermediate or low frequencies.

Three cloned sequences from the 947 nt S gene fragment contained single nucleotide deletions resulting in non-sense mutations. In the 657 nt S gene fragment, we observed 8 clones with similar deletions, detected in both the forward and reverse direction during sequencing. These deletions were seen in the non-recombinant Alpha and Epsilon haplotypes and likely reflect non-functional viral particles, expected to constitute a substantial fraction of genomes within an infected individual26,27.

In vitro recombination can be introduced by reverse-transcription and PCR amplification, which are part of both genome sequencing and cloning protocols28. These in vitro effects have a strong stochastic component and would result in substantially different recombinant haplotype frequencies across different extracts and PCR experiments. To determine the extent to which these protocols could have led to biased inference of recombination, we compared the haplotype frequencies across the four extracts from NYCPHL-002461, which had each independently been subjected to reverse transcription and PCR amplification, and the frequency of these haplotypes in the cloning experiment, which included PCR amplification.

Within the 947 nt cloned S gene fragment, the major haplotype was present between 76.4% and 78.6%, and the minor haplotype was between 13.7% and 15.4% (Supplementary Table1). The recombinant haplotype positions 23604A and 23709C was present at 3.9% allele frequency (standard deviation of 0.34% across extracts), whereas recombinant haplotype 23604C and 23709T was present at 4.3% (standard deviation of 0.37% across extracts). Although the haplotype frequencies among extracts were significantly different (p=0.029; chi-square test), the magnitude of these differences were unremarkable. Furthermore, there was no significant difference between the frequency of these haplotypes in cloning experiment and extracts (p=0.190 versus -A; p=0.189 versus -B; p=0.357 versus -C; p=0.206 versus -D; Fishers Exact Test).

A similar pattern was observed within the 476 nt cloned fragment in the ORF8 region, which included four discrimination sites: 27972, 28048, 28095, and 28111 (Supplementary Table2). The predominant recombinant haplotypes were consistent across the four extracts, and the frequencies differed only slightly (p=0.077; chi-square test). As in S, the frequency of these recombinant haplotypes in the cloning experiment was not significantly different from any of the extracts (p=0.405 versus -A; p=0.413 versus -B; p=0.199 versus -C; p=0.408 versus -D; Fishers exact test).

Hence, in vitro recombination induced by either reverse-transcription or PCR amplification, does not appear to have been the dominant contributor to the recombinant haplotype distribution reported here.

To determine whether there was onward transmission of a recombinant descendent of these major and minor strains, we queried the 27,806 genomes sequenced by NYC public health surveillance and deposited to GISAID through 5 September 2021. We tested these genomes for mosaicism (3SEQ29; with Dunn-Sidak correction for multiple comparisons) of the major and minor strains; however, we were unable to reject the null hypothesis of non-reticulate evolution for any of these genomes. We also did not find any genomes in the PHL dataset with a superset of the identifying substitutions present in the major and minor variants (e.g., C912T and C27406G) among the genomes in the PHL dataset. There is no evidence of an Alpha/Epsilon recombinant that circulated in New York City.

Since the Dunn-Sidak correction done in the 3SEQ analysis applies a conservative type-1 error threshold of 0.05, we reran the analysis using a more permissive threshold of 0.25 (see methods) and were able to reject the null hypothesis for a single genome (EPI_ISL_2965250; p=2.24106 and Dunn-Sidak corrected p=0.117). Although this genome (Fig.6) contains many of the mutations characteristic of the Alpha variant throughout the genome, it does not possess mutations unique to the major strain nor any Epsilon-specific mutations. Rather, within the putative recombinant regions, the EPI_ISL_2965250 genome has C8809T, C27925T, C28311T, and T28879G. All of these mutations are characteristic of the B.1.526 Iota-variant, prevalent in NYC in early 2021. Therefore, this genome is likely not a descendant of the major and minor strains. Instead it appears to be a recombinant descendant of Alpha and Iota viruses.

The distribution of the nucleotide variation found in the major, minor, Iota (B.1.526; EPI_ISL_1635735), and single putative recombinant (EPI_ISL_2965250) strains relative to the reference genome (Wuhan Hu-1; bottom gray sequence).

See more here:
Detection of SARS-CoV-2 intra-host recombination during superinfection with Alpha and Epsilon variants in New York City - Nature.com

Related Posts