Genetic variation and structure of complete chloroplast genome in alien monoecious and dioecious Amaranthus weeds | Scientific Reports – Nature.com

Posted: May 20, 2022 at 2:03 am

Genomic features

The quadripartite structure of 22 samples of 17 species in Amaranthus consists of a large single-copy region (LSC with 83, 38284, 062bp), a small single-copy region (SSC with 17, 937 18, 124bp), and a pair of inverted repeat regions (IRs with 23, 96424, 357bp). The full length of the 22 cp genomes ranges from 149,949bp in A. polygonoides to 150, 756bp in A. albus (Table 1). The chloroplast genome sequences were deposited in GenBank (Table 1).

The total GC content was 36.5% to 36.6%, only A. albus, A. blitoides and A. polygonoides have a GC content of 36.5% (Table 1). The chloroplast genome contains a total of 133 genes, including 88 protein-coding genes, 37 tRNA genes, and 8 rRNA genes, 18 of which were duplicated in the inverted repeat regions (see Supplementary Table S1 online). The gene rps12 was trans-spliced; the 50-end exon was located in the LSC region, whereas the 30- intron and exon were duplicated and located in the inverted repeat regions. The partial duplicate of rps19 and ycf1 genes appeared as pseudogenes as they lost their protein-coding ability. 16 genes have introns.

The length of the SSC region was conserved among the subgenera by comparing the length of the chloroplast genomes of 22 individuals from 17 species. A. palmeri, A. tuberculatus and A. arenicola in subgen. Acnida were 18,02718,042bp in length, the SSC length of 5 species of subgen. Amaranthus was 17,93717,948bp, and the SSC length of 8 species of subgen. Albersia was 18,05718,124bp (Tables 1, 2). There were about 77bp InDels in ndhE-G and 180bp InDels in ndhG-I, which induced the variation of SSC length among subgenera (Table 2; see Supplementary Fig. S1 online). The frequencies of SNPs and InDels in the chloroplast genomes of the 17 species were 1.79% and 2.86%, respectively (Table 3). The frequencies of SNPs and InDels in the genes were 1.22% and 1.14%, and the frequencies of SNPs and InDels in the intergenic spacer were 3.25% and 7.32%, respectively (Table 3). In general, the variation mainly occurred in the intergenic spacer region, and InDels mainly occurred in the non-coding region (Table 3). The longest InDel was 387bp, which occurred on ycf2, followed by 384bp InDel on psbM-trnD.

Each species has 28 to 38 repeats, distributed in 30 locations, including 11 to 14 forward repeats, 11 to 17 palindromic repeats, and 6 to 8 reverse repeats ranging from 30 to 64bp in length. There were 19 common repeats locations, of which 11 had no variation and 8 had variation in length. The R3, R8, R11 and R13 had the most abundant variation (Fig.2). The R12 (forward and reverse repeats) was distributed in LSC, IRa, SSC and IRb. The R12 on SSC is almost opposite to R12 on LSC, dividing the entire circular genome into two parts of nearly equal length. The repeats on LSC were mainly concentrated near Repeat 12 (loci 29,57246,282), loci 81668327, loci 29,572 and loci 75,230. The repeats on IRs are constant within the genus. There were two common repeats in SSC, and one was a palindrome sequence shared by subgen. Acnida, subgen. Amaranthus, and A. albus.

The distribution of repeat sequences at 30 loci in Amaranthus. R is short for repeat. The red line segment R12 and the black line segments are repeats in all 17 species, the orange line segment represents a repeating sequence in some species. A repeat with only one line segment indicates that there is only one repeat at the site, and vice versa indicates that there are several different repeats at the site. The chloroplast genome figure was generated by the Geneious Prime v. 2020.1.2 software.

MISA analysis showed that each cp genome of Amaranthus contained 2939 SSRs (see Supplementary Table S4 online). On average, the number of SSR types from more to less was mono-, tetra-, di-, tri-, penta- and hexa-nucleotides in order (see Supplementary Table S4 online). About 55.56% of those SSRs were composed of A or T bases. Among all SSRs, most loci located in LSC (77.78%) and IGS (71.91%). About 12 repeat motifs were shared by all species in the genus while the remaining motifs were species-specific or subgenus-specific (see Supplementary Table S4 online). Different combinations of SSR markers could distinguish all species except A. standleynaus and A. crispus, A. dubius and A. spinosus (see Supplementary Table S4 online).

The topologies of the phylogenetic trees constructed by maximum likelihood and Bayesian methods were the same basically. A. palmeri, A. arenicola and A. tuberculatus clustered together (BS/PP=100/1) to form subgen. Acnida, or the Dioecious Clade (Fig.3). A. hybridus, A. hypochondriacus, A. dubius, A. spinosus, A. retroflexus clustered together (BS/PP=100/1) to represent subgen. Amaranthus, or the Hyridus Clade (Fig.3). And the above two clades were very close (BS/PP=100/1) (Fig.3). A. albus and A. blitoides were clustered with low/moderate value (BS/PP=35/0.84) and separated from subgen. Albersia and were closely related to subgen. Amaranthus and subgen. Acnida (BS/PP=58/0.99) (Fig.3). Among the three species of subgen. Albersia distributed in Galpagos, A. polygonoides became a single basal branch. The other two species, A. albus and A. blitoides, formed a separate clade (Galpagos Clade). The rest of subgen. Albersia were clustered into one branch, namely the ESA+South American Clade (BS/PP=100/1) (Fig.3).

A maximum likelihood topological tree based on chloroplast genome of Amaranthus and three outgroups. Values at each node indicate maximum likelihood bootstrap support (BS)/Bayesian inference posterior probability (PP) value. Individuals marked with grey backgrounds represent major monophyletic branches in the genus. The newick format files are imported into MEGA version 6 to generate the final topology tree.

The partially qualified fragment regions searched by exhaustive method were overlapped, and the overlapped regions were combined together as a hotspot region. Finally, 16 hotspot fragments with a length of 737 to 2818bp were obtained, and the SNP variation frequency ranged from 0.78 to 1.49% (see Supplementary Table S3 online). The topological trees constructed by the alignments of these 17 hot fragments and the topological trees constructed by the alignment sequences of each gene and intergenic spacer were consistent with the chloroplast genome topological tree, namely, the hotspots with more than 90% bootstrap value support for the subgen. Amaranthus, subgen. Acnida and subgen. Albersia branch (excluding A. albus, A. polygonoides, and A. blitoides) were ndhF-rpl32, ycf1 and rpoC2 (Fig.4).

Three maximum likelihood topological trees based on rpoC2, ndhF-rpl32 and ycf1 of Amaranthus and three outgroups. Values at each node indicate maximum likelihood bootstrap support (BS)/Bayesian inference posterior probability (PP) value. AMA represented the subgen. Amaranthus, ACN represented the subgen. Acnida, ALB represented the subgen. Albersia. The newick format files are imported into MEGA version 6 to generate the final topology tree.

In several similar taxa, there were 25 InDels and 11 SNPs between A. tunetanus and A. standleyanus. A. crispus and A. standleyanus had no difference. There are 46 SNPs and 144 InDels between A. arenicola and A. tuberculatus. By sequence alignment and variation analysis, it was found that trnK-UUU-atpF, trnT-UGU-atpB, psbE-clpP, rpl14-rps19, ndhF-D could be used to distinguish A. tunetanus from A. standleyanus, A. crispus, and A. arenicola from A. tuberculatus.

View post:
Genetic variation and structure of complete chloroplast genome in alien monoecious and dioecious Amaranthus weeds | Scientific Reports - Nature.com

Related Posts