Simplifying SNP discovery in the cotton genome

Posted: April 2, 2015 at 5:44 am

The term "single-nucleotide polymorphism" (SNP) refers to a single base change in DNA sequence between two individuals. SNPs are the most common type of genetic variation in plant and animal genomes and are, thus, an important resource to biologists. The ubiquity of these markers and the fact that these polymorphisms show variation at such a fine scale (i.e., at the individual level) makes them ideal markers for many applications, such as population-level genetic diversity studies and genetic mapping in plants.

The growing popularity of next-generation sequencing has made SNPs a pervasive genetic marker in many areas of plant biology. The ever-increasing throughput of sequencing platforms has resulted in the ability to easily identify and genotype thousands of SNPs across numerous individuals to uncover genetic variation among and within populations. This technique, however, becomes quite challenging when the species of interest has undergone whole genome duplication events (i.e., polyploidy), as is common in many plant lineages.

Researchers at Texas A&M and the Southern Plains Agricultural Research Center have developed a strategy that simplifies the discovery of useful SNPs within the complex genome of cotton. The protocol is freely available in a recent issue of Applications in Plant Sciences.

"Cotton presents a challenge for SNP marker discovery due to the polyploid origin of the two most widely grown species," says Dr. Alan Pepper, an author of the study. "All plants have duplicated sequences, whether due to whole genome duplication, duplication of segments of chromosomes, duplication by retroviruses, or duplication by unequal crossing over. When you are looking for potential SNPs, particularly without a reference genome, you run the risk of identifying sequence differences between duplicated sequences rather than differences between individuals. This problem is particularly acute in recent allopolyploids."

Allopolyploid species are the product of hybridization between two divergent taxa. The genomes of these plants, therefore, contain two very similar copies of their genes--one from each parent.

According to Pepper, "A problem arises when our computational methods accidentally align DNA regions that are duplicated within the genomes of the plants being studied, rather than mapping the orthologous regions between the plants."

Enter the strategy presented by Pepper and colleagues.

Using the Illumina next-generation sequencing platform, over 50 million DNA reads were collected from restriction enzyme-digested DNA from four Gossypium species. The team then filtered these reads to enrich for orthologous DNA fragments.

Pepper explains, "One of the exciting things about this approach is that it employs a widely used, well-supported, off-the-shelf bioinformatics software known as Stacks (written by Julian Catchen at the University of Oregon) as a "filter" to enrich for pairs of fragments that are likely to be alleles of a single, orthologous region, rather than paralogs or homeologs."

The new method allows for the detection of polymorphisms between individuals, which will be useful for downstream applications such as marker-assisted selection, linkage and QTL mapping, and genetic diversity studies.

See the rest here:
Simplifying SNP discovery in the cotton genome

Related Posts