More markers, or more populations? | Gene Expression

Here’s a letter to The American Journal of Human Genetics worth reading, Genetic Landscape of Eurasia and “Admixture” in Uyghurs:

…In the papers…by Xu and Jin, the genetic structure of Uyghurs was described by 8150 ancestry-informative markers (AIMs). These markers estimated the admixture rate of the Uyghur population to be around 50% East Asian ancestry by comparing Uyghurs to East Asians and Europeans. However, we suspect that the estimate of Xu and Jin may be considerably biased by insufficient reference population coverage….

The difference between the estimate of Xu and Jin (52%) and our estimate (31%) may stem from either the different population coverage or the sample size. We analyzed a different and larger sample of Uyghur individuals (n = 48) than that analyzed by Xu and Jin….Their small sample size may have contributed to their overestimation of the European component to admixture (i.e., to cluster assignment). However, the insufficient population coverage may be more responsible for the difference than the sample size or the number of markers. Concerning the number of markers, it is known that a relatively small but specifically selected number of AIMs can accurately predict ethnicity proportion…As the two papers of Xu and Jin have demonstrated, the estimated admixture rates reported did not change much regardless of whether they were using chromosome 21 data only or the whole genome, and thus a large number of markers may not be necessary to estimate the “admixture” rate of Uyghurs. When we analyzed only the 12 markers with the highest FST values in our samples…the Uyghurs had a 30.2% assignment at K = 2 to the Europe and Western Asia cluster. This estimate was not significantly different from the above 31.2% when using all 68 markers. We consider it unlikely that a different set of appropriately chosen SNPs would give a markedly different answer based on unpublished data on some of these same populations….

Basically the authors are arguing that you’d rather have a more diverse range of populations (to get more between population genetic variance) than just keep increasing the number of markers within individuals to really capture geographic diversity. Reference population matters. I know that 23andMe tells South Asians to expect to get back that they’re 70-90% “European,” with the balance “East Asian.” People with only Native American ancestry are going to be 75% “East Asian” and 25% “European.” These sorts of results from the reference populations are pretty misleading in my opinion. If you model the variation of all the world’s populations as the combination of variation of a few reference populations you’re getting a stylized fact which is confusing if you don’t know to interpret it correctly.

Below is figure one, where they show the difference between K = 2 and K = 6 (assume two or six ancestral populations for your data set). The map illustrates the distribution of K = 6, as the intensity of each color represents the current contribution in that region of a K ancestral group.

gr1

Related Posts

Comments are closed.