The Prometheus League
Breaking News and Updates
- Abolition Of Work
- Ai
- Alt-right
- Alternative Medicine
- Antifa
- Artificial General Intelligence
- Artificial Intelligence
- Artificial Super Intelligence
- Ascension
- Astronomy
- Atheism
- Atheist
- Atlas Shrugged
- Automation
- Ayn Rand
- Bahamas
- Bankruptcy
- Basic Income Guarantee
- Big Tech
- Bitcoin
- Black Lives Matter
- Blackjack
- Boca Chica Texas
- Brexit
- Caribbean
- Casino
- Casino Affiliate
- Cbd Oil
- Censorship
- Cf
- Chess Engines
- Childfree
- Cloning
- Cloud Computing
- Conscious Evolution
- Corona Virus
- Cosmic Heaven
- Covid-19
- Cryonics
- Cryptocurrency
- Cyberpunk
- Darwinism
- Democrat
- Designer Babies
- DNA
- Donald Trump
- Eczema
- Elon Musk
- Entheogens
- Ethical Egoism
- Eugenic Concepts
- Eugenics
- Euthanasia
- Evolution
- Extropian
- Extropianism
- Extropy
- Fake News
- Federalism
- Federalist
- Fifth Amendment
- Fifth Amendment
- Financial Independence
- First Amendment
- Fiscal Freedom
- Food Supplements
- Fourth Amendment
- Fourth Amendment
- Free Speech
- Freedom
- Freedom of Speech
- Futurism
- Futurist
- Gambling
- Gene Medicine
- Genetic Engineering
- Genome
- Germ Warfare
- Golden Rule
- Government Oppression
- Hedonism
- High Seas
- History
- Hubble Telescope
- Human Genetic Engineering
- Human Genetics
- Human Immortality
- Human Longevity
- Illuminati
- Immortality
- Immortality Medicine
- Intentional Communities
- Jacinda Ardern
- Jitsi
- Jordan Peterson
- Las Vegas
- Liberal
- Libertarian
- Libertarianism
- Liberty
- Life Extension
- Macau
- Marie Byrd Land
- Mars
- Mars Colonization
- Mars Colony
- Memetics
- Micronations
- Mind Uploading
- Minerva Reefs
- Modern Satanism
- Moon Colonization
- Nanotech
- National Vanguard
- NATO
- Neo-eugenics
- Neurohacking
- Neurotechnology
- New Utopia
- New Zealand
- Nihilism
- Nootropics
- NSA
- Oceania
- Offshore
- Olympics
- Online Casino
- Online Gambling
- Pantheism
- Personal Empowerment
- Poker
- Political Correctness
- Politically Incorrect
- Polygamy
- Populism
- Post Human
- Post Humanism
- Posthuman
- Posthumanism
- Private Islands
- Progress
- Proud Boys
- Psoriasis
- Psychedelics
- Putin
- Quantum Computing
- Quantum Physics
- Rationalism
- Republican
- Resource Based Economy
- Robotics
- Rockall
- Ron Paul
- Roulette
- Russia
- Sealand
- Seasteading
- Second Amendment
- Second Amendment
- Seychelles
- Singularitarianism
- Singularity
- Socio-economic Collapse
- Space Exploration
- Space Station
- Space Travel
- Spacex
- Sports Betting
- Sportsbook
- Superintelligence
- Survivalism
- Talmud
- Technology
- Teilhard De Charden
- Terraforming Mars
- The Singularity
- Tms
- Tor Browser
- Trance
- Transhuman
- Transhuman News
- Transhumanism
- Transhumanist
- Transtopian
- Transtopianism
- Ukraine
- Uncategorized
- Vaping
- Victimless Crimes
- Virtual Reality
- Wage Slavery
- War On Drugs
- Waveland
- Ww3
- Yahoo
- Zeitgeist Movement
-
Prometheism
-
Forbidden Fruit
-
The Evolutionary Perspective
Category Archives: Human Genetics
Genotyping, sequencing and analysis of 140,000 adults from Mexico … – Nature.com
Posted: October 16, 2023 at 6:42 am
Recruitment of study participants
The MCPS was established in the late 1990s following discussions between Mexican scientists at the National Autonomous University of Mexico (UNAM) and British scientists at the University of Oxford about how best to measure the changing health effects of tobacco in Mexico. These discussions evolved into a plan to establish a prospective cohort study that could investigate not only the health effects of tobacco but also those of many other factors (including factors measurable in the blood)1. Between 1998 and 2004, more than 100,000 women and 50,000 men 35years of age or older (mean age 50years) agreed to take part, were asked questions, had physical measurements taken, gave a blood sample and agreed to be tracked for cause-specific mortality. More women than men were recruited because the study visits were predominantly made during working hours when women were more likely to be at home (although visits were extended into the early evenings and at weekends to increase the proportion of men in the study).
Participants were recruited from randomly selected areas within two contiguous city districts (Coyoacn and Iztapalapa). These two districts have existed since the pre-Hispanic period and are geographically close to the ancient Aztec city of Tenochtitlan. Originally, Indigenous populations settled there, but over the centuries, the population dynamics have substantially changed. Many people from Spain, including the conqueror Hernn Corts, resided in Coyoacn while the capital of New Spain was being built over the ruins of Tenochtitlan. The modern populations of Coyoacn and Iztapalapa derive largely from the development of urban settlements and migrations from the 1950s to the 1970s. Over this period, both districts, but particularly Iztapalapa, received large numbers of Indigenous migrants from the central (Nahuas, Otomies and Purepechas), south (Mixtecos, Zapotecos and Mazatecos) and southeast (Chinantecos, Totonacas and Mayas) regions of the country.
At recruitment, a 10-ml venous EDTA blood sample was obtained from each participant and transferred to a central laboratory using a transport box chilled (410C) with ice packs. Samples were refrigerated overnight at 4C and then centrifuged (2,100g at 4C for 15min) and separated the next morning. Plasma and buffy-coat samples were stored locally at 80C, then transported on dry ice to Oxford (United Kingdom) for long-term storage over liquid nitrogen. DNA was extracted from buffy coat at the UK Biocentre using Perkin Elmer Chemagic 360 systems and suspended in TE buffer. UV-VIS spectroscopy using Trinean DropSense96 was used to determine yield and quality, and samples were normalized to provide 2g DNA at 20ngl1 concentration (2% of samples provided a minimum 1.5g DNA at 10ngl1 concentration) with a 260:280nm ratio of >1.8 and a 260:230nm ratio of 2.02.2.
Genomic DNA samples were transferred to the Regeneron Genetics Center from the UK Biocentre and stored in an automated sample biobank at 80C before sample preparation. DNA libraries were created by enzymatically shearing DNA to a mean fragment size of 200bp, and a common Y-shaped adapter was ligated to all DNA libraries. Unique, asymmetric 10bp barcodes were added to the DNA fragment during library amplification to facilitate multiplexed exome capture and sequencing. Equal amounts of sample were pooled before overnight exome capture, with a slightly modified version of IDTs xGenv1 probe library; all samples were captured on the same lot of oligonucleotides. The captured DNA was PCR amplified and quantified by quantitative PCR. The multiplexed samples were pooled and then sequenced using 75bp paired-end reads with two 10bp index reads on an Illumina NovaSeq 6000 platform on S4 flow cells. A total of 146,068 samples were made available for processing. We were unable to process 2,628 samples, most of which failed QC during processing owing to low or no DNA being present. A total of 143,440 samples were sequenced. The average 20 coverage was 96.5%, and 98.7% of the samples were above 90%.
Of the 143,440 samples sequenced, 2,394 (1.7%) did not pass one or more of our QC metrics and were subsequently excluded. Criteria for exclusion were as follows: disagreement between genetically determined and reported sex (n=1,032); high rates of heterozygosity or contamination (VBID>5%) (n=249); low sequence coverage (less than 80% of targeted bases achieving 20 coverage) (n=29); genetically identified sample duplicates (n=1,062 total samples); WES variants discordant with the genotyping chip (n=8); uncertain linkage back to a study participant (n=259); and instrument issue at DNA extraction (n=6). The remaining 141,046 samples were then used to compile a project-level VCF (PVCF) for downstream analysis using the GLnexus joint genotyping tool. This final dataset contained 9,950,580 variants.
Approximately 250ng of total DNA was enzymatically sheared to a mean fragment size of 350bp. Following ligation of a Y-shaped adapter, unique, asymmetric 10bp barcodes were added to the DNA fragments with three cycles of PCR. Libraries were quantified by quantitative PCR, pooled and then sequenced using 150bp paired-end reads with two 10bp index reads on an Illumina NovaSeq 6000 platform on S4 flow cells. A total of 10,008 samples were sequenced. This included 200 motherfatherchild trios and 3more extended pedigrees. The rest of the samples were chosen to be unrelated to third degree or closer and enriched for parents of nuclear families. The average mean coverage was 38.5 and 99% of samples had mean coverages of >30, and all samples were above 27.
Of the 10,008 samples that were whole-genome sequenced, 58 (0.6%) did not pass one or more of our QC metrics and were subsequently excluded. Reasons for exclusion were as follows: disagreement between genetically determined and reported sex (n=16); high rates of heterozygosity or contamination (VBID>5%) (n=10); genetically identified sample duplicates (n=19 total samples); and uncertain linkage back to a study participant (n=14). The remaining 9,950 samples were then used to compile a PVCF for downstream analysis using the GLnexus joint genotyping tool. This final dataset contained 158,464,363 variants.
The MCPS WES and WGS data were reference-aligned using the OQFE protocol35, which uses BWA MEM to map all reads to the GRCh38 reference in an alt-aware manner, marks read duplicates and adds additional per-read tags. The OQFE protocol retains all reads and original quality scores such that the original FASTQ is completely recoverable from the resulting CRAM file. Single-sample variants were called using DeepVariant (v.0.10.0) with default WGS parameters or custom exome parameters35, generating a gVCF for each input OQFE CRAM file. These gVCFs were aggregated and joint-genotyped using GLnexus (v.1.3.1). All constituent steps of this protocol were executed using open-source software.
Similar to other recent large-scale sequencing efforts, we implemented a supervised machine-learning algorithm to discriminate between probable low-quality and high-quality variants8,12. In brief, we defined a set of positive control and negative control variants based on the following criteria: (1) concordance in genotype calls between array and exome-sequencing data; (2) transmitted singletons; (3) an external set of likely high quality sites; and (4) an external set of likely low quality sites. To define the external high-quality set, we first generated the intersection of variants that passed QC in both TOPMed Freeze8 and gnomADv.3.1 genomes. This set was additionally restricted to 1000 genomes phase1 high-confidence SNPs from the 1000Genomes project36 and gold-standard insertions and deletions from the 1000Genomes project and a previous study37, both available through the GATK resource bundle (https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle). To define the external low-quality set, we intersected gnomADv3.1 fail variants with TOPMed Freeze8 Mendelian or duplicate discordant variants. Before model training, the control set of variants were binned by allele frequency and then randomly sampled such that an equal number of variants were retained in the positive and negative labels across each frequency bin. A support vector machine using a radial basis function kernel was then trained on up to 33 available site quality metrics, including, for example, the median value for allele balance in heterozygote calls and whether a variant was split from a multi-allelic site. We split the data into training (80%) and test (20%) sets. We performed a grid search with fivefold cross-validation on the training set to identify the hyperparameters that returned the highest accuracy during cross-validation, which were then applied to the test set to confirm accuracy. This approach identified a total of 616,027 WES and 22,784,296 WGS variants as low-quality (of which 161,707 and 104,452 were coding variants, respectively). We further applied a set of hard filters to exclude monomorphs, unresolved duplicates, variants with >10% missingness, 3 mendel errors (WGS only) or failed HardyWeinberg equilibrium (HWE) with excess heterozgosity (HWE P<11030 and observed heterozygote count of >1.5 expected heterozygote count), which resulted in a dataset of 9,325,897 WES and 131,851,586 WGS variants (of which 4,037,949 and 1,460,499 were coding variants, respectively).
Variants were annotated as previously described38. In brief, variants were annotated using Ensembl variant effect predictor, with the most severe consequence for each variant chosen across all protein-coding transcripts. In addition, we derived canonical transcript annotations based on a combination of MANE, APPRIS and Ensembl canonical tags. MANE annotation was given the highest priority followed by APPRIS. When neither MANE nor APPRIS annotation tags were available for a gene, the canonical transcript definition of Ensembl was used. Gene regions were defined using Ensembl release 100. Variants annotated as stop gained, start lost, splice donor, splice acceptor, stop lost or frameshift, for which the allele of interest was not the ancestral allele, were considered predicted loss-of-function variants. Five annotation resources were utilized to assign deleteriousness to missense variants: SIFT; PolyPhen2 HDIV and PolyPhen2 HVAR; LRT; and MutationTaster. Missense variants were considered likely deleterious if predicted deleterious by all five algorithms, possibly deleterious if predicted deleterious by at least one algorithm and likely benign if not predicted deleterious by any algorithm.
Samples were genotyped using an Illumina Global Screening Array (GSA) v.2 beadchip according to the manufacturers recommendations. A total of 146,068 samples were made available for processing, of which 145,266 (99.5%) were successfully processed. The average genotype call rate per sample was 98.4%, and 98.4% of samples had a call rate above 90%. Of the 145,266 samples that were genotyped, 4,435 (3.1%) did not pass one or more of our QC metrics and were subsequently excluded. Reasons for exclusion were as follows: disagreement between genetically determined and reported sex (n=1,827); low-quality samples (call rates below 90%) (n=2,276); genotyping chip variants discordant with exome data (n=44); genetically identified sample duplicates (n=1,063 total samples); uncertain linkage back to a study participant (n=268); and sample affected by an instrument issue at DNA extraction (n=6). The remaining 140,831 samples were then used to compile a PVCF for downstream analysis. This dataset contained 650,380 polymorphic variants.
The input array data from the RGC Sequencing Laboratory consisted of 140,831 samples and 650,380 variants and were passed through the following QC steps: checks for consistency of genotypes in sex chromosomes (steps14); sample-level and variant-level missingness filters (steps 5 and 6); the HWE exact test applied to a set of 81,747 third-degree unrelated samples, which were identified from the initial relatedness analysis using Plink and Primus (step7); setting genotypes with Mendelian errors in nuclear families to missing (step8); and a second round of steps57 (step9). Plink commands associated with each step are displayed in column2 (Supplementary Table 9). The final post-QC array data consisted of 138,511 samples and 559,923 variants.
We used Shapeit (v.4.1.3; https://odelaneau.github.io/shapeit4) to phase the array dataset of 138,511 samples and 539,315 autosomal variants that passed the array QC procedure. To improve the phasing quality, we leveraged the inferred family information by building a partial haplotype scaffold on unphased genotypes at 1,266 trios from 3,475 inferred nuclear families identified (randomly selecting one offspring per family when there was more than one). We then ran Shapeit one chromosome at a time, passing the scaffold information with the --scaffold option.
We separately phased the support-vector-machine-filtered WES and WGS datasets onto the array scaffold. The phased WGS data constitute the MCPS10k reference panel. For the WGS phasing, we used WhatsHap (https://github.com/whatshap/whatshap) to extract phase information in the sequence reads and from the subset of available trios and pedigrees, and this information was fed into Shapeit (v.4.2.2; https://odelaneau.github.io/shapeit4) through the --use-PS 0.0001 option. Phasing was carried out in chunks of 10,000 and 100,000 variants (WES and WGS, respectively) and using 500 SNPs from the array data as a buffer at the beginning and end of each chunk. The use of the phased scaffold of array variants meant that chunks of phased sequencing data could be concatenated together to produce whole chromosome files that preserved the chromosome-wide phasing of array variants. A consequence of this process is that when a variant appeared in both the array and sequencing datasets, the data from the array dataset were used.
To assess the performance of the WGS phasing process, we repeated the phasing of chromosome2 by removing the children of the 200 motherfatherchild trios. We then compared the phase of the trio parents to that in the phased dataset that included the children. We observed a mean switch error rate of 0.0024. Without using WhatsHap to leverage phase information in sequencing reads, the mean switch error rate increased to 0.0040 (Supplementary Fig. 23).
The relatedness-inference criteria and relationship assignments were based on kinship coefficients and probability of zero IBD sharing from the KING software (https://www.kingrelatedness.com). We reconstructed all first-degree family networks using PRIMUS (v.1.9.0; https://primus.gs.washington.edu/primusweb) applied to the IBD-based KING estimates of relatedness along with the genetically derived sex and reported age of each individual. In total, 99.3% of the first-degree family networks were unambiguously reconstructed. To visualize the relationship structure in the MCPS, we used the software Graphviz (https://graphviz.org) to construct networks such as those presented in Supplementary Fig. 5. We used the sfdp layout engine which uses a spring model that relies on a force-directed approach to minimize edge length.
To identify IBD segments and to measure ROH, we ran hap-ibd (v.1.0; https://github.com/browning-lab/hap-ibd) using the phased array dataset of 138,511 samples and 538,614 sites from autosomal loci. Hap-ibd was run with the parameter min-seed=4, which looks for IBD segments that are at least 4cM long. We filtered out IBD segments in regions of the genome with fourfold more or fourfold less than the median coverage along each chromosome following the procedure in IBDkin (https://github.com/YingZhou001/IBDkin), and filtered out segments overlapping regions with fourfold less than the median SNP marker density (Supplementary Fig. 28). For the homozygosity analysis, we intersected the sample with the exome data to evaluate loss-of-function variants, which resulted in a sample of 138,200. We further overlaid the ROH segments with local ancestry estimates, and assigned ancestry where the ancestries were concordant between haplotypes and posterior probability was >0.9, assigning ancestry to 99.8% of the ROH.
We used the workflow implemented in the R package bigsnpr (https://privefl.github.io/bigsnpr). In brief, pairwise kinship coefficients were estimated using Plink (v.2.0) and samples were pruned for first-degree and second-degree relatedness (kinship coefficient<0.0884) to obtain a set of unrelated individuals. LD clumping was performed with a default LD r2 threshold of 0.2, and regions with long-range LD were iteratively detected and removed using a procedure based on evaluating robust Mahalanobis distances of PC loadings. Sample outliers were detected using a procedure based on K-nearest neighbours. PC scores and loadings for the first 20 PCs were efficiently estimated using truncated singular value decomposition (SVD) of the scaled genotype matrix. After removal of variant and sample outliers, a final iteration of truncated SVD was performed to obtain the PCA model. The PC scores and loadings from this model were then used to project withheld samples, including related individuals, into the PC space defined by the model using the online augmentation, decomposition and procustes algorithm. For each PC analysis in this study, variants with MAF<0.01 were removed.
Admixture (v.1.3.0; https://dalexander.github.io/admixture) was used to estimate ancestry proportions in a set of 3,964 reference samples representing African, European, East Asian, and American ancestries from a dataset of merged genotypes. This included 765 samples of African ancestry from 1000Genomes (n=661) and HGDP (n=104), 658 samples of European ancestry from 1000Genomes (n=503) and HGDP (n=155), 727 samples of East Asian ancestry from 1000Genomes (n=504) and HGDP (n=223), and 1,814 American samples, including 716 Indigenous Mexican samples from the MAIS study, 64 admixed Mexican American samples from MXL, 21 Maya and 13 Pima samples from HGDP, and 1,000 unrelated Mexican samples from the MCPS. Included SNPs were limited to variants present on the Illumina GSAv.2 genotyping array for which TOPMed-imputed variants in the MAIS study had information r20.9 (m=199,247 SNPs). To select the optimum number of ancestry populations (K) to include in the admixture model, fivefold cross validation was performed for each K in the set 4 to 25 with the cv flag. To obtain ancestry proportion estimates in the remaining set of 137,511 MCPS samples, the population allele frequencies (P) estimated from the analysis of reference samples were fixed as parameters so that the remaining samples could be projected into the admixture model. Projection was performed for the K=4 model and for the K=18 model that produced the lowest cross-validation error, and point estimation was attained using the block relaxation algorithm.
The MAIS genotyping datasets were obtained from L.Orozco from Insituto Nacional de Medicina Genmica. For 644 samples, genotyping was performed using an Affymetrix Human 6.0 array (n=599,727 variants). An additional 72 samples (11 ancestry populations) were genotyped using an Illumina Omni 2.5 array (n=2,397,901 variants). The set of 716 Indigenous samples represent 60 of out the 68 recognized ethnic populations in Mexico3. Per chromosome, VCFs for each genotyping array were uploaded to the TOPMed imputation server (https://imputation.biodatacatalyst.nhlbi.nih.gov) and imputed from a multi-ethnic reference panel of 97,256 whole genomes. Phasing and imputation were performed using the programs eagle and MiniMac, respectively. The observed coefficient of determination (r2) for the reference allele frequency between the reference panel and the genotyping array was 0.696 and 0.606 for the Affymetrix and Illumina arrays, respectively.
Physical positions of imputed variants were mapped from genome build GRCh37 to GRCh38 using the program LiftOver, and only variant positions included on the Affymetrix GSA v.2 were retained. After further filtering out variants with imputation information r2<0.9, the following QC steps were performed before merging of the MAIS Affymetrix and Illumina datasets: (1) removal of ambiguous variants (that is, A/T and C/G polymorphisms); (2) removal of duplicate variants; (3) identifying and correcting allele flips; and (4) removal of variants with position mismatches. Merging was performed using the --bmerge command in Plink (v.1.9).
We used publicly available genotypes from the HGDP (n=929) and the 1000Genomes project (n=2,504). To obtain a combined global reference dataset for downstream analyses of population structure, admixture and local ancestry, the HGDP and 1000Genomes datasets were merged. The resulting merged public reference dataset was subsequently merged with the MAIS dataset and MCPS genotyping array dataset. Each merge was performed using the bmerge function in Plink (v.1.9; https://www.cog-genomics.org/plink) after removing ambiguous variants, removing duplicate variants, identifying and correcting allele flips, and removing variants with position mismatches. The combined global reference dataset comprised 199,247 variants and 142,660 samples.
To characterize genetic admixture within the MCPS cohort, we performed a seven-way LAI analysis with RFMix (v.2.0; https://github.com/slowkoni/rfmix) that included reference samples from the HGDP and 1000Genomes studies, and Indigenous samples from the MAIS study. This merged genotyping dataset of samples across these studies with the 138,511 MCPS participants included 204,626 autosomal variants and 5,363 chromosomeX variants.
To identify reference samples with extensive admixture to exclude from LAI, we performed admixture analysis with the program TeraSTRUCTURE (https://github.com/StoreyLab/terastructure) on a merged genotyping dataset (n=3,274) that included African (AFR), European (EUR) and American (AMR) samples from the HGDP, 1000Genomes and MAIS studies, and 1,000 randomly selected unrelated MCPS samples. Following the recommended workflow in the TeraSTRUCTURE documentation (https://github.com/StoreyLab/terastructure), we varied the rfreq parameter from the set of {0.05, 0.10, 0.15, 0.20} of autosomal variants with K=4 and selected the value that maximized the validation likelihood (20% of autosomal variants; rfreq=45,365). We then varied the K parameter and ran it in triplicate to identify the value that attained a maximal average validation likelihood (K=18). Each of the estimated K ancestries was assigned to a global superpopulation (that is, AFR, EUR and AMR), and the cumulative K ancestry proportion was used as an ancestry score for selecting reference samples. Using an ancestry score threshold of 0.9, 666 AFR, 659 EUR and 616 AMR samples were selected as reference samples. The AMR samples used for seven-way LAI comprised 98 Mexico_North, 42 Mexico_Northwest, 185 Mexico_Central, 128 Mexico_South and 163 Mexico_Southeast individuals.
Reference samples were phased using Shapeit (v.4.1.2; https://odelaneau.github.io/shapeit4) with default settings, and the phasing of the 138,511 MCPS participants was performed as described above (see the section Array phasing). Seven-way LAI was performed using RFMix (v.2.0), with the number of terminal nodes for the random forest classifier set to 5 (-n 5), the average number of generations since expected admixture set to 15 (-G 15), and ten rounds of expectation maximization (EM) algorithm (-e 10). Global ancestry proportion estimates were derived by taking the average per-chromosome Q estimates (weighted by chromosome length) for each of the seven ancestries (that is, AFR, EUR, Mexico_North, Mexico_Northwest, Mexico_Central, Mexico_South and Mexico_Southeast). Inferred three-way global ancestry proportion estimates were obtained by combining proportions for each of the five Indigenous Mexican populations into a single AMR category.
To delineate local ancestry segments for use in the estimation of ancestry-specific allele frequencies (see the section Ancestry-specific allele frequency estimation), we performed a three-way LAI analysis using a merged genotyping dataset that excluded the MAIS samples as this afforded greater genotyping density (493,036 autosomal variants and 12,798 chromosomeX variants). Before LAI analysis, reference samples were selected using the same workflow for TeraSTRUCTURE as described above, with modifications being the inclusion of 10,000 unrelated MCPS participants and an ancestry threshold of 0.95. RFMix was applied as described above, with modifications being the use of 753 AFR, 649 EUR and 91 AMR reference samples, specification of 5 rounds of EM (-e 5), and use of the --reanalyze-reference option, which treated reference haplotypes as if they were query haplotypes and updated the set of reference haplotypes in each EM round.
To measure the correlation in ancestry between partner pairs, we used a linear model to predict ancestry of each partner using the ancestry of their spouse, education level (four categories) and district (Coyoacn and Iztapalapa) of both partners.
We averaged local ancestry dosages (estimated using RFMix at 98,012 positions along the genome) from 78,833 unrelated MCPS samples and performed a per-ancestry scan testing for deviation of local ancestry proportion from the global ancestry proportion19. The test is based on assumptions of binomial sampling and normal approximation for the sample mean. The global ancestry proportion for each ancestry was estimated as a robust average over local ancestry using the Tukeys biweight robust mean. The scan was performed in all autosomes separately for African, European and Indigenous Mexican ancestries with the significance threshold 1.7107=0.05/(98, 0123), which accounts for the number of local ancestry proportions tested and the three ancestries.
IBD segments from hapIBD were summed across pairs of individuals to create a network of IBD sharing represented by the weight matrix (Win {{mathbb{R}}}_{ge 0}^{ntimes n}) for n samples. Each entry ({w}_{{ij}}in W) gives the total length in cM of the genome that individuals i and j share identical by descent. We sought to create a low-dimensional visualization of the IBD network. We used a similar approach to that described in ref. 14, which used the eigenvectors of the normalized graph Laplacian as coordinates for a low-dimensional embedding of the IBD network. Let D be the degree matrix of the graph with ({d}_{{ii}}=sum _{{j}}{w}_{{ij}}) and 0 elsewhere. The normalized (random walk) graph Laplacian is defined to be (L=I-{D}^{-1}W), where I is the identity matrix.
The matrix L is positive semi-definite, with eigenvalues (0={lambda }_{0}le {lambda }_{1}le cdots le {lambda }_{n-1}). The multiplicity of eigenvalue 0 is determined by the number of connected components in the IBD network. If L is fully connected, the eigenvector associated with eigenvalue 0 is constant, whereas the remaining eigenvectors can be used to compute a low-dimensional representation of the IBD network. If p is the desired dimension, and u1,, up the bottom 1p eigenvectors of L (indexed from 0), the matrix (Uin {{mathbb{R}}}^{ntimes p}) with columns u1,, up define a low-dimensional representation of each individual in the IBD network39. In practice, we solved the generalized eigenvalue problem to obtain u1,, up.
If u is an eigenvector of L with eigenvalue , then u solves the generalized eigenvalue problem with eigenvalue 1.
To apply to the IBD network of the MCPS cohort, we first removed edges with weight >72cM as previously done14. We did this to avoid the influence on extended families on the visualization. We next extracted the largest connected component from the IBD network, and computed the bottom u1,, u20 eigenvectors of the normalized graph Laplacian.
To examine fine-scale population structure using haplotype sharing, we calculated a haplotype copying matrix L using Impute5 (https://jmarchini.org/software/#impute-5) with entries Lij that are the length of sequence individual i copies from individual j. Impute5 uses a scalable imputation method that can handle very large haplotype reference panels. At its core is an efficient Hidden Markov model that can estimate the local haplotype sharing profile of a target haplotype with respect to a reference set of haplotypes. To avoid the costly computations of using all the reference haplotypes, an approach based on the PBWT data structure was used to identify a subset of reference haplotypes that led to negligible loss of accuracy. We leveraged this methodology to calculate the copying matrix L, using array haplotypes from a set of 58,329 unrelated individuals as both target and reference datasets, and used the --ohapcopy ban-repeated-sample-names flags to ban each target haplotype being able to copy itself. SVD on a scaled centred matrix was performed using the bigstatsr package (https://cran.r-project.org/web/packages/bigstatsr/index.html) to generate 20 PCs. This is equivalent to an eigen-decomposition of the variance-covariance matrix of recipients shared segment lengths.
We imputed the filtered array dataset using both the MCPS10k reference panel and the TOPMed imputation server. For TOPMed imputation, we used Plink2 to convert this dataset from Plink1.9 format genotypes to unphased VCF genotypes. For compatibility with TOPMed imputation server restrictions, we split the samples in this dataset into six randomly assigned subsets of about 23,471 samples, and into chromosome-specific bgzipped VCF files. Using the NIH Biocatalyst API (https://imputation.biodatacatalyst.nhlbi.nih.gov), we submitted these six jobs to the TOPMed imputation server. Following completion of all jobs, we used bcftools merge to join the resulting dosage VCFs spanning all samples. For the MCPS10k imputation, we used Impute5 (v.1.1.5). Each chromosome was split into chunks using the imp5Chunker program with a minimum window size of 5Mb and a minimum buffer size of 500kb. Information scores were calculated using qctool (https://www.well.ox.ac.uk/~gav/qctool_v2/).
The 1000Genomes WGS genotype VCF files were downloaded (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/) and filtered to remove sites that are multi-allelic sites, duplicated, have missingness >2%, HardyWeinberg P<1108 in any subpopulation and MAF<0.1% in any subpopulation. We used only those 490 AMR samples in the MXL, CLM, PUR and PEL subpopulations. We constructed two subsets of genotypes on chromosome2 from the Illumina HumanOmniExpressExome (8.v1-2) and Illumina GSA (v.2) arrays, and these were used as input to the TOPMed and MCPS10k imputation pipelines.
We measured imputation accuracy by comparing the imputed dosage genotypes to the true (masked) genotypes at variants not on the arrays. Markers were binned according to the MAF of the marker in 490 AMR samples. In each bin, we report the squared correlation (r2) between the concatenated vector of all the true (masked) genotypes at markers and the vector of all imputed dosages at the same markers. Variants that had a missing rate of 100% in the WGS dataset before phasing were removed from the imputation assessment.
The LAI results consist of segments of inferred ancestry across each haplotype of the phased array dataset. As the WES and WGS alleles were phased onto the phased array scaffold, we inferred the ancestry of each exome allele using interpolation from the ancestry of the flanking array sites. For each WES and WGS variant on each phased haplotype, we determined the RFMix ancestry probability estimates at the two flanking array sites and used their relative base-pair positions to linearly interpolate their ancestry probabilities. For a given site, if ({p}_{{ijk}}) is the probability that the jth allele of the ith individual is from population k, and Gij is the 0/1 indicator of the non-reference allele for the jth allele of the ith individual then the weighted allele count (ACk), the weight allele number (ANk) and the allele frequency (k) of the kth population is given by
$${{rm{AC}}}_{k}=mathop{sum }limits_{i=1}^{n}mathop{sum }limits_{j=1}^{2}{p}_{ijk}{G}_{ij},,{{rm{AN}}}_{k}=mathop{sum }limits_{i=1}^{n}mathop{sum }limits_{j=1}^{2}{p}_{ijk},,{theta }_{k}=frac{{{rm{AC}}}_{k}}{{{rm{AN}}}_{k}}$$
An estimate of the effective sample size for population k at the site is ({n}_{k}={{rm{AN}}}_{k}/2). Singleton sites can be hard to phase using existing methods. Family information and phase information in sequencing reads was used in the WGS phasing, and this helped to phase a proportion of the singleton sites. In the WES dataset, we found that 46% of exome singletons occurred in stretches of heterozygous ancestry. For these variants, we gave equal weight to the two ancestries when estimating allele frequencies.
To validate the MCPS allele frequencies, we downloaded the gnomAD v.3.1 reference dataset (https://gnomad.broadinstitute.org) and retained only high-quality variants annotated as passed QC (FILTER=PASS), SNVs, outside low-complexity regions and with the number of called samples greater than 50% of the total sample size (n=76,156). We additionally overlapped gnomAD variants with TOPMed Freeze8 high-quality variants (FILTER=PASS) (https://bravo.sph.umich.edu/freeze8/hg38). We further merged gnomAD variants and MCPS exome variants by the chromosome, position, reference allele and alternative allele names and excluded MCPS singletons, which were heterozygous in ancestry. This process resulted in 2,249,986 overlapping variants available for comparison with the MCPS WES data. Median sample sizes in gnomAD non-Finish Europeans, African/Admixed African and Admixed American populations were 34,014, 20,719 and 7,639, respectively.
To investigate the effect of relatedness on allele frequency estimates, we implemented a method to compute relatedness-corrected allele frequencies using identical-by-descent (IBD) segments. This method computes allele frequencies at a locus by clustering alleles inherited IBD from a common ancestor, then counting alleles once per common ancestor rather than once per sample. Because IBD sharing is affected by both demography and relatedness, we limited IBD sharing to segments between third-degree relatives or closer. Conceptually, this is equivalent to tracing the genealogy of a locus back in time across all samples until no third-degree relatives remain, then computing allele frequencies in the ancestral sample.
We estimated allele frequencies in two steps. First, we constructed a graph based on IBD sharing at a locus. Second, we estimated allele counts and allele numbers by counting the connected components of the IBD graph. Our approach is similar to the DASH haplotype clustering approach40. However, we make different assumptions about how errors affect the IBD graph and additionally compute ancestry-specific frequencies using local ancestry inference estimates.
To construct the IBD graph, suppose we have genotyped and phased N diploid samples at L biallelic loci. For each locus l we construct an undirected graph Gl=(Vl,El) describing IBD sharing among haplotypes. Let the tuple (i, j)l represent haplotype j of sample i at locus l, and let ({h}^{{left(i,jright)}_{l}}in {mathrm{0,1}}) be the allele itself. Define
$$begin{array}{l}{V}_{l},=,{{(i,j)}_{l}:{rm{for}},1le jle 2,{rm{and}},1le ile N}\ {E}_{l},=,{({(i,j)}_{l},{(s,t)}_{l}):{h}^{{(i,j)}_{l}},{rm{and}},{h}^{{(s,t)}_{l}},{rm{are}},{rm{IBD}}}.end{array}$$
In words, the set of vertices V constitute all haplotypes at locus l. Each edge in E is between a pair of haplotypes that fall on the same IBD segment (Supplementary Fig. 25).
If IBD segments are observed without error, then each maximal clique of Gl represents a set of haplotypes descended from a common ancestor. In practice, edges will be missing owing to errors in IBD calling. Thus, what we observe are sets of connected components rather than maximal cliques. Because we limited edges to pairs of third-degree relatives or closer, we assumed missing edges in connected components are false negatives and included them. We additionally removed edges between haplotypes for which the observed alleles conflicted.
Given an IBD graph Gl=(Vl, El) for a locus l, we estimated alternative allele counts and allele numbers by counting the connected components of the graph. Let Cl1,,Clm be the connected components of Gl. Let CALT={Cim: haplotypes in Cim have the ALT allele} and CREF={Cim: haplotypes in Cim have the REF allele}
Then
$$begin{array}{l}AC=| {C}_{{rm{ALT}}}| \ AN=| {C}_{{rm{ALT}}}| +| {C}_{{rm{REF}}}| \ AF=AC,/,ANend{array}$$
We additionally used LAI estimates to compute ancestry-specific frequencies. Let ({p}^{{(i,j)}_{l}}in {{mathbb{R}}}^{K}) be the vector of probabilities that an allele on haplotype j from sample i at locus l comes from one of K populations. For each connected component, we averaged local ancestry estimates
$${bar{p}}_{{C}_{im}}=frac{1}{|{C}_{lm}|}{sum }_{{(i,j)}_{l}in {C}_{lm}}{p}^{{(i,j)}_{l}}$$
We computed a vector of weighted allele counts W and allele numbers N by
$$begin{array}{l}W={sum }_{Cin {C}_{{rm{ALT}}}}{bar{p}}_{C}\ N={sum }_{Cin {C}_{{rm{ALT}}}}{bar{p}}_{C}+{sum }_{Cin {C}_{{rm{REF}}}}{bar{p}}_{C}end{array}$$
Ancestry-specific frequencies were estimated by dividing each component of W by the corresponding component of N.
For singletons for which the phasing of haplotypes was unknown, we averaged local ancestry estimates from haplotypes in the sample.
To generate source datasets for assessing trans-ancestry portability of BMI PRS, whole genome regression was performed using Regenie (https://rgcgithub.github.io/regenie/) in individuals in the MCPS and in a predominantly European-ancestry cohort from the UK Biobank. Individuals with type2 diabetes (ICD10 code E11 or self-reported) were excluded. BMI values underwent rank-based inverse normal transformation (RINT) by sex and ancestry; models were additionally adjusted for age, age2 and technical covariates (UK Biobank). The Regenie summary statistics from the UK Biobank were used to generate a BMI PRS in MCPS; conversely, MCPS summary statistics were applied to UK Biobank statistics.
To avoid overfitting with respect to selection of a PRS algorithm and its associated tuning parameters, LDpred (https://github.com/bvilhjal/ldpred) with value of 1 was chosen from a recent publication of BMI and obesity27. Summary statistics were restricted to HapMap3 variants and followed existing filtering recommendations. In the MCPS, two PRS values were generated; imputed variants were obtained from the MCPS10k reference panel or the TOPMed panel. In the UK Biobank data, PRS values were calculated separately by continental ancestry (African, East Asian, European, Latino, South Asian), determined from a likelihood-based inference approach8 in a merged dataset of variants from UK Biobank and the 1000Genomes project.
To evaluate PRS performance, BMI values were transformed (RINT) by sex and ancestry and regressed on PRS, age and age2. As for the generation of summary statistics, individuals with diabetes were excluded from the analysis. PRS accuracy was assessed by incrementalR2 (proportional reduction in regression sum of squares error between models with and without BMI PRS). Additionally, raw BMI values with PRS, age, age2, sex and ancestry were modelled to obtain per BMI PRS standard deviation effect-size estimates. The impact of ancestry differences on source summary statistics compared to target PRS was assessed with two approaches. For the MCPS, individuals were divided into quantiles by estimated Indigenous Mexican Ancestry using the LAI approach described above. For the UK Biobank, metrics were calculated within each 1000Genomes-based continental ancestry.
The MCPS represents a long-standing scientific collaboration between researchers at the National Autonomous University of Mexico and the University of Oxford, who jointly established the study in the mid-1990s and have worked together on it ever since. Blood sample collection and processing were funded by a Wellcome Trust grant to the Mexican and Oxford investigators. However, at the time, no funding was requested to create an appropriate long-term sample storage facility in Mexico City. Therefore, the Mexican investigators agreed for the samples to be shipped to Oxford where they could be stored in a liquid-nitrogen sample storage archive (funded by the UK Medical Research Council and Cancer Research UK) that had previously been established by the Oxford team, and only on the understanding that control of the samples remained with the Mexican investigators. The shipping of blood samples from Mexico to the United Kingdom was approved by the Mexican Ministry of Health, and the study was approved by scientific and ethics committees within the Mexican National Council of Science and Technology (0595 P-M), the Mexican Ministry of Health and the Central Oxford Research Ethics Committee (C99.260). Although appropriate facilities in Mexico City now exist to store the samples, the Mexican investigators have decided that the costs of sending them back to Mexico exceed the benefits of having closer access to them. Study participants gave signed consent in keeping with accepted ethical practices at the time for observational cohort studies. The baseline consent form stated that their blood samples would be stored and used in the future for unspecified research purposes (with a specific statement that this would include future analysis of genetic factors) and that it would probably be many years before such blood analyses were done. The MCPS consent form also stated that the research was being done in collaboration with the University of Oxford and that the purpose of the study was to benefit future generations of Mexican adults. In 2019, the Mexican and Oxford investigators jointly agreed to allow the extracted DNA to be sent to the Regeneron Genetics Center after they had offered to genotype and exome sequence the entire cohortthereby creating the resource now available for future research by Mexican scientists (see the Data Availability section)in exchange for sharing the other data with them for the purpose of performing joint collaborative genetic analyses. Formal approval to share MCPS data with commercial institutions was sought and obtained from the Medical Ethics Committee of the National Autonomous University of Mexico (FMED/CEI/MHU/001/2020). Major discoveries from the study have been disseminated through open-access scientific publications, local and international scientific meetings, press releases, social media and local television, but direct communication of study results to the original study participants is unfortunately not practical as no information on telephone numbers or email addresses was collected at recruitment. As in other prospective cohort studies (such as the UK Biobank), it was agreed that there would be no feedback of individual blood results to participants, as it has been shown that such feedback can do more harm than good (whereas no feedback ensures that that is not the case).
Recruitment of individuals in the MAIS cohort was done with approval of the leaders of the Indigenous communities and with the support of the National Commission for the Development of Indigenous Communities of Mexico (CDI), now the Instituto Nacional de los Pueblos Indgenas (INPI). All participants provided written informed consent, and authorities or community leaders participated as translators where necessary. The consent form described how findings from the study may have commercial value and be used by for-profit companies. Sample collection for MAIS was approved by the Bioethics and Research Committees of the Insituto Nacional de Medicina Genmica in Mexico City (protocol numbers 31/2011/I and 12/2018/I). Preliminary data from the MAIS cohort have been discussed with the Indigenous leaders and volunteer individuals included in the study, explaining the meaning of the findings on health or populations history, and the potential use of the data in future collaborations.
Further information on research design is available in theNature Portfolio Reporting Summary linked to this article.
Link:
Genotyping, sequencing and analysis of 140,000 adults from Mexico ... - Nature.com
Posted in Human Genetics
Comments Off on Genotyping, sequencing and analysis of 140,000 adults from Mexico … – Nature.com
The role and impact of alternative polyadenylation and miRNA … – Nature.com
Posted: at 6:42 am
Domcke, S. et al. Evaluating cell lines as tumour models by comparison of genomic profiles. Nat. Commun. 4, 2126 (2013).
Article ADS PubMed Google Scholar
Kbel, M. et al. Ovarian carcinoma subtypes are different diseases: Implications for biomarker studies. PLoS Med. 5(12), e232 (2008).
Article PubMed PubMed Central Google Scholar
Soslow, R. A. Histologic subtypes of ovarian carcinoma: An overview. Int. J. Gynecol. Pathol. 27(2), 161172 (2008).
PubMed Google Scholar
Ip, C. K. M. et al. Stemness and chemoresistance in epithelial ovarian carcinoma cells under shear stress. Sci. Rep. 6, 26788 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Galluzzi, L. et al. Systems biology of cisplatin resistance: Past, present and future. Cell Death Dis. 5(5), e1257e1257 (2014).
Article CAS PubMed PubMed Central Google Scholar
Seborova, K. et al. Association of ABC gene profiles with time to progression and resistance in ovarian cancer revealed by bioinformatics analyses. Cancer Med. 8(2), 606616 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sharom, F. J. ABC multidrug transporters: structure, function and role in chemoresistance. Pharmacogenomics 9(1), 105127 (2008).
Article CAS PubMed Google Scholar
Smida, M. & Nijman, S. Functional drug-gene interactions in lung cancer. Expert Rev. Mol. Diagn. 12(3), 291302 (2012).
Article CAS PubMed Google Scholar
Shanker, S. et al. Drug resistance in lung cancer. Lung Cancer: Targets Ther. 1, 2336 (2010).
CAS Google Scholar
Chung, F. S. et al. Disrupting P-glycoprotein function in clinical settings: What can we learn from the fundamental aspects of this transporter?. Am. J. Cancer Res. 6(8), 15831598 (2016).
CAS PubMed PubMed Central Google Scholar
Derti, A. et al. A quantitative atlas of polyadenylation in five mammals. Genome Res. 22(6), 11731183 (2012).
Article CAS PubMed PubMed Central Google Scholar
Shi, Y. Alternative polyadenylation: New insights from global analyses. RNA 18(12), 21052117 (2012).
Article CAS PubMed PubMed Central Google Scholar
Reyes, A. & Huber, W. Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. Nucleic Acids Res. 46(2), 582592 (2017).
Article PubMed Central Google Scholar
Mandel, C., Bai, Y. & Tong, L. Protein factors in pre-mRNA 3-end processing. Cell. Mol. Life Sci. 65(78), 10991122 (2008).
Article CAS PubMed PubMed Central Google Scholar
Danckwardt, S., Hentze, M. W. & Kulozik, A. E. 3 end mRNA processing: Molecular mechanisms and implications for health and disease. EMBO J. 27(3), 482498 (2008).
Article CAS PubMed PubMed Central Google Scholar
Neilson, J. R. & Sandberg, R. Heterogeneity in mammalian RNA 3 end formation. Exp. Cell Res. 316(8), 13571364 (2010).
Article CAS PubMed PubMed Central Google Scholar
Chekulaeva, M., Hentze, M. W. & Ephrussi, A. Bruno acts as a dual repressor of oskar translation, promoting mRNA oligomerization and formation of silencing particles. Cell 124(3), 521533 (2006).
Article CAS PubMed Google Scholar
Berkovits, B. D. & Mayr, C. Alternative 3 UTRs act as scaffolds to regulate membrane protein localization. Nature 522(7556), 363367 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
An, J. J. et al. Distinct role of long 3 UTR BDNF mRNA in spine morphology and synaptic plasticity in hippocampal neurons. Cell 134(1), 175187 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gruber, A. R. et al. Global 3 UTR shortening has a limited effect on protein abundance in proliferating T cells. Nat. Commun. 5, 5465 (2014).
Article ADS CAS PubMed Google Scholar
Magee, P., Shi, L. & Garofalo, M. Role of microRNAs in chemoresistance. Ann. Transl. Med. 3(21), 332 (2015).
PubMed PubMed Central Google Scholar
Tanaka, I. Regulation of alternative splicing and polyadenylation by genotoxic anti-cancer agents. Universit Paris-Saclay (2019).
Cole, S. P. C. Targeting multidrug resistance protein 1 (MRP1, ABCC1): Past, present, and future. Annu. Rev. Pharmacol. Toxicol. 54(1), 95117 (2014).
Article CAS PubMed Google Scholar
Hipfner, D. R. et al. Monoclonal antibodies that inhibit the transport function of the 190-kDa multidrug resistance protein, MRP. Localization of their epitopes to the nucleotide-binding domains of the protein. J. Biol. Chem. 274(22), 1542015426 (1999).
Article CAS PubMed Google Scholar
Emmanouilidi, A. et al. Inhibition of the lysophosphatidylinositol transporter ABCC1 reduces prostate cancer cell growth and sensitizes to chemotherapy. Cancers (Basel) 12(8), 2022 (2020).
Article CAS PubMed Google Scholar
Fanelli, M. et al. Targeting ABCB1 and ABCC1 with their specific inhibitor CBT-1 can overcome drug resistance in osteosarcoma. Curr. Cancer Drug Targets 16(3), 261274 (2016).
Article MathSciNet CAS PubMed Google Scholar
Stefan, S. M. Multi-target ABC transporter modulators: What next and where to go?. Future Med. Chem. 11(18), 23532358 (2019).
Article CAS PubMed Google Scholar
Schumacher, T. et al. ABC transporters B1, C1 and G2 differentially regulate neuroregeneration in mice. PLOS ONE 7(4), e35613 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Galluzzi, L. et al. Molecular mechanisms of cell death: Recommendations of the Nomenclature Committee on Cell Death 2018. Cell Death Differ. 25(3), 486541 (2018).
Article PubMed PubMed Central Google Scholar
Basuli, D. et al. Iron addiction: A novel therapeutic target in ovarian cancer. Oncogene 36(29), 40894099 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, D., Zhang, M. & Chao, H. Significance of glutathione peroxidase 4 and intracellular iron level in ovarian cancer cellsutilization of ferroptosis mechanism. Inflamm. Res. 70(10), 11771189 (2021).
Article CAS PubMed Google Scholar
Chen, L. et al. Erastin sensitizes glioblastoma cells to temozolomide by restraining xCT and cystathionine--lyase function. Oncol. Rep. 33(3), 14651474 (2015).
Article MathSciNet CAS PubMed Google Scholar
Yamaguchi, H. et al. Caspase-independent cell death is involved in the negative effect of EGF receptor inhibitors on cisplatin in non-small cell lung cancer cells. Clin. Cancer Res: Off. J. Am. Assoc. Cancer Res. 19(4), 845854 (2013).
Article CAS Google Scholar
Zhou, H.-H. et al. Erastin reverses ABCB1-mediated docetaxel resistance in ovarian cancer. Front. Oncol. 9, 13981398 (2019).
Article PubMed PubMed Central Google Scholar
Beaufort, C. M. et al. Ovarian cancer cell line panel (OCCP): Clinical importance of in vitro morphological subtypes. PLOS ONE 9(9), e103988 (2014).
Article ADS PubMed PubMed Central Google Scholar
Perets, R. et al. Transformation of the fallopian tube secretory epithelium leads to high-grade serous ovarian cancer in Brca;Tp53;Pten models. Cancer cell 24(6), 751765 (2013).
Article CAS PubMed PubMed Central Google Scholar
Klotz, D. M. & Wimberger, P. Cells of origin of ovarian cancer: Ovarian surface epithelium or fallopian tube?. Arch. Gynecol. Obstet. 296(6), 10551062 (2017).
Article PubMed Google Scholar
Soong, T. R. et al. The fallopian tube, precursor escape and narrowing the knowledge gap to the origins of high-grade serous carcinoma. Gynecol. Oncol. 152(2), 426433 (2019).
Article PubMed Google Scholar
Ducie, J. et al. Molecular analysis of high-grade serous ovarian carcinoma with and without associated serous tubal intra-epithelial carcinoma. Nat. Commun. 8(1), 990 (2017).
Article ADS PubMed PubMed Central Google Scholar
Masamha, C. P. & Todd, Z. Adapting 3 rapid amplification of CDNA ends to map transcripts in cancer. J. Vis. Exp. 133, e57318 (2018).
Google Scholar
Bruhn, O. et al. Alternative polyadenylation of ABC-transporters of the C-family (ABCC1, ABCC2, ABCC3) and implications on post-transcriptional micro-RNA regulation. Mol. Pharmacol. 97, 112122 (2019).
Article PubMed Google Scholar
Kurnit, K. C., Fleming, G. F. & Lengyel, E. Updates and new options in advanced epithelial ovarian cancer treatment. Obstet. Gynecol. 137(1), 108 (2021).
Article CAS PubMed Google Scholar
Gao, B. et al. Paclitaxel sensitivity in relation to ABCB1 expression, efflux and single nucleotide polymorphisms in ovarian cancer. Sci. Rep. 4, 4669 (2014).
Article CAS PubMed PubMed Central Google Scholar
Mayr, C. Regulation by 3-untranslated regions. Annu. Rev. Genetics 51(1), 171194 (2017).
Article CAS Google Scholar
Friedman, R. C. et al. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19(1), 92105 (2009).
Article CAS PubMed PubMed Central Google Scholar
Mayr, C. & Bartel, D. P. Widespread shortening of 3UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138(4), 673684 (2009).
Article CAS PubMed PubMed Central Google Scholar
Masamha, C. P. et al. CFIm25 links alternative polyadenylation to glioblastoma tumour suppression. Nature 510(7505), 412416 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
He, X.-J. et al. Aberrant alternative polyadenylation is responsible for survivin up-regulation in ovarian cancer. Chin. Med. J. 129(10), 11401146 (2016).
Article CAS PubMed PubMed Central Google Scholar
Patch, A.-M. et al. Wholegenome characterization of chemoresistant ovarian cancer. Nature 521(7553), 489494 (2015).
Article ADS CAS PubMed Google Scholar
Chen, M. et al. miR-133b down-regulates ABCC1 and enhances the sensitivity of CRC to anti-tumor drugs. Oncotarget 8(32), 5298352994 (2017).
Article PubMed PubMed Central Google Scholar
Ma, J. et al. Involvement of miR-133a and miR-326 in ADM resistance of HepG2 through modulating expression of ABCC1. J. Drug Target 23(6), 519524 (2015).
More:
The role and impact of alternative polyadenylation and miRNA ... - Nature.com
Posted in Human Genetics
Comments Off on The role and impact of alternative polyadenylation and miRNA … – Nature.com
Human – Simple English Wikipedia, the free encyclopedia
Posted: January 30, 2023 at 2:25 am
A human is a member of the species Homo sapiens, which means 'wise man' in Latin.[3] Carolus Linnaeus put humans in the mammalian order of primates.[1] Humans are a species of hominid, and chimpanzees, bonobos, gorillas and orangutans are their closest living relatives.
Humans are mammals. They are also social animals. They usually live in groups. They help and protect each other. They care for their children. Humans are bipedal, which means they walk on two legs.
Humans have a complex brain, which is much larger than that of the other living apes. They use language, make ideas, and feel emotions. This brain, and the fact that arms are not needed for walking, lets humans use tools. Humans use tools far more than any other species.
Humans first came from Africa. There are humans living on every continent.[4][5] As of 2022, there were over 7900 million people living on Earth.[6] Overpopulation is a problem.
Humans have a long period of development after birth. Their life depends less on instinct than other animals, and more on learning. Humans are also born with their brains not so well developed as those of other mammals. This makes for an unusually long childhood, and so makes family life important. If their brains were better developed at birth, their head would be larger, and this would make birth more difficult. In birth, the baby's head has to get through the 'birth canal', the passageway through the mother's pelvis.
Many animals use signs and sounds to communicate with each other. But humans have language. It lets them express ideas by using words. Humans are capable of making abstract ideas and communicating them to others. Human language can express things which are not present, or talk about events that are not happening at that time.[7] The things might be elsewhere, and the events may also have occurred at another place or time.[8]
No known animals have a system of communication that is as elaborate as human language. By using words to communicate with each other, humans make complex communities with laws, traditions and customs. Humans like to understand the world around them. They try to explain things through myth, science and philosophy. Wanting to understand things has helped humans make important discoveries.
Humans are the only species living today known to build fires, to cook their food and wear clothes. Humans use more technology than any other animal on Earth ever has. Humans like things that are beautiful and like to make art, literature and music. Humans use education and teaching to pass on skills, ideas and customs to the next generations.
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
Humans are part of the animal kingdom. They are mammals, which means that they give birth to their young, and females feed their babies with breast milk. Humans belong to the order of primates. Apes like gorillas, orangutans, chimps, and gibbons are also primates.
The closest living relatives of humans are the two chimpanzee species: the common chimpanzee and the bonobo. Scientists have examined the genes of humans and chimpanzees, and compared their DNA. The studies showed that 95% to 99% of the DNA of humans and chimpanzees is the same.[9][10][11][12][13]
Biologists explain the similarity between humans and other hominoids by their descent from a common ancestor. In 2001, a hominid skull was discovered in Chad. The skull is about 7 million years old, and has been classified as Sahelanthropus tchadensis. This skull may show that the date at which humans started to evolve (develop differently) from other primates is 2 million years earlier than scientists had previously thought.[14]
Humans are part of a subfamily called the Homininae (or hominins), inside the hominids or great apes.
Long ago, there used to be other types of hominins on Earth. They were like modern humans, but not the same. Homo sapiens are the only type of hominins who are alive today.[15] The earliest known fossils of genus Homo have been called Homo habilis (handy man). The first fossils of Homo habilis were found in Tanzania. Homo hablilis is thought to have lived about 2.2 to 1.7 million years ago.[16] Another human species thought to be an ancestor of the modern human is Homo erectus.[17] There are other extinct species of Homo known today. Many of them were likely our 'cousins', as they developed differently than our ancestors.[18] Different species of plants and animals moved from Africa to the Middle East, and then elsewhere. Early humans may have moved from Africa to other parts of the world in the same way.
The first truly modern humans seem to have appeared between 300,000,[19] and 200,000 years ago in East Africa.[20][21][22] In paleontology, 200,000 years is a "short" time. So, scientists speak of a "recent single origin" of humans. Some of these early humans later moved out from Africa. By about 90,000 years ago they had moved into Eurasia. This was the area where Neanderthals, Homo neanderthalensis, had been living for a long time (at least 350,000 years).
By about 42 to 44,000 years ago Homo sapiens had reached western Europe, including Britain.[23] In Europe and western Asia, Homo sapiens replaced the neanderthals by about 35,000 years ago. The details of this event are not known.
At roughly the same time Homo sapiens arrived in Australia. Their arrival in the Americas was much later, about 15,000 years ago.[24] All these earlier groups of modern man were hunter-gatherers.
Early human history is commonly divided into three ages. The time periods are labeled with the material used for tools.
The "Stone Age" is commonly subdivided into the Paleolithic, Mesolithic, and Neolithic periods.
Up to about 10 thousand years ago most humans were hunter-gatherers. They did not live in one place, but moved around as the seasons changed. The start of planting crops for food, called farming made the Neolithic revolution. Some people chose to live in settlements. This also led to the invention of metal tools and the training of animals. About 6000 years ago the first proper civilizations began in places like Egypt, India, and Syria. The people formed governments and armies for protection. They competed for area to live and resources and sometimes they fought with each other. About 4000 years ago some states took over or conquered other states and made empires. Examples include ancient Greece and the Roman Empire.
Some modern day religions also began at this time such as Judaism and Hinduism. From the Middle Ages and beyond humanity saw an explosion of new technology and inventions. The printing press, the car, the train, and electricity are all examples of this kind of invention. As a result of the developments in technology, modern humans live in a world where everyone is connected, for example by telephone or by internet. People now control and change the environment around them in many different ways.
In early times, humans usually settled near to water and other natural resources. In modern times if people need things they can transport them from somewhere else. So basing a settlement close to resources is no longer as important as it once was. Since 1800, the number of humans, or population, has increased by six billion.[25] Most humans (61%) live in Asia. The rest live in the Americas (14%), Africa (14%), Europe (11%), and Oceania (0.5%).
Most people live in towns and cities. This number is expected to get higher. In 2005 the United Nations said that by the end of that year, over half the world would be living in cities. This is an important change in human settlement patterns: a century earlier in 1900 only 14% of people lived in cities, in 2000 47% of the world's population lived in cities. In developed countries, like the United States, 80% of the population live in cities.[26]
Humans have a large effect on the world. Humans are at the top of the food chain and are generally not eaten by any animals. Humans have been described as super predators because of this.[27] Because of industry and other reasons humans are said to be a big cause of global climate change.[28]
Human body measurements differ. The worldwide average height for an adult human male is about 172cm (5ft 7+12in), and the worldwide average height for adult human females is about 158cm (5ft 2in). The average weight of an adult human is 5464kg (119141lb) for females and 7083kg (154183lb) for males.[29][30] Body weight and body type is influenced by genetics and environment. It varies greatly among individuals.
Human hair grows on the underarms, the genitals, legs, arms, and on the top of the head in adults of both genders. Hair will usually grow on the face of most adult males, and on the chest and back of many adult males. In human children of both genders, long hair grows only on the top of the head. Although it might look like humans have fewer hairs than most primates, they actually do not. The average human has more hair follicles, where hair grows from, than most chimpanzees have.[31] Human hair can be black, brown, red or blond.[32] When humans get older hair can turn grey or white.
Human skin colors vary greatly. They can be a very pale pink all the way to dark brown. There is a reason why people in tropical areas have dark skins. The dark pigment (melanin) in the skin protects them against ultraviolet rays in sunlight. The damage caused by UV rays can and does cause skin cancer in some people. Therefore, in more sunny areas, natural selection favors darker skin color.[33][34] Sun tanning has nothing to do with this issue, because it is just a temporary process which is not inherited. In colder climates the advantage of light-coloured skin is two-fold. It radiates less heat, and it absorbs more sunlight. In weaker sunlight a darker body produces less vitamin D than a lighter body. The selection for lighter skin is driven by these two reasons. Therefore, in less sunny areas, natural selection favours lighter skin colour.[35][36][37]
Humans are not as strong as other primates of the same size. An average female orangutan is at least three times as strong as an average human.[38]
The average human male needs 7 to 8 hours sleep a day. People who sleep less than this are generally not as healthy. A child needs more sleep, 9 to 10 hours on average.
The human life cycle is similar in some ways to most other mammals. However, there are some differences. The young grow inside the female mother for nine months. After this time the baby is pushed out of the woman's vagina, with its brain only half developed.
Unlike most other mammals, human childbirth is somewhat dangerous. Babies' heads are large, and the mothers pelvis bones are not very wide. Since people walk on two legs, their hips are fairly narrow. This means that birth can be difficult. Rarely, mother or baby may die in childbirth.[39] The number of mothers dying in childbirth is less in the 21st century. This is because of better medication and treatment. In many poor countries the number of mothers dying is higher. Sometimes it is up to 10 times as many as richer countries.[40]
In the human female, her fertile period in the oestrous cycle is hidden, and mating can take place at any time. That is quite unusual. In mammals generally the fertile period is very noticeable. Mating only takes place when the female signals her fertility. Think about cats, for example. The human cycle is unusual, and it is thought that there is a reason. Humans band together in tribes which have many people. It helps the tribe if the father of a child is not known for certainty. Men live together and work together in much larger groups than do chimpanzees (our nearest living relatives). They have a collective interest in the tribe. It is thought that the human mating system helps this.[41][42]
The average human baby weighs 34 kg at birth and is 5060 cm tall. Babies are often smaller in poorer countries,[43] and may die early because of this.[44]
Humans have four stages in their lives: childhood, adolescence, adulthood and old age.
Life expectancy is how long you are expected to live. This depends on many things including where you live. The highest life expectancy is for people from Monaco, 89.52 years. The lowest is for people from Chad where life expectancy is only 49.81 years.[45]
Psychology is the study of how the human mind works. The human brain is the main controller of what a person does. Everything from moving and breathing to thinking is done by the brain. The human neocortex is huge compared with other mammals, and gives us our thinking ability, and the ability to speak and understand language.
Neurology is the study of how the brain works, psychology is the study of how and why people think and feel. Many aspects of life are also influenced by the hormone system, including growth and sexual development. The hormonal system (especially the pituitary gland) is partly controlled by the brain.
Human behaviour is hard to understand, so sometimes psychologists study animals because they may be simpler and easier to know. Psychology overlaps with many other sciences including medicine, biology, computer science and linguistics.
Language at its most basic is talking, reading and writing. The study of language is called linguistics. Humans have the most complicated languages on Earth. Although almost all animals communicate, human language is unique. Its use of syntax, and its huge learnt vocabulary are its main features.[8][46] There are over 7,300 languages spoken around the world. The world's most spoken first language is Mandarin Chinese, and the most spoken language is English.[47] This includes speakers of English as a second language.
Art has existed almost as long as humans. People have been doing some types of art for thousands of years as the picture on the right shows. Art represents how someone feels in the form of a painting, a sculpture or a photograph.
Music has also been around for thousands of years. Music can be made with only your voice but most of the time people use instruments. Music can be made using simple instruments only such as simple drums all the way up to electric guitars, keyboards and violins. Music can be loud, fast, quiet, slow or many different styles. Music represents how the people who are playing the music feel.
Literature is anything made or written using language. This includes books, poetry, legends, myths and fairy tales. Literature is important as without it many of the things we use today, such as Wikipedia, would not exist.
Humans often categorize themselves by race or ethnicity. Modern biologists know that human gene sequences are very similar compared to many other animals.[48][49][50] This is because of the "recent single origin" of modern humans.[22] That is one reason why there is only one human race.[51][52]:360
Ethnic groups are often linked by linguistic, cultural, ancestral, and national or regional ties. Race and ethnicity can lead to different social treatment called racism.
Religion is a belief of faith in a higher being, spirit, or any system of ideas that a group of people believe in. To have faith in a belief is to have the belief without proof that it is true. Faith can bring people together because they all believe in the same thing. Some of the things religions talk about are what happens after death, why humans exist, how humans came to exist (creation), and what is good to do and not to do (morality). Some people are very religious. Many people believe in one all-powerful god; some people believe in more than one god; some people are atheists, who do not believe in a god; and some people are agnostics, who are not sure if there is a god.
Technology are the things and methods which humans use to make tasks easier. Science is understanding how the universe and the things in it work. Technology used to be quite simple. It was passed on by people telling others, until writing was invented. This allowed technology to develop much quicker. Now people understand more and more about the world and the universe. The use of the telescope by Galileo, Einstein's theory of relativity, lasers, and computing are all scientific discoveries. Technology is of great importance to science, to medicine, and to everyday life.
A war is a lethal fight between large groups of people, usually countries or states. A war involves the use of lethal weapons as both sides try to kill the other. It is estimated that during the 20th century, between 167 and 188 million humans died because of war.[53] The people who fight for a state in wars are called soldiers. The people who fight in wars, but not for a state, are usually called "fighters".
Modern wars are very different from wars a thousand or even a hundred years ago. Modern war involves sabotage, terrorism, propaganda, and guerrilla warfare. In modern-day wars, civilians (people who are not soldiers) are often targets. An example of this is the nuclear bomb dropped on Hiroshima and Nagasaki at the end of World War II. The bombs killed as many as 140,000 people in Hiroshima and 80,000 in Nagasaki by the end of 1945,[54] about half on the days of the bombings. Since then, thousands more have died from wounds or illness because of exposure to radiation released by the bombs.[55] In both cities, the overwhelming majority of the dead were civilians. In Germany, Austria, and Great Britain, conventional bombs were used. About 60,595 British,[56] and 550,000 German,[57] civilians were killed by planes bombing cities.
Read more:
Human - Simple English Wikipedia, the free encyclopedia
Posted in Human Genetics
Comments Off on Human – Simple English Wikipedia, the free encyclopedia
Deep Dive Ties Together Dog Genetics, Brain Physiology and Behavior to Explain Why Collies Are Different from Terriers – Scientific American
Posted: December 12, 2022 at 4:27 am
Deep Dive Ties Together Dog Genetics, Brain Physiology and Behavior to Explain Why Collies Are Different from Terriers  Scientific American
Posted in Human Genetics
Comments Off on Deep Dive Ties Together Dog Genetics, Brain Physiology and Behavior to Explain Why Collies Are Different from Terriers – Scientific American
How oxytocin drives connections of newly integrated adult-born neurons: Research – Hindustan Times
Posted: at 4:27 am
How oxytocin drives connections of newly integrated adult-born neurons: Research  Hindustan Times
Read this article:
How oxytocin drives connections of newly integrated adult-born neurons: Research - Hindustan Times
Posted in Human Genetics
Comments Off on How oxytocin drives connections of newly integrated adult-born neurons: Research – Hindustan Times
Alzheimer’s Disease Genetics Fact Sheet – National Institute on Aging
Posted: December 2, 2022 at 3:52 am
Many people wonder if Alzheimers disease runs in the family. A persons chance of having the disease may be higher if he or she has certain genes passed down from a parent. However, having a parent with Alzheimers does not always mean that someone will develop it.
Each human cell contains the instructions a cell needs to do its job. These instructions are made up of DNA (deoxyribonucleic acid), which is packed tightly into structures called chromosomes. Each chromosome has thousands of segments called genes.
Genes are passed down from a person's biological parents. They carry information that defines traits such as eye color and height. Genes also play a role in keeping the body's cells healthy.
Problems with geneseven small changes to a genecan cause diseases like Alzheimer's.
Genetic mutations (permanent change in one or more specific genes) can cause diseases. If a person inherits a genetic mutation that causes a certain disease, then he or she will usually get the disease. Sickle cell anemia, cystic fibrosis, and some cases of early-onset Alzheimer's disease are examples of inherited genetic disorders.
Other changes or differences in genes, called genetic variants, may increase or decrease a person's risk of developing a particular disease. When a genetic variant increases disease risk but does not directly cause a disease, it is called a genetic risk factor.
Identifying genetic variants may help researchers find the most effective ways to treat or prevent diseases such as Alzheimer's in an individual. This approach, called precision medicine, takes into account individual variability in genes, environment, and lifestyle for each person.
The expression of geneswhen they are switched on or offcan be affected, positively and negatively, by environmental and lifestyle factors, such as exercise, diet, chemicals, or smoking. The field of epigenetics is studying how such factors can alter a cell's DNA in ways that affect gene activity.
There are two types of Alzheimer'searly-onset and late-onset. Both types have a genetic component.
Most people with Alzheimer's have the late-onset form of the disease, in which symptoms become apparent in their mid-60s and later.
Researchers have not found a specific gene that directly causes late-onset Alzheimer's disease. However, having a genetic variant of the apolipoprotein E (APOE) gene on chromosome 19 does increase a person's risk. The APOE gene is involved in making a protein that helps carry cholesterol and other types of fat in the bloodstream.
APOE comes in several different forms, or alleles. Each person inherits two APOE alleles, one from each biological parent.
APOE 4 is called a risk-factor gene because it increases a person's risk of developing the disease. However, inheriting an APOE 4 allele does not mean that a person will definitely develop Alzheimer's. Some people with an APOE 4 allele never get the disease, and others who develop Alzheimer's do not have any APOE 4 alleles.
Recent research indicates that rare forms of the APOE allele may provide protection against Alzheimers disease. More studies are needed to determine how these variations might delay disease onset or lower a persons risk.
Early-onset Alzheimers disease is rare, representing less than 10 percent of all people with Alzheimers. It typically occurs between a persons 30s and mid-60s. Some cases are caused by an inherited change in one of three genes.
The three single-gene mutations associated with early-onset Alzheimers disease are:
Mutations in these genes result in the production of abnormal proteins that are associated with the disease. Each of these mutations plays a role in the breakdown of APP, a protein whose precise function is not yet fully understood. This breakdown is part of a process that generates harmful forms of amyloid plaques, a hallmark of Alzheimers disease.
A child whose biological mother or father carries a genetic mutation for one of these three genes has a 50/50 chance of inheriting that mutation. If the mutation is in fact inherited, the child has a very strong probability of developing early-onset Alzheimers disease.
For other cases of early-onset Alzheimers, research has shown that other genetic components are involved. Studies are ongoing to identify additional genetic risk variants.
Having Down syndrome increases the risk of developing early-onset Alzheimers disease. Many people with Down syndrome develop Alzheimers as they get older, with symptoms appearing in their 50s or 60s. Researchers believe this is because people with Down syndrome are born with an extra copy of chromosome 21, which carries the APP gene.
For more information, see NIA's Early-Onset Alzheimer's Disease: A Resource List.
A blood test can identify which APOE alleles a person has, but results cannot predict who will or will not develop Alzheimer's disease. Currently, APOE testing is used primarily in research settings to identify study participants who may have an increased risk of developing Alzheimer's. This knowledge helps scientists look for early brain changes in participants and compare the effectiveness of possible treatments for people with different APOE profiles.
Genetic testing is also used by physicians to help diagnose early-onset Alzheimers disease and to test people with a strong family history of Alzheimers or a related brain disease.
Genetic testing for APOE or other genetic variants cannot determine an individuals likelihood of developing Alzheimers diseasejust which risk factor genes a person has. It is unlikely that genetic testing will ever be able to predict the disease with 100 percent accuracy, researchers believe, because too many other factors may influence its development and progression.
Some people learn their APOE status through consumer genetic testing or think about getting this kind of test. They may wish to consult a doctor or genetic counselor to better understand this type of test and their test results. General information about genetic testing can be found at:
Discovering all that we can about the role of Alzheimer's disease genetic risk and protective factors is an important area of research. NIA supports several major genetics research programs. Understanding more about the genetic basis of the disease will help researchers to:
NIA Alzheimers and related Dementias Education and Referral (ADEAR) Center800-438-4380adear@nia.nih.govwww.nia.nih.gov/alzheimersThe NIA ADEAR Center offers information and free print publications about Alzheimers and related dementias for families, caregivers, and health professionals. ADEAR Center staff answer telephone, email, and written requests and make referrals to local and national resources.
This content is provided by the NIH National Institute on Aging (NIA). NIA scientists and other experts review this content to ensure it is accurate and up to date.
Content reviewed: December 24, 2019
See the original post here:
Alzheimer's Disease Genetics Fact Sheet - National Institute on Aging
Posted in Human Genetics
Comments Off on Alzheimer’s Disease Genetics Fact Sheet – National Institute on Aging
Human genetic clustering – Wikipedia
Posted: November 23, 2022 at 4:44 am
Human genetic clustering refers to patterns of relative genetic similarity among human individuals and populations, as well as the wide range of scientific and statistical methods used to study this aspect of human genetic variation.
Clustering studies are thought to be valuable for characterizing the general structure of genetic variation among human populations, to contribute to the study of ancestral origins, evolutionary history, and precision medicine. Since the mapping of the human genome, and with the availability of increasingly powerful analytic tools, cluster analyses have revealed a range of ancestral and migratory trends among human populations and individuals.[1] Human genetic clusters tend to be organized by geographic ancestry, with divisions between clusters aligning largely with geographic barriers such as oceans or mountain ranges.[2][3] Clustering studies have been applied to global populations,[4] as well as to population subsets like post-colonial North America.[5][6] Notably, the practice of defining clusters among modern human populations is largely arbitrary and variable due to the continuous nature of human genotypes; although individual genetic markers can be used to produce smaller groups, there are no models that produce completely distinct subgroups when larger numbers of genetic markers are used.[2][7][8]
Many studies of human genetic clustering have been implicated in discussions of race, ethnicity, and scientific racism, as some have controversially suggested that genetically derived clusters may be understood as proof of genetically determined races.[9][10] Although cluster analyses invariably organize humans (or groups of humans) into subgroups, debate is ongoing on how to interpret these genetic clusters with respect to race and its social and phenotypic features. And, because there is such a small fraction of genetic variation between human genotypes overall, genetic clustering approaches are highly dependent on the sampled data, genetic markers, and statistical methods applied to their construction.
A wide range of methods have been developed to assess the structure of human populations with the use of genetic data. Early studies of within and between-group genetic variation used physical phenotypes and blood groups, with modern genetic studies using genetic markers such as Alu sequences, short tandem repeat polymorphisms, and single nucleotide polymorphisms (SNPs), among others.[11] Models for genetic clustering also vary by algorithms and programs used to process the data. Most sophisticated methods for determining clusters can be categorized as model-based clustering methods (such as the algorithm STRUCTURE[12]) or multidimensional summaries (typically through principal component analysis).[1][13] By processing a large number of SNPs (or other genetic marker data) in different ways, both approaches to genetic clustering tend to converge on similar patterns by identifying similarities among SNPs and/or haplotype tracts to reveal ancestral genetic similarities.[13]
Common model-based clustering algorithms include STRUCTURE, ADMIXTURE, and HAPMIX. These algorithms operate by finding the best fit for genetic data among an arbitrary or mathematically derived number of clusters, such that differences within clusters are minimized and differences between clusters are maximized. This clustering method is also referred to as "admixture inference," as individual genomes (or individuals within populations) can be characterized by the proportions of alleles linked to each cluster.[1] In other words, algorithms like STRUCTURE generate results that assume the existence of discrete ancestral populations, operationalized through unique genetic markers, which have combined over time to form the admixed populations of the modern day.
Where model-based clustering characterizes populations using proportions of presupposed ancestral clusters, multidimensional summary statistics characterize populations on a continuous spectrum. The most common multidimensional statistical method used for genetic clustering is principal component analysis (PCA), which plots individuals by two or more axes (their "principal components") that represent aggregations of genetic markers that account for the highest variance. Clusters can then be identified by visually assessing the distribution of data; with larger samples of human genotypes, data tends to cluster in distinct groups as well as admixed positions between groups.[1][13]
There are caveats and limitations to genetic clustering methods of any type, given the degree of admixture and relative similarity within the human population. All genetic cluster findings are biased by the sampling process used to gather data, and by the quality and quantity of that data. For example, many clustering studies use data derived from populations that are geographically distinct and far apart from one another, which may present an illusion of discrete clusters where, in reality, populations are much more blended with one another when intermediary groups are included.[1] Sample size also plays an important moderating role on cluster findings, as different sample size inputs can influence cluster assignment, and more subtle relationships between genotypes may only emerge with larger sample sizes.[1][8] In particular, the use of STRUCTURE has been widely criticized as being potentially misleading through requiring data to be sorted into a predetermined number of clusters which may or may not reflect the actual population's distribution.[8][14] The creators of STRUCTURE originally described the algorithm as an "exploratory" method to be interpreted with caution and not as a test with statistically significant power.[12][15]
Modern applications of genetic clustering methods to global-scale genetic data were first marked by studies associated with the Human Genome Diversity Project (HGDP) data.[1] These early HGDP studies, such as those by Rosenberg et al. (2002),[4][16] contributed to theories of the serial founder effect and early human migration out of Africa, and clustering methods have been notably applied to describe admixed continental populations.[5][6][17] Genetic clustering and HGDP studies have also contributed to methods for, and criticisms of, the genetic ancestry consumer testing industry.[18]
A number of landmark genetic cluster studies have been conducted on global human populations since 2002, including the following:
Clusters of individuals are often geographically structured. For example, when clustering a population of East Asians and Europeans, each group will likely form its own respective cluster based on similar allele frequencies. In this way, clusters can have a correlation with traditional concepts of race and self-identified ancestry; in some cases, such as medical questionnaires, the latter variables can be used as a proxy for genetic ancestry where genetic data is unavailable.[9][4] However, genetic variation is distributed in a complex, continuous, and overlapping manner, so this correlation is imperfect and the use of racial categories in medicine can introduce additional hazards.[9]
Some scholars[who?] have challenged the idea that race can be inferred by genetic clusters, drawing distinctions between arbitrarily assigned genetic clusters, ancestry, and race. One recurring caution against thinking of human populations in terms of clusters is the notion that genotypic variation and traits are distributed evenly between populations, along gradual clines rather than along discrete population boundaries; so although genetic similarities are usually organized geographically, their underlying populations have never been completely separated from one another. Due to migration, gene flow, and baseline homogeneity, features between groups are extensively overlapping and intermixed.[2][9] Moreover, genetic clusters do not typically match socially defined racial groups; many commonly understood races may not be sorted into the same genetic cluster, and many genetic clusters are made up of individuals who would have distinct racial identities.[7] In general, clusters may most simply be understood as products of the methods used to sample and analyze genetic data; not without meaning for understanding ancestry and genetic characteristics, but inadequate to fully explaining the concept of race, which is more often described in terms of social and cultural forces.
In the related context of personalized medicine, race is currently listed as a risk factor for a wide range of medical conditions with genetic and non-genetic causes. Questions have emerged regarding whether or not genetic clusters support the idea of race as a valid construct to apply to medical research and treatment of disease, because there are many diseases that correspond with specific genetic markers and/or with specific populations, as seen with Tay-Sachs disease or sickle cell disease.[3][25] Researchers are careful to emphasize that ancestryrevealed in part through cluster analysesplays an important role in understanding risk of disease. But racial or ethnic identity does not perfectly align with genetic ancestry, and so race and ethnicity do not reveal enough information to make a medical diagnosis.[25] Race as a variable in medicine is more likely to reflect social factors, where ancestry information is more likely to be meaningful when considering genetic ancestry.[2][25]
Originally posted here:
Human genetic clustering - Wikipedia
Posted in Human Genetics
Comments Off on Human genetic clustering – Wikipedia
Human Genome Project Fact Sheet
Posted: at 4:44 am
A special committee of the U.S. National Academy of Sciences outlined the original goals for the Human Genome Project in 1988, which included sequencing the entire human genome in addition to the genomes of several carefully selected non-human organisms.
Eventually the list of organisms came to include the bacterium E. coli, bakers yeast, fruit fly, nematode and mouse. The projects architects and participants hoped the resulting information would usher in a new era for biomedical research, and its goals and related strategic plans were updated periodically throughout the project.
In part due to a deliberate focus on technology development, the Human Genome Project ultimately exceeded its initial set of goals, doing so by 2003, two years ahead of its originally projected 2005 completion. Many of the projects achievements were beyond what scientists thought possible in 1988.
President Bill Clinton and Francis Collins, M.D., Ph.D., (NHGRI Director) at a June 2000 event at the White House celebrating the draft human genome sequence generated by the Human Genome Project. Dr. Collins served as the de facto leader of the International Human Genome Sequencing Consortium, the group that sequenced the human genome during the Human Genome Project. (NHGRI Photo Archive)
Read more:
Human Genome Project Fact Sheet
Posted in Human Genetics
Comments Off on Human Genome Project Fact Sheet
Abstracts | International Congress of Human Genetics 2023
Posted: at 4:44 am
The International Scientific Programme Committee (ISPC) invites you to contribute to the programme by submitting an abstract for possible inclusion in the 14th International Congress of Human Genetics (ICHG) 2023 programme.
Please follow the abstract guidelines below, to help you through the submission process.
Abstract GuidelinesUse Arial Font 11, single line spacing and full justification for the borders.
Abstract Title:A brief title that clearly indicates the content of the contribution (maximum of 30 words).
Please avoid abbreviations in the abstract title. Abbreviations may be used if they refer to gene names using the standardised nomenclature, and in the body of the abstract if defined when first used. Do not use capitals or capitalise words that are not nouns.
Example:Title: This is an important African contribution to the field in terms of the APOL1 gene
Abstract Content:Please ensure that your abstract summarises your entire contribution in one paragraph (maximum of 300 words). Do not use section headings, but ensure that the content is structured.
Diagrams, illustrations, tables, references and graphics are NOT permitted.
Qualities of a good abstract embodies the following structure:
Check grammar and spelling, sentence construction and punctuation before submission. Ensure that abbreviations are defined when used for the first time and then use the abbreviation in the rest of the abstract. Only use abbreviations if the term is used two or more times. Ask another person to carefully proofread and check your abstract for flow and content, as well as the details above.
Note: Abstracts not in the correct format will be returned to the submitting author.
Keywords:Please indicate at least 3 keywords for your abstract. These terms will be used to help people locate your abstract using the Conference App.
Go here to see the original:
Abstracts | International Congress of Human Genetics 2023
Posted in Human Genetics
Comments Off on Abstracts | International Congress of Human Genetics 2023
Ancient DNA and Neanderthals | The Smithsonian Institution’s Human …
Posted: November 16, 2022 at 11:29 pm
DNA (deoxyribonucleic acid) is arguably one of the most useful tools that scientists can use to understand living organisms. Our genetic code can tell us a lot about who we are, where come from, and even what diseases we may be predisposed to contracting and acquiring. When studying evolution, DNA is especially important in its application to identifying and separating organisms into species. However, DNA is a fragile molecule, and it degrades over time. For most fossil species, there is essentially no hope of ever acquiring DNA from their fossils, so answers to questions about their appearance, physiology, population structure, and more may never be fully answerable. For more recently extinct species scientists have, and continue to, extract ancient DNA (aDNA) which they use to reconstruct the genome of long-gone ancestors and relatives. One such species is Neanderthals, Homo neanderthalensis.
Neanderthals were the first species of fossil hominins discovered and have secured their place in our collective imagination ever since. The first Neanderthal fossils were found in Engis, Belgium in 1829, but not identified as belonging to Neanderthals until almost 100 years later. The first fossils to be called Neanderthals were found in 1856 in Germany, at a site in the Neander Valley (where Neanderthals get their name from). Neanderthals diverged from modern humans around 500,000 years ago, likely evolving outside of Africa. Most ancestors of Homo sapiens remained in Africa until around 100,000 years ago when modern humans began migrating outwards. In that time, Neanderthals evolved many unique adaptations that helped them survive in cold environments of Europe and Asia. Their short limbs and torso help conserved heat, and their wide noses helped warm and humidify air as they breathed it in. Despite these differences, modern humans and Neanderthals are very closely related and looked similar. We even overlapped with each other-living in the same place at roughly the same time in both the Middle East and Europe. If this is the case, why did Neanderthals go extinct while we survived? We can use DNA to help to answer this question and others, including:
Scientists answer these questions by comparing genomes as whole, as well as specific genes, between humans and Neanderthals. Before getting into the specifics of Neanderthal DNA, it is important to appreciate the structure of DNA itself, why it is so important, and why aDNA can be so difficult to work with.
Fast Facts
You may recognize the basic structure of DNA: two strands arranged in a double-helix pattern with individual bases forming rungs, like a twisting ladder. These bases are adenine (A), thymine (T), guanine (G), and cytosine (C). They form complementary pairs on opposite ends of each ladder rung: adenine across from thymine and cytosine across from guanine. For example, if one side of the twisting ladder reads AATG, the opposing side will read TTAC. It is the sequence of these individual base pairs that makes up our genetic code, or our genome. Errors can occur when DNA is unwound to be replicated with one or more bases being deleted, substituted for others, or newly added. Such errors are called mutations and range from being essentially harmless to deadly.
The main function of DNA is to control the production, timing, and quantity of proteins produced by each cell. This process is called protein synthesis and comes in two main stages: transcription and translation. When the cell needs to produce a protein, an enzyme called RNA polymerase unzips the DNA double-helix and aids in pairing RNA (ribonucleic acid, a molecule related to DNA) bases to the complementary DNA sequence. This first step is called transcription, the product of which is a single-sided strand of RNA that exits the cell. This messenger RNA, or mRNA, goes into the cells cytoplasm to locate an organelle called a ribosome where the genetic information in the mRNA can be translated into a protein. The process of translation involves another kind of RNA, transfer RNA or tRNA, binding to the base sequences on the mRNA. tRNA is carrying amino acids, molecules that will make up the final protein, binding in sequence to create an amino acid chain. This amino acid chain will then twist and fold into the final protein.
Base pairs are arranged in groups of three, or codons, on the mRNA and tRNA, Each codon codes for a single amino acid. Each individual amino acid can be coded for by more than one codon. For example, both AAA and AAG code for the same amino acid lysine. Therefore, a mutation changing the last A to a G will be functionally meaningless. This is known as a silent, or synonymous change. If that last A in the codon mutated to a C, however, the codon AAC codes for asparagine, a different amino acid. This new amino acid could lead to the formation of a completely new protein or make the amino acid chain unable to form a protein at all. This is known as a nonsynonymous change. Nonsynonymous changes are the basis for diversity within a gene pool on which evolution acts.
The total DNA sequence is made up of base pairs, but not all sequences of base pairs serve the same function. Not all parts of the DNA sequence directly code for protein. Base pair sequences within DNA can be split into exons, sequences that directly code for proteins, and introns, sequences that do not directly code for a specific protein. The exon portion of our genome is collectively called the exome, and accounts for only about 1% of our total DNA. Exons and introns together form genes, sequences that code for a protein. On average there are 8.8 exons and 7.8 introns in each gene. The noncoding, or intron, parts of DNA used to be called junk DNA, random or repeating sequences that did not seem to code for anything. Recent research has shown that the majority of the genome does serve a function even if not coding for protein synthesis. These intron sequences can help regulate when genes are turned on or off, control how DNA winds itself to form chromosomes, be remnant clues of an organisms evolutionary history, or serve other noncoding functions.
Most of our total genome is made of up nuclear DNA, or the genetic material located in the nucleus of a cell. This DNA forms chromosomes, X-shaped bundles of DNA, that separate during cell division. Homo sapiens have 23 pairs of chromosomes. Nuclear DNA is directly inherited from both parents with 50% each coming from an organisms biological male and female parents. Therefore, both parents lineages are represented by nuclear DNA with one exception. One of those pairs of chromosomes are called sex chromosomes. Everyone gets some combination of X (male female parent) and Y (male parent only) chromosomes that determine an organisms biological sex. These combinations can come in a variety of possible alternatives outside of XX and XY including XXY, X, and others. Because the Y chromosome is only inherited from a biological male parent, the sequence of the Y chromosome can be used to trace patrilineal ancestry.
DNA is also found in the mitochondria, an organelle colloquially referred to as the powerhouse of the cell. This mitochondrial DNA or mtDNA is much smaller than the nuclear genome, only composing about 37 genes. mtDNA is only inherited from an organisms biological female parent and can be used to trace matrilineal ancestry. Because both Y-chromosome DNA and mtDNA are smaller and inherited form only one parent, thus less subject to mutations and changes, they are more useful in tracing lineages through deep time. However, they pale in comparison to the entire nuclear genome in terms of size and available base sequences to analyze.
Fast Facts
Recall that DNA is made up of base pair sequences that are chemically bonded to the sides of the double-helix structure forming a sort of twisting ladder. As an organic molecule, the component parts of that twisting ladder are subject to degradation over time. Without the functioning cells of a living organism to fix these issues and make new DNA, DNA can degrade into meaningless components somewhat rapidly. While DNA is abundant and readily extracted in living organisms (you can even do your own at-home experiment to extract DNA! https://learn.genetics.utah.edu/content/labs/extraction/howto/) finding useable DNA in extinct organisms gets harder and harder the further back in time that organism died.
The record for the oldest DNA extracted used to go to an ancient horse, dating to around 500,000-700,000 years old (Miller and Lambert 2013). However, in 2021 this was blown out of the water with the announcement of mammoth DNA extracted from specimens over 1 million years old found in eastern Siberian permafrost, permanently frozen ground (van der Valk et al., 2021). These cases of extreme DNA preservation are rare and share a few important factors in common: the specimens are found in very cold, very dry environments, typically buried in permafrost or frozen in caves. The oldest hominin DNA recovered comes from a Neanderthal around 400,000 years old (Meyer et al. 2016), near the beginnings of the Neanderthal species. Finding older DNA in other hominins is unlikely as for most of our evolutionary history hominins lived in the warm, sometimes wet, tropics and subtropics of Africa and Asia where DNA does not preserve well.
When scientists are lucky enough to find a specimen that may preserve aDNA, they must take the utmost care to extract it in such a way as to preserve it and prevent contamination. Just because aDNA is preserved does not mean it is preserved perfectly; it still decomposes and degrades over time, just at a slower rate in cool, dry environments. Because of this, there is always going to be much less DNA from the old organism than there is in even the loose hair and skin cells from the scientists excavating it. Because of this, there are stringent guidelines in place for managing aDNA extraction in the field that scientists must follow (Gilbert et al., 2005 for example). In hominins, this is even more important since human and Neanderthal DNA are so similar that most sequences will be indistinguishable from each other.
Challenges in Sequencing aDNA
When aDNA does preserve, is often highly fragmented, degraded, and has undergone substantial changes from how the DNA appeared in a living organism. In order to sequence the DNA, or read the base pair coding, these damages and changes have to be taken into account and fixed wherever possible. aDNA comes in tattered, fragmented strands that are difficult to read and analyze. One way scientists deal with this is to amplify the aDNA that is preserved so that it is more readily accessible via a process known as polymerase chain reaction (PCR). PCR essentially forces the DNA to self-replicate exponentially so that there are many more copies of the same sequence to compare. Due to the exponential duplication, it is especially important for there to be no contamination of modern DNA in the sample. The amplified sequences can then be compared and aligned to create longer sequences, up to and including entire genes and genomes.
The component parts of DNA also degrade over time. One example is deamination, when cytosine bases degrade into a thymine molecule and guanine bases degrade into an adenine. This could potentially lead to misidentification of sequences, but scientists have developed chemical methods to reverse these changes. Comparison between closely related genomes, such as humans and Neanderthals, can also identify where deamination may have occurred in sequences that do not vary between the two species. Deamination can actually be useful because it is an excellent indicator that the sample you are looking at is genuine aDNA and not DNA from a contaminated source.
aDNA extraction and sequencing is inherently destructive and requires destroying at least part of the fossil sample you are attempting to extract DNA from. That is something that paleoanthropologists want to avoid whenever possible! To rationalize destroying a fossil to extract aDNA, it is common practice to first test this technique other non-hominin fossils from the same site to first confirm that DNA is accessible and in reasonable quantities/qualities. Testing aDNA form other sources, such as a cave bear at a Neanderthal site, can also identify any potential sources of contamination more easily since cave bears and humans are more distantly related.
Fast Facts:
DNA preserves best in cold, dry environments
aDNA must be destructively sampled, amplified, and analyzed prior to looking at the sequence
Neanderthal skull La Ferrassie 1 from La Ferrassie, France
The first analysis of any Neanderthal DNA was mitochondrial DNA (mtDNA), published in 1997. The sample was taken from the first Neanderthal fossil discovered, found in Feldhofer Cave in the Neander Valley in Germany. A small sample of bone was ground up to extract mtDNA, which was then replicated and analyzed.
Researchers compared the Neanderthal mtDNA to modern human and chimpanzee mtDNA sequences and found that the Neanderthal mtDNA sequences were substantially different from both (Krings et al. 1997, 1999). Most human sequences differ from each other by an average of 8.0 substitutions, while the human and chimpanzee sequences differ by about 55.0 substitutions. The Neanderthal and modern human sequences differed by approximately 27.2 substitutions. Using this mtDNA information, the last common ancestor of Neanderthals and modern humans dates to approximately 550,000 to 690,000 years ago, which is about four times older than the modern human mtDNA pool. Since this study was completed, many more samples of Neanderthal mtDNA have been replicated and studied.
Sequencing the Complete Neanderthal Mitochondrial Genome
After successfully sequencing large amounts of mtDNA, a team led by Svante Pbo from the Max Planck Institute reported the first complete mitochondrial DNA (mtDNA) sequence for a Neanderthal (Green et al. 2008). The sample was taken from a 38,000 year old Neanderthal from Vindija Cave, Croatia. The complete mtDNA sequence allowed researchers to compare this Neanderthal mtDNA to modern human mtDNA to see if any modern humans carried the mtDNA from a related group to the Neanderthal group.
Later, Svante Pbos lab sequenced the entire mitochondrial genome of five more Neanderthals (Briggs et al. 2009). Sequences came from two individuals from the Neander Valley in Germany and one each from Mezmaiskaya Cave in Russia, El Sidrn Cave in Spain, and Vindija Cave in Croatia. Though the Neanderthal samples came from a wide geographic area, the Neanderthal mtDNA sequences were not particularly genetically diverse. The most divergent Neanderthal sequence came from the Mezmaiskaya Cave Neanderthal from Russia, which the oldest and eastern-most specimen. Further analysis and sampling or more individuals has led researchers to believe that this diversity was more closely related to age than it was to population-wide variance (Briggs et al. 2009).On average, Neanderthal mtDNA genomes differ from each other by 20.4 bases and are only 1/3 as diverse as modern humans (Briggs et al. 2009). The low diversity might signal a small population size.
There is evidence that some other hominin contributed to the Neanderthal mtDNA gene pool around 270,000 years ago (Posth et al., 2017). A femur discovered in Germany had its mtDNA genotyped and it was found that there was introgression from a non-Neanderthal African hominin, either Homo sapiens or closely related to us, around 270,000 years ago. This mitochondrial genome is also highly divergent from the Neanderthal average discussed previously, indicating that Neanderthals may have been much more genetically diverse in their more distant past.
As for Neanderthal introgression into the modern human mtDNA genome, it is possible that the evidence of such admixture is obscured for a variety of reasons (Wang et al 2013). Primary among these reasons is sample size: There are to date only a dozen or so Neanderthal mtDNA sequences that have been sampled. Because the current sample of Neanderthal mtDNA is so small, it is possible that researchers simply have not yet found the mtDNA in Neanderthals that corresponds to that of modern humans.
Map of Neanderthal extent througout Eurasia.
There have been many efforts to sequence Neanderthal nuclear genes, with an eventual goal to sequence as much of the Neanderthal genome as possible. In 2014, the complete genome of a Neanderthal from the Altai Mountains in Siberia was published (Prufer et al., 2014). This female individuals genome showed that her parents were likely half siblings and that her genetic line showed evidence of high rates of incestuous pairings. It is unclear whether this is due to her living in a small and isolated population or if other factors may have influenced the lineages inbreeding. Their analysis also showed that this individual was closely related to both modern humans and the Denisovans, another ancient human population. By their analysis, there was only a very small margin by which Neanderthal and Denisovan DNA differed exclusively from modern humans.
Fast Facts:
Neanderthals are genetically distinct from modern humans, but are more closely related to us than chimpanzees are
The Neanderthal and modern human lineages diverged about 550,000 years ago
So far, we have no evidence of Neanderthal mtDNA lineages in modern humans
Neanderthals were not as genetically diverse as modern humans were at the same period, indicating that Neanderthals had a smaller population size
Neanderthal nuclear DNA shows further evidence of small population sizes, including genetic evidence of incest
As technology improves, researchers are able to detect and analyze older and more fragmentary samples of DNA
Scientists have also found DNA from another extinct hominin population: the Denisovans. The first remains of the species found were a single fragment of a phalanx (finger bone) and two teeth, all of which date back to about 40,000 years ago (Reich et al., 2010). Since then, a Denisovan mandible, or lower jaw, has been found in Tibet (Chen et al., 2019) and a Denisovan molar has been found in Laos (Demeter et al., 2022). Other fossil hominins, such as the Homo longi remains from northern China (Ji et al., 2021) and the Dali cranium from northwestern China may belong to the Denisovans, but without comparable fossils and genetics it is difficult to know for sure.
This species is the first fossil hominin identified as a new species based on its DNA alone. Denisovans are close relatives of both modern humans and Neanderthals, and likely diverged from these lineages around 300,000 to 400,000 years ago; they are more closely related to Neanderthals than to modern humans. You might be wondering: If we have the DNA of Denisovans, why cant we compare them to modern humans like we do Neanderthals? Why isnt this article about them too? The answer is simply that we dont have enough DNA and fossils to make a comparison. The single-digit specimen pool of Denisovans found to date is statistically far too small a data set to derive any meaningful comparisons. Until we find more Denisovan material, we cannot begin to understand their full genome in the way that we can study Neanderthals. The lack of more (and more morphologically diagnostic) Denisovan fossils is the reason why scientists have not yet given them a species name.
Fast Facts:
Homo sapiens and Homo neanderthalensis are different species, yet you are reading this webpage about them potentially interbreeding with each other. So, what does that mean, exactly? Modern humans and Neanderthals lived in separate regions evolving along separate evolutionary lineages for hundreds of thousands of years. Even so, Neanderthals are still our closest currently known relative. Because of that evolutionary proximity, despite being recognized as different species, it is still possible that members of our two species exchanged genetic information. This exchange of DNA is called introgression, or interbreeding.
When looking for evidence of interbreeding, scientists do not search billions and billions of base pairs. Instead, there are specific regions of the genomes that are known to be highly variable in modern humans along with several million single nucleotide polymorphisms (SNPs), where the given base at a single location can vary among people. The difference between the total genome and these specific regions/sites can lead to some confusion. In terms of the total genome, humans and chimpanzees are 98-99% similar. Yet, it is possible for individuals to have up to 4% Neanderthal DNA. That difference is accounted for in that 4% of the highly variable genome is inherited from a Neanderthal source, not 4% of the entire genome. If one was to look at the modern human genome as a whole, at least 98-99% is the same, inherited from our common ancestor with Neanderthals.
Neanderthals are known to contribute up to 1-4% of the genomes of non-African modern humans, depending on what region of the word your ancestors come from, and modern humans who lived about 40,000 years ago have been found to have up to 6-9% Neanderthal DNA (Fu et al., 2015). Because Neanderthals likely evolve outside of Africa (no Neanderthal fossils have been found in Africa to date) it was thought that there would be no trace of Neanderthal DNA in African modern humans. However, a study in 2020 demonstrated that there is Neanderthal DNA in all African Homo sapiens (Chen at el., 2020). This is a good indicator of how human migration out of Africa worked: that Homo sapiens did not leave Africa in one or more major dispersals, but that there was gene flow back and forth over time that brough Neanderthal DNA into Africa.
The evidence we have of Neanderthal-modern human interbreeding sheds light on the expansion of modern humans out of Africa. These new discoveries refute many previous hypotheses in which anatomically modern humans replaced archaic hominins, like Neanderthals, without any interbreeding. However, even with some interbreeding between modern humans and now-extinct hominins, most of our genome still derives from Africa.
For many years, the only evidence of human-Neanderthal hybridization existed within modern human genes. However, in 2016 researchers published a new set of Neanderthal DNA sequences from Altai Cave in Siberia, as well as from Spain and Croatia, that show evidence of human-Neanderthal interbreeding as far back as 100,000 years ago -- farther back than many previous estimates of humans migration out of Africa (Kuhlwilm et al., 2016). Their findings are the first to show human gene flow into the Neanderthal genome as opposed to Neanderthal DNA into the human genome. These data tells us that not only were human-Neanderthal interbreeding events more frequent than previously thought, but also that an early migration of humans did in fact leave Africa before the population that survived and gave rise to all contemporary non-African modern humans.
We previously mentioned the lack of genetic contributions by Neanderthals into the modern human mtDNA gene pool. As we have shown that Neanderthal-human interbreeding did occur, why wouldnt we find their DNA in our mtDNA as well as our nuclear DNA? There are several potential explanations for this. It is possible that there were at one point modern humans who possessed the Neanderthal mtDNA, but that their lineages died out. It is also highly possible that Neanderthals did not contribute to the mtDNA genome by virtue of the nature of human-Neanderthal admixture. While we know that humans and Neanderthals bred, we have no way of knowing what the possible social or cultural contexts for such breeding would have been.
Because mtDNA is passed down exclusively from mother to offspring, if Neanderthal males were the only ones contributing to the human genome, their contributions would not be present in the mtDNA line. It is also possible that while interbreeding between Neanderthal males and human females could have produced fertile offspring, interbreeding between Neanderthal females and modern human males might not have produced fertile offspring, which would mean that the Neanderthal mtDNA could not be passed down. Finally, it is possible that modern humans do carry at least one mtDNA lineage that Neanderthals contributed to our genome, but that we have not yet sequenced that lineage in either modern humans or in Neanderthals. Any of these explanations could underlie the lack of Neanderthal mtDNA in modern human populations.
Given that scientists have DNA evidence of another hominin species, the Denisovans, is there any evidence for interbreeding among all three species? Yes! Comparison of the Denisovan genome to various modern human populations shows up to 4-6% contribution from Denisovans in non-African modern human populations. This concentration is highest in people from Papua New Guinea and Oceania. It makes sense that interbreeding would appear in these Southeast Asian and Pacific Island communities, as their ancestors migrated from mainland Asia where Denisovan fossils have been found. There is also substantial evidence for Denisovan-Neanderthal interbreeding, including one juvenile female that appears to be a fist generation hybrid of a Neanderthal female parent and Denisovan male parent (Slon et al., 2018). Finding more Denisovan fossils will hopefully mean developing a more complete picture of Denisovan genetics so that scientists can explore these interactions in more detail.
Fast Facts:
Homo neanderthalensis, adult male. Reconstruction based on Shanidar 1 (artist, John Gurche)
While much of the genetic diversity discussed above came from inactive, noncoding, or otherwise evolutionarily neutral segments of the genome, there are many sites that show clear evidence of selective pressure on the variations between modern humans and Neanderthals. Researchers found 78 loci at which Neanderthals had an ancestral state and modern humans had a newer, derived state (Green et al 2010). Five of these genes had more than one sequence change that affected the protein structure. These proteins include SPAG17, which is involved in the movement of sperm, PCD16, which may be involved in wound healing, TTF1, which is involved in ribosomal gene transcription, and RPTN, which is found in the skin, hair and sweat glands. Other changes may not alter the sequence of the gene itself, but alter the factors that control that genes replication in the cell, changing its expression secondarily.
This tells us that these traits were selected for in the evolution of modern humans and were possibly selected against in Neanderthals. Though some of the genomic areas that may have been positively selected for in modern humans may have coded for structural or regulatory regions, others may have been associated with energy metabolism, cognitive development, and the morphology of the head and upper body. These are just a few of the areas where we have non-genetic evidence of differentiation between modern humans and Neanderthals.
While the study of DNA reveals aspects of relatedness and lineage, its primary function is, of course, to control the production of proteins that regulate an organisms biology. Each gene may have a variety of genotypes, which are the variances that can occur within the site of a particular gene. Each genotype codes for a respective phenotype, which is the physical expression of that gene. When we study Neanderthal DNA, we can examine the genotypes at loci of known function and can infer what phenotype the Neanderthals mutations may have expressed in life. Below, explore several examples of Neanderthal genes and the possible phenotypes that they would have displayed.
Ancient DNA has been used to reconstruct aspects of Neanderthal appearance. A fragment of the gene for the melanocortin 1 receptor (MRC1) was sequenced using DNA from two Neanderthal specimens from Spain and Italy: El Sidrn 1252 and Monte Lessini (Lalueza-Fox et al. 2007). MC1Ris a receptor gene that controls the production of melanin, the protein responsible for pigmentation of the hair and skin. Neanderthals had a mutation in this receptor gene which changed an amino acid, making the resulting protein less efficient and likely creating a phenotype of red hair and pale skin. (Thereconstruction below of a male Neanderthal by John Gurche features pale skin, but not red hair) .How do we know what this phenotype would have looked like? Modern humans display similar mutations of MC1R, and people who have two copies of this mutation have red hair and pale skin. However, no modern human has the exact mutation that Neanderthals had, which means that both Neanderthals and humans evolved this phenotype independent of each other.
If modern humans and Neanderthals living in Europe at the same time period both evolved this reduction of pigmentation, it is likely that there was an advantage to this trait. One hypothesis to explain this adaptations advantage involves the production of vitamin D. Our bodies primarily synthesize our supply of vitamin D, rather than relying on vitamin D from food sources. Vitamin D is synthesized when the suns UV rays penetrate our skin. Darker skin makes it harder for sunlight to penetrate the outermost layers and stimulate the production of vitamin D, and while people living in areas of high sun exposure will still get plenty of vitamin D, people who live far from the equator are not exposed to as much sunlight and need to optimize their exposure to the sun. Therefore, it would be beneficial for populations in colder climates to have paler skin so that they can create enough vitamin D even with less sun exposure.
The FOXP2 gene is involved in speech and language (Lai et al. 2001). Mutations in the FOXP2 gene sequence in modern humans led to problems with speech, and oral and facial muscle control. The human FOXP2 gene is on a haplotype that was subject to a strong selective sweep. A haplotype is a set of alleles that are inherited together on the same chromosome, and a selective sweep is a reduction or elimination of variation among the nucleotides near a particular DNA mutation. Modern humans and Neanderthals share two changes in FOXP2 compared with the sequence in chimpanzees (Krause et al. 2007). How did this FOXP2 variant come to be found in both Neanderthals and modern humans? One scenario is that it could have been transferred between species via gene flow. Another possibility is that the derived FOXP2 was present in the ancestor of both modern humans and Neanderthals, and that the gene was so heavily favored that it proliferated in both populations. A third scenario, which the authors think is most likely, is that the changes and selective sweep occurred before the divergence between the populations. While it can be tempting to infer that the presence of the same haplotype in Neanderthals and humans means that Neanderthals had similar complex language capabilities, there is not yet enough evidence for such a conclusion. Neanderthals may also have their own unique derived characteristics in the FOXP2 gene that were not tested for in this study. Genes are just one factor of many in the development of language.
The gene that produces the ABO blood system is polymorphic in humans, meaning that there are more than two possible expressions of this gene. The genes for both A and B blood types are dominant, and O type is recessive, meaning that people who are type A or B can have genotypes of either AA or AO (or BB and BO) and still be A (or B) blood type, but to have type O blood one must have a genotype of OO. Various selection factors may favor different alleles, leading to the maintenance of distinct blood groups in modern human populations. Though chimpanzees also have different blood groups, they are not the same as human blood types. While the mutation that causes the human B blood group arose around 3.5 Ma, the O group mutation dates to around 1.15 Ma. When scientists tested whether Neanderthals had the O blood group they found that two Neanderthal specimens from Spain probably had the O blood type, though there is the possibility that they were OA or OB (Lalueza-Fox et al. 2008). Though the O allele was likely to have already appeared before the split between humans and Neanderthals, it could also have arisen in the Neanderthal genome via gene flow from modern humans.
The ability to taste bitter substances is controlled by a gene, TAS2R38. Some individuals are able to taste bitter substances, while others have a different version of the gene that does not allow them to taste bitter foods. Possession of two copies of the positive tasting allele gives the individual greater perception of bitter tastes than the heterozygous state in which individuals have one tasting allele and one non-tasting allele. Two copies of a non-tasting allele leads to inability to taste bitter substances.
When scientists sequenced the DNA of a Neanderthal from El Sidrn, Spain for the TAS23R38 gene, they found that this individual was heterozygous and thus was able to perceive bitter taste - although not as strongly as a homozygous individual with two copies of the tasting allele would be able to (Lalueza-Fox et al. 2009). Both of these haplotypes are still present in modern people, and since the Neanderthal sequenced was heterozygous, the two alleles (tasting and non-tasting) were probably both present in the common ancestor of Neanderthals and modern humans. Though chimpanzees also vary in their ability to taste bitterness, their abilities are controlled by different alleles than those found in humans, indicating that non-tasting alleles evolved separately in the hominin lineage.
The microcephalin gene relates to brain size during development. A mutation in the microcephalin gene, MCPH1, is a common cause of microcephaly. Mutations in microcephalin cause the brain to be 3 to 4 times smaller in size. A variant of MCPH1, haplogroup D, may have been positively selected for in modern humans and may also have come from an interbreeding event with an archaic population (Evans et al. 2006). All of the haplogroup D variants come from a single copy that appeared in modern humans around 37,000 years ago. However, haplogroup D itself came from a lineage that had diverged from the lineage that led to modern humans around 1.1 million years ago. Although there was speculation that the Neanderthals were the source of the microcephalin haplogroup D (Evans et al. 2006), Neanderthal DNA sequenced does not contain the microcephalin haplogroup D (Green et al. 2010).
While changes to the genome can directly affect the phenotypes displayed in an organism, altering the timing mechanism of protein production can cause very similar effects. MicroRNA (miRNA) is one such mechanism: a cell uses miRNA to suppress the expression of a gene until that gene becomes necessary. One miRNA can target multiple genes by binding its seed region to messenger RNA that would otherwise have carried that information to the ribosome to be transcribed into proteins, preventing transcription from taking place. In hominins, one particular miRNA called miR-1304 is exhibited in both an ancestral and derived condition. The derived condition has a mutation at the seed region which allows it to target more mRNA segments but less effectively. This means that in the derived state, some genes will be more strongly expressed due to a lack of suppression. One such trait is the production of enamelin and amelotin proteins, both used in dental formation during development. The suppression of production in Neanderthals, and subsequent lack of suppression in modern humans, could be a contributing factor to some of the morphological differences between Neanderthal and modern human dentition.
Research shows that Neanderthal DNA has contributed to our immune systems today. A study of the human genome found a surprising incursion of Neanderthal DNA into the modern human genome, specifically within the region that codes for our immune response to pathogens (Dannemann et al 2016). These particular Neanderthal genes would have been useful for the modern humans arriving in Europe whose immune systems had never encountered the pathogens within Europe and would be vulnerable to them, unlike the Neanderthals who had built up generations of resistance against these diseases. When humans and Neanderthals interbred, they passed this genetic resistance to diseases on to their offspring, allowing them a better chance at survival than those without this additional resistance to disease. The evidence of this genetic resistance shows that there have been at least three incursions of nonhuman DNA into the genes for immune response, two coming from Neanderthals and one from our poorly understood evolutionary cousins, the Denisovans.
While many of the genes that we retain for generations are either beneficial or neutral, there are some that have become deleterious in our new, modern lives. There are several genes that our Neanderthal relatives have contributed to our genome that were once beneficial in the past but can now cause health-related problems (Simonti et al 2016). One of these genes allows our blood to coagulate (or clot) quickly, a useful adaptation in creatures who were often injured while hunting. However, in modern people who live longer lives, this same trait of quick-clotting blood can cause harmful blood clots to form in the body later in life. Researchers found another gene that can cause depression and other neurological disorders and is triggered by disturbances in circadian rhythms. Since it is unlikely that Neanderthals experienced such disturbances to their natural sleep cycles, they may never have expressed this gene, but in modern humans who can control our climate and for whom our lifestyle often disrupts our circadian rhythms, this gene is expressed more frequently.
Briggs, A.W., Good, J.M., Green, R.E., Krause, J. Maricic, T., Stenzel, U., Lalueza-Fox, C., Rudan, P., Brajkovi, D., Kuan, ., Gui, I., Schmitz, R., Doronichev, V.B., Golovanova, L. V., de la Rasilla, M., Fortea, J., Rosas, A., Pbo, S., 2009. Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325: 318-321.
Brown, T.A., 2010. Stranger from Siberia. Nature 464: 838-839.
Callaway, Ewen. 2009. First draft of Neanderthal genome is unveiled. New Scientist. 12 Feb 2009.
Caramelli, D., Milani, L., Vai, S., Modi, A., Peccholi, E., Girardi, M., Pilli, E., Lari, M., Lippi, B., Ronchitelli, A., Mallegni, F., Casoli, A., Bertorelle, G., Barbujani, G., 2008. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences. PLoS One 3(7): e2700.
Chen, F., Welker, F., Shen, C.C., Bailey, S.E., Bergmann, I., Davis, S., Xia, H., Wang, H., Fischer, R., Freidline, S.E., Yu, T.L. 2019. A late middle Pleistocene Denisovan mandible from the Tibetan Plateau.Nature569(7756): 409-412.
Chen, L., Wolf, A. B., Fu, W., Li, L., Akey, J. M., 2020. Identifying and interpreting apparent Neanderthal ancestry in African individuals.Cell180(4): 677-687.
Coop, G., Bullaughey, K. Luca, F., Przeworski, M., 2008. The timing of selection at the human FOXP2 Gene. Molecular Biology and Evolution 25(7): 1257-1259.
Currat, M., Excoffier, L., 2004. Modern humans did not admix with Neanderthals during their range expansion into Europe. PLoS Biology 2: e421.
Dalton, R., 2006. Neanderthal DNA yields to genome foray. Nature 441: 260-261.
Dalton, R., 2006. Neanderthal genome sees first light. Nature 444: 254.
Dannerman, M., Andres, A.M., Kelso, J. Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors. 2016. The American Journal of Human Genetics: 98(1) 22-33.
Demeter, F., Zanolli, C., Westaway, K.E., Joannes-Boyau, R., Duringer, P., Morley, M.W., Welker, F., Rther, P.L., Skinner, M.M., McColl, H., Gaunitz, C. 2022. A Middle Pleistocene Denisovan molar from the Annamite Chain of northern Laos.Nature Communications13(1): 1-17.
Evans, P.D., Mekel-Bobrov, N., Vallender, E.J., Hudson, R.R., Lahn, B.T., 2006. Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. Proceedings of the National Academy of Sciences 103(48): 18178-18183.
Fu, Q., Hajdinjak, M., Moldovan, O.T., Constantin, S., Mallick, S., Skoglund, P., Patterson, N., Rohland, N., Lazaridis, I., Nickel, B., Viola, B., Profer, K., Meyer, M., Kelso, J., Reich, D., Pbo, S., 2015. An early modern human from Romania with a recent Neanderthal ancestor. Nature. 524(7564):216-9.
Gilbert, M. T. P., Bandelt, H. J., Hofreiter, M., Barnes, I., 2005. Assessing ancient DNA studies.Trends in Ecology & Evolution20(10): 541-544.
Green, R. E., J. Krause, Briggs, A.W., Marcic, T., Stensel, U., Kircher, M., Patterson, N.Fritz, M., Hansen, N., Durand, E.Y., Malaspinas, A-S, Jensen, J.D., Marques-Bonet, T., Alkan, C., Prfer, K., Meyer, M., Burbano, H.A., Good, J.M., Schultz, R., Aximu-Petri, A., Butthof, A., Hber, B., Hffner, B., Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E.S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, ., Guic, I., Doronichev, V.B., Golovanova, L.V., Lalueza-Fox, C., de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R.W., Eichler, E.E., Falush, D., Birney, E., Mullikan, J.C. Slatkin, M., Neilsen, R., Kelso, J., Lachmann, M., Reich, D., Pbo, S., 2010. A draft sequence of the Neandertal genome.Science 328: 710-722.
Green, R. E., Krause, J., Ptak, S.E., Briggs, A.W., Ronan, M.T., Simons, J.F., Du, L., Egholm, M., Rothberg J.M., Paunovic, M., Pbo, S.,. 2006. Analysis of one million base pairs of Neanderthal DNA. Nature 444: 330-336.
Green, R. E., Malaspinas, A.-S. Krause, J., Briggs, A., Johnson, P., Uhler, C., Meyer, M., Good, J., Maricic, T., Stenzel, U., 2008. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134: 416-426.
Griffiths, D. A., 2018. Shifting syndromes: Sex chromosome variations and intersex classifications.Social Studies of Science48(1): 125-148.
Hofreiter, M., Serre, D., Poinar, H.N., Kuch, M., Pbo, S., 2001. Ancient DNA. Nature Reviews2: 353-359.
Holden, C., 2006. It's Neanderthal Time. Science 313: 279.
Jagannathan, M., Cummings, R., Yamashita, Y. M., 2018. A conserved function for pericentromeric satellite DNA.Elife7: e34122.
Ji, Q., Wu, W., Ji, Y., Li, Q., Ni, X. 2021. Late Middle Pleistocene Harbin cranium represents a new Homo species. The Innovation 2(3).
Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R.E., Burbano, H.A., Hublin, J.-J., Hnni, C., Fortea, J., de la Rasilla, M., Bertranpetit, J., Rosas, A., Pbo, S., 2007. The derived FOXP2 variant of modern humans was shared with Neandertals. Current Biology 17: 1908-1912.
Krings, M., Stone, A., Schmitz, R.W., Krainitzki, H., Stoneking, M., Pbo, S., 1997. Neandertal DNA Sequences and the origin of modern humans. Cell 90: 19-30.
Krings, M., Geisert, H., Schmitz, R.W., Krainitzki, H., Pbo, S., 1999. DNA Sequence of the mitochondrial hypervariable region II from the Neanderthal type specimen. Proceedings of the National Academy of Sciences USA 96: 5581-5585.
Kuhlwilm M, Gronau I, Hubisz MJ, de Filippo C, Prado-Martinez J, Kircher M, Fu Q, Burbano HA, Lalueza-Fox C, de La Rasilla M, Rosas A. 2016. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature. 530(7591):429-33.
Lalueza-Fox, C., Gigli, E., de la Rasilla, M., Fortea, J., Rosas, A., Bertranpetit, J., Krause, J., 2008. Genetic characterization of the ABO blood group in Neandertals. BMC Evolutionary Biology 8: 342.
Lalueza-Fox, C., E. Gigli, E., de la Rasilla, M., Fortea, J., Rosas, A., 2009. Bitter taste perception in Neanderthals through the analysis of the TAS2R38 gene. Biology Letters 5: 809-811.
Lalueza-Fox, C., Rmpler, H., Caramelli, D., Stubert, C., Catalano, G., Hughes, D., Rohland, N., Pilli, E., Longo, L., Condemi, S., de la Rasilla, M., Fortea, J., Rosas, A., Stoneking, M., Schneberg, T., Bertranpetit, J., Hofreiter, M., 2007. A melanocortin 1 receptor allele suggests varying pigmentation among Neanderthals. Science 318: 1453-1455.
Lopez-Valenzuela M., Ramrez O., Rosas A., Garca-Vargas S., de la Rasilla M., Lalueza-Fox C., Espinosa-Parrilla Y., 2012. An ancestral miR-1304 allele present in Neanderthals regulates genes involved in enamel formation and could explain dental differences with modern humans. Molecular biology and evolution. mss023.
Mackelprang, R., Rubin, E.M., 2008. New tricks with old bones. Science 321: 221-212.
Meyer M., Arsuaga J.L., De Filippo C., Nagel S., Aximu-Petri A., Nickel B., Martnez I., Gracia A., de Castro J.M.B., Carbonell E., Viola B., Kelso J., Prfer K., Pbo S. 2016. Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531, 504507.
Millar CD., Lambert DM., 2013. Ancient DNA: Towards a million-year-old genome. Nature 499, 3435.
Noonan, J.P., Coop, G., Kudaravalli, S., Smith, D., Krause, J., Alessi, J. Chen, F., Platt, D., Pbo, S., Pritchard, J.K., Rubin, E.M., 2006. Sequencing and analysis of Neanderthal genomic DNA. Science 314: 1113-1118.
Pennisi, E., 2009. Sequencing Neandertal mitochondrial genomes by the half-dozen. Science325: 252.
Read the original here:
Ancient DNA and Neanderthals | The Smithsonian Institution's Human ...
Posted in Human Genetics
Comments Off on Ancient DNA and Neanderthals | The Smithsonian Institution’s Human …