The genomes of 204 Vitis vinifera accessions reveal the origin of European wine grapes – Nature.com

Posted: December 22, 2021 at 12:44 am

Sequence variation in 204 V. vinifera genomes and ten outgroup species

We resequenced 122 accessions of V. vinifera (including sativa, sylvestris and feral) and two Vitis species at a genome coverage ranging from 8-fold to 90-fold (average 26) and obtained archived sequence reads for another 82 vinifera accessions and 8 other grape species (Supplementary Data1 and Supplementary Note1). Using uniquely mapped paired-end reads, 7,364,288 SNPs were identified in the non-repetitive regions of the cultivated grape genomes (sativa, Supplementary Fig.1a), of which 596,150 were private to single accessions. Forty-eight bona fide wild and 33 feral vinifera added 492,256 additional SNPs (Supplementary Note 2). Validation of SNP calls showed low error rates in genotyping for homozygous and heterozygous sites (0.00013% and 0.01019%, respectively (Supplementary Methods12, Supplementary Note3, and Supplementary Figs.23).

We used a subset of 5,925,766 polymorphic sites that were informative in the outgroup Muscadinia rotundifolia as well as in a set of eight American and Asian Vitis species to determine the mutation direction, the unfolded site frequency spectrum, and the strength and direction of selective pressure in cultivated varieties by mutation age and mutation type (Supplementary Note4 and Supplementary Fig.1b, c). A relatively large proportion of the SNPs (11.9%) that are polymorphic in vinifera predate the speciation event that led to the creation of vinifera (trans-specific SNPs), suggesting a largely incomplete lineage sorting in the genus Vitis (Supplementary Fig.4). Only 8.2% of the SNPs found in sylvestris are not present in sativa, consistent with a high level of shared variation between wild and cultivated grapes that is expected under a scenario of extensive gene flow and/or with a moderate bottleneck experienced during the domestication process. The cultivated varieties have a nucleotide diversity of =7.29 103 and highly heterozygous genomes (Supplementary Fig.5), with a maximum of 97.1% of total genome length and 96.8% of genes in heterozygous condition in Sauvignon Blanc (Supplementary Fig.6), despite a mating system dominated by cleistogamy and self-compatibility. The nucleotide diversity in the wild accessions was equal to 3.80 103. While this diversity value may be underestimated due to an incomplete sampling of all the diversity available in sylvestris, it still clearly shows that the domestication events that led to the creation of the cultivated varieties, unlike in other fruit crops19,20,21, did not lead to significant genome-wide losses of genetic diversity as a consequence of a major genetic bottleneck, confirming and complementing previous estimates based on haplotype diversity1.

We used two different data sets and two different approaches to derive inferences on the history, population structure and geographic differentiation in cultivated grapes. We first used a model-based clustering approach22 (implemented in the software ADMIXTURE) using whole genome sequence data of 203 accessions of vinifera (after removing accession KE06 from this specific analysis, following the classification of this individual as a feral escapee done by Liang and coworkers18) to infer their genetic ancestry and a statistical model developed by Pickrell and Pritchard23 (implemented in the software TreeMix) to infer splits and gene flow between cultivated and wild grapes (Fig.1a). We then extended the ancestry and gene flow analyses to an additional set of 1241 accessions (hereafter referred to as diversity panel), using a set of 6357 SNPs in common between the whole genome sequenced accessions and the publicly available SNP profiles of the additional accessions (Fig.1b).

a Maximum likelihood (ML) tree with four groups of cultivated varieties (Supplementary Fig.10) and four groups of wild accessions (Supplementary Fig.7). Ancestry composition and group sizes are illustrated in Supplementary Fig.10. b ML tree with nine groups of cultivated varieties and seven populations of sylvestris. Ancestry composition, group sizes, explained variance and the description of sylvestrissylvestris admixture are given in Supplementary Fig.22. a, b Migration events are indicated by colored arrows. The color scale shows the migration weight. The scale bar shows ten times the average standard error of the estimated entries in the sample covariance matrix. Bold lines indicate the sylvestris branches of the tree. Trees represent random trees and numbers represent bootstrap support values above 70% (100 iterations) before adding migrations. Support for the migration events and the resulting predictive model is given in Supplementary Figs.20 and 22c, Supplementary Table1, and Supplementary Data2.

When we applied the model-based clustering to the species germplasm WGS data (n=203, Supplementary Fig.7), with K=2 we separated eastern and western ancestry. With >0.85 membership proportion, the western ancestry component defined one group that includes exclusively accessions of western sylvestris and western feral grapes. The eastern ancestry component defined the other group that includes eastern wild and feral grapes, Caucasian wine grapes, table grapes, and cultivated varieties from across the Europes three great southern peninsulas (Iberian, Italian, and Balkan). The rest of the cultivated germplasm, represented by varieties that today are grown in Alpine countries, appears to be the result of admixture. The statistics to estimate the number of ancestral populations suggested that eastern and western ancestry is the main divide, according to the Evannos test (Supplementary Fig.8). According to the cross-validation error, K=3 and 4 provide the best predictive model, with K=3 showing only slightly higher cv-error than K=4 (Supplementary Fig.9). The existence of up to four ancestry components in V. vinifera was also considered plausible by Liang and coworkers in a broader context of the genus Vitis18. With both K=3 and 4, we confirmed the two main components that are dominant in wild grapes, one (yellow, hereafter referred to as W1), which is dominant in western sylvestris, and one (blue, hereafter referred to as W2) that is dominant in eastern sylvestris (Supplementary Fig.7). In consideration of the fact that the aberrans forms of sylvestris should have gone extinct, both these components, with different proportions, should correspond to the sylvestris typica ancestry in the East and in the West. Unlike the W1 component, which is found in both eastern and western wine grapes but is predominant only in wild and feral accessions, the W2 component is predominant in Caucasian wine grapes, in table grapes and in European cultivated varieties east of 40E longitude. With K=4, two additional components (orange and gray) are predicted, with one (orange, hereafter referred to as C1) predominantly found with the W2 component in table grapes, and the other (gray, hereafter referred to as C2) almost exclusively found in wine grapes. These two components, which are detectable in some eastern feral grapes but not in wild grape samples, could be derived from extinct forms of sylvestris (i.e., aberrans) and deserve special attention to better understand the structure of genetic diversity in the cultivated compartment. The C2 component is detectable in cultivated varieties of the Muscat family as well as in European wine grapes. The C1 component is most frequently associated with table and wine grapes from around the Mediterranean Basin.

In order to test which scenario of ancestral populations is more consistent with the taxonomic treatment of the cultivated compartment, we applied the model-based clustering to the cultivated germplasm alone (n=123, Supplementary Fig.10). With K=2, we separated one population containing only wine grape varieties from the Alpine countries, which includes 29.3% of all accessions and corresponds to Negruls occidentalis, and one population that consists of 31.7% of all accessions and includes Caucasian wine grapes, table grapes and European varieties from the Southern Balkans and Iberian Peninsula. With K=2, varieties that are typical of the ecogeographical groups pontica and orientalis clustered in the same ancestral population. K=3 generated one population corresponding to occidentalis, including 20.3% of all accessions, and a divide between one population of table grapes and Caucasian wine grapes, including 14.6% of all accessions, and one population that includes varieties from the entire Balkans (including insular Greece), from Southern Italy and from the Iberian Peninsula, representing 13.0% of all accessions. Only K=4 generated a divide between table grapes and Caucasian wine grapes into two ancestral populations that correspond to Negruls orientalis and pontica georgica, respectively. The other two ancestral populations with K=4 were represented by wine grapes from the Alpine countries (occidentalis) and by varieties from the Balkans, Greece and Southern Italy, largely corresponding to Negruls pontica balcanica. The adoption of K=4 (Supplementary Figs.1012) allowed us to obtain groups that reflect, both in terms of number and composition, the divide and the stratification postulated by Negrul (orientalis, pontica georgica, pontica balcanica, occidentalis) and widely accepted in grapevine taxonomy.

TreeMix provided strong evidence for a single eastern origin for the entirety of the cultivated germplasm as well as for an origin of European wine grapes from introgression of western sylvestris individuals into the domesticated lineage of orientalis grapes (Fig.1a). The inclusion of this single event of admixture (Supplementary Fig.13 and Supplementary Note5) in the model allowed 98.7% of the variance in relatedness among populations to be explained. TreeMix also suggested the occurrence of gene flow in the opposite direction, going from cultivated accessions into wild populations as a consequence of the migration of intermediate forms between wine and table grapes into western wild populations. With all the events of admixture shown in Fig.1a, which were confirmed by a 3-population test (Supplementary Table1), the proportion of the variance in the predicted relatedness among populations explained by the model increased to 99.8% and was resilient to different data treatments (Supplementary Figs.1420 and Supplementary Note6). According to TreeMix analysis, Mediterranean wine grapes from the Balkans and Magna Graecia, largely corresponding to pontica balcanica, appear to be genetically more similar to orientalis ancestors (table grapes) than to pontica georgica ancestors (Caucasian wine grapes), in partial disagreement with Negruls hypothesis (Fig.1a). Principal component analysis, haplotype-based pairwise genetic distance matrices and pedigree networks (see below) lend further support in favor of this statement.

The analysis of the extended set of accessions in the diversity panel provided historical and geographical resolution to this reconstruction. With four ancestry components (Supplementary Fig.21), we defined eight groups of well differentiated accessions in eight broad geographic areas (Supplementary Fig.22), excluding varieties with highly admixed ancestry. Caucasian wine grapes were confirmed to be distinct from all other cultivated varieties and closely related to the local wild accessions. Table grapes as well as wine grape varieties from the Black Sea Basin, the Middle East and the Mediterranean Basin have prevalently W2 and C1 ancestry, with an increase of the C1 component going from east to west (Supplementary Fig.22). Cultivated varieties across Europe are characterized by the increasing presence of W1 and C2 ancestry components going from south to north (Supplementary Fig.22). TreeMix analysis and three-population test (Supplementary Data2) suggested that the presence of W1 ancestry is to be attributed to admixture events between Mediterranean lineages of sylvestris, either extinct or not captured by our sample, and introduced varieties most similar to those today grown in the Balkans and Magna Graecia (Fig.1b). We found the highest W1 western sylvestris ancestry proportion in old local varieties considered today as autochthonous in the central and northern Italian peninsula, such as Enantio, Lambrusco di Sorbara, Raboso Piave, Fumat, Greco di Tufo, Aglianico, Verduzzo, Welschriesling, and in the widely grown variety Cabernet Franc that is similar to wild forms still present in the Atlantic Pyrenees24 (Supplementary Fig.7). We collectively refer to this germplasm as well as to hybrid forms classified in other papers under the designation of vigne sauvage faux (false sylvestris) as primitive European varieties, which represented the ninth group included in Fig.1b. Admixture events may have started in Southern Europe as early as in Greek and Roman times. This scenario agrees with previous estimates that western wine grapes and table grapes have diverged for 2.6K years17 and with our demographic model that predicts the nadir of effective population size 2K years before the present (Supplementary Fig.23) and suggests resumption of sexual reproduction in domesticated grapevines since Roman times. Further admixture may have later involved other sylvestris lineages more similar to those found today around the Alpine region (Fig.1b and Supplementary Data2).

In order to understand the consequences of post-domestication sylvestris-sativa hybridization, we used an ABBABABA test25 to identify genomic regions in wine grapes from the Alpine countries that received introgression from western wild populations before spreading worldwide. We observed widespread rather than localized signals of introgression (Fig.2a), suggesting that hybridizations occurred multiple times and left pervasive sylvestris ancestry across the genome rather than limited to specific loci under adaptive evolution. We vice versa identified some chromosomal regions that appear to be under negative selection against the introgression of wild alleles. These regions could correspond to loci that are particularly important for quality traits. An analysis using DA distances26 and phylogenetic trees built separately in 2368 genomic windows across the genome provided additional evidence for widespread effects of introgression, with western sylvestris contributions being detected in 37.7% of the genome (Fig.2b).

a Dots represent adjusted fd values in 100Kb windows of non-repetitive DNA. Lines represent cubic smoothing splines of the values. b Diagram of 100Kb chromosomal windows (in red) that show phylogenetic tree topologies with shorter genetic distance between Alpine wine grapes and western sylvestris than between Alpine wine grapes and any other cultivated group. Red triangles in a and constricted regions in b indicate the location of centromeric repeats. Source data are provided as a Source Data file.

Despite the scale of the dispersal and admixture events that have reshaped the continental diversity for millennia, as shown by multiple lines of evidence presented so far, European wine grapes remain connected with table grapes of the Central Asian oases through a highly interconnected network of first-degree or second-degree relationships (Supplementary Fig.24) that includes all of the 123 cultivated varieties of the WGS panel. We detected 24 parentoffspring and 4 full-sibling relationships, providing conclusive evidence for previously conflicting inferences (Supplementary Figs.2529 and Supplementary Note7). In the diversity panel, 492 varieties spanning the same geographic range were interconnected by 576 parentoffspring relationships and another 122 varieties had parentoffspring relationships outside of this network, also including parentoffspring pairs between cultivated varieties and feral grapes (Supplementary Fig.30).

A principal component and coordinate analysis (PCA and PCoA, respectively, in Fig.3 and Supplementary Fig.31 and Supplementary Note8) largely supported the conclusions based on the ancestry analysis about the origin of European wine grapes. The PCA in Fig.3 shows that the unbiased set of SNPsobtained by WGSprovided higher resolution for the separation of varieties on the bi-dimensional space than pre-ascertained SNPs used in hybridization-based genotyping. The set of sequenced varieties (Fig.3a) captured most of the genetic diversity present in the cultivated germplasm as represented in the extended set of accessions (Fig.3b), including the diversity present in the Iberian peninsula (Supplementary Fig.32), a center of supposed independent domestication in the West12. The PCA in Supplementary Fig.32 shows that the Iberian cultivated germplasm is more similar to table grapes and Eastern populations of sylvestris than to Western populations of sylvestris, not providing support to the hypothesized event of neodomestication. These results are consistent with those obtained by Freitas and coworkers27 from low coverage resequencing of a larger set of locally grown Iberian cultivars and local wild accessions. The PCA highlights individual accessions that may serve as illustrations of the blurred boundaries between wild and cultivated compartments. Enantio and Lambrusco di Sorbara that are cultivated south of the Alps in the Po valley provide an example of western wine grapes that are situated midway on the PCA plane between western sylvestris and cultivated varieties from the Balkans and Magna Graecia (Fig.3a) and are contiguous to sylvestris accessions from the Italian peninsula (Fig.3b), as observed by11. WGS data show that ADMIXTURE membership proportions in Enantio and Lambrusco di Sorbara (Supplementary Fig.7) and the level of haplotype sharing with accessions of Western sylvestris (Supplementary Fig.33) are fully compatible with these varieties representing sylvestris-sativa first generation hybrids or very early backcross generations. The feral accession KE06 from the Ketsch island on the Rhein river (Germany) shows a similar genetic constitutionresulting from a possible cross between an escapee from the vineyards and a genuine autochthonous sylvestris, as suggested by18but an opposite case of classification (lambrusque mtis) presumably because the accession was found outside of a vineyard. There is no evidence of parentoffspring relationships between KE06 and cultivated varieties of the WGS panel, but we detected the highest level of haplotype sharing with Savagnin Blanc and Pinot Noir (across 40.6 and 39.8% of the diploid genome length, respectively, Supplementary Fig.34). The Manseng family that was represented in the diversity panel by Gros Manseng, Petit Manseng and Riesling Bleu and is located midway in the PCA plane, as recently observed by28, between the Pinot/Savagnin Blanc parentoffspring pair and French/German populations of sylvestris is in close proximity with an accession classified by Laucou and coworkers15 as a French sylvestris (B00ERBY). The pairs Pinot/B00ERBY, Savagnin Blanc/Petit Manseng, Savagnin Blanc/Riesling Bleu (collection Oberlin), and Petit Manseng/Gros Manseng share a parentoffspring relationship (Supplementary Fig.30), indicating a possible origin of Petit Manseng, Riesling Bleu and B00ERBY from a cross between a cultivated and a wild accession. Similar hybridization events between cultivated germplasm that was introduced from the center of domestication in the East and local sylvestris may also have occurred elsewhere in Southern Europe, generating intermediate forms that somewhere thrive as seedlings in the wild (e.g., feral forms in the Adriatic coast of Croatia, Supplementary Fig.33) and somewhere are vegetatively propagated for cultivation (e.g., some cultivars in the Iberian peninsula as shown in Supplementary Fig.32 and proto-varieties in Montenegro29). Our analysis of both the WGS panel and the diversity panel did not reveal any instance of cultivated varieties carrying pure Western sylvestris ancestry that would be expected in the scenario of an independent domestication event.

a PCA of 204V. vinifera whole genome resequenced genotypes based on 7.9M SNPs. b PCA of 1445 V. vinifera genotypes based on a subset of 6357 pre-ascertained SNPs in the diversity panel and in common with the WGS panel. Sequenced samples are indicated as open (cultivated varieties) and solid (wild accessions) squares. Additional cultivated varieties are indicated as gray crosses in b. Samples with uncertain assignment in their literature reports are reported as faux sauvage: 1, sylvestris FR B00ERBY15; 2, KE0618; 3, Vigne sauvage faux Mouchouses 1; 4, Tighzirt 1; 5, Fethiye 58 6415 and collectively indicated as solid circles in b. The 2-letter codes () indicate countries of origin: CH Switzerland, DE Germany, DZ Algeria, ES Spain, FR France, GE Georgia, GR Greece, HU Hungary, IT Italy, MA Morocco, SK Slovakia, TN Tunisia, TR Turkey. Source data are provided as a Source Data file.

The history of grape cultivation combines local adaptation with widespread vegetative propagation and movement, with varieties that have achieved broad or worldwide distribution and others that have largely remained confined in narrow geographic areas. Using a set of 605 cultivated varieties that provided a nearly proportional representation of those in cultivation in each country (Supplementary Table2), we associated the individual accessions with a precise geographic location represented by either the most ancient known area of cultivation (for widely spread and so-called international varieties) or the most typical or renowned growing region at the present time (for locally grown varieties). Figure4 shows the geographical distribution of genetic ancestry components for the cultivated compartment (Fig.4a) and for wild populations (Fig.4b), respectively. The top two wine-producing countries, France and Italy, exploited most of the diversity of western wine grapes (Fig.4c, d). The Italian viticulture showed the highest within country variation both in the intensity of the local major ancestry component (Fig.4c) and in the assortment of all four ancestry components (Fig.4d). This was already apparent from the very high proportion of admixed ancestry varieties observed among those originating from the Italian peninsula (Supplementary Fig.22), which therefore seems to be home not onlyto varieties that differ in their ancestry but also to crosses that generated highly admixed ancestries. This is likely due both to the historical presence of hubs for maritime and land trade routes between the East and the West and to the ample latitudinal and climatic range of wine growing regions (from 36 to 46N) that encompass USDA hardiness zones from 7 to 10. Spain and Portugal, the third and fifth wine-producing countries, respectively, instead rely on a national germplasm largely based on high C1 ancestry that is only admixed with C2 ancestry in northern Portugal, Galicia, and southern Pyrenees, presumably as a consequence of massive natural crossing with descendants of Savagnin Blanc (Supplementary Fig.30). Germany is the fourth wine-producing country in Europe with several wine regions located at the northern limits of grape cultivation and has, therefore, more limiting growing conditions and a reduced variation in proportional ancestry components across the country. Although there is a clear pattern of ancestry component proportions that is dictated by latitude across thetop wine-producing countries, but most notably across Italy (Supplementary Fig.35), which seems to result from environmental limitations preventing large-scale, within-country geographical displacement, there are notable exceptions of cultivated varieties with typical southern ancestry that are traditionally in use at northern latitudes. For instance, the variety Garganega, once extensively grown in warm climates of Sicily under the synonym of Grecanico Dorato, rose to fame for quality wines only after its long-range movement to Alpine growing regions.

Continental patterns of ancestry components in cultivated (a) and wild (b) grapevines and nationwide patterns of wine grape ancestry in the top five wineproducing countries in Europe (c, d). Colors represent W2 ancestry (blue), C1 ancestry (orange), C2 ancestry (gray), and W1 ancestry (yellow). Each ancestry component is plotted separately (a, b). Intensity of the main ancestry component is plotted (c). Overlay of all ancestry components is plotted (d). The collection site of wild accessions is indicated by black dots (b). The most representative site of cultivation of each variety is indicate by black dots (c, d). Abbreviations of top wine-producing countries: Italy IT, France FR, Spain ES, Germany DE, Portugal PT. Source data are provided as a Source Data file.

We therefore tested whether local adaptation to climate conditions may have contributed to shaping the geographic distribution of genetic diversity by using a generalized linear model (GLM). For each cultivated variety and the corresponding geographical location, we associated the genetic ancestry coefficients with 29 bioclimatic variables (Supplementary Table3) of the location using a spatial resolution of 1km2, under the assumption that each variety that has been retained in cultivation in a specific site, where many others may have been discharged, may recapitulate genotypes suitable for the local climate conditions. Seven climatic variables showing <0.70 Spearman correlation with one another explained from 41 to 52% of the variance in the geographic distribution of each ancestry component (Supplementary Table3). The W2 ancestry showed positive association with annual temperature range and negative association with seasonal precipitation. The C1 and C2 ancestry components showed associations with annual mean temperature in opposite directions (positive and negative, respectively). The W1 ancestry was most significantly and positively associated with seasonal precipitation. The associations between ancestry components and local climate variables are so tightly related to the geographical location that they rapidly decay under simulations that systematically displaced each genotype outside of the most traditional site of cultivation by 20, 50, and 100 Km in all latitudinal and longitudinal directions (Supplementary Table4).

Artificial selection for specific desired traits during the domestication process results in selective sweeps that lead to local reductions of genetic variation. Loss of nucleotide and haplotype diversity (Supplementary Fig.36a) as well as runs-of-homozygosity (Supplementary Fig.5) were detected in cultivated varieties across three loci on chromosomes 2, 15, and 17 when they were compared to sylvestris (Supplementary Fig.36b, c). These strong signals of selective sweeps presumably originate from strong positive selection of favorable alleles (Supplementary Fig.36d, f, h) and result in persistent linkage disequilibrium (r2) and extended hitchhiking (Supplementary Fig.36e).

The reduction of diversity in the lower arm of chromosome 2 is a breeding sweep known to result from positive selection for two nearby loss-of-function mutations causing loss of anthocyanin pigmentation in berry skin30. Homozygous recessive genotypes are so-called white varieties, which were the only option for the production of white wines before the advent of modern technologies to limit skin contact of crashed berries with their juice. The quest for this trait brought about a severe loss of diversity at nearby distal loci because, while LD dropped rapidly to background values on the proximal side of the locus, it persisted for 4Mb on the distal side (Supplementary Fig.36e).

The reduction of diversity on chromosome 15 resides in a pericentromeric region (Supplementary Fig.37). Contrary to breeding and domestication sweeps that are characterized by both low haplotype diversity as well as high frequency of homozygous varieties as a result of the positive selection for one favorable mutation, we observed in this case only a marked reduction of haplotype diversity, upstream of the centromere. We also observed an extreme segregation distortion immediately downstream of the centromere in the selfed progeny of Pinot Noir with a complete lack of one class of homozygous seedlings, compatible with a lethal recessive variant that was masked in Pinot Noir by the presence of one copy of the reference haplotype. It is thus possible that favorable and unfavorable variants are in strong linkage and in repulsion across the centromere in this region and are maintained in heterozygous state in the population of cultivated varieties.

The sweep on chromosome 17 has been proposed by Myles and coworkers1 as a footprint of domestication. Within a 2Mb valley of haplotype diversity in the cultivated germplasm (Fig.5a), we identified the nadir of haplotype diversity in a 100Kb interval carrying a total of 13 predicted genes in the most common haplotype, five of which form a cluster of tandemly arranged isopiperitenol/carveol dehydrogenases (Fig.5b). In addition to the most common haplotype (H1-A) that corresponds to the PN40024 reference sequence, we identified 18 other haplotypes with minor frequencies in the population (Supplementary Fig.38 and Fig.5c). The phenotypic traits that were subject to selection during domestication31 were presumably related to flower sex determination, with nearby mutations within a sex locus in the upper arm of chromosome 2 involved in the transition from dioecious plants in sylvestris to hermaphrodites in sativa32, and to berry morphology, with an increase in berry size and flesh-to-seed ratio going from sylvestris to sativa that made the grapes more attractive to human consumption and more amenable to wine making, with their genetic determinants still unknown. Quantitative trait loci (QTLs) controlling a series of berry traits in wine as well as in table grapes have been found overlapping with the sweep region on chromosome 1733,34.

a Chromosomal plot of haplotype diversity. Haplotype diversity was calculated in blocks of five consecutive variant sites and plotted as the average of 50 consecutive blocks (blue dots) and a cubic smoothing spline (black line). The scale indicates Mb. The yellow background indicate the interval magnified in b. b V2.1 gene models (exons in blue), manually curated gene predictions (green) in the isopiperitenol/carveol dehydrogenase gene cluster (gene IDs 711), annotated transposable elements (light gray). c Frequency of 19 haplotypes shown in Supplementary Fig.38 in 196 grapevine accessions. d Genotype frequency in 121 cultivated varieties. e VIT_217s0000g05570 (gene 6 in b) gene phylogeny. Numbers indicate the proportion of bootstrap trees supporting that clade. f ASE of the LRRreceptor kinase VIT_217s0000g05570 alleles in representative varieties of 15 haplotypic combinations, in softening berries (lower panel) and leaves (upper panel). The asterisks indicate statistically significant ASE levels (p-value <0.05) according to a Stouffers meta-analysis with weight and direction effect using n=2 biologically independent samples. Cumulative expression is reported for each haplotypic combination lacking exonic SNPs in VIT_217s0000g05570 (H1-A/H1-G, H1-A/H10, H1-A/AX) and for a control variety homozygous for the H1-A haplotype. Gene expression for three haplotype combinations (H1-A/H10, H1-A/H6, H1-A/H4) was quantified in leaves of three different representative varieties (Tschvediansis Tetra, Picolit, Lambrusco Grasparossa) with the same genotype with respect to those used for berry gene expression. Source data of gene expression are provided as a Source Data file.

Two nearby genes in the sweep region captured our attention because they show the lowest level of diversity (Fig.5e and Supplementary Fig.39a) and because they show a hugely increased transcript abundance in the berry in the haplotypes found in the cultivated forms in comparison to those found in the wild ones (Fig.5f and Supplementary Fig.39b). These genes (VIT_217s0000g05570 and VIT_217s0000g05580, corresponding to gene numbers 6 and 7 in the diagram of Fig.5b) are arranged in a head-to-tail orientation with less than 100bp separating their transcriptional units (Supplementary Fig.40) and encode a leucine-rich-repeat receptor-like kinase (LRRRLK) and the first isopiperitenol/carveol dehydrogenase in the tandemly repeated cluster, respectively. We used allele-specific analysis of gene expression to determine the steady-state transcript abundance of the two genes in leaves and berries for a large subset of the 19 haplotypes identified in the region. While no major differences in expression between haplotypes are detected in leaves (Fig.5f and Supplementary Fig.39), only one haplogroup, including the most common haplotype and other highly similar haplotypes that are present only in cultivated varieties, seems to produce detectable levels of transcripts in the berries at least for the kinase gene (Fig.5f). Haplotypes that are found in the wild accessions on the contrary all show transcript levels that are very close to zero. The most frequent (76%) haplotype H1-A is present in 95.8% of cultivated varieties in either homozygous (55.4%) or heterozygous (40.4%) condition, possibly indicating a dominant or semi-dominant mode of action of the selected allele (Fig.5d). The haplogroup comprising H1-A is represented in 98.3% of cultivated varieties. The only exceptions among cultivated varieties are represented by Berzamino, an almost abandoned wine grape once grown in Northeastern Italy35,36, and Gordin Verde, a wine grape from Moldova, unrelated to Berzamino (Supplementary Figs.24 and 30). Despite both varieties having domesticated traits, Berzamino is homozygous for the H7 haplotype that is predominant in wild accessions and consequently has extremely low transcript levels for both genes in the berry (Fig.5df and Supplementary Fig.39). Gordin Verde is heterozygous for two haplotypes (H1-F/H2-A) that are normally found in other cultivated varieties in combination with H1-A and provide low transcript levels for the kinase in the berry (Fig.5df). The haplotypes found in all other domesticated varieties that do not have at least one H1-A copy all share a region of sequence identity that comprises the 5 intergenic region of the kinase gene, the kinase and the dehydrogenase genes, forming the H1-A haplogroup, and have high levels of expression of the kinase gene in the berry (Fig.5e). The difference in organ-specific and allele-specific expression is even more dramatic for the isopiperitenol/carveol dehydrogenase with extremely high levels of transcript being detected for the cultivated haplotypes in the berry (Supplementary Fig.39). While there is in general a good correlation between transcript levels of the kinase and the dehydrogenase genes as if there was a common regulatory element capable of affecting the expression of both genes, there are a few haplotypes identified in cultivated varieties (H2-A, H3, and especially H1-F) that show very low levels for the kinase transcript (Fig.5f) but detectable levels of the dehydrogenase transcript (Supplementary Fig.39). The expression pattern of the two genes in the berry provided by the selected haplotypes appears to be tightly developmentally regulated (Supplementary Figs.4142). Expression is low during the initial phase of berry growth, which occurs mostly by cell division and partly by cell enlargement37, and sharply increases at berry softening, which marks the inception of ripening about one week before color transition (vraison) and resumption of berry growth38. This second phase of increase in berry size, unlike the first one, occurs exclusively by cell expansion39,40. A genome-wide association study (GWAS, Supplementary Fig.43) and an association analysis performed with one of the SNPs that recapitulates the expression differences among haplotypes for the kinase (Fig.6) reveal a significant association between SNPs in the locus and the seed-to-berry ratio at the inception of ripening, with all cultivated varieties showing lower ratios than the wild ones, and Berzamino and Gordin Verde showing high ratios among the cultivated ones. Seed development is the chief factor promoting berry growth41. Berry weight, which is commonly measured as a proxy for berry size, is positively correlated with seed content (seed fresh weight, SFW) and QTLs for extreme variation in berry size and seed content colocalize42 on chromosome 18 with a seed morphogenesis regulator MADS-Box gene43. Doligez and coworkers34 showed, additionally, that the QTL overlapping with the sweep region on chromosome 17 explains the residual variation in berry weight not explained by seed content and it is therefore possible that factors in this region promote pericarp growth at a rate that is more than proportional to the increase in SFW, which is reflected by lower seed-to-berry ratio. The selected haplotypes are associated with a change in berry morphology towards a larger pericarp per unit of SFW. This leads to an increase in size of the fleshy and edible part of the berry, making it more attractive for fresh consumption. It also decreases more than proportionally the seed content released from crushed berries into the must, which greatly improves tannin chemistry and textural sensory attributes in wines. This effect is due to a reduction of the leakage during maceration of astringent and bitter condensed tannins with low degree of polymerization from seeds in favor of the extraction of more palatable condensed tannins with higher degree of polymerization from skins. The kinase gene encodes a LRRRLK that is orthologous to a kinase in Arabidopsis (At5G62710) that is expressed in ovaries and in vascular tissues44 and that shows high homology with the FEI2 kinase. RLKs play a pivotal role in sensing external stimuli, activating downstream signaling pathways and regulating cell behavior involved in response to pathogens, growth, and developmental processes in plants. The FEI2 kinase has been shown in Arabidopsis45 to promote anisotropic cell expansion through a modulation of cell wall function, a role that FEI2 fulfils by interacting directly with 1-aminocyclopropane-1-carboxylic acid (ACC) synthase, a key enzyme for ethylene biosynthesis. The grape berry is considered a non-climateric fruit, lacking a concomitant increase in respiration rate and ethylene biosynthesis at the onset of ripening, but the rise in endogenous ethylene production that is consistently observed a few days before the inception of the second phase of berry growth regulates several aspects of ripening46, including an increase of berry diameter that can be further augmented by the application of exogenous ethylene at vraison47. In light of the specific function of the LRRreceptor kinase ortholog in other plants, it is possible that the haplotypes selected during grape domestication may have provided cultivated varieties with new opportunities for ethylene-related cell expansion during berry ripening thanks to the greatly increased expression of the LRRreceptor kinase gene.

a Association between a AT mutation in the VIT_217s0000g05570 gene, which recapitulate the increase in berryspecific expression of the kinase, and seed-to-berry ratio in hard berries prior to softening, soft berries collected over the same bunch and their average (as a proxy for the end of the first phase of berry growth). Box-plots show 88 accessions (green dots) sorted by their genotype at the SNP_chr17:6,079,793. Accessions with missing AA, AT, TT genotypes were classified based on their alternate/alternate, alternate/reference and reference/reference genotypes, respectively, at the variant sites chr17:6,080,166; 6,079,793; 6,080,193; 6,080,258; 6,080,447; 6,080,449, which are all in LD with chr17: 6,079,793 in the H1-A haplotype. b Variation in soluble solids concentration in the same berries and accessions as in a. Red dots indicate values in hard berries of sylvestris V395. Yellow dots indicate values in eastern feral grapes. Cyan dots indicate values in Berzamino and Gordin Verde. Boxes indicate the first and third quartiles, the horizontal line within the boxes indicates the median and the whiskers indicate 1.5 interquartile range. Source data are provided as a Source Data file.

View original post here:
The genomes of 204 Vitis vinifera accessions reveal the origin of European wine grapes - Nature.com

Related Posts