Title: Development and application of a 20K SNP array in potato
Abstract: In this thesis the results are described of investigations of various application of genome wide SNP (single nucleotide polymorphism) markers. The set of SNP markers was identified by GBS (genotyping by sequencing) strategy. The resulting dataset of 129,156 SNPs across 83 tetraploid varieties was used directly to map traits, but also as a basis for the development of a 20K SNP array in Potato (Solanum tuberosum L.). Subsequently this array, named SolSTW, was used to collect genotypic data from 569 potato genotypes. This dataset offered insight in the breeding history of potato, population structure, linkage disequilibrium (LD) and the potential of GWAS (genome wide association studies) in potato. In Chapter 2 we describe to development of the SolSTW 20K Infinium SNP array. One third of the SNPs on this array originate from the well-known SolCAP 8303 SNP array. The other SNPs are a subset from a targeted re-sequencing project of 83 tetraploid potato varieties. Because of the high SNP density in potato only a limited number of SNPs is suitable for assay development on a SNP array. An obvious outcome is that flanking SNPs contribute to assay failure, particularly for assays with SNPs located in introns. We used fitTetra software to cluster the distribution of captured signals of each marker into the expected five genotypic classes (nulliplex, simplex, duplex, triplex, quadruplex), resulting in a dataset with 14,530 SNP markers. Subsequently the genotypic data obtained with the SolSTW array was used to characterize a set of 569 potato varieties, advanced breeding clones and progenitors. This resulted in the identification of several footprints of potato breeding. Firstly SNPs were dated i.e. the year of market release of the first variety showing polymorphism for a SNP locus is an indication of the ancestry of a SNP. In such a way we identified SNPs with an ancestry tracing back to heirloom varieties, and SNPs (post-1945 SNPs) tracing back to wild species used in modern introgression breeding. Secondly, the changes in allele frequency were calculated over time. Most SNPs show a relative stable allele frequency over time, and very limited genetic variation is removed from the gene-pool of potato i.e genetic erosion is almost absent. Therefore we conclude that 100 years of breeding has not been able to get rid of non-beneficial genetic variation. Only a limited number of SNPs show a rapid increased in allele frequency, which can be explained by positive selection for disease resistance by breeders, or the more frequent use of several founders. Better understanding of the genome wide decay of Linkage Disequilibrium (LD) and population structure offers relevant knowledge to perform and interpret the results of a genome wide association study (GWAS) (Chapter 3). Linkage disequilibrium (LD) is a complex phenomenon, and the influence of the factors shaping LD in tetraploids is hardly studied. Therefore we used simulated data to disentangle and therewith understand often-confounded factors underlying LD-decay. We simulated datasets differing in number of haplotypes in a population, and differing in percentage of haplotype specific SNPs. In these simulations we observed that the choice of an estimator of LD-decay has a major effect on the outcome of an LD-decay estimate, while the true LD-decay remains the same. Based on the simulation we conclude that a 90% percentile and a so-called D1/2 (the distance where 50% of the initial LD is decayed) performed best to estimate and compare LD-decay in potato. To understand the various aspects of LD-decay in the variety panel of 537 varieties, the panel was subdivided in several groups based on the age of a variety and the population structure groups. This resulted in the identification of LD-decay over time, i.e in relatively young varieties the average size of the LD-blocks is smaller. The differences between subpopulations were smaller and are most likely the effect of the population structure. We also observed that there are very long LD-blocks caused by introgression breeding and that different a priori MAF-thresholds also can influence the outcome of LD-decay estimation. Having both LD-decay and population structure defined a genome wide association study (GWAS) was conducted (Chapter 4). For this purpose α-solanine and α-chaconine were measured in potato tubers. Subsequently the sum of both (total SGA) and the ratio between the two were used to discover QTLs for these traits in a GWAS. Additionally we used three bi-parental populations to validate the GWAS results. Total SGA content was confounded with population structure and therefore it was difficult to explain all phenotypic variation with SNP markers. Two QTLs (Sgt1.1 and Sgt11.1) were identified which could be validated in one of the segregating populations. The ratio between α-solanine and α-chaconine was not confounded with population structure, resulted in the identification of two major-effect QTLs (Sgr7.1 & Sgr8.1) located near the candidate genes SGT1 and SGT2, which are known for being responsible in the final steps towards either α-solanine or α-chaconine. The QTL Sgr8.1 could be validated, however similar phenotypes were explained by different haplotypes in two populations. We show that population structure, low frequent alleles and genetic heterogeneity may explain to some degree the missing heritability in GWAS in potato. In Chapter 5 we describe how the method of graphical genotyping, which is widely used in diploid bi-parental populations, can be applied in a variety panel of tetraploid varieties. We show that a few discrete filtering steps in Excel can be used to display patterns that are visual representations of introgression segments and the locations of historical recombination events. Using this method we identified introgression segments from Solanum vernei including the Gpa5 locus on chromosome 5 and Solanum stoloniferum introgression segment including a gene involved in resistance to Potato Virus Y on chromosome 11. This method requires that the haplotypes that cause the phenotypic effect have to be identical by descent (IBD). In the final chapter 6 the results of chapter 2 to 5 are discussed. We look forward on how our results can be used in future research and applied in marker-assisted breeding. Additionally some new GWAS results are presented for tuber flesh colour, foliage maturity and resistance to Globodera pallida pathotype 3.