GENOME WIDE ASSOCIATION MAPPING IN ARABIDOPSIS THALIANA (NIH R01 GM073822)

 

PI Justin Borevitz (UChicago); coPI Magnus Nordborg (USC) (NSF grant); coPI Paul Marjoram (USC); coPI Sebastian Zoellner (U Michigan)

 

Project Description

Our proposal will investigate the genetic and molecular basis of complex traits and their interactions with the environment using the model plant Arabidopsis thaliana.  We will implement a multi use, high density oligo-nucleotide tiling array for whole genome resequencing. The sample will include a largely unstructured core set of 384 wild A. thaliana genomes.  This will be used to develop a very high resolution haplotype map, reveal genome wide patterns of variation, and suggest sites under natural selection.  The ecologically relevant quantitative trait of flowering time will be measured across two seasonal and two geographic environments which span the native range of A. thaliana.  This and future community phenotypic data will be used to develop and test methods for fine scale quantitative trait locus (QTL) association scanning capitalizing on the high density haplotype map.  Whole genome association mapping will be developed using coalescent models for detection and fine mapping.  We will determine the functional molecular changes underlying at least one QTL utilizing the full power of Arabidopsis genetics. Importantly this proposal will develop new technological inroads for using tiling arrays to generate high density haplotype maps as the foundation for whole genome association studies. These methods, once established, can then be extended to other model systems.  The development of fine scale linkage disequilibrium mapping methods will be broadly applicable.

There is a tremendous interest in complex disease association mapping, but much debate over different approaches and little success to date.  The studies proposed here in Arabidopsis will suggest successful paths for this daunting undertaking, as associations can be quickly confirmed to identify novel QTL.

Whole Proposal; Year1 update; Year2 update;

 

Resources

 

*    Mapping populations

Ø      Core 473 accessions List; Tree; Genotypes

Ø      Core 360 accessions

*    Simulating seasonal climates in chambers

Ø      Weather files Spain; Sweden

Ø     Real conditions  

 

Papers

 

Li et al. Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana (in press PNAS 2010)

 

Baxter et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1;1. PLoS Genet 2010, 6 (11): e1001193. doi:10.1371/journal.pgen.1001193 (in press)

Atwell et al Genome-wide association study of 107 phenotypes in a common set of Arabidopsis thaliana inbred lines. Nature 2010 Mar 24

Platt et al The Scale of population structure in Arabidopsis thaliana PLoS Genetics 2010 Feb 12

Genome-Wide Association Mapping Results Database

 

Progress

 

Stage 1: low density genotyping using 149 SNPs

 

(A) Genetic raw data at low resolution (149FrameworkSNPs;ChrPosition)

Data1; Data2; Data3; Data4; Data5; PeakHData45; Data6; data6_Yan and data6_Bergelson  (improved calls in data6 and peak area) BeckLines

(B) Information for database

DianeSet; DianeSiteMap

StockCenterLines.xls; CSsiteMap; Original files from Luz Rivero (Ecotypes_Origin_DTF.xls, Ecotypes_GPShabitat.xls, Ecotypes_donors.xls);

StockCenterLineGenotypes (853 lines with 149 SNPs);

DataforPlotCSlines (trimmed data, 799 lines with 141 SNPs)

Stock_Cluster; StockUniqueLines (475 lines by removing clones from 798 lines after 40% cut bad lines and bad markers)

(C) Flowering time variation in a single long-day experiment

U.S. Midwest lines: GrowthCondition; Pictures

Stock Center accessions Pictures

Flowering time 3664Lines

Stage 2: Choose a core set for high density genotyping

 

                              Strategy: choose most diverse lines from the tree first, run structure later.

                                                 ClusterAllData1_6 (5309 lines at 142 SNPs after 40% cut bad lines and bad markers(Het as NA) from 5750 lines with 149 SNPs)

                                                            DTF(3664 lines with DTF data); alleleFrequency

                                                            Update_tree (6418 genotypes at 142 SNPs after 40% cut bad samples and bad SNPs from 7072 genotypes x 149 SNPs)

 

                                        Step1: check the seed status  List

 

Bulked

Collecting

FewSeeds

Multiple

NoSeeds

Replanting

Stock

NA

total

3271

280

70

9

669

243

850

358

5750

Use 4410 lines with "Bulked","Collecting","Multiple"(need check),"Stock" for next step.

 

Step2. remove bad SNPs with too many inconsistent hets calls; remove bad lines and bad markers (40% cutoff); remove clones

1863 lines with 142 SNPs left, HetsPerLine; HetsPerMarker, 34% lines have at least 1 Het call (caution)

Not much improvement (32%) to remove Het calls using 20% cut bad SNP and bad lines;

Change Het as missing data, further cut more lines (40% cut missing data). 1841 lines left.

UniqueCluster (1841 x 142 SNPs), DTF (1102 lines with DTF) treeUniqeDist.pdf

 

                                        Step3. Cut tree to get 384 groups, choose 1 line from each group (singleton > 20 accessions RIL parents> Nordborg 192 > stock center > others)

                                                           Cluster of the 384 lines, list (384 and 1841 lines. Note: 22 lines in the 384 were found having at least 2 adjacent Het calls), DTF

                                                                        LD in 384 lines (Hets as missing); Hets in red and missing in grey (line in Y-axis same order as the list)

                                               

Step4: the important lines such as RIL parents, 20 re-sequenced accessions, and Nordborg192 were prioritized within the group.

                        List; cluster (all 20 accessions and common RIL parents in, 123 Nordborg lines including Kas-1 in the list)

 

Stage 3: High density genotyping (250K SNP array) of 473 accessions (including core360 and Nordborg 192)

              List of accessions

              Genotypes

 

Scripts;

Stage 4: Phenotyping and Genome-Wide Association Mapping

              Database

 

Created by Yan Li in Borevitz lab

Posted on 1/31/07

Modified on 11/4/2010