GENOME WIDE ASSOCIATION MAPPING IN ARABIDOPSIS THALIANA (NIH R01 GM073822)

 

PI Justin Borevitz (UChicago); coPI Magnus Nordborg (USC) (NSF grant); coPI Paul Marjoram (USC); coPI Sebastian Zoellner (U Michigan)

 

    Project Description

 

Our proposal will investigate the genetic and molecular basis of complex traits and their interactions with the environment using the model plant Arabidopsis thaliana.  We will implement a multi use, high density oligo-nucleotide tiling array for whole genome resequencing. The sample will include a largely unstructured core set of 384 wild A. thaliana genomes.  This will be used to develop a very high resolution haplotype map, reveal genome wide patterns of variation, and suggest sites under natural selection.  The ecologically relevant quantitative trait of flowering time will be measured across two seasonal and two geographic environments which span the native range of A. thaliana.  This and future community phenotypic data will be used to develop and test methods for fine scale quantitative trait locus (QTL) association scanning capitalizing on the high density haplotype map.  Whole genome association mapping will be developed using coalescent models for detection and fine mapping.  We will determine the functional molecular changes underlying at least one QTL utilizing the full power of Arabidopsis genetics. Importantly this proposal will develop new technological inroads for using tiling arrays to generate high density haplotype maps as the foundation for whole genome association studies. These methods, once established, can then be extended to other model systems.  The development of fine scale linkage disequilibrium mapping methods will be broadly applicable.

There is a tremendous interest in complex disease association mapping, but much debate over different approaches and little success to date.  The studies proposed here in Arabidopsis will suggest successful paths for this daunting undertaking, as associations can be quickly confirmed to identify novel QTL.

Whole Proposal; Year1 update; Year2 update

 

Stage 1: low density genotyping using 149 SNPs

 

(A) Genetic raw data at low resolution (149FrameworkSNPs;ChrPosition)

Data1; Data2; Data3; Data4; Data5; PeakHData45; Data6; data6_Yan and data6_Bergelson  (improved calls in data6 and peak area) BeckLines

(B) Information for database

DianeSet; DianeSiteMap

StockCenterLines.xls; CSsiteMap; Original files from Luz Rivero (Ecotypes_Origin_DTF.xls, Ecotypes_GPShabitat.xls, Ecotypes_donors.xls);

StockCenterLineGenotypes (853 lines with 149 SNPs);

DataforPlotCSlines (trimmed data, 799 lines with 141 SNPs)

Stock_Cluster; StockUniqueLines (475 lines by removing clones from 798 lines after 40% cut bad lines and bad markers)

(C) Flowering time variation in a single long-day experiment

U.S. Midwest lines: GrowthCondition; Pictures

Stock Center accessions Pictures

Flowering time 3664Lines

Stage 2: Choose a core set for high density genotyping

 

                              Strategies: A. choose most diverse lines from the tree first, run structure later.

 

                                                 ClusterAllData1_6 (5309 lines at 142 SNPs after 40% cut bad lines and bad markers(Het as NA) from 5750 lines with 149 SNPs)

                                                            DTF(3664 lines with DTF data); alleleFrequency

                                                            Update_tree (6418 genotypes at 142 SNPs after 40% cut bad samples and bad SNPs from 7072 genotypes x 149 SNPs)

 

                                        Step1: check the seed status  List

 

Bulked

Collecting

FewSeeds

Multiple

NoSeeds

Replanting

Stock

NA

total

3271

280

70

9

669

243

850

358

5750

Use 4410 lines with "Bulked","Collecting","Multiple"(need check),"Stock" for next step.

 

Step2. remove bad SNPs with too many inconsistent hets calls; remove bad lines and bad markers (40% cutoff); remove clones—thanks Yu Huang

1863 lines with 142 SNPs left, HetsPerLine; HetsPerMarker, 34% lines have at least 1 Het call (caution)

Not much improvement (32%) to remove Het calls using 20% cut bad SNP and bad lines;

Change Het as missing data, further cut more lines (40% cut missing data). 1841 lines left.

UniqueCluster (1841 x 142 SNPs), DTF (1102 lines with DTF) treeUniqeDist.pdf

 

                                        Step3. Cut tree to get 384 groups, choose 1 line from each group (singleton > 20 accessions RIL parents> Nordborg 192 > stock center > others)

                                                           Cluster of the 384 lines, list (384 and 1841 lines. Note: 22 lines in the 384 were found having at least 2 adjacent Het calls), DTF (251with DTF)

                                                                        LD in 384 lines (Hets as missing); Hets in red and missing in grey (line in Y-axis same order as the list)

                                               

Step4: Check the important lines such as RIL parents, 20 re-sequenced accessions, and Nordborg192. Replace some lines with above lines within same group.

                        List; cluster (all 20 accessions and common RIL parents in, 123 Nordborg lines including Kas-1 in the list)

 

Step5: Pheonotyping 360 wild accessions with some mutants in flowering time

                        cluster; list

 

Stage 3: High density genotyping of 360 core accessions and phenotyping

 

Genotyping: LabelingProtocol; FinishedList; CombinedListArray; Nordborg192vs360

 

 

Scripts;

 

 

 

Created by Yan Li in Borevitz lab

Posted on 1/31/07

Modified on 12/31/2007