GENOME WIDE ASSOCIATION MAPPING IN ARABIDOPSIS THALIANA (NIH R01 GM073822)
PI Justin
Borevitz (UChicago); coPI Magnus Nordborg (USC) (NSF grant);
coPI Paul Marjoram (USC); coPI Sebastian Zoellner (U Michigan)
Project
Description
Our proposal
will investigate the genetic and molecular basis of complex traits and their
interactions with the environment using the model plant Arabidopsis thaliana.
We will implement a multi use, high density oligo-nucleotide tiling array for
whole genome resequencing. The sample will include a largely unstructured core
set of 384 wild A. thaliana genomes. This will be used to develop
a very high resolution haplotype map, reveal genome wide patterns of
variation, and suggest sites under natural selection. The ecologically
relevant quantitative trait of flowering time will be measured across two
seasonal and two geographic environments which span the native range of A.
thaliana. This and future community phenotypic data will be used to
develop and test methods for fine scale quantitative trait locus (QTL)
association scanning capitalizing on the high density haplotype map.
Whole genome association mapping will be developed using coalescent models for
detection and fine mapping. We will determine the functional molecular
changes underlying at least one QTL utilizing the full power of Arabidopsis
genetics. Importantly this proposal will develop new technological inroads for
using tiling arrays to generate high density haplotype maps as the foundation
for whole genome association studies. These methods, once established, can then
be extended to other model systems. The development of fine scale linkage
disequilibrium mapping methods will be broadly applicable.
There is a
tremendous interest in complex disease association mapping, but much debate
over different approaches and little success to date. The studies
proposed here in Arabidopsis will suggest successful paths for this
daunting undertaking, as associations can be quickly confirmed to identify
novel QTL.
Whole Proposal; Year1 update; Year2 update;
Resources
Mapping populations
Ø Core 473 accessions List; Tree; Genotypes
Simulating seasonal
climates in chambers
Papers
Li et al. Association mapping of local climate-sensitive quantitative trait loci in Arabidopsis thaliana (in press PNAS 2010)
Baxter et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by
natural variation of the sodium transporter AtHKT1;1.
PLoS Genet 2010, 6 (11): e1001193. doi:10.1371/journal.pgen.1001193 (in press)
Atwell et al Genome-wide
association study of 107 phenotypes in a common set of Arabidopsis thaliana
inbred lines. Nature 2010 Mar 24
Platt et al The
Scale of population structure in Arabidopsis
thaliana PLoS Genetics 2010 Feb 12
Genome-Wide
Association Mapping Results
Database
Progress
Stage 1: low
density genotyping using 149 SNPs
(A)
Genetic raw data at low resolution (149FrameworkSNPs;ChrPosition)
Data1; Data2; Data3; Data4; Data5; PeakHData45; Data6; data6_Yan and data6_Bergelson (improved calls in data6 and peak area) BeckLines
(B) Information for database
StockCenterLines.xls; CSsiteMap; Original files from Luz Rivero (Ecotypes_Origin_DTF.xls, Ecotypes_GPShabitat.xls, Ecotypes_donors.xls);
StockCenterLineGenotypes
(853 lines with 149 SNPs);
DataforPlotCSlines
(trimmed data, 799 lines with 141 SNPs)
Stock_Cluster;
StockUniqueLines (475 lines by removing
clones from 798 lines after 40% cut bad lines and bad markers)
(C)
Flowering time variation in a single long-day experiment
U.S. Midwest lines: GrowthCondition; Pictures
Flowering time 3664Lines
Stage 2: Choose
a core set for high density genotyping
Strategy: choose most diverse lines from the tree first, run
structure later.
ClusterAllData1_6 (5309 lines at 142 SNPs after 40%
cut bad lines and bad markers(Het as NA) from 5750 lines with 149 SNPs)
DTF(3664 lines with DTF data); alleleFrequency
Update_tree (6418 genotypes at 142 SNPs after 40%
cut bad samples and bad SNPs from 7072 genotypes x 149 SNPs)
Step1:
check the seed status List
|
Bulked |
Collecting |
FewSeeds |
Multiple |
NoSeeds |
Replanting |
Stock |
NA |
total |
|
3271 |
280 |
70 |
9 |
669 |
243 |
850 |
358 |
5750 |
Use 4410 lines with
"Bulked","Collecting","Multiple"(need
check),"Stock" for next step.
Step2. remove bad SNPs with
too many inconsistent hets calls; remove bad lines and bad markers (40%
cutoff); remove clones
1863 lines with 142 SNPs
left, HetsPerLine; HetsPerMarker, 34% lines have at least 1
Het call (caution)
Not much improvement (32%)
to remove Het calls using 20% cut bad SNP and bad lines;
Change Het as missing data,
further cut more lines (40% cut missing data). 1841 lines left.
UniqueCluster (1841 x 142 SNPs), DTF (1102 lines with DTF) treeUniqeDist.pdf
Step3.
Cut tree to get 384 groups, choose 1 line from each group (singleton > 20
accessions RIL parents> Nordborg 192 > stock center > others)
Cluster
of
the 384 lines, list (384 and 1841 lines.
Note: 22 lines in the 384 were found having at least 2 adjacent Het calls), DTF
LD in 384 lines (Hets as missing); Hets in red and missing in grey (line in Y-axis
same order as the list)
Step4: the important lines
such as RIL parents, 20 re-sequenced accessions, and Nordborg192 were
prioritized within the group.
List; cluster (all 20
accessions and common RIL parents in, 123 Nordborg lines including Kas-1 in the
list)
Stage 3: High
density genotyping (250K SNP array) of 473 accessions (including core360 and
Nordborg 192)
Stage 4:
Phenotyping and Genome-Wide Association Mapping
Created by
Yan Li in Borevitz lab
Posted on
1/31/07
Modified
on 11/4/2010