GENOME WIDE ASSOCIATION MAPPING IN ARABIDOPSIS THALIANA (NIH R01 GM073822)
PI Justin Borevitz
(UChicago); coPI Magnus Nordborg (USC) (NSF
grant); coPI Paul Marjoram (USC); coPI Sebastian Zoellner (U Michigan)
Project Description
Our proposal
will investigate the genetic and molecular basis of complex traits and their
interactions with the environment using the model plant Arabidopsis thaliana.
We will implement a multi use, high density oligo-nucleotide tiling array for
whole genome resequencing. The sample will include a largely unstructured core
set of 384 wild A. thaliana genomes. This will be used to develop
a very high resolution haplotype map, reveal genome wide patterns of
variation, and suggest sites under natural selection. The ecologically
relevant quantitative trait of flowering time will be measured across two
seasonal and two geographic environments which span the native range of A.
thaliana. This and future community phenotypic data will be used to
develop and test methods for fine scale quantitative trait locus (QTL)
association scanning capitalizing on the high density haplotype map.
Whole genome association mapping will be developed using coalescent models for
detection and fine mapping. We will determine the functional molecular
changes underlying at least one QTL utilizing the full power of Arabidopsis
genetics. Importantly this proposal will develop new technological inroads for
using tiling arrays to generate high density haplotype maps as the foundation
for whole genome association studies. These methods, once established, can then
be extended to other model systems. The development of fine scale linkage
disequilibrium mapping methods will be broadly applicable.
There is a
tremendous interest in complex disease association mapping, but much debate
over different approaches and little success to date. The studies
proposed here in Arabidopsis will suggest successful paths for this
daunting undertaking, as associations can be quickly confirmed to identify
novel QTL.
Whole Proposal; Year1 update; Year2 update
Stage 1: low
density genotyping using 149 SNPs
(A)
Genetic raw data at low resolution (149FrameworkSNPs;ChrPosition)
Data1; Data2; Data3; Data4; Data5; PeakHData45; Data6; data6_Yan and data6_Bergelson (improved calls in data6 and peak area) BeckLines
(B)
Information for database
StockCenterLines.xls;
CSsiteMap; Original
files from Luz Rivero (Ecotypes_Origin_DTF.xls,
Ecotypes_GPShabitat.xls,
Ecotypes_donors.xls);
StockCenterLineGenotypes
(853 lines with 149 SNPs);
DataforPlotCSlines
(trimmed data, 799 lines with 141 SNPs)
Stock_Cluster;
StockUniqueLines (475 lines by removing
clones from 798 lines after 40% cut bad lines and bad markers)
(C)
Flowering time variation in a single long-day experiment
U.S. Midwest lines: GrowthCondition; Pictures
Flowering time 3664Lines
Stage 2: Choose
a core set for high density genotyping
Strategies: A. choose most diverse lines from the tree first,
run structure later.
ClusterAllData1_6 (5309 lines at 142 SNPs after 40%
cut bad lines and bad markers(Het as NA) from 5750 lines with 149 SNPs)
DTF(3664 lines with DTF data); alleleFrequency
Update_tree (6418 genotypes at 142 SNPs after 40%
cut bad samples and bad SNPs from 7072 genotypes x 149 SNPs)
Step1:
check the seed status List
|
Bulked |
Collecting |
FewSeeds |
Multiple |
NoSeeds |
Replanting |
Stock |
NA |
total |
|
3271 |
280 |
70 |
9 |
669 |
243 |
850 |
358 |
5750 |
Use 4410 lines with
"Bulked","Collecting","Multiple"(need
check),"Stock" for next step.
Step2. remove bad SNPs with
too many inconsistent hets calls; remove bad lines and bad markers (40%
cutoff); remove clones—thanks Yu Huang
1863 lines with 142 SNPs
left, HetsPerLine; HetsPerMarker, 34% lines have at least 1
Het call (caution)
Not much improvement (32%)
to remove Het calls using 20% cut bad SNP and bad lines;
Change Het as missing data,
further cut more lines (40% cut missing data). 1841 lines left.
UniqueCluster (1841 x 142 SNPs), DTF (1102 lines with DTF) treeUniqeDist.pdf
Step3.
Cut tree to get 384 groups, choose 1 line from each group (singleton > 20
accessions RIL parents> Nordborg 192 > stock center > others)
Cluster
of
the 384 lines, list (384 and 1841 lines.
Note: 22 lines in the 384 were found having at least 2 adjacent Het calls), DTF (251with DTF)
LD in 384 lines (Hets as missing); Hets in red and missing in grey (line in Y-axis
same order as the list)
Step4: Check the important
lines such as RIL parents, 20 re-sequenced accessions, and Nordborg192. Replace
some lines with above lines within same group.
List; cluster (all 20
accessions and common RIL parents in, 123 Nordborg lines including Kas-1 in the
list)
Step5:
Pheonotyping 360 wild accessions with some mutants in flowering time
Stage 3: High
density genotyping of 360 core accessions and phenotyping
Genotyping:
LabelingProtocol;
FinishedList; CombinedListArray;
Nordborg192vs360
Created by
Yan Li in Borevitz lab
Posted on
1/31/07
Modified
on 12/31/2007