mapping mouse coat color genes

63
1 Mapping mouse coat color genes Statistics 246 Spring 2006 Week 5 Lecture 1

Upload: vunga

Post on 10-Feb-2017

221 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Mapping mouse coat color genes

1

Mapping mouse coat color genes

Statistics 246 Spring 2006Week 5 Lecture 1

Page 2: Mapping mouse coat color genes

2

Inbred strains and their crosses

Our main players are the C57BL/6 (BL for black,abbreviated B6), a robust strain that has been aroundabout 90 years, and the NOD (non-obese diabetic)mouse strain, a delicate diabetes-prone straindiscovered in 1990.

Coat colours: agouti is standard, B6 is black, NOD isalbino (i.e. white). There are many others (chocolate,blue, etc.) but these are the three we meet here.

Page 3: Mapping mouse coat color genes

3

Normal (wild-type) mouse coat: color = agoutia grizzled color of fur resulting from the barring of each hair in several alternate dark and light bands

Page 4: Mapping mouse coat color genes

4Black mouse: C57/BL6 strain

Page 5: Mapping mouse coat color genes

5Albino mouse: non-obese diabetic (NOD) strain

Page 6: Mapping mouse coat color genes

6

Coat color loci in miceFour main loci : A, B, C and D

• Locus A – agouti• Locus B – black• Locus C (known as Tyr) – albinism• Locus D – dilution gene

In the discussion that follows, we only see variation atloci A and C. Our mice all have the dominant (black)allele B rather than the recessive (chocolate) allele bat Locus B , and the dominant (normal color) allele Drather than the recessive (diluted) allele d at Locus D.

Page 7: Mapping mouse coat color genes

7

Alleles at the Agouti (A) locus

• Ay, Lethal dominant yellow• Avy, Viable yellow• Aw, White-bellied Agouti• A, Agouti or Wild type• At, Black and Tan• Am, mottled agouti• a, Non-agouti• ae., Extreme non-agouti

A and a are a dominant/recessive allele pair

Page 8: Mapping mouse coat color genes

8

Alleles at the Albino (C) Locus

• C, full color gene• cch, chincilla• ch, himalayan• c, albino gene

C and c are a dominant/recessivepair of alleles

Page 9: Mapping mouse coat color genes

9

Alleles at A and C interact(called epistasis in genetics))

• If the mouse is aaCy it is not agouti and not albino(in our case it is a black mouse)

• If the mouse is AxCy it is agouti and not albino

• If the mouse is wxcc it is albino no matter what thealleles at the agouti locus are, because they areirrelevant

Page 10: Mapping mouse coat color genes

10

Crosses

We will denote the NOD mice by A, and the B6 mice byB. This same notation will denote the twohomozygotes at a polymorphic marker.

Two main crosses interest us, following the first filialgeneration or F1 , which we denote by A×B → H. HereH denotes heterozygote, which is the case for our F1s.

The backcross BC is arrived at via H×B → BC, or theobvious variant, while the F2 intercross (second filialgeneration) is denoted by H×H → IC=F2.

Page 11: Mapping mouse coat color genes

11

Our data

• An F2 inter cross was performed starting withC57BL/6 and NOD parental lines.

• We have 133 female mice at the F2 generation, justfemales for the reason that males fight, and thisinfluences other (quantitative blood) phenotypes ofinterest

• They were genotyped at 153 microsatellite markersspanning all 19 autosomes and the X chromosome.We also have coat color and a few white blood cellphenotypes.

Page 12: Mapping mouse coat color genes

12

Our markers are Microsatellites

..AGTCCACACACACACACATGT..

..AGTCCACACACACACACATGT..

..AGTCCACACACACACACATGT..

..AGTCCACACACACACACACACACATGT..

..AGTCCACACACACACACACACACATGT..

..AGTCCACACACACACACACACACATGT..

PCR andelectrophoresis

A

B

H

Desirable: to call the genotypes (A, H, or B) automatically Problems: stutters and noise, variability of the patterns, etc.

Page 13: Mapping mouse coat color genes

13

A small portion of the data (beginning)

data type f2 intercross .133 153 7*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH

data type f2 intercross .133 153 7*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH

D10M106 = a marker on chr 10 defined by MITIncompleteness code: C = B or H, D = A or H, - = missing

#individuals #loci #traits marker next column = data from mouse1

Page 14: Mapping mouse coat color genes

14

A small portion of the raw data (end)

data type f2 intercross .133 153 7*D10M106 BBABBBBBHBBABBBBAABBBB-BABABABBABBBBBBBBBBBBB-BBBBBBABBAAABBBBBBBBB-HBABABB-ABBBBAB-BBBABABBB-BBBBBCBCBCBHBBBHCBBHBHHBCBBBBBBBHBHBHCH*D10M14 AHHBHHHAHHABAHBHHBABAA-BHHAHAAHAHHHHHBAHHHAHHBAHBHABBBHAAHHHHAHBHHH--HHHHAHAHAHBHHHAHHABAHHHAHHHAHBHBBHHHAAHAAHHBHHAHAH-HBABAHAHBHHAH*D10M163 AHBBHHB-HHAB-HBH-BAHBA-BHHAHAAHAAHHAHBAHHHHHHHAHBHABBBHAAHBBHAHBBHHBBHBHHHH-HBHHHHHAHHAHABH-AHHHAHBABBBBAAAHAAHHBHHAHHHBHBAHAHABHHHAH*D10M20 HCBHAHBAHHAHAHBABAHHBH-HHHABAAHAAABHHBH-HAHBHAAHBCABABHAAABBHAHBHHBBBHBHAHH-HBHHHABAHHHHAHHBAAHHABHABHBHAAHBHAAHBHAAHBHBHBHHHHABAHAAH

*DXM210 --HAAAAHHHAHAAAAAHAH-HAHHAHAHHH-HHH-H-H-AHH-AAHAA-HHAAAAAHH-AHHAAHHHAHAAH-HAHA-HAAHAHAA-A-HH-AAHHHAHAAHAAAAAHHHHAAAHAAHHAHHHHHHAHAAHA*DXM222 HAAHHAA-HHAAHAAHHAAAHH-HAAHAAHHHHHAHHHH-AAHHAHHAAHHHHHHAAHHAHHHHAAH-AHHHAHHAAAHHHAAAHAHAAHHAAHA-HAA--HHAHHA-AAHAAAAA-HH-AHAAAH-HHAAHA*DXM39 HAAAHAA-HHAH-AAA-HAAHH-HAAAAHHHHHHAHHHHAAHHAAAHAAHHHHAAAAHHHHHHAAHH--HAAAHH-AAHHAAHAHAHAAHH-AAAHHAAHAHAHAAAAHH-AAAAAAAH-AHAAHHAAAAAHA

*trait1 1 1 2 3 1 1 2 3 1 2 2 2 1 1 1 1 3 1 1 1 1 1 1 1 2 1 1 3 1 1 2 1 1 1 2 1 2 3 1 1 2 3 1 1 2 3 1 2 2 1 1 2 3 1 3 3 1 3 1 1 1 1 3 1 1 3 1 2 3 3 1 1 1 2 2 3 1 1 2 2 1 1 1 1 1 3 1 1 2 2 3 3 3 1 1 1 1 1 1 1 1 1 2 2 2 2 1 3 1 1 1 1 1 2 3 1 1 3 3 1 1 2 2 3*trait2 8.90472059883773 8.62455170973674 8.45460831622462 8.43453595773523 8.58360495693549 8.35936910622267 9.09754487783084 8.3999100308015 8.46379241468912 8.69506039786081 8.1840487448877 8.89745444516111 8.40886531453061 9.03400088685978 8.948542613*trait3 16.0508869012649 16.1080453151048 16.1678377428531 16.1130831091348 16.0999316803306 16.1941303372343 16.1773075105902 16.0347140420193 16.0917516756187 16.1524970776757 13.1806322855283 16.1240777226359 16.095850042011 16.1130831091348 16.1050104*trait4 16.0138456295845 16.0907244541622 16.1250712646947 16.1324915312421 16.0819502043220 16.1735303578464 16.1673618922612 16.1383883542261 16.1215895402126 16.1432759494886 16.0938029583893 16.1175955259166 16.1324915312421 16.1447375819047 16.150562*trait5 13.8887610197039 14.1288603771646 13.9863778758242 13.8489453340505 13.8806738838707 14.1443345737472 14.1776041279299 13.6876771864544 13.9578777992512 13.9682316449819 13.8188383480569 13.9645051106422 13.9086972726904 14.1572072187223 14.219196*trait6 7.1066061377273 6.52209279817015 6.63331843328038 6.53184897311552 6.73340189183736 6.45099687648642 6.82437367004309 6.40797949140273 6.18070654674651 6.555830937802 6.64639051484773 7.00910777324243 6.2079191102714 6.7761268365037 6.688769737816*trait7 8.65927129000923 8.41405243249672 8.15861166920692 8.19973896063079 8.32603268595508 8.13739583005665 8.91103545969641 8.16432053342724 8.22326945804098 8.47428569040496 7.8146688490437 8.59291890194437 8.2033952718339 8.87440132651663 8.786201998

Coat color code

WBC traits

Page 15: Mapping mouse coat color genes

15

Snapshot of the genotype data

Page 16: Mapping mouse coat color genes

16

Error Detection

calc.genoprob, calc.errorlod, plot.errorlod

Using theLOD_errorstatistic.Based oncloserecombneventswhichindicatepossiblepresence ofgenotypingerror(see later)

Page 17: Mapping mouse coat color genes

17

Mendel’s laws for one locus

We can (and should) check Mendel with data from our133 offspring at each of our 153 loci.

For example, at D7Mit126, we have 24 A, 29 B and 67H genotypes, adding to 120, indicating 12 incompleteor missing genotypes.

What do we expect according to Mendel? How wouldwe test whether the data agree with our expectations?

Page 18: Mapping mouse coat color genes

18

Mendel’s law for 2 loci

Mendel inferred the independent segregation ofdifferent factors from data on peas .

Here we check that this holds for our two coatcolor loci, but not generally. We then go on tounderstand the more general situation.

Page 19: Mapping mouse coat color genes

19

Mating & Coat color outcomes in this cross

Parentallines

C57/BL6malesBlack(aaBBCC)

NOD femalesAlbinos(AABBcc)

×

F1

All AgoutiaABBCc

F2Agouti : 9

Black : 3

Albino 4

We need to check these last proportions following Mendel’sreasoning.

Page 20: Mapping mouse coat color genes

20

Punnett square depicting F1 parental allelecombinations passed on to F2 offspring

Page 21: Mapping mouse coat color genes

21

It’s not always like that

129326136Total2823 50B6594610H36 01026A

Total BHA13251

2-locus genotypes at D12Mit51 and D12Mit132.If we pool A and H, we do not get 9:3:3:1.

Page 22: Mapping mouse coat color genes

22

Let’s estimate the recombination fractionr between D12Mit51 and D12Mit132

129326136Total2823 50B6594610H36 01026A

Total BHA13251

2-locus genotypes at D12Mit51 and D12Mit132.129 offspring from H×H, where A×B→H.

Page 23: Mapping mouse coat color genes

23

Estimation of r

First note that we can’t simply count recombinants. Why?Because recombination can occur in the paternal or thematernal meiosis, or both, and all we see are the genotypes ofthe offspring. In most cases, the parental origin of therecombination can be inferred, but not in every case.

Denoting the two markers by 1 and 2, the NOD alleles by a, and B6 alleles by b, then the parental haplotypes are a1a2 on

one chromosome, and b1b2 on the other. Each parent passeson a1a2 with probability(1-r)/2, and similarly for b1b2 , while theypass on each of the recombinant haplotypes a1b2 and b1a2with probability r/2.

In practice, recombinations have slightly different frequencies inmale and female meioses, but we ignore this refinement.

Page 24: Mapping mouse coat color genes

24

Probabilities of parentally transmittedhaplotype combinations (×4)

Haplotype combinations resulting from crossing doublyheterozygous parents, each a1/b1 at locus 1 and a2/b2 at locus2. This table is for coupling: the parental haplotypes are a1a2and b1b2, i.e. the mother and father are both a1a2/b1b2.

Here P and M denote the Paternally and Maternally transmittedhaplotypes, respectively.

(1-r)2r(1-r)r(1-r)(1-r)2b1b2

r(1-r) r2 r2r(1-r)b1a2

r(1-r) r2 r2r(1-r)a1b2

(1-r)2r(1-r)r(1-r)(1-r)2a1a2

b1b2b1a2a1b2a1a2P M

Page 25: Mapping mouse coat color genes

25

From the Punnett square to the tableof 2-locus genotype probabilities

Terms in the Punnett square table can be summed to build up atable of probabilities for the 9 different 2-locus genotypeprobabilities.

For example, we observe A (=a1/a1 ) at locus 1 and H (=a2 /b2)at locus 2, if and only if the transmitted male and femalehaplotypes are the pairs a1a2 &a1b2 or a1b2 &a1a2 , and thisoccurs with a combined probability of 2r(1-r)/4.

The other terms are built up similarly, the most complex casebeing the 2-locus genotype HH, where 4 different terms need tobe considered, corresponding to the fact that a doubleheterozygote can result from 4 different combinations ofparental or recombinant haplotypes.

Page 26: Mapping mouse coat color genes

26

Probabilities of 2-locus genotypes (×4)

(1-r)22r(1-r)r2B2r(1-r)2[r2+(1-r)2]2r(1-r)Hr22r(1-r)(1-r)2ABHAL1 L2

Looking at this table, we see that recombinations(or not) can be inferred, apart from the parent, in allbut the HH case. We can almost count recombinants.

Page 27: Mapping mouse coat color genes

27

Estimation of r, cont. Using the table of probabilities we can write down a log

likelihood function for any set of 2-locus frequencies. Label the cells of the table 1,…,9, and denote the

corresponding probabilities by p1(r) …,.p9 (r), and thefrequencies by n1, …, n9. Then the log-likelihood for theresulting multinomial model is

log L = ∑i ni log pi (r).

The parameter r is then estimated by maximizing thisfunction, and an approximate standard error orconfidence interval obtained using the Fisher informationor the asymptotic chi-square approximation.

Page 28: Mapping mouse coat color genes

28

A frill: the M-step of an EM-algorithm

The function log L(r) can be maximized in a number ofways, but in general there is no closed formexpression for the maximum likelihood estimate r^. Ifwe were able to decompose the count n5 of HHs intothe n5

P that are pairs of parental haplotypes, and n5R

that are pairs of recombinant haplotypes, withfrequencies (1-r)2 and r2, resp, the recombinanthaplotypes can then be counted directly and the MLEis

= 2(n3 + n7 + n5R)+ n2 + n4 + n6 + n8)/2n.

ˆ r

ˆ r

Page 29: Mapping mouse coat color genes

29

The E-step

In general we don’t know n5R but can estimate it using the

following formula:

In practice, we need a value of r to begin with. Next we usethe above estimate, then get the next , and then iterate.

Exercise: Prove the above formula, and that the iteration isan instance of the EM-algorithm.

E(n5R | n5) =

r2

(1− r)2 + r2n5

ˆ r

Page 30: Mapping mouse coat color genes

30

2-locus genotype frequencies for D12Mit132 and D13Mit6

131307130Total 32 621 5B 61172915H 38 72110A

Total BHA132 | 6

Exercise: Estimate r for these two loci. Is it different from 1/2?

Page 31: Mapping mouse coat color genes

31

Inferring linkage andmapping markers

We now turn to deciding when two marker loci are linked,and if so, estimating the map distance between them.Then we go on and create a full (marker) map of eachchromosome, relative to which we can map trait genes.With these preliminaries completed, we can map trait loci.

Page 32: Mapping mouse coat color genes

32

The LOD score

Suppose that we have two marker loci, and we don’tknow whether or not they are linked. A natural way toaddress this question is to carry out a formal test ofthe null hypothesis H: r=1/2 against the alternativeK: r< 1/2, using the marker data from our cross.The test statistic almost always used in this context islog10 of the ratio of the likelihood at the maximumlikelihood estimate to that at the null, r=1/2, i.e.

LOD = log10{ L( ˆ r )L(1 / 2)

}€

ˆ r

Page 33: Mapping mouse coat color genes

33

Calculating the LOD score

Recall that the (log) likelihood here is based on the multinomialdistribution for the allocation of n=132 intercross mice into theirnine 2-locus genotypic categories. As we saw earlier, it can bewritten

and so we take the difference between this function evaluated atand at r=1/2, which is

where qi is 1/16, 1/8 or 1/4, depending on i.€

ˆ r €

log10 L(r) = ni log10 pi(r)i∑

LOD = ni log10 pi( ˆ r ) /qii∑

Page 34: Mapping mouse coat color genes

34

Null probabilities of 2-locus genotypes

1/161/81/16B1/81/41/8H1/161/81/16ABHAL1 L2

This is just putting r = 1/2 in an earlier table.

Exercise: Suggest some different test statistics to discriminate between the null H and the alternative K. How do they perform in comparison to the LOD?

Page 35: Mapping mouse coat color genes

35

Using the LOD score

Normal statistical practice would have us setting a type 1 error in a givencontext (cross, sample size), and determining the cut-off for the LOD whichwould achieve approximately the desired error under the null hypothesis.

This approach is rarely adopted in genetics, where tradition dictates the useof more stringent thresholds, which take into a account the multiple testingcommon on linkage mapping. It was originally motivated by a Bayesianargument, and in fact, Bayesian approaches to linkage analysis areincreasingly popular. Let us use of Bayes’ formula in the form

log10 posterior odds = log10 prior odds + LOD,

where the odds are for linkage. With 20 chromosomes, which we mightassume approx the same size, and not too long, the prior probability of tworandom loci being on the same chromosome and hence linked, is about1/20. In order to overcome these prior odds against linkage, and achievereasonable posterior odds, say 100:1, we would want a LOD of at least 3.

Page 36: Mapping mouse coat color genes

36

Linkage groups

And so it has come to pass that a LOD must be >3 to getpeople’s attention. We’ll be a little more precise later.

The next step is to define what are called linkage groups.These partition the markers into classes, every pair of markersbeing either closely linked (i.e. r ≈ 0), or being connected by achain of markers, each consecutive pair of which is closelylinked. In practice, we might define closely linked to besomething like

a) < c1, and b) LOD( ) > c2, where e.g. c1= 0.2, c2 = 3.

ˆ r

ˆ r

Page 37: Mapping mouse coat color genes

37

Forming linkage groups, cont.

When one tries to form linkage groups, it is not unusual to haveto vary c1 and c2 a little, until all markers fall into a group ofmore than just one marker. When this is done, it is hoped thatthe linkage groups correspond to chromosomes. If thechromosome number of the species is known, and thatcoincides with the number of linkage groups, this is areasonable presumption. But much can happen to dash thishope: one may have two linkage groups corresponding todifferent arms of the same chromosome, and not know that;one can have a marker at the end of one chromosome “linked”to a marker at the end of another chromosome, though thisshould be rare if there is plenty of data; and so on.

Page 38: Mapping mouse coat color genes

38

Ordering linkage groups

Next we want to order the markers in a linkage group( ideally,on a chromosome). How do we do that? An initial ordering canbe done by starting one of the markers, M1 say, on the mostdistant pair, here distance being recombination fraction, or mapdistance. Call M2 the closest marker to M1 and continue in thisway.

Now we want to confirm our ordering. One way is to calculate a(maximized) log likelihood for every ordering, and select theone with the largest log likelihood. But if we have (say) 11markers on a chromosome, this is 11! = 4×107 orders. Whatpeople often do is take moving k-tuples of markers, andoptimize the order of each, e.g. with k = 3 or 4. Whicheverstrategy one adopts, multi (i.e. >2) locus methods are needed.

Page 39: Mapping mouse coat color genes

39

Likelihoods for 3-locus data Suppose that we have 3 markers M1 , M2 and M3 in that order. How do we

calculate the log likelihood of the associated 3-locus marker data from ourintercross?

Recalling the discussion preceding the Punnett square of the last lecture,the parental haplotypes here are a1a2a3 and b1b2b3 while are would nofewer than 6 forms of recombinant haplotypes:

the four single recombinants a1a2b3 , a1 b2 b3 , b1b2a3 and b1a2a3 ,and the two double recombinants a1b2 a3 and b1a2b3 .

Proceeding as before, we calculate the probability of each of these in termsof the recombination fractions r1 and r2 across intervals M1-M2, and M2-M3,respectively. For simplicity, we assume the Poisson model, withindependence of recombination across disjoint intervals. For example,a1a2a3 would have probability (1- r1)(1- r2)/4, a1a2b3 would have probability(1- r1)r2/4, while a1b2 a3 would have probability r1r2 .

We would do this for every one of the 8 paternal and 8 maternal haplotypes,and then collect them up to assign probabilities for each of the 33 3-locusgenotypes (AAA, AAH, …, BBB), and maximize the multinomial likelihood inthe parameters r1 and r2 . This is just as in the 2-locus case.

Page 40: Mapping mouse coat color genes

40

Multilocus linkage: #loci >3

It should have become clear by now that the strategy justoutlined is not going to work too easily when there are (say) 11loci in a linkage group.

In that case, haplotypes are strings of the form a1a2b3 … a10b11 ,where there are just 2 parental and 210-2 distinct recombinanthaplotypes. The number of parental haplotype combinations isthe square of this number, and they must be mapped into 311

11-locus genotypes, and a multinomial MLE carried out toestimate 10 recombination fractions. What can be done?

In 1987 the first large scale human genetic map was published,and at the same time a new algorithm was announced for bothhuman pedigrees and experimental crosses, such as ourintercross. This algorithm made use of hidden Markov models,and for the first time allowed full likelihood calculations in ourcurrent context without the exponential blow-up just described.

Page 41: Mapping mouse coat color genes

41

Multilocus mappingHere we show how using Rabiner’s notation we can get an HMM.

Then we calculate our probabilities via the forward algorithm.Note that in our case, the Markov chain is non-stationary: it has

different states and transition probabilities from time (here locus)to time (locus). For simplicity, we omit the locus subscript.

State space: {aa, ab, ba, bb} = {a,b}×{a,b}.Transition probabilities: P⊗P (Kronecker product), where P is

Note: using states {A, H, B} won’t work. Why?Observation set: {A, H, B,C, D, -}. Here C = not A, D = not B.Emission probabilities: here just the obvious ones, e.g.

pr(emit A | aa ) = pr (emit D | aa) = 1.Initial probabilites: πi all 1/4.

1− r rr 1− r

Page 42: Mapping mouse coat color genes

42

Multilocus mapping, cont.

I’m not going to cover this topic in any more detail this year, as Idiscussed it a few years ago, and those interested can read itthere:

www.stat.berkeley.edu/users/terry/Classes/s260.1998/index.html

We use the HMM forward algorithm on each mouse’s data, oneby one, and multiply to get the likelihood, just as we describedlast week for the backcross. In practice we take logs, and needsome tricks to deal with underflow. Parameter estimation canalso be dealt with using a different HMM formula.

Now suppose that we have ordered our marker loci as justdescribed, either by maximizing the likelihood within linkagegroups over all orders, or by doing so in moving windows ofsize 3-5. How do we look at the result?

Page 43: Mapping mouse coat color genes

43

Checking the map, afterremoval of bad markers

est.rf, plot.rf (from an R package)

Top triangle is atransform of therecombinationfraction, namely-4(1+log2r ).Bottom trianglecontains theLOD scores atthe maximumlikelihoodestimate ofrecombinationfraction.Notice the “bad”bits in the top LHand bottom RHcorners.

Page 44: Mapping mouse coat color genes

44

Checking existing genetic maps

As indicated earlier, the markers in our cross came from MIT,and they were already mapped. Most researchers wouldsimply use the pre-existing map, as this would usually (but notalways) be based on many more recombinations than could beexpected in a single cross. Why might we not just do the same?

Well, existing maps are rarely completely error-free, and oneshould always look at one’s own data.

An added benefit of looking at one’s own data in relation to anexisting map is that this should bring to light markers with alarge numbers of genotyping errors, assuming the map iscorrect.

Page 45: Mapping mouse coat color genes

45

Interplay between errordetection and maps

• Genotyping errors in mouse crosses can usually only bedetected with the appearance of unusual numbers ofclose recombination events

• This depends entirely on the quality of the map• The availability of the mouse genome sequence allows

us to check genetic maps against the physical maps: welocate the (unique) PCR primers for our microsatellitemarkers. This has brought a new era in quality of maps(includes human genetic maps!).

The next slide depicts the genetic map we used.

Page 46: Mapping mouse coat color genes

46

Locations of our markers

After a commercial, we move on to mapping coat color genes.

Page 47: Mapping mouse coat color genes

47

RR

Page 48: Mapping mouse coat color genes

48

R/R/qtlqtl

Authors: Karl Broman, Hao Wu, Gary Churchill, Saunak Sen, & Brian Yandell

Page 49: Mapping mouse coat color genes

49

Benefits of using R/qtl

• Lots of graphics• Good error detection with accompanying graphics• Single and two qtl mapping (and interaction terms)• Choice of several input formats

– Includes Mapmaker format• Many alternatives for mapping methods• Many different models for phenotypes, e.g.

standard normal, nonparametric model, binarytraits

Page 50: Mapping mouse coat color genes

50

Why map coat color genes in ourC57/BL6 x NOD F2 intercross?

• the locations of these genes are known• even with a modest number of mice we should be able to

map these genes easily• it is a useful check that everything is as it should be with

our data• and finally, it is a good exercise for us.

Exercise. Look up the agouti and albino loci at the MouseGenome Informatics database.

Page 51: Mapping mouse coat color genes

51

Recall our earlier Punnett square

Page 52: Mapping mouse coat color genes

52

Segregation data at a “random” marker

Phenotype by genotype at D12Mit51(complete data only)

A B H Agouti 19 18 35 Black 8 3 18 White 9 7 12

Page 53: Mapping mouse coat color genes

53

Mapping a segregating trait We turn now to mapping the two coat color genes segregating in

our cross, beginning with the albino locus, and then the agoutilocus. To do so, we need a genetic model, that is, we need toknow or guess the relation between genotypes at our trait lociand phenotypes, which is embodied in the notion of apenetrance function.

Looking at the preceding table, the albino trait segregates justas though governed by a recessive gene, so we postulate alocus with a recessive and a dominant allele for it. Although thisis not precisely the case for the non-agouti trait, it is almost, andwe do likewise.

Later we will consider their interaction.

Page 54: Mapping mouse coat color genes

54

Probabilities of albino-marker genotypes (×4)

Recall that the NOD mouse (A) is homozygous for the albinoallele, while the C57/BL6 (B) is homozygous for the non-albinoallele. We can collapse an earlier table to get (×4)

1-r22 - 2r(1-r)1-(1-r)2Full colorr22r(1-r)(1-r)2AlbinoBHAColour M

Here r is the rec. fr. between a marker and the albino locus.

Page 55: Mapping mouse coat color genes

55

Segregation data at themarker closest to Tyrc

Phenotype by genotype at D7Mit126 @ 50 cM (the Tyrc locus is at 44 cM)

A B H Agouti 3 19 47 Black 0 10 19 White 21 0 1

Page 56: Mapping mouse coat color genes

56

Plot of LOD score at each marker along the genome

Mapping the albino locus

Page 57: Mapping mouse coat color genes

57

Chromosome 7 genotypes for the albino mice.

Pale blue shading is conserved NOD haplotype.D7Mit128 is near the Tyrc locus,

A: homozygous NOD, B: homozygous B6,H: heterozygote. Genotypes are read down.

Page 58: Mapping mouse coat color genes

58

Approximate probabilities ofagouti-marker genotypes (×4)

Recall that the C57/BL6 (B) is homozygous for non-agouti,while the NOD (A) is homozygous agouti. Ignoring the 1/16 ofthe intercross who would exhibit the non-agouti trait (and beblack) if they weren’t albino, we get the following approximatetable, where 1/16 of the mice will be misclassified. Here r is therecombination fraction between a marker and the agouti locus.

(1-r)22r(1-r)r2Black1- (1-r)22-2r(1-r)1-r2Non-blackBHAColour M

Page 59: Mapping mouse coat color genes

59

Segregation data at the markerclosest to the agouti locus

Phenotype by genotype at D2Mit48 @ 87 cM (agouti locus is at 89 cM)

A B H Agouti 24 2 46 Black 0 28 1 White 5 6 14

Page 60: Mapping mouse coat color genes

60

Mapping the agouti locus

Plot of LOD score at each marker along the genome

Page 61: Mapping mouse coat color genes

61

Chromosome 2 genotypes for the black progeny.

Mauve shading indicates conserved C57/BL6 haplotype.Marker D2Mit48 is very close to the agouti locus.

Page 62: Mapping mouse coat color genes

62

Conclusion: single locus mappingConclusion: single locus mapping• agouti locus (A,a alleles) on Chr 2 at 89.9 cM• albino locus (C,c alleles) on Chr 7 at 44 cM

(now known as Tyrc gene)• In the data set:

– at 89 cM on Chr 2 with a LOD score > 20• Marker D2M48 (8th marker on Chr 2)

– at 43 cM on Chr 7 with a LOD score > 20• Marker D7M126 (4th marker on Chr 7)

The method worked for agouti, even though1/16th of the mice were misclassified

Page 63: Mapping mouse coat color genes

63

Acknowledgement

This lecture would not have been possible without thevery substantial input of Melanie Bahlo and TomBrodnicki of the Walter & Eliza Hall Institute of MedicalResearch (WEHI), Melbourne Australia.

Tom (together with people from the WEHI mousefacility) carried out the cross, and did all thephenotyping, while Melanie did all the data analysispresented, and contributed a lot to the presentation.Overall, responsibility for the presentation (especiallyall the errors!) remains mine.