58305301 research seminar on algorithms: sums of products ... · l elston, r. ja stewart, j., a...
TRANSCRIPT
58305301 Research Seminar onAlgorithms: Sums of Products
Elston-Stewart algorithm
Tero Hiekkalinna8.11.2005
Papersl Elston, R. ja Stewart, J., A general model for the
genetic analysis of pedigree data. Human Heredity,21,6(1971)
l Exact Genetic Linkage Computations for GeneralPedigrees. Fishelson M. and Geiger D. Bioinformatics,2002; 18 Suppl. 1: S189-S198.
l M. Fishelson and D. Geiger: Optimizing exact geneticlinkage computations. RECOMB'03.
Introductionl Humans have 22
autosomal chromosomepairs and one sexchromosome pair (Male:X/Y, Female: X/X)
l Each pair ofchromosomes containsone paternal andmaternal chromosome
We get half of the genesfrom father and half frommother!
Genetic markerl Well known position on genomel Microsatellite
• (CA)n-repeats (cytosine ja adenosine basepair) in DNAsequence
• Tens of thousands in genome• Repeat sequence length < 150 basepairs• Repeats lengths different between people
l Also others: Minisatellites, SNPs (Single NucleotidePolymorphism)
(CA)8 : 5’-CACACACACACACACA-3’(CA)6 : 5’-CACACACACACA-3’
Linkage analysis
l Linkage analysis method is used for mappingdisease predisposing genes in families
l Co-segregation of disease locus and geneticmarker locus is statistically tested• Estimating recombination fraction (genetic distance
between disease locus and marker)• Maximum likelihoods methods
• L( )=P(data| )
Linkage analysis - why?
l Identify position of the disease locus ongenome
l Identify gene on the regionl What gene does or doesn’t do?
• Problem in protein coding?
l Can we help the patients?l Genetic counseling
Linkage analysis
l In typical genome-wide linkage mappingstudy using microsatellites with hundredsof multigenerational pedigrees, eachindividual is sampled over 350 geneticmarkers from all chromosomes
l It’s impossible to analyze this amount ofdata by “eye”
Example of pedigreeSymbols
Male
Male with disease
Female
Female with disease
Example of multigenerational family
Person 10 has alleles 1and 2. Pair of alleles iscalled genotype
101/2
Likelihood function
l Likelihood function for family with n individuals(f = founder) can be expressed in as a multiplesum of products (penetrance, population- andtransmission parameters):
Likelihood function
l There is n summations and each indexed over allpossible ordered genotypes (G) of a pedigree member• Ordered genotype means that source of allele is known (i.e. from
father or mother)l If each member of the pedigree has G possible ordered
genotypes, then pedigree with n members has Gn
ordered genotype combinationsl Each genotype combination is associated with n
penetrance and n population/transmission parameters.l Procedure therefore requires Gn(2n-1) multiplications
followed by Gn-1 summations
Example: number of markersand allelesl Genetic marker with two alleles A and B, then possible
ordered genotypes is G = 22=4 and if pedigree has 4members, then possible ordered genotypes inpedigree is G=(22)4=256
BBABBAAA
FatherMother
B/BB/A
A/B
A/A
l Two markers with two alleles: G=((2*2)2)4=65536l Three markers with two alleles: G=((2*2*2)2)4=167777216
Example: number of personsl Genetic marker with two alleles A and B and with 4
pedigree members, then possible ordered genotypes inpedigree is G=(22)4=256
• 5 members: G=(22)5=1024• 6 members: G=(22)6=4096• 7 members: G=(22)7=16384• 10 members: G=(22)10=1048576
l G is quite large even with small numbers ofmarkers and pedigree members
Elston-Stewart algorithm
l Each factor in the product is indexedby the genotypes of threeindividuals, offspring and two parents
l Pedigree is number of nuclearfamilies linked together with certainindividuals
Pedigree can be analyzed onenuclear family at a time!
101/2
Elston-Stewart algorithm
l The likelihood function for nuclear family with Kchildren
l Offsprings are independent, conditional on parentalgenotypes
l Computational time requirement is now linear whenadding new people into the pedigree!
Elston-Stewart algorithm
l Number of genotype combinations canbe eliminated• Eliminate impossible genotypes
• Example: Offspring genotypes are known, butsecond parent is unknown unknown parentsgenotypes can be listed using spouse and offspringgenotypes
• Using phenotype• Example: ABO blood group: If person’s blood
group is O, then only possible genotype is O/O
Elston-Stewart algorithm
l Start bottom of the pedigree:• Calculate conditional probabilities
for person II-2, using persons III-1, III-2 and II-1
• Calculate conditional probabilitiesfor person II-3, using persons III-3 and II-4
• Calculate conditional probabilitiesfor person I-1 and I-2, usingpersons II-2 and II-3
l Then overall pedigree likelihood issum of all nuclear family likelihoods!
I:
II:
III:
Generation
Elston-Stewart algorithm
l Original 1971 algorithmcouldn’t handle loops
l Method for allowingloops
Persons 8 and 9 are same individual!Algorithm is in infinite loop!
Elston-Stewart algorithm
l Pros• Can handle very large pedigrees (linear
computational time with increase of people)
l Cons• Only few markers can be analyzed jointly in
multipoint analysis (exponential computationaltime with increase of markers)
Superlink – basic ideas
l Bayesian networks used for presentinglinkage analysis problems
l Uses Elston-Stewart and/or Lander-Green-algorithms to calculate pedigreelikelihood• If big pedigree and few marker Elston-Stewart• If medium size pedigree and many markers Lander-Green• Or combination of these algorithms
Bayesian networkl Random variables
• Genetic loci• Phenotypes• Selector variables
• Inheritance patternsl Local probability tables
• Transmission models• Penetrance models• Recombination models• Population
allele/genotypeprobabilities
Parent 1 Parent 2
Child
Variable elimination
l 1st step• Graph presentation of pedigree. Nodes of the
graph are people in the pedigree and edgespresent parent relations. Genotypes ofindividual depends of genotypes of relatives• Downward-, upward- and selector updates
l 2nd step• Entries in probability table where variable
equals 0 are invalid
UpdatesDownward Upward
Selector
Other eliminations
l Variable trimming• If individuals affection status is unknown, phenotype
variable can be trimmed• Founders selector variables can be trimmed, since no
information about phase
l Merging variables• Unknown phase: If two possible genotypes only differ
in phase, then they have same probability• Recombination events in children cannot be identified,
then selector variables can be eliminated
Variable elimination order
l Small pedigree, many loci:• Elimination locus by locus
l Big pedigree, few loci:• Elimination one nuclear family at a time
l Greedy heuristics• Each variable is assigned with an elimination
cost and chooses to eliminate the variablewith smallest cost
Superlink
l Careful variable elimination reduceslikelihood calculation time• Select best algorithm for the job
l Saves required memoryl More complex pedigrees can be
analyzed
Extra referencesl Sham, P., Statistics in Human Genetics. Arnold (Hodder
Headline Group), London, 1998.l Strachan, T. ja Read, A., Human Molecular Genetics, Third
Edition. BIOS Scientic Publishers Ltd, Oxford, UK, 2003.l Lange K, Elston RC., Extensions to pedigree analysis I.
Likehood calculations for simple and complex pedigrees. HumHered. 1975;25(2):95-105.
l FASTLINK 4.1P documentation(http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/fastlink.html)
l Lander, E. ja Green, P., Construction of multilocus geneticlinkage map in humans. Proceedings of the National Academyof Sciences, USA,84,8(1987)
Appendixl Lander-Green-algorithm
• Uses inheritance vectors• Proceeds locus after locus (vs. Elston-Stewart proceeds nuclear
family at a time)• Pros
• Can handle many markers in multipoint analysis (linear computationaltime with increase of markers)
• Cons• Can handle only medium size nuclear families (exponential
computational time with increase of people (non-founders))• Does not account for interference