58305301 research seminar on algorithms: sums of products ... · l elston, r. ja stewart, j., a...

27
58305301 Research Seminar on Algorithms: Sums of Products Elston-Stewartɔalgorithm Tero Hiekkalinna 8.11.2005

Upload: others

Post on 23-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

58305301 Research Seminar onAlgorithms: Sums of Products

Elston-Stewart algorithm

Tero Hiekkalinna8.11.2005

Page 2: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Papersl Elston, R. ja Stewart, J., A general model for the

genetic analysis of pedigree data. Human Heredity,21,6(1971)

l Exact Genetic Linkage Computations for GeneralPedigrees. Fishelson M. and Geiger D. Bioinformatics,2002; 18 Suppl. 1: S189-S198.

l M. Fishelson and D. Geiger: Optimizing exact geneticlinkage computations. RECOMB'03.

Page 3: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Introductionl Humans have 22

autosomal chromosomepairs and one sexchromosome pair (Male:X/Y, Female: X/X)

l Each pair ofchromosomes containsone paternal andmaternal chromosome

We get half of the genesfrom father and half frommother!

Page 4: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Genetic markerl Well known position on genomel Microsatellite

• (CA)n-repeats (cytosine ja adenosine basepair) in DNAsequence

• Tens of thousands in genome• Repeat sequence length < 150 basepairs• Repeats lengths different between people

l Also others: Minisatellites, SNPs (Single NucleotidePolymorphism)

(CA)8 : 5’-CACACACACACACACA-3’(CA)6 : 5’-CACACACACACA-3’

Page 5: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Linkage analysis

l Linkage analysis method is used for mappingdisease predisposing genes in families

l Co-segregation of disease locus and geneticmarker locus is statistically tested• Estimating recombination fraction (genetic distance

between disease locus and marker)• Maximum likelihoods methods

• L( )=P(data| )

Page 6: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Linkage analysis - why?

l Identify position of the disease locus ongenome

l Identify gene on the regionl What gene does or doesn’t do?

• Problem in protein coding?

l Can we help the patients?l Genetic counseling

Page 7: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Linkage analysis

l In typical genome-wide linkage mappingstudy using microsatellites with hundredsof multigenerational pedigrees, eachindividual is sampled over 350 geneticmarkers from all chromosomes

l It’s impossible to analyze this amount ofdata by “eye”

Page 8: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Example of pedigreeSymbols

Male

Male with disease

Female

Female with disease

Example of multigenerational family

Person 10 has alleles 1and 2. Pair of alleles iscalled genotype

101/2

Page 9: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Likelihood function

l Likelihood function for family with n individuals(f = founder) can be expressed in as a multiplesum of products (penetrance, population- andtransmission parameters):

Page 10: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Likelihood function

l There is n summations and each indexed over allpossible ordered genotypes (G) of a pedigree member• Ordered genotype means that source of allele is known (i.e. from

father or mother)l If each member of the pedigree has G possible ordered

genotypes, then pedigree with n members has Gn

ordered genotype combinationsl Each genotype combination is associated with n

penetrance and n population/transmission parameters.l Procedure therefore requires Gn(2n-1) multiplications

followed by Gn-1 summations

Page 11: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Example: number of markersand allelesl Genetic marker with two alleles A and B, then possible

ordered genotypes is G = 22=4 and if pedigree has 4members, then possible ordered genotypes inpedigree is G=(22)4=256

BBABBAAA

FatherMother

B/BB/A

A/B

A/A

l Two markers with two alleles: G=((2*2)2)4=65536l Three markers with two alleles: G=((2*2*2)2)4=167777216

Page 12: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Example: number of personsl Genetic marker with two alleles A and B and with 4

pedigree members, then possible ordered genotypes inpedigree is G=(22)4=256

• 5 members: G=(22)5=1024• 6 members: G=(22)6=4096• 7 members: G=(22)7=16384• 10 members: G=(22)10=1048576

l G is quite large even with small numbers ofmarkers and pedigree members

Page 13: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Elston-Stewart algorithm

l Each factor in the product is indexedby the genotypes of threeindividuals, offspring and two parents

l Pedigree is number of nuclearfamilies linked together with certainindividuals

Pedigree can be analyzed onenuclear family at a time!

101/2

Page 14: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Elston-Stewart algorithm

l The likelihood function for nuclear family with Kchildren

l Offsprings are independent, conditional on parentalgenotypes

l Computational time requirement is now linear whenadding new people into the pedigree!

Page 15: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Elston-Stewart algorithm

l Number of genotype combinations canbe eliminated• Eliminate impossible genotypes

• Example: Offspring genotypes are known, butsecond parent is unknown unknown parentsgenotypes can be listed using spouse and offspringgenotypes

• Using phenotype• Example: ABO blood group: If person’s blood

group is O, then only possible genotype is O/O

Page 16: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Elston-Stewart algorithm

l Start bottom of the pedigree:• Calculate conditional probabilities

for person II-2, using persons III-1, III-2 and II-1

• Calculate conditional probabilitiesfor person II-3, using persons III-3 and II-4

• Calculate conditional probabilitiesfor person I-1 and I-2, usingpersons II-2 and II-3

l Then overall pedigree likelihood issum of all nuclear family likelihoods!

I:

II:

III:

Generation

Page 17: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Elston-Stewart algorithm

l Original 1971 algorithmcouldn’t handle loops

l Method for allowingloops

Persons 8 and 9 are same individual!Algorithm is in infinite loop!

Page 18: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Elston-Stewart algorithm

l Pros• Can handle very large pedigrees (linear

computational time with increase of people)

l Cons• Only few markers can be analyzed jointly in

multipoint analysis (exponential computationaltime with increase of markers)

Page 19: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Superlink – basic ideas

l Bayesian networks used for presentinglinkage analysis problems

l Uses Elston-Stewart and/or Lander-Green-algorithms to calculate pedigreelikelihood• If big pedigree and few marker Elston-Stewart• If medium size pedigree and many markers Lander-Green• Or combination of these algorithms

Page 20: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Bayesian networkl Random variables

• Genetic loci• Phenotypes• Selector variables

• Inheritance patternsl Local probability tables

• Transmission models• Penetrance models• Recombination models• Population

allele/genotypeprobabilities

Parent 1 Parent 2

Child

Page 21: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Variable elimination

l 1st step• Graph presentation of pedigree. Nodes of the

graph are people in the pedigree and edgespresent parent relations. Genotypes ofindividual depends of genotypes of relatives• Downward-, upward- and selector updates

l 2nd step• Entries in probability table where variable

equals 0 are invalid

Page 22: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

UpdatesDownward Upward

Selector

Page 23: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Other eliminations

l Variable trimming• If individuals affection status is unknown, phenotype

variable can be trimmed• Founders selector variables can be trimmed, since no

information about phase

l Merging variables• Unknown phase: If two possible genotypes only differ

in phase, then they have same probability• Recombination events in children cannot be identified,

then selector variables can be eliminated

Page 24: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Variable elimination order

l Small pedigree, many loci:• Elimination locus by locus

l Big pedigree, few loci:• Elimination one nuclear family at a time

l Greedy heuristics• Each variable is assigned with an elimination

cost and chooses to eliminate the variablewith smallest cost

Page 25: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Superlink

l Careful variable elimination reduceslikelihood calculation time• Select best algorithm for the job

l Saves required memoryl More complex pedigrees can be

analyzed

Page 26: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Extra referencesl Sham, P., Statistics in Human Genetics. Arnold (Hodder

Headline Group), London, 1998.l Strachan, T. ja Read, A., Human Molecular Genetics, Third

Edition. BIOS Scientic Publishers Ltd, Oxford, UK, 2003.l Lange K, Elston RC., Extensions to pedigree analysis I.

Likehood calculations for simple and complex pedigrees. HumHered. 1975;25(2):95-105.

l FASTLINK 4.1P documentation(http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/fastlink.html)

l Lander, E. ja Green, P., Construction of multilocus geneticlinkage map in humans. Proceedings of the National Academyof Sciences, USA,84,8(1987)

Page 27: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)

Appendixl Lander-Green-algorithm

• Uses inheritance vectors• Proceeds locus after locus (vs. Elston-Stewart proceeds nuclear

family at a time)• Pros

• Can handle many markers in multipoint analysis (linear computationaltime with increase of markers)

• Cons• Can handle only medium size nuclear families (exponential

computational time with increase of people (non-founders))• Does not account for interference