guy grebla1 allegro, a new computer program for linkage analysis guy grebla
DESCRIPTION
Guy Grebla 3 What is Allegro Allegro is based on Genehunter. Allegro runs faster than Genehunter due to algorithmic improvements.TRANSCRIPT
Guy Grebla 1
Allegro,A new computer program for linkage analysisGuy Grebla
Guy Grebla 2
Overview
What is Allegro Allegro vs. Genehunter Reduced inheritance vectors Founder couple reduction Fast tree traversal
Formalization Calculation of Spairs
Single locus probability calculation (if time permits)
Guy Grebla 3
What is Allegro
Allegro is based on Genehunter.
Allegro runs faster than Genehunter due to algorithmic improvements.
Guy Grebla 4
Allegro vs. Genehunter(1)
Allegro runs much faster than Genehunter, typically the speedup is 20-40 fold, and in many cases as high as 100 fold.
If necessary, Allegro is capable, at a cost of 10-30% in run time, to cut down the memory requirements by a factor of 20-60 compared with Genehunter.
Guy Grebla 5
Allegro vs. Genehunter(2)
Recall that the time complexity of Genehunter is exponential in the pedigree’s size, therefore it is infeasible to run Genehunter with large pedigree’s size.
Due to the algorithmic improvements, Allegro is capable of handling significantly larger pedigrees (even though its time complexity is still exponential in the pedigree’s size).
Guy Grebla 6
Reduced inheritance vectors – the idea The idea is based on symmetry that exists
between the two alleles of a founder.
1 0
V=(0,1,1,0)
0 1
0 0
V=(1,1,0,0)
1 1
n1 n2
Guy Grebla 7
Reduced inheritance vectors
For male (female) founder, the corresponding paternal (maternal) bit of his (her) first child is set to 0 and not expressed in the reduced vector (it is called hidden).
Result: let m be the number of non-founders, f the number of founders, the vector size is reduced to 2m-f
Guy Grebla 8
Reduced inheritance vectors (Cont.)n1 n2 n3 n4
n5 n6
a / b
[0 0]
a / b
1 1
a / b
1] 0[
a / c
b / c0 1
І
ІІ
ІІІ
Guy Grebla 9
Founder couple reduction
Consider a couple of founders which: Have at least one grandchild Both not genotyped Aren’t married twice
Guy Grebla 10
Founder couple reduction (Cont.) v* is like v but :
Invert the corresponding bit of each of the grandchildren.
The paternal and maternal bit of each child are switched
n1 n2 n3 n4
1] 0[
a / c
0 1
Corresponding bit
v and v* has the same probability
Guy Grebla 11
Founder couple reduction - results With the founder couple reduction, the
effective number of bits is 2m-f-c where c is the number of founder couples satisfying the stated conditions.
Therefore, we’ve improved by a factor of 2c over the previous reduction.
Guy Grebla 12
Fast tree traversal
The basic structure of the algorithms implemented in the Genehunter program loops over inheritance vectors in the outermost loop and over people in the pedigree in an inner loop
Drawback: for vectors that only differ for branches of the pedigree, part of the calculation will be duplicated.
Guy Grebla 13
Fast tree traversal (Cont.)
Idea: changing the order of looping to avoid the repeated calculations.
Guy Grebla 14
Fast tree traversal – naïve example Say we want to calculate for each vector v of
length n, the number of 1’s in v.
“Genehunter” method: for each vector calculate the number of 1’s.(add each bit of the vector to the sum)
“Allegro” method: pass the vectors and save calculations along the way.
Guy Grebla 15
naïve example – Allegro method
0
0
0 1
1
1 2
0
0 1 0 1
1
Less additions!
Guy Grebla 16
Fast tree traversal - formalization For each inheritance vector v, S(v) is known. We traverse the pedigree from the top down. When a child is born:
If it has i hidden bits – 22-i possibilities for its bits For each possibility the inheritance vector is appropriately
updated and the branch is descended We add a bit b to update vector v to v+ D(v) is a collection of data N=22m-f - number of possible inheritance vectors
Guy Grebla 17
Fast tree traversal - formalization(2)
Recursive algorithm:
addbit(v, D, b):for b = 0, 1 do
set v+ = (v,b) and calculate D+ = D(v+)if there are more bits, addbit(v+,D+, next bit) ,
else D+ contains data for s(v+)If the calculation of D+ and s are both O(1) then the total time complexity of the calculation is O(N)
Guy Grebla 18
Example – calculation of Spairs
Øij(p,q)= 1 if allele i of p and allele j of q are IBD and 0 otherwise
Spq(v) = ∑1i=0∑1
j=0Øij(p,q)
Spairs(v) = ∑(p,q) is a pair of affecteds Spq(v)
ki- the number of times founder allele i turns up among the affected.
s – the value of Spairs for the traversed portion
D = (s,k1,k2,…,k2f)
Guy Grebla 19
Example (Cont.)
When an unaffected person is added, do nothing (s+=s, ki
+=ki , kj
+=kj) When an affected person is added, perform:
s+ s + ki + kj
ki+ ki + 1
kj+ kj + 1
Guy Grebla 20
Example (Cont.)n1 n2 n3 n4
n5 n6
a / b
[0 0]
a / b
1 1
a / b
1] 0[
a / c
b / c
V=(0,1,1,1,1)
Init (no vector bits)
s=1, k1=1, k3=2, k4=1
ІІІ1 is added
s=2, k1=1, k3=2, k4=2, k5=1
ІІІ2 is added
s=4, k1=1, k3=2, k4=3, k5=1,k6=1
0 1
І
ІІ
ІІІ
Guy Grebla 21
Spairs calculation – Genehunter vs. Allegro Genehunter calculates Spairs by calculating Spq
for each affected pair, and add it to Spairs
This process requires O(Nα2) where α is the number of affected.
We saved a factor of α2 (!)
Guy Grebla 22
Additional improvements
Allegro use FFT for matrices multiplication, some classical computational techniques have been used to speed the FFT by a factor of three or four.
Guy Grebla 23
References
“Fast multipoint linkage analysis and the program Allegro”, Daniel F.Gudbjartsson, Kristjan Jonasson, Michael L.Frigge, Augustine Kong
"Allegro, a new computer program for linkage analysis,"Gudbjartsson DF, Jonasson K, Frigge ML, Kong A. Nat Genet. 2000 May;25(1):12-3.
Guy Grebla 24
BACKUP
Guy Grebla 25
Single locus probability calculation Goal: compute Pr[ml | vl], at locus l for every
vector vl
marker data at this locus (evidence).
A certain inheritance vector.
Guy Grebla 26
Single locus probability calculation(Cont.) In general: p(ml | vl) = ∑aєP∏2f
i=1p(ai)where P is the set of possible allele assignments a=(a1,…a2f) to (n1,…,n2f)
This probability may be calculated for each vl using Fast tree traversal.
Denote p(ml | vl) as q(v)
Guy Grebla 27
Single locus probability - notations
n1 n2 n3 n4
n5 n6
a / b
[0 0]
a / b
1 1
a / b
1] 0[
a / c
b / c0 1
І
ІІ
ІІІ
Founder nodes
Assume our founder nodes are numbered, node ni is numbered i
Guy Grebla 28
Single locus probability – notations(2) Founder nodes are classified to 3 disjoint sets:
A – assigned nodes. E – contains edges – each edge is labeled with 2
distinct alleles. U – unassigned nodes.
ai – allele assigned to i (i єA)
Guy Grebla 29
Single locus probability - initialization Init:
E nodes of genotyped founders (edges). U rest of the founder nodes. A nil (empty) q(v) 0
Goal: build a founder graph. From the graph we can calculate q(v)
Guy Grebla 30
Single locus probability – algorithm When a person genotyped a / b is added:
The value of v (so far) determines the sources of the alleles of the person among the founders.
Denote the corresponding founders by i and j, and consider the edge (i,j).
Guy Grebla 31
Single locus probability – algorithm (2) 6 options for edge (i,j):
123
4
5
6
A U
E
ii
i
i
i
j
j
j
j
j
i
j
Guy Grebla 32
Single locus probability – case by case Case 1:
Put (i,j) in E, remove i,j from U
Case 2: check whether {a,b} = {ai,aj}
Case 3: Check if ai is one of a and b, and if it
is, assign the other to aj , and move j from U to A
Guy Grebla 33
Single locus probability – case by case(2) Case 4:
Check if ai is one of a and b Check if the other one is consistent
with the labeling of an edge (j,k) in E and if it’s consistent force the assignment
Cases 5,6: May need another loop. Set ai=a, aj=b, check and handle
consistency Set ai=b, aj=a, check and handle
consistency
Guy Grebla 34
Single locus probability – algorithm(3) After the last bit of the vector was added, for
the probability calculation a product over the edges in E is needed:
Let (ae,be)єE
q(v) is updated by adding to it:∏i єA ∏ e єE2p(ae)p(be)