3/23/[email protected] sequencing shrna libraries with dna sudoku yaniv erlich hannon lab yaniv...
TRANSCRIPT
3/23/09 [email protected] shRNA libraries with DNA Sudoku
Yaniv Erlich Hannon Lab
Yaniv Erlich Hannon Lab
Compressed Genotyping
Cold Spring Harbor Laboratory
Poster in a nutshell• Genotyping is the process of determining the genetic variation for a certain trait in an individual.
• It is one of the main diagnostic tools in medical genetics- Finding carriers for rare genetic diseases such as Cystic Fibrosis- Tissue matching in organ donation- Forensic DNA analysis
• Until now - only serial genotyping is possible. This is expensive and tedious.
• Taking advantage on the ‘signal sparsity’, we developed and tested a compressed genotyping framework.
3/23/09 [email protected] shRNA libraries with DNA Sudoku
Significant volumes of knowledge have been accumulated in recent years linking subtle genetic variations to a wide variety of medical disorders from cystic fibrosis to mental retardation. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, largely due to the relatively tedious and expensive process of DNA sequencing. Since the genetic polymorphisms that underlie these disorders are relatively rare in the human population, the presence or absence of a disease-linked polymorphism can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective reconstruction protocol, called "DNA Sudoku", to retrieve useful data. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies, and assembled a mathematical framework that has some important distinctions from 'traditional' compressed sensing ideas in order to address different biological and technical constraints.
Abstract
Genotyping as a sparse graph reconstruction
Samples
Alleles
An example of carrier screen for Cystic Fibrosis. There are two allele nodes, the Wild Type (WT) and the and the Cystic Fibrosis mutation. Samples 1, 2, 3, 5 are WT, while specimen 4 is a carrier. The specimen labeled with ’X’ is affected and does not enter to the screen. Genotyping is equivalent of finding the edges in the graph.
THE GRAPH IS SPARSE 1.Number of carriers is very low2.No affected individuals3.The degree of every sample node is always two (human genome is diploid)
Genotyping is equivalent to reveal the edges of the bipartite graph
3/23/09 [email protected] shRNA libraries with DNA Sudoku
The main idea – pooled processing
One could reveal the graph edges by DNA sequence each sample
- expensive, tedious, and slow
Better:
Pool the samples and then sequence the pools
3/23/09 [email protected] shRNA libraries with DNA Sudoku
Allele
AllelePool
What the observer sees
The biadjacency matrix of the graph
What the observer wants
The pooling design
A binary matrix (‘1’ – in the pool, ‘0’ – otherwise)
Mathematically speaking
Pool
Specimen
Specimen
0 2
0 2
0 2
11
0 2
1 0 1 1 1
1 1 0 1 0
1 1 0 0 1
1 7
1 5
0 6
What is a good pooling design
Attribute WhyDecodability
Small number of pools Less genotyping assays
Constant column weight The robot can pull several specimens every step
Low column weight Less robotics efforts
Low row weight Reducing the chance for biological noise
Trivial compressed sensing demands
Biological oriented requirements
We need a light-weight d-disjunct matrix
Inputs: N (number of specimens)
Column Weight (robotics efforts)
Algorithm:
1. Find W numbers {x1,x2,…,xw} such that:
(a) Bigger than
(b) Pairwise coprime
2. Generate W modular equations:
3. Construct the pooling matrix upon the modular equations
Output: Pooling matrix
Light Chinese Design
N
)(mod
)(mod 1
WxPoolSpecimen
xPoolSpecimen
The algorithm reaches the bound derived by Kautz & Singleton (1964)
Decoding the genotyping results by Belief Propagation
The pooled results can be decoded as using Belief Propagation
SpecimensPools
Genotyping results
A-priori biological informati
on
03/06/09
Example of Belief Propagation
Specimens
Pools
Specimen is in a pool
#1
#2
#3
#4
#5
#6
#7
CBA D
CBA D
CBA D
CBA D
CBA D
CBA D
CBA D
DCA
ACB
CBA
CDB
1.You can be either A, C, or D
Possible genotypes:
2. I can’t be B
3.Specimen #3, #6 and #7: One of you guys
should be B
CBA D
CBA D
CBA D
References & Acknowledgments
• Compressed Genotyping. Yaniv Erlich, Assaf Gordon, Michael Brand, Gregory J. Hannon & Partha P. Mitra. Submitted to IEEE Trans. Info. Theory. 2009.
• DNA Sudoku - harnessing high-throughput sequencing for multiplexed specimen analysis. Yaniv Erlich, Kenneth Chang, Assaf Gordon, Roy Ronen, Oron Navon, Michelle Rooks & Gregory J. Hannon. Genome Research. 2009.
Lindsay-Goldberg Fellowship