sequence alignment in dna under the guidance of : prof. kolin paul presented by: lalchand gaurav...
TRANSCRIPT
Sequence Alignment in DNA
Under the Guidance of :Prof . Kolin Paul
Presented By:Lalchand
Gaurav Jain
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Agenda
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study
Time-Line
Application Domain & Objective
To present an efficient implementation (Specially parallel) that effectively aids the problem of searching for short sequences in DNA.
• Analyzing Gene expression• Mapping variations between individuals• Mapping homologous Proteins• Assembling Genome of Organism
BWT
Algorithm
Mapper
Indexing
{ Location,Occurance}
ReadsSuffix Array :15GB for human
genome{3 billion * 4 B +3 GB genome}
Basic Alignment Procedure
BWT :Bwt[i] = Ref(SA[i]-1)
{3 GB }
To be parallelized
Parallelized
Intermediate size :10^18
Genome
O(logG)Searching
10
Scope of Parallelism in BWT
• With BWT , w length string can be find in O(w) time.• The BWT is closely related to the suffix array• Lexicographic sorted list of all suffixes in a genome.
• Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i) =1}
BWT
● Implementation of Bwt using Selection Sort– OpenMp
Initial Step - 1
0 100 200 300 400 500 600 700 800 900 10000
1000
2000
3000
4000
5000
6000
7000
Bwt Creation using Selection sort
Proc 1 Proc 2Proc 4Proc 8
File Size in KB
Tim
e in
Sec
onds
Selection Sort - Openmp
CPUCores 8
Data cache
L1 :32K L2 :6M
DRAM 12GB
Proc. Clock
2.9 GHz
● Implementation of Bwt using Selection Sort– OpenMp
● Implementation of Bwt using Quick Sort– OpenMp
Initial Step - 2
Quick Sort - Openmp
CPU StatisticsCores 8
Data cache
L1 :32K L2 :6M
DRAM 12GB
Proc. Clock
2.9 GHz
● Implementation of Bwt using Selection Sort– OpenMp
● Implementation of Bwt using Quick Sort– OpenMp
● Implementing Bwt on GPU– Bitonic sort
Initial Step - 3
Why Bitonic ??...
• Concatenations of two sub-sequences sorted in opposite directions – A cyclic shift of elements
• Implemented by comparator networks– Work in place– No Communication
• Naturally suitable for SIMD architectures– Each thread executing same code but different data
• O(log2n) time and O(nlog2n) work
18
Burrows-Wheeler Transform
5 $
A C G T A
4 A $ A C G T
3 T A $ A C G
2 G T A $ A C
1 C G T A $ A
0 A C G T A $
Input: A C G T A $
Output: A T $ A C G
Basic String Sorting Algorithm
indices: 0 1 2 3 4 5
5 $
A C G T A
4 A $ A C G T
0 A C G T A $
1 C G T A $ A
2 G T A $ A C
3 T A $ A C G
indices: 5 4 0 1 2 3
Steps Performed
• Copy Genome from host to device Memory • Indices Array for pointing Reference string• Compare Suffix based on indices array – Swap indices accordingly.
• Sorts n elements in log2n Kernel calls. – Each of O(1) time & O(n) work
• One more step for BWT from suffix array– Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i)= 1}
Cpu
Bitonic
Bitonic_sort_stepCuda_Memcpy & kernel call
CPU – GPU Interaction (BWT)
Suffix - > BWT
Genome
O(log2G)Searching
Suffix_compare
Suffix Array
Initialise_indices_array
Evaluation
GPU Statistics
SM 30
Core/SM 8
Cores 240
Data cache (SM)
16 K
DRAM 536 M
Proc. Freq 1.2 MHz
Bwt with Bitonic Sort
Comparison between Expected (GPU) and Exact result
CPU GPUCores 2 240
Data cache (SM)
L1 :32K L2 :6M
16K
DRAM 12GB 536 M
Proc. Clock
2.9 GHz
1.2 MHz
(Quick_Sort_time) * 2 ) / 240
References :• Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters
• Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon
• M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report
• Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina
• Parallel Lossless Data Compression on the GPU : Yao Zhang
Thanks
Future Work
• Run in limited memory environments– Compute in parts
• To use the memory hierarchy of GPU– Sort keys are cached in register or shared memory– Long runs of repeated character• Position indicating end of run
• Can only sort sequence,with length power of 2– 2k+1 2k+1
–Padding with largest symbol