sequence alignment in dna under the guidance of : prof. kolin paul presented by: lalchand gaurav...

Sequence Alignment in DNA

Under the Guidance of :Prof . Kolin Paul

Presented By:Lalchand

Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Agenda

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Time-Line

Application Domain & Objective

To present an efficient implementation (Specially parallel) that effectively aids the problem of searching for short sequences in DNA.

• Analyzing Gene expression• Mapping variations between individuals• Mapping homologous Proteins• Assembling Genome of Organism

BWT

Algorithm

Mapper

Indexing

{ Location,Occurance}

ReadsSuffix Array :15GB for human

genome{3 billion * 4 B +3 GB genome}

Basic Alignment Procedure

BWT :Bwt[i] = Ref(SA[i]-1)

{3 GB }

To be parallelized

Parallelized

Intermediate size :10^18

Genome

O(logG)Searching

10

Scope of Parallelism in BWT

• With BWT , w length string can be find in O(w) time.• The BWT is closely related to the suffix array• Lexicographic sorted list of all suffixes in a genome.

• Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i) =1}

BWT

● Implementation of Bwt using Selection Sort– OpenMp

Initial Step - 1

0 100 200 300 400 500 600 700 800 900 10000

1000

2000

3000

4000

5000

6000

7000

Bwt Creation using Selection sort

Proc 1 Proc 2Proc 4Proc 8

File Size in KB

Tim

e in

Sec

onds

Selection Sort - Openmp

CPUCores 8

Data cache

L1 :32K L2 :6M

DRAM 12GB

Proc. Clock

2.9 GHz


● Implementation of Bwt using Quick Sort– OpenMp

Initial Step - 2

Quick Sort - Openmp

CPU StatisticsCores 8

Data cache

L1 :32K L2 :6M

DRAM 12GB

Proc. Clock

2.9 GHz


● Implementation of Bwt using Quick Sort– OpenMp

● Implementing Bwt on GPU– Bitonic sort

Initial Step - 3

Why Bitonic ??...

• Concatenations of two sub-sequences sorted in opposite directions – A cyclic shift of elements

• Implemented by comparator networks– Work in place– No Communication

• Naturally suitable for SIMD architectures– Each thread executing same code but different data

• O(log2n) time and O(nlog2n) work

18

Burrows-Wheeler Transform

5 $

A C G T A

4 A $ A C G T

3 T A $ A C G

2 G T A $ A C

1 C G T A $ A

0 A C G T A $

Input: A C G T A $

Output: A T $ A C G

Basic String Sorting Algorithm

indices: 0 1 2 3 4 5

5 $

A C G T A

4 A $ A C G T

0 A C G T A $

1 C G T A $ A

2 G T A $ A C

3 T A $ A C G

indices: 5 4 0 1 2 3

Steps Performed

• Copy Genome from host to device Memory • Indices Array for pointing Reference string• Compare Suffix based on indices array – Swap indices accordingly.

• Sorts n elements in log2n Kernel calls. – Each of O(1) time & O(n) work

• One more step for BWT from suffix array– Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i)= 1}

Cpu

Bitonic

Bitonic_sort_stepCuda_Memcpy & kernel call

CPU – GPU Interaction (BWT)

Suffix - > BWT

Genome

O(log2G)Searching

Suffix_compare

Suffix Array

Initialise_indices_array

Evaluation

GPU Statistics

SM 30

Core/SM 8

Cores 240

Data cache (SM)

16 K

DRAM 536 M

Proc. Freq 1.2 MHz

Bwt with Bitonic Sort

Comparison between Expected (GPU) and Exact result

CPU GPUCores 2 240

Data cache (SM)

L1 :32K L2 :6M

16K

DRAM 12GB 536 M

Proc. Clock

2.9 GHz

1.2 MHz

(Quick_Sort_time) * 2 ) / 240

References :• Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters

• Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon

• M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report

• Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina

• Parallel Lossless Data Compression on the GPU : Yao Zhang

Thanks

Future Work

• Run in limited memory environments– Compute in parts

• To use the memory hierarchy of GPU– Sort keys are cached in register or shared memory– Long runs of repeated character• Position indicating end of run

• Can only sort sequence,with length power of 2– 2k+1 2k+1

–Padding with largest symbol

sequence alignment in dna under the guidance of : prof. kolin paul presented by: lalchand gaurav...

Documents

bwt slide

bwt selection sort

selection sort openmp

quick sort openmp slide

efcient implementation

n work slide

quick sort openmp initial

genome of organism slide