sequence alignment in dna under the guidance of : prof. kolin paul presented by: lalchand gaurav...

25
Sequence Alignment in DNA Under the Guidance of : Prof . Kolin Paul Presented By: Lalchand Gaurav Jain

Upload: sheena-bertha-cannon

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Sequence Alignment in DNA

Under the Guidance of :Prof . Kolin Paul

Presented By:Lalchand

Gaurav Jain

Page 2: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Agenda

Page 3: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Time-Line

Page 4: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Time-Line

Page 5: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Time-Line

Page 6: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Time-Line

Page 7: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

• Application Domain & objective• General Alignment Procedure• Scope of parallelism in BWT• Selection sort and quick sort implementation• Bwt Implementation on GPU• Comparative study

Time-Line

Page 8: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Application Domain & Objective

To present an efficient implementation (Specially parallel) that effectively aids the problem of searching for short sequences in DNA.

• Analyzing Gene expression• Mapping variations between individuals• Mapping homologous Proteins• Assembling Genome of Organism

Page 9: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

BWT

Algorithm

Mapper

Indexing

{ Location,Occurance}

ReadsSuffix Array :15GB for human

genome{3 billion * 4 B +3 GB genome}

Basic Alignment Procedure

BWT :Bwt[i] = Ref(SA[i]-1)

{3 GB }

To be parallelized

Parallelized

Intermediate size :10^18

Genome

O(logG)Searching

Page 10: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

10

Scope of Parallelism in BWT

• With BWT , w length string can be find in O(w) time.• The BWT is closely related to the suffix array• Lexicographic sorted list of all suffixes in a genome.

• Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i) =1}

BWT

Page 11: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

● Implementation of Bwt using Selection Sort– OpenMp

Initial Step - 1

Page 12: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

0 100 200 300 400 500 600 700 800 900 10000

1000

2000

3000

4000

5000

6000

7000

Bwt Creation using Selection sort

Proc 1 Proc 2Proc 4Proc 8

File Size in KB

Tim

e in

Sec

onds

Selection Sort - Openmp

CPUCores 8

Data cache

L1 :32K L2 :6M

DRAM 12GB

Proc. Clock

2.9 GHz

Page 13: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

● Implementation of Bwt using Selection Sort– OpenMp

● Implementation of Bwt using Quick Sort– OpenMp

Initial Step - 2

Page 14: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Quick Sort - Openmp

CPU StatisticsCores 8

Data cache

L1 :32K L2 :6M

DRAM 12GB

Proc. Clock

2.9 GHz

Page 15: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

● Implementation of Bwt using Selection Sort– OpenMp

● Implementation of Bwt using Quick Sort– OpenMp

● Implementing Bwt on GPU– Bitonic sort

Initial Step - 3

Page 16: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain
Page 17: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Why Bitonic ??...

• Concatenations of two sub-sequences sorted in opposite directions – A cyclic shift of elements

• Implemented by comparator networks– Work in place– No Communication

• Naturally suitable for SIMD architectures– Each thread executing same code but different data

• O(log2n) time and O(nlog2n) work

Page 18: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

18

Burrows-Wheeler Transform

5 $

A C G T A

4 A $ A C G T

3 T A $ A C G

2 G T A $ A C

1 C G T A $ A

0 A C G T A $

Input: A C G T A $

Output: A T $ A C G

Basic String Sorting Algorithm

indices: 0 1 2 3 4 5

5 $

A C G T A

4 A $ A C G T

0 A C G T A $

1 C G T A $ A

2 G T A $ A C

3 T A $ A C G

indices: 5 4 0 1 2 3

Page 19: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Steps Performed

• Copy Genome from host to device Memory • Indices Array for pointing Reference string• Compare Suffix based on indices array – Swap indices accordingly.

• Sorts n elements in log2n Kernel calls. – Each of O(1) time & O(n) work

• One more step for BWT from suffix array– Bwt[i] = ref [ SA[i] -1] {Bwt[i] = $ when S(i)= 1}

Page 20: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Cpu

Bitonic

Bitonic_sort_stepCuda_Memcpy & kernel call

CPU – GPU Interaction (BWT)

Suffix - > BWT

Genome

O(log2G)Searching

Suffix_compare

Suffix Array

Initialise_indices_array

Page 21: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Evaluation

GPU Statistics

SM 30

Core/SM 8

Cores 240

Data cache (SM)

16 K

DRAM 536 M

Proc. Freq 1.2 MHz

Bwt with Bitonic Sort

Page 22: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Comparison between Expected (GPU) and Exact result

CPU GPUCores 2 240

Data cache (SM)

L1 :32K L2 :6M

16K

DRAM 12GB 536 M

Proc. Clock

2.9 GHz

1.2 MHz

(Quick_Sort_time) * 2 ) / 240

Page 23: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

References :• Fast in-place sorting with CUDA based on bitonic sort :Hagen Peters

• Rapid Parallel Genome Indexing with MapReduce :Rohith K. Menon

• M. Burrows and D. Wheeler. A Block-Sorting Lossless Data Compression Algorithm. Technical report

• Lightweight Data Indexing and Compression in External Memory :Paolo Ferragina

• Parallel Lossless Data Compression on the GPU : Yao Zhang

Page 24: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Thanks

Page 25: Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain

Future Work

• Run in limited memory environments– Compute in parts

• To use the memory hierarchy of GPU– Sort keys are cached in register or shared memory– Long runs of repeated character• Position indicating end of run

• Can only sort sequence,with length power of 2– 2k+1 2k+1

–Padding with largest symbol