ramethy: reconfigurable acceleration of bisulfite sequence … 8/2_james... · 2015. 4. 16. ·...

22
1 Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment James Arram, Wayne Luk (Imperial College) Peiyong Jiang (Chinese University of Hong Kong) February 24, 2015

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

1

Ramethy:Reconfigurable Acceleration ofBisulfite Sequence Alignment

James Arram, Wayne Luk (Imperial College)Peiyong Jiang (Chinese University of Hong Kong)

February 24, 2015

Page 2: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

2

Contributions

I A run-time reconfigurable architecture for acceleratingsequence alignment using FPGAs

I An application of this architecture to accelerate bisulfitesequence alignment

I Optimisations which improve the performance of theFM-index search operation

I Implementation of our design: Ramethy on a Maxeler nodeI 14.9x speed of multicore CPU (up to 88.4x)I 3.8x speed of GPU (up to 22x)

Page 3: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

3

Motivation

I Cost of DNA sequencing decreasing faster than Moore’s LawI Amount of sequenced data increasingI Current compute infrastructures strugglingI Slow analysis = slow diagnosis

Page 4: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

4

Bisulfite Sequencing

I Particular DNA sequencing used to find methylation status ofDNA

I Mechanism which controls gene expression

I Studies link Methylation to certain illnesses

Figure: Image by Karina Zillner

Page 5: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

5

Bisulfite Sequencing Applications

I Exciting diagnosis methods developed by CUHK:

1. Cancer diagnosisI General diagnosis: targets most cancersI Early stage detection

2. Prenatal diagnosisI Test baby for illness before birthI non-invasive: safe for mother and baby

Page 6: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

6

Bisulfite Sequencing Analysis

I Latest bisulfite sequencing analysis tool: Methy-Pipe(developed by CUHK)

I Fully integrated, but still slow (∼1 day to perform analysis)I Sequence alignment is bottleneck (over 50% of run-time)

T C G G A A G G A G T A G T T A C G T

T C G G A

G G A G C

G T T A C

C G G G A

G A G C A

T T A C G

G G A A G

A G C A G

T A C G T

Genetic diversity Sequencing error

Reference genome

Reads

Page 7: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

7

Alignment Programs

I Multiple software programs for sequence alignmentI Soap2, BWA, Bowtie, etc.

I Several GPU-based alignment programsI Soap3-dp, CUSHAWI up to 10x faster than CPU-based alignment programs

I FPGA based alignment programs have been proposedI Typically accelerate Smith-Waterman alignment algorithmI Good speed-up, but often compromise functionality and

accuracy

Page 8: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

8

FM-index

I Given Methy-Pipe alignment parameters, we target FM-indexsearch operation for acceleration

I Performs alignment using an index of the reference genomeI Closely related to suffix arrays (SA)

I FM-index search operation:I Update SA interval for each symbol in read (backwards search)I number updates (iterations) = number symbols in read

1: initialise SA interval2: for each symbol in read (last to first) do3: update SA interval4: end for

Page 9: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

9

FM-index Search Operation Analysis

I Extend search with backtracking for inexact alignmentI Edit operations (sub, ins, del) performed on the read

I Algorithm bottlenecks:I Dynamic data access to index (2 per update)I Excessive backtracking in inexact alignment

I Optimise algorithm and alignment architecture

Page 10: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

10

Static Architecture

I FPGA is configured with a static alignment circuit

Module 1

Module 2

Module 3

Input

Output

Static Architecture

FPGA Configuration 1

I Performance limited by features of alignment algorithms:

1. Idle Modules2. Unbalanced pipelines

3. Not enough resources4. Inflexible alignment

Page 11: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

11

Run-time Reconfigurable Architecture

I FPGA configurations for each moduleI Modules replicated as many times as possible

I Run-time reconfiguration used to switch configurations

Module 1

Module 1

Module 1

Input I.D

Module 2

Module 2

Module 2

I.D I.D

Module 3

Module 3

Module 3

I.D Output

Reconfigurable Architecture

FPGA Configuration 1 FPGA Configuration 2 FPGA Configuration 3

I.D = intermediate data

Page 12: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

12

Run-time Reconfigurable Architecture Analysis

I Performance of run-time reconfigurable architecture:

T =∑i

(Ti/Ni + tr + tt)

I Ti : time for module to process input dataI Ni : number of modules in configurationI Overheads: tr (reconfiguration time), tt (transfer time)

I Performance of static architecture

best: T = max (T1/N1, ...) worst: T =∑i

(Ti/Ni )

I Ni :runtime >> Ni :static

Page 13: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

13

Run-time Reconfigurable Architecture Analysis

I Reconfiguration addresses limitations of static architecture:

1. Configurations contain single type of module, datapartitioned:

I No idle modules

2. Modules are not interlinked:I No unbalanced pipeline of modules

3. Configurations for each stage in alignment algorithm:I Easier to map full alignment circuit to hardware

4. Run-time reconfiguration used to switch configurations:I Flexible alignment

Page 14: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

14

FM-index Optimisations

I Reduce dynamic data access: n-step FM-indexI Old: SA interval updated for 1 symbol per iterationI New: SA interval updated for n symbols per iterationI Number of dynamic data accesses reduced by factor of n

I Reduce excessive backtracking: bi-directional searchI Old: brute force searchI New: Use bi-directional search and constrain edit positionI Reduce search space and excessive backtracking

I More details in paper

Page 15: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

15

Ramethy: Implementation

I Design goal: Match alignment parameters of Methy-PipeI Permit up to two mismatches in readsI Report unique alignments only

Exact

Match

One

Mismatch

Reads Two

Mismatches

I Implement Ramethy on a 1U Maxeler MPC-X1000 nodeI 8 Altera Stratix V FPGAs each with 48GB of off-chip DRAMI Design as data flow graph, then compiled into bitstream

Page 16: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

16

Module Designs

I Host CPU streams reads to modules, results streamed backI Index stored in off-chip DRAMI Module uses BRAM to store reads and SA intervals

Off-chip DRAM

Host Device

FPGA

resultsreads

Module

Index

BR

AM

update

old interval

new interval

Page 17: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

17

Experimentation

I Experimental data:I Reference genome = chr22 of Human genome (Full 3.3GB)I 10M reads of 75 symbols

I Comparisons with:soap2 (CPU - 2 Intel Xeon: 32 cores)soap3-dp (GPU - 1 NVIDIA GTX 580, 512 cores)

I Among fastest alignment programs availableI soap2 used in Methy-Pipe, later versions may use soap3-dp

I Provide 3 performance values for Ramethy (8 FPGAs):I v1: measured values (1 module per Conf.)I v2: upper bound estimate (3 modules per Conf.)I static: best case static design 4 EM, 1 OM, 3 TM (1 module

per Conf.)

Page 18: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

18

Performance: Bisulfite Sequence Alignment

program platform clock freq. device Mbps speedup

soap2 Intel E5-2650 2000 2 4.5 1.0xsoap3-dp NVIDIA GTX-580 772 1 17.4 3.9x

static design MPC-X1000 150 8 58.5 13.1xRamethy v1 MPC-X1000 150 8 66.4 14.9xRamethy v2 MPC-X1000 150 8 395 88.4x

I 14.9x speed of soap2, 3.8x speed of soap3-dp

I Same accuracy as software

0 20 40 60 80 100

soap2

soap3-dp

static design

Ramethy v1

Ramethy v2

Speedup

Page 19: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

19

Performance: Standard DNA Sequence Alignment

program platform clock freq. device Mbps

Fernandez et al. Convey HC-1 150 4 13.2Olson et al. Pico M-503 250 8 120Ramethy v1 MPC-X1000 150 8 135

I Fernandez et al.: Does not test all edit positions

I Olson et al.: Backtracing?

I Ramethy: Identical to software

0 20 40 60 80 100 120 140 160

Fernandez et al.

Olson et al.

Ramethy v1

bases per second (M)

Page 20: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

20

Energy Consumption

I Measure energy consumption from run-time device power(obtained from MaxOS)

Program Device Power (W) Energy Consumption (kJ)

soap2 190 31.9soap3-dp 244 10.5

Ramethy v1 72 1.5

I Almost an order of magnitude less energy consumed byFPGAs (short run-time plus low power)

Page 21: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

21

Future Work

I Further optimising RamethyI Other ways of reducing dynamic data access?I Pre-filer reads for quick discarding

I Accelerating other parts of Methy-Pipe

I Study scientific and clinical impact

Page 22: Ramethy: Reconfigurable Acceleration of Bisulfite Sequence … 8/2_James... · 2015. 4. 16. · soap2 Intel E5-2650 2000 2 4.5 1.0x soap3-dp NVIDIA GTX-580 772 1 17.4 3.9x static

22

Conclusion

I Accelerate bisulfite sequence alignment using FPGAs

I Run-time reconfigurable architecture for acceleratingalignment algorithms

I Target FM-index for accelerationI Reduce dynamic data access and excessive backtracking

I Ramethy 14.9x speed of soap2, 3.8x speed of soap3-dpI Identical alignment accuracy to softwareI Almost an order of magnitude less energy consumed

I Faster alignment = improved science and healthcare