0507036
DESCRIPTION
my first seminar slide.TRANSCRIPT
![Page 1: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/1.jpg)
REFERENCE:
PUBLISHED BY THE IEEE COMPUTER SOCIETY, JULY 2008
Presented by: Md. Merazul Islam 0507036
Dept. of CSE, KUET
![Page 2: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/2.jpg)
WARP PROCESSING ? Dynamically optimize the software to
improve execution time and energy consumption.
A new architecture implementing with both H/W & S/W.
Transform binary kernel into FPGA circuit. Fully dynamic and generate entire
coprocessing circuits beyond functional units. It can also works with multiple processors.
Md. Merazul Islam, Dept. of CSE, KUET
![Page 3: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/3.jpg)
FPGA CIRCUIT ? Field Programmable Gate Array:
Programmable. FPGA do Bit Manipulation Fast. FPGAs aren't Part of Mainstream Computing. Supports any compiler, any language,
multiple sources etc.
Figure:In the CAD-oriented FPGA, the configurable logic block inputs and outputs are directly connected to the switch matrices.
Md. Merazul Islam, Dept. of CSE, KUET
![Page 4: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/4.jpg)
µPI$
D$
FPGA
Profiler
Dynamic Part. Module (DPM)
Time Energy
SW Only
HW/ SW
Partitioned application executes faster with lower energy consumption
55
WARP ARCHITECTUREProfile application to determine critical regions
22
Profiler
Initially execute application in software only
11
µPI$
D$
Partition critical regions to hardware
33
Dynamic Part.
Module (DPM)
Program configurable logic & update software binary
44
FPGA
Md. Merazul Islam, Dept. of CSE, KUET
![Page 5: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/5.jpg)
µPI$D$
(FPGA)
Profiler
DPM(CAD)
WARP PROCESSING
STEPS BinaryBinary
Decompilation
BinaryHW Bit stream
RT Synthesis
PartitioningBinary Updater
BinaryUpdated Binary
BinaryStd. HW Binary
JIT FPGA Compilation
JIT FPGA Compilation
Tech. Mapping/Packing
Placement
Logic Synthesis
Routing
Md. Merazul Islam, Dept. of CSE, KUET
![Page 6: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/6.jpg)
WARP PROCESSING
STEPS Dynamic Binary Translation Decompilation:
Recover high-level information lost during compilation.
Utilize sophisticated decompilation methods.
RT Synthesis: Converts decompiled CDFG to Boolean
expressions. Detects read/write, memory access pattern,
memory read/write ordering.
discover loops, if-else, etc.
reduce operation sizes, etc.
reroll loops, etc.
Md. Merazul Islam, Dept. of CSE, KUET
![Page 7: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/7.jpg)
WARP PROCESSING
STEPS Logic Synthesis: Optimize hardware circuit created during RT synthesis.
Technology Mapping/Packing: Decompose hardware circuit into basic logic
gates. Traverse logic network combining nodes to form
single-output. Placement: Identify critical path, placing
critical nodes in center of configurable logic fabric.
Routing:Find a path within FPGA to connect source and
sinks of each net.Represent routing nets between CLBs as routing
between SMs. Md. Merazul Islam, Dept. of CSE, KUET
![Page 8: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/8.jpg)
RESULTS Execution Time and Memory Requirements
(a) a commercial FPGA CAD tool running on a desktop workstation (b) the Riverside Dynamic CAD tools on the same workstation, and (c) the RDCAD tools on a lean 40- MHz ARM7 processor.
size time a 120MB
3min
b 3.6MB .108s
c 3.6MB
1.11s
Md. Merazul Islam, Dept. of CSE, KUET
![Page 9: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/9.jpg)
SPEEDUP COMPARISON
[a] Comparison of software execution on a digital signal processor (DSP) and warped execution on a warp processor to a 200-MHz ARM9 on single threaded applications.
[b] Comparison of multithreaded application speedups on various 400-MHz ARM11-based multiprocessors and warp processors.
Md. Merazul Islam, Dept. of CSE, KUET
![Page 10: 0507036](https://reader038.vdocument.in/reader038/viewer/2022110115/54bd2d984a7959ea0f8b4822/html5/thumbnails/10.jpg)
CONCLUSION
Warp processing shows the technique’s & opening the door to new challenges.
Speed up 2X-100X or even more. 20X less memory usage. 10% more routing resource usage. 38%-94% power reduction.
In the near future, we expect warp processors to achieve speedups much greater than an order of magnitude.
Md. Merazul Islam, Dept. of CSE, KUET