1 - cpre 583 (reconfigurable computing): floating point iowa state university (ames) cpre 583...

1 - CPRE 583 (Reconfigurable Computing): Floating Point Iowa State University (Ames)

CPRE 583Reconfigurable ComputingLecture 14: Fri 10/12/2011

(Floating Point)

Instructor: Dr. Phillip Jones([email protected])

Reconfigurable Computing LaboratoryIowa State University

Ames, Iowa, USA

http://class.ee.iastate.edu/cpre583/


•Project Teams: Form by Monday 10/10

•MP2 due Friday 10/14

•Project Proposal due Friday 10/14 (midnight)– High-level topic, and high plan for execution– I’ll give feedback– Project proposal Class presentation on Wed 10/19

• 5-10 power point slides

•I plan to have exams back to you this Friday

Announcements/Reminders


Project Grading Breakdown

• 50% Final Project Demo• 30% Final Project Report

– 20% of your project report grade will come from your 5-6 project updates. Friday’s midnight

• 20% Final Project Presentation


• FPL• FPT• FCCM• FPGA• DAC• ICCAD• Reconfig• RTSS• RTAS• ISCA

Projects Ideas: Relevant conferences

• Micro• Super Computing• HPCA• IPDPS


• Teams Formed and Topic: Mon 10/10– Project idea in Power Point 3-5 slides

• Motivation (why is this interesting, useful)• What will be the end result• High-level picture of final product

– Project team list: Name, Responsibility• High-level Plan/Proposal: Fri 10/14

– Power Point 5-10 slides (presentation to class Wed 10/19)• System block diagrams• High-level algorithms (if any)• Concerns

– Implementation– Conceptual

• Related research papers (if any)

Projects: Target Timeline


• Work on projects: 10/19 - 12/9– Weekly update reports

• More information on updates will be given• Presentations: Finals week

– Present / Demo what is done at this point– 15-20 minutes (depends on number of projects)

• Final write up and Software/Hardware turned in: Day of final (TBD)

Projects: Target Timeline


Initial Project Proposal Slides (5-10 slides)

• Project team list: Name, Responsibility (who is project leader)– Team size: 3-4 (5 case-by-case)

• Project idea• Motivation (why is this interesting, useful)• What will be the end result• High-level picture of final product

• High-level Plan– Break project into mile stones

• Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip.

– System block diagrams– High-level algorithms (if any)– Concerns

• Implementation• Conceptual

• Research papers related to you project idea


Weekly Project Updates

• The current state of your project write up– Even in the early stages of the project you

should be able to write a rough draft of the Introduction and Motivation section

• The current state of your Final Presentation– Your Initial Project proposal presentation

(Due Wed 10/19). Should make for a starting point for you Final presentation

• What things are work & not working• What roadblocks are you running into


Common Questions


• Floating Point on FPGAs (Chapter 21.4 and 31)– Why is it viewed as difficult??– Options for mitigating issues

Overview


Floating Point Format (IEEE-754)

S exp Mantissa

1 11 52

Double Precision

S exp Mantissa

1 8 23

Single Precision

Floating point value = (-1)S * 2(exp-127) * (1.Mantissa)

0 x”80” 110 x”00000”

23

Mantissa = b-1 b-2 b-3 ….b-23 = ∑ b-i 2-i i=1

Example: = -1^0 * 2^128-127 * 1.(1/2 + 1/4)

= -1^0 * 2^1 * 1.75 = 3.5

Floating point value = (-1)S * 2(exp-1023) * (1.Mantissa)


Fixed Point

0 x”80” “110” x”00000” = 3.5

10-bit (Format 3.7) Fixed Point for 3.5 = ?

Whole Fractional

Example formats (W.F): 5.5, 10.12, 3.7

b-1 b-2 …. b-F bW-1 … b1 b0

Example fixed point 5.5 format:

01010 01100 = 10. 1/4 + 1/8 = 10.375

Compare floating point and fixed point

Floating point:

011 1000000


Fixed Point (Addition)

Whole Fractional Operand 1

Whole Fractional Operand 2

Whole Fractional sum

+


Fixed Point (Addition)

0011 111 0000 Operand 1 = 3.875

0001 101 0000 Operand 2 = 1.625

sum

+

11-bit 4.7 format

0101 100 0000 = 5.5

You can use a standard ripple-carry adder!


Floating Point (Addition)

0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

1. Common exponent (i.e. align binary point)Make x”80” -> x”7F” or visa-verse?



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

1. Common exponent (i.e. align binary point)Make x”7F”->x”80”, lose least significant bits of Operand 2 - Add the difference of x”80” – x“7F” = 1 to x”7F” - Shift mantissa of Operand 2 by difference to the right. remember “implicit” 1 of the original mantissa

0 x”80” 111 x”80000”

0 x”80” 110 x”80000”

Operand 1 = 3.875

Operand 2 = 1.625 +



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

2. Add mantissas

0 x”80” 111 x”80000”

0 x”80” 110 x”80000”

Operand 1 = 3.875

Operand 2 = 1.625 +



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

2. Add mantissas

0 x”80” 111 x”80000”

0 x”80” 110 x”80000”

Operand 1 = 3.875

Operand 2 = 1.625 +

1 110 x”00000”

Overflow!



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

2. Add mantissas- You can’t just overflow mantissa into exponent field- You are actually overflowing the implicit “1” of Operand 1,

so you sort of have an implicit “2” (i.e. “10”).

0 x”80” 111 x”80000”

0 x”80” 110 x”80000”

Operand 1 = 3.875

Operand 2 = 1.625 +

1 110 x”00000”

Overflow!



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

2. Add mantissas- Deal with overflow of Mantissa by normalizing.

- Shift mantissa right by 1 (shift a “0” in because of implicit “2”)- Increment exponent by 1

0 x”80” 111 x”80000”

0 x”80” 110 x”80000”

Operand 1 = 3.875

Operand 2 = 1.625 +

0 x”81” 011 x”00000”



0 x”80” 111 x”80000”

0 x”7F” 101 x”00000”

Operand 1 = 3.875

Operand 2 = 1.625 +

2. Add mantissas- Deal with overflow of Mantissa by normalizing.

- Shift mantissa right by 1 (shift a “0” in because of implicit “2”)- Increment exponent by 1

0 x”80” 111 x”80000”

0 x”80” 110 x”80000”

Operand 1 = 3.875

Operand 2 = 1.625 +

0 x”81” 011 x”00000”

0 x”81” 011 x”00000” = 5.5


Floating Point (Addition): Other concerns

Special Value Sign Exponent Mantissa

Zero 0/1 0 0

Infinity 0 MAX 0

-Infinity 1 MAX 0

NaN 0/1 MAX Non-zero

Denormal 0/1 0 nonzero

S exp Mantissa

1 8 23

Single Precision


Fixed Point (Addition): Hardware


Floating Point (Addition): High-level HardwareE0 E1 M0 M1

SWAP

Difference

Mux

Right Shift

Add/Sub

Priority EncoderRound

Left Shift value

Left Shift

M

Greater Than

Shift value

Denormal?

Sub/const

E

Standard Adder from previous

slidet


Floating Point

• Both Xilinx and Altera supply floating point soft-cores (which I believe are IEEE-754 compliant). So don’t get too afraid if you need floating point in your class projects

• Also there should be floating point open cores that are freely available.


Fixed Point vs. Floating Point• Floating Point advantages:

– Application designer does not have to think “much” about the math

• Floating point format supports a wide range of numbers (+/- 3x1038 to +/-1x10-38), single precision

– If IEEE-754 compliant, then easier to accelerate existing floating point base applications

• Floating Point disadvantages– Ease of use at great hardware expense

• 32-bit fix point add (~32 DFF + 32 LUTs)• 32-bit single precision floating point add (~250 DFF + 250 LUTs).

About 10x more resources, thus 1/10 possible best case parallelism.

• Floating point typically needs massive pipeline to achieve high clock rates (i.e. high throughput)

– No hard-resouces such as carry-chain to take advantage of


Fixed Point vs. Floating Point

• Range example: (using decimal for clarity)– Assume we can only use 3 digit

• For fixed point, all 3 digits used for whole part (3.0 format)• For floating point, 2 digits used for mantissa, 1 digit for exponent• What is the largest number you can represent for each?

• Precision example: (using decimal for clarity)– For the same format above, represent 125


Mitigating Floating Point Disadvantages

• Only support a subset of the IEEE-754 standard– Could use software to off-load special cases

• Modify floating point format to support a smaller data type (e.g. 18-bit instead of 32-bit)– Link to Cornell class:

• http://instruct1.cit.cornell.edu/courses/ece576/FloatingPoint/index.html

• Add hardware support in the FPGA for floating point– Hardcore multipliers: Added by companies early 2000’s– Altera: Hard shared paths for floating point (Stratix-V

2011)• How to get 1-TFLOP throughput on FPGAs article

– http://www.eetimes.com/design/programmable-logic/4207687/How-to-achieve-1-trillion-floating-point-operations-per-second-in-an-FPGA

– http://www.altera.com/education/webcasts/all/wc-2011-floating-point-dsp-fpga.html


Mitigating Fixed Point Disadvantages (21.4)

• Block Floating Point (mitigating range issue)– All data in a block share an exponent– Periodic rescaling– Makes use of Fix-point hardware

• Useful in application where data is processed in stages, and a limited value range can be placed on each stage (e.g. FFT)


Next Lecture

• Review Mid-term• EDK (XPS) tool overview


Questions/Comments/Concerns

• Write down– Main point of lecture

– One thing that’s still not quite clear

– If everything is clear, then give an example of how to apply something from lecture

OR


Lecture NotesAltera App Notes on computing FLOPs for Stratix-III or Stratix-IV

Altera old app Notes on floating point add/mult

Link to floating point single precision calculator

Block (fixed) floating point (build slide explanation example)

Number comparing CPU/FPGA/GPU floating point throughput

Pre-make showing some examples of Fix Point advantage for:- Representing the precision of a number- And precision for add a convoluted type of number 1M.0001


Lecture NotesPoints:

Original 286, 386: Not floating point HWNext: Floating point coprocessor (on a separate chip)Next: Floating point on same chip

Why carry ripple used over my advanced “high” performing generate-propagate adders (.1 for 4-LUTs vs .4ns for 1 LUT

1 - cpre 583 (reconfigurable computing): floating point iowa state university (ames) cpre 583...

Documents