using fpgas to supplement ray-tracing computations on the cray xd-1

Using FPGAs to Supplement Using FPGAs to Supplement Ray-Tracing Computations on Ray-Tracing Computations on

the Cray XD-1the Cray XD-1

Charles B. Cameron

United States Naval AcademyDepartment of Electrical Engineering

United States Naval Academy105 Maryland Avenue, Stop 14BAnnapolis, Maryland 21402-5025

Research supported by:• NASA Goddard Space Flight Center (Code 586)• NRL Applied Optics Branch (Code 5630)• DoD High Performance Computing Modernization Program at NRL (Code 5593)• United States Naval Academy• Xilinx, Inc.

TopicsTopics

• Ray tracing

• Conventional parallel processing

• Modulo scheduling

• Coordination of sequential and parallel processing

• Expected Performance

Ray tracingRay tracing

• MODIS– Moderate-resolution Imaging Spectroradiometer

• The Intersection Problem

• Finding the Perpendicular

• Refraction

• Reflection

MODIS Optical SystemMODIS Optical System ( (Moderate-resolution Imaging Moderate-resolution Imaging

Spectroradiometer)Spectroradiometer)

MODIS Optical SystemMODIS Optical System

•485 pinholes•400 rays per pinhole•241 121 rays reflected from the diffuser•5.66 109 rays

Ray Directed to a SurfaceRay Directed to a Surface

• MODIS– Moderate-resolution Imaging

Spectroradiometer



• Refraction

• Reflection

• Coordinate Transformation

Calculate the Intercept PointCalculate the Intercept Point


Spectroradiometer



• Refraction

• Reflection


Find the NormalFind the Normal


Spectroradiometer



• Refraction

• Reflection


Find the Refracted RayFind the Refracted Ray


Spectroradiometer



• Refraction

• Reflection


Find the Reflected RayFind the Reflected Ray


Spectroradiometer



• Refraction

• Reflection


Coordinate TransformationCoordinate Transformation


Spectroradiometer



• Refraction

• Reflection

• Coordinate Transformation(Hard to visualize this!)

TopicsTopics

• Ray tracing





ParallelismParallelism

PerformancePerformance (5.66 (5.66 10 1099 rays) rays)

Processor DEC Alpha 3000 Series Model 800. 200 MHz

Cray XD-1 with 839 AMD Opteron 275 processors. 2.2 GHz

Duration 1.2 106 s

(Two weeks)

27 s

Rate 0.112 106 rays · surfaces / s

6.6 106 rays · surfaces / (s · processor)

Reduction in Time Consumed:

Improvement in Ray Tracing Rate:99.998 %

5,857 %

*

* Rate based on a linear regression of results obtained using a varying numbers of processors.

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

DEC Alpha 3000 Series Model 800 Opteron alone

PerformancePerformance (5.66 (5.66 10 1099 rays) rays)

EfficiencyEfficiency

TopicsTopics

• Ray tracing





Operations Required as a Operations Required as a Function of Surface, Aperture, Function of Surface, Aperture,

and Interaction Typesand Interaction Types

0

10

20

30

40

50

60

# o

f O

per

atio

ns

1 2 3 4 5 6 7 8 9 10 11 12

Circular

Aperture

Rectangular

Aperture

Plane 1. Refraction

7. Reflection

4. Refraction

10. Reflection

Sphere 2. Refraction

8. Reflection

5. Refraction

11. Reflection

Conicoid 3. Refraction

9. Reflection

6. Refraction

12. Reflection

Lots of theseNot too many of these

27

6

6

112 4b b ac

2a

4ac

ac

2 4

2

b b ac

a

2 4b ac

2 4b ac

4

2b

b

c a

2

27

11

6

6

Quadratic EquationQuadratic Equation

Critical Path

(Data-Flow Limit)

88 cycles

Latency

Unit # of cycles

Adder 11

Multiplier 6

Divider 27

Square root extractor 27

Modulo Scheduling:Modulo Scheduling:One MultiplierOne Multiplier

Modulo Scheduling:Modulo Scheduling:One MultiplierOne Multiplier

Equal to the Data-Flow Limit

One collective computation

Modulo Scheduling:Modulo Scheduling:Filling the PipelineFilling the Pipeline

10c 0c

Cycle #

20c30c40c50c60c70c80c90c


10c 0c

Cycle #

20c30c40c50c60c70c80c90c

Multipliers are 100 % utilized


10c 0c

Cycle #

20c30c40c50c60c70c80c90c

No schedule conflicts

Modulo Scheduling:Modulo Scheduling:Two MultipliersTwo Multipliers

Two multipliers with two multiplications each


Two cycles

One adder with two additions

Maximum efficiency


Improved efficiency:

Up from 25 %


Less than the Data-Flow Limit


Less than the Data-Flow Limit, but double the throughput.

TopicsTopics

• Ray tracing





Cray XD-1Cray XD-1

•MPI (Message Passing Interface)

•Master node

•Reads file

•Distributes file

•Collates results

...

...

...

... ... ...220 nodes

One Node of the Cray XD-1One Node of the Cray XD-1

•Open MP (Multi Processing)

•144 of 220 nodes have a Xilinx Virtex II Pro FPGA

•Opteron processors

•Sequential program

•Depth first

•FPGA

•Pipelined hardware

•Breadth first

AMD Opteron0

AMD Opteron1

AMD OpteronP2

AMD Opteron3

FPGA

FPGA ThreadRT Thread

RT Thread

RT Thread

RT Thread

TopicsTopics

• Ray tracing





PerformancePerformance

Opteron alone 6.6 106 rays · surfaces / s · proc [meas.]

FPGA alone 5.4 106 rays · surfaces / s · proc [est.]

Reduction in speed = 20 %.





Opteron with FPGA 12.0 106 rays · surfaces / s · proc [est.]

Increase in speed = +80 %.

Floating point units use 11% of FPGA

•1 adder

•1 multiplier

•1 divider

•1 square-root unit











•1 adder

•1 multiplier

•1 divider


•3 adders

•4 multipliers

•1 divider



0.00

5.00

10.00

15.00

20.00

25.00

30.00

Opteron alone FPGA alone Opteron withFPGA

Opteron withFPGA

Note 1: 1 adder, 1 multiplier, 1 divider, 1 square-root takerNote 2: 3 adders, 4 multipliers, 1 divider, 1 square-root taker

MeasuredEstimate

Estimate

Estimate

(Note 1)(Note 2) (Note 1)

SummarySummary

• Modulo scheduling produces 100 % efficiency of critical resources.

• Sequential processors get a boost from supplemental FPGA processing.

• Deep pipelines are efficient only if filled much of the time.

• FPGAs beat ASICs only if they can take advantage of special problem knowledge.

• Opteron uses 55 W.• Virtex II Pro FPGA uses 4 W to 45 W.

EquationsEquations

• Intersection of a Ray with a Plane

• Intersection of a Ray with a Sphere

• Intersection of a Ray with a Conicoid


• Interaction of a Ray with an Optical Surface

• Coordinate Transformations

Intersection of a Ray with a Intersection of a Ray with a PlanePlane

List of equations

Initial direction

Normal to the plane

Point in the plane

Initial point

Final point

Intersection of a Ray with a Intersection of a Ray with a SphereSphere

List of equationsInitial pointFinal point

Initial direction

Intersection of a Ray with a Intersection of a Ray with a ConicoidConicoid

List of equations

Initial point

Final point

Initial direction

Finding the PerpendicularFinding the Perpendicular

Unit Vector Normal to a Sphere

Unit Vector Normal to a Conicoid

List of equations

Interaction of a Ray with an Interaction of a Ray with an Optical SurfaceOptical Surface

Refraction Reflection

List of equations

Initial index of refraction

Final index of refraction

Normal to the plane

Initial direction

Final direction

Coordinate TransformationsCoordinate Transformations

Rotation and Translation

Rotation

List of equations

Translation Vector

Rotation Matrix

Direction in Frame of Reference k

Direction in Frame of Reference k+1

Position in Frame of Reference k

Position in Frame of Reference k+1

using fpgas to supplement ray-tracing computations on the cray xd-1

Documents

nrl code

f hsqrt1894094671y1

f haddsub1067067560g

f hdivider1856056191sqrtg

n cx

2addsub1134313112002

optics branch code

h g sqrtg