graphics on key by eyal sarfati and eran gilat supervised by prof. shmuel wimer, amnon stanislavsky...

27
Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1

Upload: collin-walters

Post on 27-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Reconstruction Of Sparse Multiband Signals

Graphics on KeybyEyal Sarfati and Eran Gilat

Supervised byProf. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk

1VLSI labOverviewMotivationAlgorithm ImprovementsSoftware simulationGPU VLSI DesignGoK system designChallenges and contributionsSummaryDemo22MotivationGPU (Graphics Processing Unit) is the key for high-performance in graphics applications (games, flight simulations, virtual worlds, etc.)Mobile systems (e.g. cellphones, handheld devices) lack a suitable GPU

3

GoK

External GPU with a standard interface can significantly enhance graphic performance of systems with limited computing resources3Project GoalDevelop a low-cost prototype which performs 3D animation and displays it on a 2D RGB screen.

USBVGA

GoK4HostStandard interface for data input/outputProvides real time graphics processing to systems with limited computing resources . - GPU .

: ?4Project StagesSoftware DesignImplementing algorithm in MatlabSimulation and analysisAdaptation of algorithm to hardware

ASIC DesignArchitectural designImplementation in VHDLSynthesis and layout

System Design Implementation of system blocks including SW and HW interfacesSystem integration System performance enhancement

55Graphic AnimationElementary operations :TranslationRotation Scaling63D Data RepresentationSeries of triangles

Each triangle is represented by:3 vertices3 RGB vectors1 normal vector

.6 Rendering Algorithm stages [Wimer]Elementary transformations Four transformations are executed for every triangle:Three matrix multiplications for vertex co-ordinatesOne matrix multiplication for normal vector

7

12

Projection of triangles on viewing plane Composed of 2 stages :Transformation from 3D to 2D (projection)Transformation from real co-ordinates to screen co-ordinatesDetermine potential triangle visibilityHidden triangles are discarded on the basis of their normal directionThis detection reduces the processed data by 50%

7 Algorithm DetailsDetermine projected triangles visibilityScan all points and compare their depth with depth of previously saved pointsScan in 3D space using inverse transformation8

III

Color of visible pointsCompute pixel color from the RGB vector and the current lighting vectorUsing mathematical average for all the pixels inside triangles rather than linear interpolation

To increase efficiency :Split trianglesIncrease parallelism8MATLAB SimulationMatlab implementation of rendering algorithm [Wimer]

9

Run Time on Arm based processor : 16 seconds Run Time on Matlab based software : 1 hour 9System Overview10

GoK

Concept

USBVGA

GoKPrototypeHost10GPU Architecture Design PrinciplesDesign Goal: maximize throughput

Use parallel architecture to overcome bottlenecks

Minimize expensive memory accesses

Optimize accuracy for fast calculations

1111

Prefetch & VisibilityDetection Unit 3D TransformationUnit Trianglepre-processor FIFO task queue Rasterization 10Rasterization 1Rasterization 0Scheduler UnitZ-Buffer ArbiterSnooping CacheRGB ArbiterSnooping CacheTrianglesRGB FrameZ-BufferGPU Architecture

ZRGB . -flow + -prefetch.12Sort Coordinates according to y axisTriangle slopes calculationCreate 2 halftrianglesD calculationFIFO-1 / CRGB Color SetVertex / Normal Transform

Project Triangle

Transformation and Pre-processor 133D TransformationUnit Trianglepre-processor Note : Early elimination of invisible triangles reduces load by 50% !

13

Prefetch & VisibilityDetection Unit 3D TransformationUnit Trianglepre-processor FIFO task queue Rasterization 10Rasterization 1Rasterization 0Scheduler UnitZ-Buffer ArbiterSnooping CacheRGB ArbiterSnooping CacheTrianglesRGB FrameZ-BufferGPU Architecture

ZRGB . -flow + -prefetch.14FIFO Task Queue Stalls input stream to prevent overflow by means of a backward communication protocolBackwards communication permeable to the Prefetch and Visibility Detection Unit

15Trianglepre-processor FIFO task queue Scheduler UnitTarget : Maximize throughputMinimize idle time of rasterization unitsImmediately issue next half triangle for processing upon completion of processing previous triangleFIFO task queue Rasterization 10Rasterization 1Rasterization 0Scheduler Unit USB: 1.1 & 2.

bulk windriver. iso jungo.15

Prefetch & VisibilityDetection Unit 3D TransformationUnit Trianglepre-processor FIFO task queue Rasterization 10Rasterization 1Rasterization 0Scheduler UnitZ-Buffer ArbiterSnooping CacheRGB ArbiterSnooping CacheTrianglesRGB FrameZ-BufferGPU Architecture

ZRGB , .16Rasterization UnitsFor each point of each half triangle:Calculate the new Z valueRead the stored Z value and compare it with the calculated oneUpdate both the Z-Buffer and RGB Frame Buffer accordingly17Rasterization 10Rasterization 1Rasterization 0Scheduler UnitZ-Buffer ArbiterSnooping CacheRGB ArbiterSnooping Cache

. -flow + -prefetch.17Multi Core Architecture Problem18Multi core architecture with shared memory must cope with:Efficient management of multiple requests to the shared memoryGuaranteeing data coherency Solution : Arbiter Snooping Multi Cache

Rasterization 10Rasterization 1Rasterization 0

RGB FrameZ-Buffer

ZRGB -cache + snooping.

snooping I/E " scanning ( ) .18Arbiter Snooping Multi Cache (ASMC)Reduce memory access timeCache memory Simultaneous multiple memory access requestsArbiter for efficient memory access managementData CoherencyAdd Snooping mechanism to cache to guarantee data coherencySharedMemory19Rasterization 10Rasterization 1Rasterization 0SnoopingMulti - Cache

ArbiterDeadlockUsing Snooping mechanismUsing Watchdog mechanism19GPU ASIC Implementation

20

Technology : 65ns CMOS 8LM Clock frequency : 300MhzCore area : 2.25 mm2 Power consumption : Approx. 130mW @ 300MhzUSB Host can supply up to 400mW . -flow + -prefetch.20GoK System Requirements Input:The data is sent by the host to the GoK in two stages:Initialization : a list of triangles are sent to the GoKAnimation : a transformation for all triangles is sent to the GoK every 40 msec (25 FPS)

Output:Real-time object animation at :160x120 pixels resolution120,000 triangles/sec25 frames/sec

21 21FPGAUSB

System Overview - SoPCSystemControllerCommunication BusUSBControllerMemoryControllerVGAController

22

ASMCProcessorGPUHostGPU

22Summary2323ChallengesMatlab implementation and simulation for detailed investigation and evaluation of algorithm VLSI design and implementation of an efficient architecture (with maximum parallelism) for GPU algorithmReal-time embedded system design on FPGANIOS II, USB1.1, DDR2, VGA, Avalon Bus, Software drivers & codeGPU integration in the systemModification of USB1.1 driver for acceptable reliability of data transfer Modification of standard VGA interface core to enable 100Mhz GPU core to interface with 50Mhz VGA unit 2424Main ContributionsEnhancement of algorithm for increased performanceEarly elimination of invisible triangles - 50% computation reductionSplitting of triangles to reduce computation complexity and increase parallelismSimplification of pixel color computation

Pre-process the triangles data for fast rasterization computationEfficient scheduling of half triangles to rasterization unitsDesign and implementation of arbiter snooping multi cache Shared memory management, cache memory, data coherencyDouble memory buffer for continuous motion of animation2525The Bottom LineImplementation of a Graphics on Key that enhances the graphic performance of low power, low cost gadgetsThe device performs the required computations and displays the animation on screenProject required specifications : 120,000 triangles/sec @ 160X120 resolution.26 Achieved performance :1,000,000 triangles/sec @ 640X480 resolution. Approx. 25mW @ 50Mhz26Demonstration2727