fast and accurate w avelet radiosity computations using

14
Fast and Accurate Wavelet Radiosity Computations Using High–End Platforms Laurent Alonso 1 , Xavier Cavin 2 , Jean–Christophe Ulysse 2 and Jean–Claude Paul 1 ISA research team LORIA 3 Campus Scientifique, BP 239 F–54506 Vandoeuvre-l` es-Nancy CEDEX, France Abstract. In this paper, we show how to fully exploit the capabilities of high– end SGI graphics and parallel machines to perform radiosity computations on scenes made of complex shapes both quickly and accurately. Overlapping multi– processing and multi–pipeline graphics accelerations on one hand, and incor- porating recent research works on wavelet radiosity on the other hand, allows radiosity to become a practical tool for interactive design. 1 Introduction Since radiosity methods are becoming more frequently considered for a new wide range of applications (design, computer animations, virtual reality), efficient and accurate algorithms have to be found to provide a direct rendering of scenes made of complex shapes. Scenes created with modeling software typically include parametric surfaces — such as NURBS, cylinders, spheres —, arbitrary planar primitives, and their CSG combinations. Figures 1, 4, 5 and 7 illustrate several examples of such scenes, modeled in different representations. Up–to–now, many techniques have been proposed to solve the famous “radiosity equation” for such surfaces, but unfortunately none of them appear to have found the right compromise between speed and accuracy. For instance, the clustering ap- proach [14, 22, 24] allows very fast computations, but of a limited accuracy. On the contrary, wavelet radiosity [15], which is a generalization of the hierarchical radiosity method [16], provides a better approximation of the radiosity function, but at a non negligible computational cost. Attempts to parallelize hierarchical radiosity have been undertaken [9], but they have mostly proven the difficulty of this task. In this paper, we fully exploit the capabilities of high–end SGI graphics and parallel machines to illuminate scenes made of complex shapes both quickly and accurately (see fig. 1), two goals that were considered incompatible with previous methods. Moreover, thanks to the incorporation of recent research works on wavelet radiosity that allow complex surfaces to be illuminated as if they were simple primitives [1, 12, 18], our parallel algorithm results in a better approximation of the radiosity function with a faster convergence and lower memory costs. 1 INRIA Lorraine. 2 Institut National Polytechnique de Lorraine. 3 UMR n o 7503 LORIA, a joint research laboratory between CNRS, Institut National Polytechnique de Lorraine, INRIA, Universit´ e Henri Poincar´ e and Universit´ e Nancy 2.

Upload: others

Post on 29-Jan-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Fastand Accurate WaveletRadiosityComputationsUsing High–End Platforms

LaurentAlonso1, Xavier Cavin2, Jean–ChristopheUlysse2 andJean–ClaudePaul1

ISA researchteamLORIA3

CampusScientifique,BP239F–54506Vandœuvre-les-Nancy CEDEX, France� � � � � � � � � � � � � � � � � � �

Abstract. In this paper, we show how to fully exploit thecapabilitiesof high–end SGI graphicsand parallelmachinesto perform radiosity computationsonscenesmadeof complex shapesbothquickly andaccurately. Overlappingmulti–processingand multi–pipelinegraphicsaccelerationson one hand,and incor-poratingrecentresearchworks on wavelet radiosity on the other hand,allowsradiosityto becomea practicaltool for interactive design.

1 Intr oduction

Sinceradiositymethodsarebecomingmorefrequentlyconsideredfor anew widerangeof applications(design,computeranimations,virtual reality), efficient and accuratealgorithmshave to be found to provide a direct renderingof scenesmadeof complexshapes.Scenescreatedwith modelingsoftwaretypically includeparametricsurfaces— suchasNURBS,cylinders,spheres—, arbitraryplanarprimitives,andtheir CSGcombinations.Figures1, 4, 5 and7 illustrateseveralexamplesof suchscenes,modeledin differentrepresentations.

Up–to–now, many techniqueshave beenproposedto solve the famous“radiosityequation” for such surfaces,but unfortunatelynone of them appearto have foundthe right compromisebetweenspeedand accuracy. For instance,the clusteringap-proach[14, 22, 24] allows very fastcomputations,but of a limited accuracy. On thecontrary, wavelet radiosity[15], which is a generalizationof thehierarchicalradiositymethod[16], providesa betterapproximationof the radiosity function, but at a nonnegligible computationalcost.Attemptsto parallelizehierarchicalradiosityhave beenundertaken[9], but they havemostlyproventhedifficulty of this task.

In thispaper, wefully exploit thecapabilitiesof high–endSGIgraphicsandparallelmachinesto illuminatescenesmadeof complex shapesbothquickly andaccurately(seefig. 1), two goalsthatwereconsideredincompatiblewith previousmethods.Moreover,thanksto the incorporationof recentresearchworks on wavelet radiosity that allowcomplex surfacesto be illuminatedas if they weresimpleprimitives[1, 12, 18], ourparallel algorithm resultsin a betterapproximationof the radiosity function with afasterconvergenceandlowermemorycosts.

1INRIA Lorraine.2Institut NationalPolytechniquedeLorraine.3UMR no 7503LORIA, a joint researchlaboratorybetweenCNRS,Institut NationalPolytechniquede

Lorraine,INRIA, Universite Henri Poincare andUniversite Nancy 2.

Fig. 1. The virtual geometryroom in the 4th floor of the SodaHall hasbeenilluminatedwithour high–orderwaveletparallelradiosityalgorithmusingrecentstate–of–the–artradiositytech-niques. Renderingthis complex scene— modeledwith quadricsurfacepatches(including thechessset)— with a similar speedandaccuracy would have beenimpossiblewith previous ra-diosity methods,sincethe numberof initial patcheswould have beenintractable. ModeledinSGDL.By courtesyof SGDL Systems.

Our paperis organizedas follows: in section2, we review previous work on re-lated radiosity techniques.Section3 detailsthe latest improvementsof our parallelwaveletradiosityalgorithm[7, 8] thathave led to its currentimplementation.Section4is devotedto theexperimentswe have conductedwith our applicationon differenttestscenes,including a full casestudywith the virtual SGI Reality CenterTM . Finally, insection5, wedraw someconclusions.

2 Previous work

Radiositytechniqueshave interestingpropertiesthatmake themusefulfor new appli-cationsthatrequirehighly realisticvirtual environments.Thesecoverarchitecturalandindustrialdesign,but alsotheentertainmentindustry(asshown by Siggraph’99panel:“Get Real! GlobalIllumination for Film, Broadcast,andGameProduction”).Then,ra-diosityalgorithmshaveto provideefficientandaccuratesolutionsfor thekind of scenesthesepotentialusersgenerallymanipulate.

ScenesconstructedusingCAD tools— suchasCATIA R

or ARC+TM — or moregeneralpurposemodelers— suchasMayaTM , 3D StudioMAX R

or SGDL — fre-

quentlycontaincomplex planarsurfaces,with curvedboundariesor holesin them(seefig. 4(a)),and/orcurvedandparametricsurfaces(seefig. 1 andfig. 7). Moreover, in thefuture,surfacesextractedby 3D scanningsystemswill routinelybetransformedinto asetof parametricsurfaces.

We analyzein section2.1thetwo maingeneralapproachesthatwould traditionallybeapplied:clusteringandwaveletradiosity. Sincenoneof themappearto find therightcompromisebetweenspeedandaccuracy, section2.2briefly introducesrecentresearchadvanceson which theremainingof thispaperwill bebased.

2.1 Classicalapproaches

Probablythemostwidely usedapproachto dealwith complex surfacesin radiosityisto tessellatetheminto simpleplanarsurfaces(only little researchhasbeendevotedto

arbitraryplanarpolygons[3, 4] andparametricsurfaces[10, 21, 25, 29]). This methodhasseveralnegativeconsequenceson theradiosityalgorithm:

� It increasesthenumberof input surfacesthealgorithmmusthandle.� The tessellationis madeasan artificial pre–treatmentbeforethe radiositycom-putations,influencingthesecomputationsandcreatingarbitrarydiscontinuities.It canpreventusfrom reachingagoodillumination solution.� It doesnotallow ahierarchicalapproachfor theradiosityfunctionontheoriginalsurface,muchlesshigherorderwavelets.� Thetessellationcancreatepoorlyshapedtriangles(seefig. 4(b)),whichcancauseZ–buffer artifactsat thevisualizationstage,andareharderto detectin visibilitytests.

Clustering. Obviously, someof the tessellationdrawbacks,suchas the numberofinitial surfaces,canbe addressedusingclustering[14, 22, 24]. In clustering,neigh-boringpatchesaregroupedtogether, into a cluster. Theclusterreceivesradiosityanddispatchesit to thepatchesit contains.Unfortunately, increasingthenumberof initialsurfaces— for a betterapproximationof original surfaces— makesit harderfor theclusteringapproachesto computeanoptimalclusterhierarchy[17]. Also, tessellatingtheoriginal surface,thenclusteringthepieceshasremovedtheconnectionto theorigi-nal surface,resultingin numerousapproximations.It wouldprobablybemoreefficientto applyclusteringto theoriginal surfacesratherthanto theresultof their tessellation.

A bettergroupingstrategy is face–clustering[27]. In face–clustering,neighboringpatchesaregroupedtogetheraccordingto their coplanarity. Yet, evenface–clusteringdependson thegeometrycreatedby the tessellation,andit doesnot remove theprob-lemsdueto thetessellationapproximations,suchasthecalculationof theexactspatialpositionof thepointsor their normals.

Wavelet radiosity. Clusteringmethodscanbe consideredwhenonewantsto obtaina fast,but approximate,renderingof radiositysimulations. Whenphysicalaccuracyis the objective, oneof the mostviable alternative is wavelet radiosity. The waveletradiositymethodwasintroducedby [15]. It is anextensionof thehierarchicalradiositymethod[16] thatallows theuseof higherorderbasisfunctions.

In theory, higherorder waveletsareproviding a morecompactrepresentationofcomplex functions.Hence,they uselessmemoryandgiveasmootherrepresentationoftheradiosityfunction,thatlooksbetterondisplay. However, in apreviousexperimentalstudy[26], thepracticalproblemsof higherorderwaveletswerelargely negatingtheirtheoreticalbenefits,thusprohibitingany rapidcomputation.

Moreover, the algorithmic complexity of the hierarchicalradiosity algorithm isquadraticwith respectto thenumberof initial surfaces,makingit not particularlywelladaptedto highly tessellatedsurfaces.

Theparallelizationof thehierarchicalradiosityalgorithmhasbeenexploredin sev-eralstudies[7, 8, 9, 13, 20, 23] in orderto overcomethelimitationsdueto its highcom-putationalcost.All theauthorshave convergedto theconclusionthatits non–uniform,dynamicallychangingcharacteristics,andits needfor long–rangecommunicationmadeit lesssuitablefor effectiveparallelization.

2.2 A novel approach

Sinceclusteringmethodsallow fastbut approximatecomputations,while wavelet ra-diosity algorithms,eventheparallelones,aremoreprecisebut slower, shouldwe giveup andconcludethatradiosityis eithera solvedproblemor aninsolubleone?

Actually, recentinnovative researchworks on wavelet radiosity have shown thathighly accuratesimulationscouldbeperformedusinghigherorderwavelets[12], bothon arbitraryplanarsurfaces,thanksto the ExtendedDomainAlgorithm [18], andonparametricsurfaces,using ConstantJacobianMappings[1]. Thesenew techniquescompletelyreplacetheneedfor tessellatingtheoriginalsurfaces.By exactlyintegratingthesesurfacesasa singleentity in theresolutionprocess— thusreducingthealgorith-mic complexity of thewaveletradiosityalgorithm—, complex scenescanberenderedmorequickly, moreaccuratelyandmuchmorenaturally thanwith previously knownmethods.

We presentin the remainderof this paperhow we can advantageouslyintegrateall thesepromisingadvancesto build a new parallelalgorithm,basedon our previouswork onparallelwaveletradiosity[7] andcombinedhardwareacceleratedvisibility [8].Runningastate–of–the–artwaveletradiosityalgorithmthatintensively usesall thepar-allel andgraphicsresourcesof high–endSGI machines,we show thatfastrenderingofaccurateradiositysimulationsof complex sceneshasfinally becomea reality.

3 Latest impr ovementsof parallel wavelet radiosity

In this section,we presentour new parallel algorithm for fast and accuratewaveletradiositycomputationson complex surfaces.Our algorithmis thenaturalextensionofpreviouswork on parallelandhardwareacceleratedradiosity[7, 8] to themostrecentresearchonwaveletradiosity[1, 12, 18]. Here,theoriginal inputsurfacesareno longertessellatedinto a setof planarsurfaces,but directly treatedasa singleentity by theresolutionprocess.

Section3.1explainshow theradiositycomputationsbetweenthecomplex inputsur-facesareperformedin parallel.Then,section3.2dealswith how to performhardwarebasedvisibility with thesesurfaces.Finally, section3.3 describesthe implementationof MP(OR)U,our “Multi–Pipe Off–line RenderingUtility”.

3.1 Parallel radiosity with complexsurfaces

Basically, our new parallel radiosityalgorithmproceedsasdescribedin [7] with theimprovementsof [5]. We apply similar partitioning and schedulingtechniquesthatallow to deliveranoptimalloadbalancing— by minimizing idle timewaitingon locksandsynchronizationbarriers— while still exhibiting excellentdatalocality.

For simplicity reasons,we havechosento keepa similar granularityby defininganelementarytaskasa whole radiosity transferbetweentwo input surfaces. That is, agivenprocessis in chargeof computinga singleradiositytransferbetweenanemittinginput surfaceSe anda receiving input surfaceSr. At a given moment,two differentprocessesmight have to dealeitherwith thesameemittingsurfaceSe or with thesamereceiving surfaceSr. In thosecases,the lazycopyandlure (seefig. 2(a))mechanismsintroducedin [5] areexactly appliedthe sameway. Actually, implementingcomplexsurfacesinto thepreviousalgorithmcouldbedonewithoutrethinkingtheparallelizatonscheme,mostly thanksto the C++ object–orienteddesignof CANDELA, our radiosityresearchplatform[28].

Emitting surface

Receiving surface (unlocked) Lure

Emitting surface

Merge

Merged surface

Emitting surface

Receiving surface (locked)

Lure

Lure

Emitting surface

(a)Whenthereceiving surfaceis locked,shootis doneona lure.

(b) A self–shooting surfaceshootsona lure.

Fig. 2. Usingluresin parallelwaveletradiosity.

A minor differenceis that the so definedelementarytask is coarsercomparedtothe previous parallel algorithm dealingwith tessellatedsurfaces. This hasled us toimplementa sub–taskstealingmechanism,that is activatedeitherrandomly(to avoid,mostly at the beginning of the computations,sometasksto take too long), or morespecificallywhentheresidualenergyof thenext emittingsurfaceis muchlowerthanthetotalresidualenergycurrentlybeingpropagatedin parallel.But thismustbeseenratherasanimplementationthanasanalgorithmiccontribution,at leastuntil we undertakeadeeperanalysis.

A side–effectof thelure mechanism,firstly introducedfor parallelismpurpose,hashelpeddealingwith the casewherea curved input surfaceshootsenergy onto itself.Handling self–shootingsurfacesis a two–stepprocess,as shown by fig. 2(b). Thesurfaceshootsenergy on a lure asif it wasa differentsurface.Then,theenergy levels,residualsandmeshsubdivisionsof thelure arecopiedbackto theoriginal surface.

3.2 Hardware–basedvisibility with complexsurfaces

As detailedin [2], our hardware–basedvisibility algorithmis anextensionof thestan-dardhemicubemethod[11] (seefig. 3) thatremovesits inherentreliability problems.Itdetectswherethevisibility cubeis likely to provide thewronganswerandresortsto aclassicalray–tracedrequestin thoseplaces.

During theradiositycomputations,this allows to quickly answerthetwo followingkindsof visibility requests[8]:

� point to environmentor surfaceto environmentrequests,i.e. what surfacesarepotentiallyvisible,from apointor asurface?Theserequestsconsistof “clipping”thesubsetof inputsurfacesvisible from thecurrentemitter;� point to point requests,i.e. is a point visible from anotherpoint?

Obviously, complex shapessuchasarbitraryplanarsurfacesor parametricsurfaces

Projection of S1

Projection of S2

Projection of S3

S2 S3

S1

Point of interest

Fig. 3. Thestandardhemicubemethod.

cannotbedirectlyprojectedontothegraphicsboard.They haveto betessellatedbeforebeingrendered.Notethatthis tessellationis totally unrelatedto theresolutionprocess.It only servesfor thehardware–basedvisibility computations.

A unique“f alse”color is associatedwith all thetrianglesresultingfrom thetessel-lationof agivencomplex shape.Comparedto thepreviousimplementationwhereinputsurfacesweretessellatedfor theresolutionprocess,this considerablyreducestherisksof visibility errors.Indeed,a thin triangleresultingfrom atessellationis no longercon-sideredfor visibility asasingleentitybut asanelementof asetof trianglesrepresentingacomplex shape:it is lesslikely to bemisseddueto thehemicubediscretization.

If thehardware–basedvisibility algorithmcanbedirectlyappliedto arbitraryplanarsurfaces(even includingholes),caremustbe takenfor curvedsurfaces.Indeed,a raybetweena given point � anda point � locatedon a curved surface � (a sphereforexample)may intersectanotherpoint � of thesamesurface � . In this case,the pixelintersectedonthecubeis theprojectionof theclosestpointbetween� and � , which istheonly onevisible from � . Then,whenqueryinga pixel color on thehemicube,theassociatedZ–buffer valuemustbecomparedto thedistancebetweenthetwo consideredpointsto checkthevalidity of theanswer.

3.3 MP(OR)U: Multi–Pipe Off–line RenderingUtility

In [8], we have designeda parallel radiosity algorithm that allows several graphicspipelinesto perform the costly visibility requestswhile remainingcomputationsarehandledin parallelby multipleprocessors.At thattime,theexperimentsweperformed,althoughencouraging,were limited to a singlegraphicspipeline. We briefly presentheretheMP(OR)Uprogramminginterfacewe have developedto allow theimplemen-tationof theparallelalgorithmovermultiple graphicspipelines.

Description. This work is inspiredby the SGI’s MPU applicationprogrammingin-terface,which allowsseveralgraphicspipelinesto collaboratefor thedisplay, eitherofmultiple sub–imagesof a samescene(for immersive environments),or of onesingleimage(with a betterframerateor for largerscenes,dependingon thechosenmode).

Basically, we have implementedthe � � � � � � � � � � and � � � � � � � � � � classesovertheOpenGLcontext anddrawable. Thekey featureof theseclassesis thatoneinstanceof themis actuallyrelatedto asetof graphicspipelines,preferablyproviding thepbufferfacilities for betterperformance[8]. Then when one want to usea � � � � � � � � � � ,

theobjectautomaticallytries to find anavailablegraphicspipelineand,if it succeeds,makesthecorrespondingcall to ! " � � # � � $ � � � � � for effectiveconnection.Thesmalloverheadof theimplementationis thatwe have to createa displaylist of thesceneforeachgraphicspipeline: this slightly increasesthe initialization time at eachmodifica-tion of theinputgeometry.

Thekey functionalitiesof MP(OR)Uinclude:

� A requestto connectto a � � � � � � � � � � maybeblockingor not. Whennograph-ics pipelineis availableat the time of request,theprocessmayeitherwait untiloneis finally acquired,or giveup andreturn.� A systemof priority is implementedto sort the � � � � � � � � � � connectionre-quests.Whena processmakesa non–blockingrequestof a low priority, andthatrequestsof higherpriority arecurrentlypending,it directly givesup andreturns.

Exampleof use. Wehavetakeninto accounttheexperimentationswehaveperformedin [8] with onegraphicspipelineto efficiently implementour parallel radiosityalgo-rithm overMP(OR)U.

In particular, sincevisible surfacesclipping operations(seesection3.2) arecriti-cal in respectto the parallelexecution,they have beenassigneda high priority insideMP(OR)U,andtheassociatedconnectionrequestsareobviouslyblocking.Onthecon-trary, point to point visibility requestscanbe managedin softwarewhenno graphicspipelineis available,sotheconnectionrequestsarenon–blockingandhave the lowestpriority.

Finally, we try asfaraspossibleto performthevisible surfacesclippingoperationsin advance,beforethecorrespondingobjectbecomesanemitterin theparallelsolvingcomputations.Otherwise,otherprocesseshaving to dealwith thesameemitterwouldhaveto wait for theclippingoperationto becompleted.

4 Experiments

In all our experiments,we have usedthe samemachine,a SGI Origin2000with 64MIPS 195MHz R10000processorsand24 Gbytesof mainmemory, connectedto twoSGI InfiniteReality2graphicspipelineswith 2 RasterManagerseach.

4.1 The opera

Theopera sceneis a faithful modelingof anoperahouseof theclassicalstyledesignedby EmmanuelHere (18th century).Historically, this is oneof thefirst architecturalillu-minationprojectwehaveworkedon, in collaborationwith EDF (ElectricitedeFrance)andCRAI (ResearchCenterin ArchitectureandEngineering).Theoriginal 3D modeldatesbackto 1993,andhasbeencreatedfrom architecturaldrawings with ARC+TM

(from DesignLabs4). At that time, thecomputationsrequiredthemodelto bedividedin threeparts(onefor eachfloor), for separatecomputationsrequiringmany hourseach.

In this architecturalmodel(asin many others),many polygonshaving non–trivialshapesappear. Up–to–now, theseprimitiveshad to be tessellatedto be handledbyradiosityalgorithms. Whenusingarbitrarypolygons,eachof thesepolygonsis illu-minatedasa uniqueprimitive. This is illustratedin fig. 4, which shows animageof asimulationof theopera scene.

4 % & & ' ( ) ) * * * + , - . / 0 1 2 3 4 5 . + 6 7 8

Theoriginal modelhas32,429trianglesandparallelograms.Whenarbitrarypoly-gonsareallowed, the numberof input primitivesfalls to 17,272. Allowing arbitrarypolygonsdoesnot just divide thenumberof input primitivesroughlyby 2, it alsosig-nificantly reducesthe numberof final patches(seeTable1). In addition,the numberof surfacesthathave shotenergy duringthesimulationis lower by morethan25%forthe sameconvergencerate (definedas the ratio of the energy alreadypropagatedtotheinitial energy). This is animportantpoint which savesvisibility computationtime.Combinedwith thesmallernumberof inputprimitives,it explainswhy theoverallcom-putationtime is roughlydividedby 2.

The differencesbetweenpre–meshinginput primitivesbeforeradiositycomputa-tion andhandlingarbitrarypolygonalprimitivesdirectly areillustratedin fig. 4(b) andfig. 4(c). Falsegeometricdiscontinuitiesdueto triangulationappearon fig. 4(b) alongtheshadowsat thebottomleft of theright window frame.Also, problemsdueto shapesnot well suitedto visibility computationscan be seenabove the window: long andthin trianglesaremissedby thevisibility algorithm. By contrast,asshown in fig. 4(c)the extendeddomainalgorithmproducesa moreregular meshingbettersuitedto theapproximationof theradiosityfunction.

34 triangles

1 polygon

(a)A close-upontheinitial meshof theopera:with or withoutarbitrarypolygons.

(b) Radiositysolutionwithout theuseof arbitrarypolygons

(c) Radiositysolutionwith theuseofarbitrarypolygons

Fig. 4. Theoperahouse,modeledwith ARC+TM . By courtesyof EDF andCRAI.

scene geometry# inputprim.

# finalpatches

comp.time

# shootingsurfaces

Opera ord. patches32,429 819,109 476s 9,358polygons 17,272 550,159 241s 6,776

SodaHall

ord. patches272,4501,124,58544,585s 9,010polygons 199,550 927,428 30,232s 7,008quadrics 70,645 623,600 12,730s 6,535

SHwithgeom.room quadrics 71,957 827,627 13,411s 6,642

Table1. Numericalresultsfor theilluminationof theopera,the4th floor of theSodaHall anditsgeometryroom,all for thesameconvergencerateof 95%,obtainedwith 48 processorsandtwographicspipelinesof theOrigin2000.

4.2 The SodaHall

TheSodaHall sceneis amodelingof thefamouscomputersciencebuilding onthecam-pusof theUniversityof Californiaat Berkeley. Theoriginaldataset5 is in theBerkeleyproprietaryUNIGRAFIX format6, andwe have convertedit in the OpenInventorfileformat7.

For our tests,we have computedthe global illumination over the whole 4th floorwith its furniture,usingthe 9 : quadricwaveletbasis. Theoriginalmodelof thisfloorcontains272,450trianglesandparallelograms,216point light sourcesand905lightingsurfaces(includinganeonmodeledasauniquecylinder). Whenarbitrarypolygonsareallowed,thenumberof inputprimitivesreaches199,550.If in addition“curved” regionsof themodel(i.e., regionswherethemeshwasoriginally meantto approximatecurvedobjects)aremodeledby quadricsurfacepatches,thenthe numberof input primitivesfalls to 70,645.

Thenumericalresultsfoundfor thesimulationof this floor of theSodaHall sceneconfirm the resultsalreadyobserved with the opera scene(seeagainTable 1). Theillumination of this large–scalemodelprovesthat it is muchmoreefficient to handlecomplex shapesdirectly with our new techniquesthanthrougha tessellation.Betweentheordinarypatchesversionandthequadricsversionof the4th floor of theSodaHall(numberof input primitives is almostdivided by 4), the numberof final patchesisroughlydividedby 2 andtheoverall computationtime is down by 70%.

4.3 The geometryroom

Thegeometryroomis acollectionof geometricobjectsmodeledwith quadricsurfaces,usingSGDL8, that we have includedin a specialroom (420A for thosein the know)of the 4th floor of the SodaHall. It consistsof a chessset, the well–known teapot,several benchesand jars, andadditionalobjects. Overall, this collectionamountsto1,312surfaces,549 of which arecurved patches.The chesssetalone– lit up by anadditionaldesklamp– is madeof 555surfaces,including264curvedpatches.

We have illuminatedtheentire4th floor of theSodaHall with this geometryroom5Availableat % & & ' ( ) ) * * * + 6 . + 5 - ; < - 3 - = + - , > ) ? < 7 @ 3 - ;6Describedat % & & ' ( ) ) 0 ; 4 ' % / 6 . + 3 6 . + 8 / & + - , > ) ? . - & % ) , 4 & 4 . - & . ) > 0 + @ / 3 - @ 7 ; 8 4 &7Availableat % & & ' ( ) ) * * * + 3 7 ; / 4 + @ ; ) ? 6 > 1 = ) A 7 , 4 B 4 3 3 ) . 7 , 4 B 4 3 3 + % & 8 38 % & & ' ( ) ) * * * + . 0 , 3 + 6 7 8

(seefig. 5). The insideview of the geometryroom andthe close-upon the chessset(seefig. 1) prove the realismof the simulation. Comparedto the “traditional” SodaHall model,thecomputationtime of thesimulationis only 5% larger. With previouslyknown radiositymethods,it would have beenimpossibleto simulatesucha complexscenewith a similar speedandaccuracy. We have estimatedthat if thegeometryroomwas uniformly tessellatedwith a grid size of roughly 2 inches,the numberof inputprimitiveswould bemultiplied by morethan60 andthecomputationtime (for thesolegeometryroom)roughlyby 4, for only a low visualaccuracy.

Fig. 5. TheSodaHall. Original modelin UNIGRAFIX. By courtesyof CarloSequin.

4.4 The virtual Reality CenterTM

As afinal experiment,we presenta full casestudyof a cooperationwith SGI to build avirtual configurationsoftwarefor RealityCenters.For thispurpose,wehavecompletelymodeledtheRealityCenterlocatedatSGICortaillod,usingMayaTM modelingsoftware(from Alias—Wavefront9).

The RealityCentersceneis a perfectillustrationof what a realmodelcanbe. In-deed,it is composedof a largesetof quadricsurfaces(thesphericalscreen,sphericalwalls, the conicaldesk,chairsmadeof ellipsoid andtorus,etc.). The correspondingtessellated3D modelswould containmore than100,000surfaces(for a mediumap-proximation),while our modelcontainslessthan3,000objects.

As expected,our algorithmbehavesvery well with this scene.Theexecutiontime,with asingleprocessorandnohardware–basedvisibility, is of aboutonehour. Figure6shows thespeed–upcurvesfor zero,oneandtwo graphicspipelines,with onehourasthesequentialreferencetime,evenwhenhardware–basedvisibility is used.

First, we examinememoryproblemsthat could explain thesecurves. As in ourprevious work [7, 8] andwhatever the machineconfigurationused,the datalocality

9 % & & ' ( ) ) * * * + 4 * + . 0 / + 6 7 8

1

4

8

12

16

20

24

1 4 8 12 16 20 24 28 32

Spe

ed-u

pC

Processors

Linear0 pipe1 pipe

2 pipes

Fig. 6. Speed–upcurvesfor theRealityCenterTM .

exhibited by the applicationis very high, sincemore than95% of dataaccessesaresatisfiedin theL1 andL2 caches.Moreover, asexplainedin [6], usinga largervirtualpagesize(1 Mbytes)thanthedefault one(16 Kbytes)largely decreasesthenumberofTLB missesandconsequentlytheexecutiontime. Finally, usingtheenhancedversionof the LINUX/GNU libc allocator, also describedin [6], guaranteescontentionfreememoryoperationswhile keepingfragmentationlow.

Thespeed–upcurvewhenusingthesoftware–basedvisibility is not thatgood,sinceit remainsunder16, even with 32 processors.The times lost on locks andon the fi-nal synchronizationbarrierdo not explain this performancedegradation.Actually, the� � � D � � figuresshow anincreaseof thenumberof total cycleswith thenumberof pro-cessors.This seemsto bedueto anoverheadof our parallelalgorithmthat introducessupplementarywork, whena processmakesa lure on a surfacethat is actuallytotallyoccludedfrom the emitter. We will try, at the following tuning phase,to remove thecostof this unnecessarywork.

Usingoneor two graphicspipelinesgivesa super–linearspeed–up10 with oneandfour processors.The benefitis not so tremendouscomparedto what it can be withthe SodaHall, mostly dueto the smallestsizeof the model. Then,whenthe numberof processorsincreases,it becomeslessand lessinterestingto usea singlegraphicspipeline,comparedto the software–basedvisibility. Indeed,we obtainthe sametimewith 16 processors,andit is evenworseto usea singlegraphicspipelinefrom 16 to 32processors.This is dueto thefactthatthegraphicspipelinebecomesahugebottleneck,andit is reallycritical for suchshortexecutiontimes(about220seconds).

Nevertheless,whenusingasecondgraphicspipeline,lesstimeis lostto synchronizetheuseof graphicsresources,andtheobtainedspeed–upcurve is muchbetter(22 with32 processors).This allows the wholeRealityCentersceneto besimulatedin only 2minutes. This definitely allows a semi–interactive renderingof the model. The userhasthenthe opportunityto changethe simulationparametersor the photometricandgeometriccharacteristicsof the input data,andto run the simulationagainuntil he istotally satisfied.Figure7 shows imagesof theobtainedresults.

10With thereferencetimebeingtheexecutiontime usingthesoftware–basedvisibility.

Fig. 7. Thevirtual RealityCenterTM , modeledwith MayaTM . Imagesby courtesyof SGI.

5 Conclusionand futur e works

In conclusion,we have presenteda parallelandgraphicshardwareacceleratedalgo-rithm to conductwavelet radiositycomputationson scenesmadeof complex shapes.Thealgorithmrelieson two innovativeresearchworkson high orderwaveletradiosity— theextendeddomainalgorithmandtheconstantJacobianmapping— that removetheneedto tessellatethecomplex input surfaces,resultingin moreaccuratecomputa-tions,with fasterconvergence,andwith smallermemorycosts.Theimplementationin-tensively usesall availablehardwareresources,bothprocessorsandgraphicspipelines,to reducethecomputationtimesasmuchaspossible.

In our futurework, we wantto continueto improvethescalabilityof our approach,in termsof both available hardware (processorsand graphicspipelines)and size ofinput data(from thesmallsingleroomto thewholebuilding). We shallconvergeto anapplicationthatnearlyallows real–timerenderingof small radiositysolutions,andcanalsocomputehighly precisesolutionsof hugedatasetsin a reasonabletime.

Wealsowantto exploreacombinationof ouralgorithmwith otherefficientradiositytechniques,like clustering,to increasethesizeof thedataour applicationcanhandle,or discontinuitymeshingto increasetheprecisionof thefinal solution.

Finally, wewantto extendouralgorithmto agreaternumberof parametricsurfaces,suchasSplinepatches,Bezierpatchesor NURBS,to beableto handleany kind of inputsurface. An orthogonalresearchdirectionmay be to approximatethesehigh–degreeparametricsurfacesby quadrics,asrecentlyproposedin [19].

Acknowledgments

The authorswould like to thank the CentreCharlesHermite for providing accesstoits graphicsand parallel resources.We are also grateful to all the peoplethat havecontributed to the modelingof our test scenesand have given us permissionto usethem. Very specialthanksto Jean–ClaudePaul, thathasstartedandmotivatedall thisresearchonwaveletradiosityandhighperformancegraphics,andto all themembersofhis fabulousISA researchteamthathavebeeninvolvedin thesuccessof CANDELA.

References

1. L. Alonso, F. Cuny, S. Petitjean, N. Holzschuch, and J.-C. Paul. Constant ja-cobian mappings: Accurate wavelet radiosity on parametric surfaces. Submit-ted to the 11th Eurographics Workshop on Rendering, 2000. Available fromhttp://www.loria.fr/˜holzschu/Publications/jacobian.ps.gz.

2. L. Alonso and N. Holzschuch. Using graphicshardware to speed-upyour visibilityqueries. Journal of GraphicsTools, 1999. Acceptedfor publication.Also available ashttp://www.loria.fr/˜holzschu/Publications/99-R-030.pdf.

3. D. R. Baum,S.Mann,K. P. Smith,andJ.M. Winget. MakingRadiosityUsable:AutomaticPreprocessingandMeshingTechniquesfor theGenerationof AccurateRadiositySolutions.ComputerGraphics(ACM SIGGRAPH’91 Proceedings), 25(4):51–60,July1991.

4. K. BouatouchandS. N. Pattanaik. DiscontinuityMeshingandHierarchicalMultiwaveletRadiosity. In W. A. Davis andP. Prusinkiewicz, editors,Proceedingsof GraphicsInterface’95, pages109–115,SanFrancisco,CA, May 1995.MorganKaufmann.

5. X. Cavin. LoadBalancingAnalysisof a ParallelHierarchicalAlgorithm on theOrigin2000.In Fifth EuropeanSGI/CrayMPPWorkshop, Bologna,Italy, September1999.

6. X. Cavin andL. Alonso. ParallelManagementof LargeDynamicSharedMemorySpace:aHierarchicalFEM Application. In SeventhInternationalWorkshopOn SolvingIrregularlyStructuredProblemsIn Parallel (IRREGULAR’2000), Cancun,Mexico, May 2000.

7. X. Cavin, L. Alonso,andJ.-C.Paul. Parallelwavelet radiosity. In Proceedingsof theSec-ondEurographicsWorkshopon Parallel GraphicsandVisualisation, pages61–75,Rennes,France,Sept.1998.Eurographics.

8. X. Cavin, L. Alonso,andJ.-C.Paul. OverlappingMulti-ProcessingandGraphicsHardwareAcceleration:PerformanceEvaluation.In FirstParallel VisualizationandGraphicsSympo-sium(PVG’99), SanFrancisco,California,October1999.

9. A. ChalmersandE. Reinhard. Parallel and DistributedPhoto-RealisticRendering. ACMSIGGRAPH’98CourseNotes,July1998.Course3.

10. P. H. Christensen,E.J.Stollnitz,D. H. Salesin,andT. D. DeRose.WaveletRadiance.In FifthEurographicsWorkshoponRendering, pages287–302,Darmstadt,Germany, June1994.

11. M. CohenandD. P. Greenberg. TheHemi-Cube:A RadiositySolutionfor Complex Envi-ronments. In ComputerGraphics(ACM SIGGRAPH’85 Proceedings), volume19, pages31–40,Aug. 1985.

12. F. Cuny, L. Alonso, and N. Holzschuch. A novel approach makes higher or-der wavelets really efficient for radiosity. Computer Graphics Forum (Euro-graphics 2000 Proceedings), 19(3), Sept. 2000. To appear. Available fromhttp://www.loria.fr/˜holzschu/Publications/paper20.pdf.

13. T. A. Funkhouser. Coarse-GrainedParallelismfor HierarchicalRadiosityUsingGroupIter-ative Methods.In ComputerGraphicsProceedings,AnnualConferenceSeries,1996(ACMSIGGRAPH’96 Proceedings), pages343–352,1996.

14. S.GibsonandR. J.Hubbold.Efficienthierarchicalrefinementandclusteringfor radiosityincomplex environments.ComputerGraphicsForum, 15(5):297–310,Dec.1996.

15. S. J. Gortler, P. Schroder, M. F. Cohen,andP. Hanrahan.WaveletRadiosity. In ComputerGraphicsProceedings,Annual ConferenceSeries,1993 (ACM SIGGRAPH’93 Proceed-ings), pages221–230,1993.

16. P. Hanrahan,D. Salzman,and L. Aupperle. A Rapid HierarchicalRadiosityAlgorithm.ComputerGraphics(ACM SIGGRAPH’91 Proceedings), 25(4):197–206,July1991.

17. J. M. Hasenfratz,C. Damez,F. Sillion, andG. Drettakis. A practicalanalysisof clusteringstrategiesfor hierarchicalradiosity. ComputerGraphicsForum(Eurographics’99 Proceed-ings), 18(3):C–221–C–232,Sept.1999.

18. N. Holzschuch,F. Cuny, and L. Alonso. Wavelet radiosity on arbitrary planar sur-faces.Submittedto the11thEurographicsWorkshopon Rendering, 2000. Availablefromhttp://www.loria.fr/˜holzschu/Publications/surfaces.ps.gz.

19. B. LacolleandN. Szafran.Approximatingparametricsurfacesby meansof quadricpatches.In Proc.of FourthInternationalConferenceonCurvesandSurfaces, St.Malo,France,1999.

20. D. Meneveaux,K. Bouatouch,andE. Maisel. Memorymanagementschemesfor radiositycomputationin complex environments.In Proc.ComputerGraphicsInternational’98 (CGI’98), pages706–714,Los Alamitos,CA, June1998.IEEEComputerSocietyPress.

21. S.Schaefer. Hierarchicalradiosityoncurvedsurfaces.In J.Dorsey andP. Slusallek,editors,RenderingTechniques’97 (Proceedingsof the Eighth EurographicsWorkshopon Render-ing), pages187–192,New York, NY, 1997.SpringerWien. ISBN 3-211-83001-4.

22. F. Sillion. A UnifiedHierarchicalAlgorithmfor GlobalIlluminationwith ScatteringVolumesandObjectClusters.IEEETransactionsonVisualizationandComputerGraphics, 1(3),Sept.1995.

23. J.P. Singh,C. Holt, T. Totsuka,A. Gupta,andJ.Hennessy. LoadBalancingandDataLocal-ity in Adaptive HierarchicalN-body Methods:Barnes-Hut,FastMultipole, andRadiosity.Journalof Parallel andDistributedComputing, 27(2):118,June1995.

24. B. Smits, J. Arvo, andD. Greenberg. A ClusteringAlgorithm for Radiosityin ComplexEnvironments.In ComputerGraphicsProceedings,AnnualConferenceSeries,1994(ACMSIGGRAPH’94 Proceedings), pages435–442,1994.

25. M. Stamminger, P. Slusallek, and H.-P. Seidel. Bounded radiosity - illuminationon general surfaces and clusters. Computer Graphics Forum (Eurographics ’97Proceedings), 16(3):C309–C317,1997. Available from http://www9.informatik.uni-erlangen.de/eng/research/pub1997.

26. A. Willmott and P. Heckbert. An empirical comparisonof progressive and wavelet ra-diosity. In J. Dorsey andP. Slusallek,editors,RenderingTechniques’97 (Proceedingsofthe Eighth EurographicsWorkshopon Rendering), pages175–186,New York, NY, 1997.SpringerWien. ISBN 3-211-83001-4.

27. A. Willmott, P. Heckbert,andM. Garland.Faceclusterradiosity. In RenderingTechniques’99, pages293–304,New York, NY, 1999.SpringerWien.

28. C. Winkler. Experimentationd’algorithmesdecalculderadiosite a based’ondelettes. PhDthesis,Institut NationalPolytechniquedeLorraine,1998.

29. H. R. Zatz. GalerkinRadiosity:A HigherOrderSolutionMethodfor Global Illumination.In ComputerGraphicsProceedings,AnnualConferenceSeries,1993(ACM SIGGRAPH’93Proceedings), pages213–220,1993.