graphically speaking - university of wolverhamptonin5445/research_cg_and_a/2007/g3070.pdfgraphically...

70 May/June 2007 Published by the IEEE Computer Society 0272-1716/07/$25.00 © 2007 IEEE

Graphically Speaking Editor: Miguel Encarnação

T he rapid evolution of information and communica-tion technology has always been a source for chal-

lenging new research questions in computer science.What happens regularly is that a new generation of tech-nology makes it suddenly possible to process, store,and/or transmit much larger amounts of information.Thus, a gradual quantitative increase can turn into a sud-den qualitative leap, simply because things become pos-sible that were not possible before. This situationnaturally raises novel questions on the conceptual level,namely how to administer, structure, and access thesequalitatively greater amounts of information.

The nightmare from a computer science point of viewis the data grave: Information that is physically presentis nevertheless “lost” because it is simply not accessiblethrough reasonable efforts. A practical criterion todetect a data grave situation is that information is typ-ically reacquired, rather than found through searchingin the archives. An example would be downloading adocument from the Internet instead of searching for iton the local hard drive.

The data grave problem can be solved for textual doc-uments by indexing and retrieval services that nowa-days are more and more directly integrated into desktopmanagers and file systems. But what is the equivalentof full-text search capabilities for 3D documents? Thevision of the emerging research field of Semantic 3D isto establish the notion of generalized 3D documents thatare full members of the family of generalized docu-ments. This means that access would be content-basedrather than based on metadata. An example is the fol-lowing Web query: http://www.epoch-net.org/DL/search?q=venus+milo#head.

Such a query shall retrieve progressively a full modelof the Venus de Milo and smoothly zoom in to its headto show the cheeks, which are decently highlighted. Theuser can then interactively inspect the cheeks in 3Dunder different lighting conditions to highlight smallsurface details. The same would remain possible evenwhen the Venus model is edited or incorporated intoanother 3D scene.

The purpose of this article is to highlight the researchissues that impede the realization of this vision today.

3D is gaining momentumExamples of technological revolutions that sudden-

ly required new conceptual approaches are many, the

most prominent examples being database technologyin the ’80s and the World Wide Web in the ’90s. Weargue that the next major technological revolution willbe triggered by massive numbers of huge 3D data setsthat will be generated in the near future. New tech-nologies have the potential to finally overcome the mod-eling bottleneck—the fact that the creation of digital3D objects was much too expensive for the longest time.Sources for really heavy amounts of 3D data will be pri-marily 3D scanning, photogrammetry, and procedur-al/parametric shape design (see the “Sources forMassive 3D” sidebar for more details). Not only is it eas-ier to produce digital shapes, also the possibilities to useand benefit from the created 3D data sets are increas-ing. A growing shift toward 3D can be detected.

3D on the desktopIt has become commonplace to regard computer

games as the main driving force behind 3D hardware’sgrowth in the market for the past several years. Thetruth is that the widespread availability of 3D accelera-tors is the technical prerequisite for a much more fun-damental movement. After Apple’s pioneering work inMac OS X, Microsoft’s Vista made 3D an integral part ofthe Windows desktop, and AIXgl and Xgl may do thesame for Linux. Instead of being optional, 3D on thedesktop will actually become a standard; games tech-nology will enter the operating system.

Mass customizationNowadays, the industrial workflow is digital all the

way from design to production and control.Computerized numerical control (CNC) machines arethe standard, not the exception. This makes it possible,at least in principle, to change the production parame-ters from one item to the next. As a consequence, eachproduced item can be unique. This situation is alreadya reality in the automotive industry. The variations areso numerous that hardly any two cars from the sameseries are completely identical.

FabbingOne step beyond customizing a few parameters in a

fixed production process is a single versatile process thatcan create basically anything. This (hypothetical)machine is called a fabricator, or fabber for short. Themost advanced approximation to this vision today is

Sven HavemannGraz Universityof Technology

Dieter W. Fellner DarmstadtUniversity ofTechnology andFraunhofer IGD

Seven Research Challenges of Generalized 3D Documents ____________________________________

IEEE Computer Graphics and Applications 71

layered manufacturing, formerly used for rapid proto-typing. The process of “printing” a 3D object is now somature and versatile that designers can print not onlymockups (nonfunctional prototypes), but also regularend products in a variety of materials.

Seven urgent research problemsConventional 3D technology simply does not scale.

Algorithms that depend on the fast accessibility of awhole 3D model have no chance with objects that aremany times larger than the main memory. Moore’s lawstates that computers become more powerful at expo-nential rates. A consequence is that more and more ofwhat had to be done at preprocessing stages in the pastwill sooner or later be possible at runtime. But thisrequires reengineering most legacy software: Insteadof the Fortran-like input � processing � output para-digm, we will see an algorithmic shift toward on-demand methods and lazy evaluation with updateableand refineable data structures. At first only a coarse ver-sion of the output is computed. Anytime later, the out-put can be queried, leading to reevaluating some partsof the input to refine the requested part of the output.

Out of core, streaming, multiresolution, progressive,parallelization, and GPGPU are concepts that point tosolutions to the algorithmic problems. There is, howev-er, a more serious problem: A database that treats 3Dobjects as anonymous binary large objects, or blobs,might be fine for managing a few hundred 3D objects,but with 10 or 20 thousands of them, it is simply useless.

This problem turns out to be more difficult to solvethan the engineering problems we’ve mentioned. It

touches apparently on deeper paradigmatic questions.In such a situation, a change of perspective can be agood idea. Understanding a 3D model as a document ina generalized sense has proven quite useful.1 Alongthese lines, databases and repositories are regardedmore like digital libraries that contain generalized dig-ital documents of many (media) types and formats.

The benefit is a more unified, content-centered view,where the specific modality of a piece of information isless of a concern. What is important is that a digital libraryallows retrieval of multimedia data sets based on theirinformation content, their semantics (deep integration),and not only through its metadata (shallow integration).Access to the content of a 3D object or scene, however, istechnically not so easy. The following problems inhibit aseamless integration of 3D objects into digital libraries.

1. ‘3D data set’ can have many meaningsA fundamental difference between 3D and other media

formats is that there is no canonical 3D representation.This is different from, for example, bitmap images wherethe ground truth is a rectangular grid of color samples(pixels). Bitmap representations differ only in theirencoding—for example, the color precision and whetherlossy (such as JPEG) or not (such as PNG). Most vector-based 2D formats (such as SVG, PostScript, PDF) areequivalent as well; their main data representation is aclosed polygon bounding a colored region in 2D.

For 3D objects the situation is much more complicated.Representations of 3D objects can be roughly divided intosurfaces, volumetric, and structured. The latter categoryexists on top of the first ones, as a scene graph for instance

Sources for Massive 3DSources generating massive 3D generally fall into three

categories.

ScanningActive shape acquisition methods involve sending beams

of light into a scene and measuring the time until thereflected rays return (time of flight method), or from wherethey return (triangulation method or structured lightmethod). 3D scanning is used widely, although its use isoften not apparent because applications range fromindustrial quality control to biometric security checks. Asetup can be as simple as a digital video camera and a laserpointer with a plastic fixture to fan the single ray into a raystripe. When moving the stripe over an object, a red linemarks the intersection of the object with a 2D plane fromwhich a point cloud can be generated even in real time.

PhotogrammetryPassive methods do not even have to send light into the

scene. A sequence of a few photographs can be sufficientfor recent algorithms to create a 3D object within a fewminutes. Modern consumer-level digital cameras provideenough high-resolution detail for a decent featuredetection, so that the device can accurately determinerelative motion from one image to the next. With recent

dense matching algorithms, a depth value can bedetermined for every pixel in each of the images. Whenthese range maps are merged, the result is a very densepoint cloud.

Procedural shape designConventional manual 3D modeling does not scale,

because the modeling effort is only linearly proportional tothe amount of geometric detail that the modeling canproduce. A fact not widely known is that the vast majorityof consumer products in the world is created using the fewmajor software packages for industrial design: AutoCAD,Pro/Engineer, Catia, Unigraphics, and SolidEdge. Thesetools gain their superior productivity through parametricdesign. Introduced by Pro/Engineer in the early 1990s,parametric design was quickly adopted by its competitors.The great innovation of procedural and parametricmodeling is that it lets designers reuse solutions to similarmodeling problems that have already been solved. Thismethod also lets designers adapt models much easier tochanging specifications, since ideally necessaryadjustments will be limited to only a few high-levelparameters from which the shape is instantly regenerated.Today, parametric design is more and more available inentry-level 3D software, and it will soon reach a state whereit permits everyday users to quickly create appealing andextremely detailed shapes.

Graphically Speaking

72 May/June 2007

can contain shapes of various different types. A short tax-onomy of 3D representations might read as follows:

■ Surfaces:– Discrete: point clouds, surfels, range maps,

triangle meshes, b-rep meshes, subdivision sur-faces, primitives (spheres, cylinders, and so on)

– Parametric: B-splines, NURBS– Implicit: metaballs, moving least squares

■ Volumetric:– Discrete: voxelgrids, tetrahedral grids, octrees, BSP

trees, adaptively sampled distance fields– Parametric: trivariate splines, free-form deforma-

tions– Implicit: F-rep, radial basis functions

■ Structured: articulated figures, deformable models,Boolean set operations (such as constructive solidgeometry, or CSG), procedural shapes, generativemodels, CAD models, scene graphs, computer games,and complete virtual worlds

These shape categories and representations are not allequivalent, because they differ in their semantics, mean-ing that their content is really different. For examples, aclosed surface bounds a volume, a surface with bound-aries does not. A volumetric data set contains many sur-faces at the same time (for example, iso-surfaces). Theseembedded surfaces cannot have self-intersections, butparametric and discrete surfaces can. All discrete repre-sentations are related to sampling theory, and algorithmson them are typically iterative. Continuous representa-tions, on the other hand, are noiseless and admit ana-lytic reasoning (such as differentiation). And thestructured representations can add to this complexitythings like evolution over time, hierarchical transfor-mations, or the concept of interactivity.

This taxonomy must be much refined so that it some-how covers all known 3D representations. We must bet-ter understand the relations between the knownrepresentations. And we must also better understandwhat defines the content of a 3D data set—more specif-ically, when can two 3D data sets be considered to havethe same semantics (the same content), even if theyhave different representations?

2. A sustainable 3D file formatA remarkable and very sad fact is that there is no sin-

gle, commonly accepted, comprehensive 3D file format.All of the most popular 3D tools have their own propri-etary file formats. When exchanging data between toolsA and B, some format F must be chosen that A can exportand B can import. So F is a function of the particular ver-sions of A and B, and the versions of the respectiveexporters and importers. But with almost any choice, aloss of information is unavoidable. And it is close to illu-sionary to assume that an export A � B � A will repro-duce the identical data set.

As can be expected, this is a huge problem in practice.It is only partially solved for the aforementioned high-end CAD tools, which almost exclusively use NURBS.The solution required longstanding, mature exchangestandards such as STEP (the International Standard for

the Exchange of Product Model Data) and IGES (theInitial Graphics Exchange Specification). The standardsare so comprehensive and elaborate that for a softwarecompany to support them (to some degree) is a predi-cate of excellence in itself.

The file format problem is, and has been, completelyignored by academia. This neglect is a deep problemthat requires careful fundamental reasoning. The focusin academic 3D modeling has been more to furtherextend the already huge “zoo” of shape representations,rather than to design file formats to store and retrievethese shapes. This explains also why so few of the “aca-demic” shape representations have found their way intopractice (apart from patenting issues).

The task of designing a new sustainable 3D file for-mat entails defining

■ fundamental requirements to demand from all sup-portable 3D representations,

■ a basic set of mainstream shape representations thatserve as examples,

■ a customizable encoding that can accommodate allattribute variations (such as vertex normals and facenormals), and

■ a well-defined recipe for extending the basic set.

3. Representation-independent stable 3Dmarkup

Assuming we know what we mean when we speakabout 3D data and that we can store and retrieve the datareliably, the next problem is how to tie additional infor-mation to parts of a 3D model. From the generalized doc-ument point of view, this is simply a hyperlink leadingfrom a part of one document (the link source) to a partof another (the link target). This implies the capabilityof referencing parts and regions in generalized docu-ments. For the Venus de Milo Web query for example,Figure 1 illustrates the problem of specifying where thestatue has its cheeks. Triangles are transient objects, andthey provide only a very fragile reference.

So, the problem is to define a stable 3D markup. Thisgoal means finding a method to reference a portion of adigital 3D artifact, irrespective of the particular shaperepresentation used. Yet, the method should let usersdiscriminate detailed surface features such as ridges,creases, and corners, since they can be of great impor-tance in determining a shape’s semantics. The referenceshould survive simple editing operations such as cuttingand affine transformations.

The drawback of the requested generic markup is, ofcourse, that it cannot exploit the specifics of a particularshape representation (such as “the largest triangle,” or“vertices contained in a box”) but only intrinsic surfaceproperties (such as “the point of maximal curvature”).

4. Representation-independent 3D queryoperations

Editing a shape that carries a markup cannot alwaysbe avoided. In these cases, the shape reference must beupdated. But how is this possible if the referencingmethod is unknown beforehand? The solution is thatany complex shape-editing procedure must keep track

of the geometric primitives that the editing affects. Thenthese primitives can be converted back to a reference(of the same type).

For example, in a huge triangle mesh of a Greek tem-ple, mark all the column capitals “Corinthian capital”using axis-aligned bounding boxes as the referencingmethod. Now, cut out a whole section from one column

of the temple. Before the cut, the triangles inside themarkup box are enumerated. After the cut, the systemgenerates a new bounding box around the remainingtriangles, thus updating the reference and restoring the“Corinthian capital” markup.

The crucial point is that the shape representationmust be able to enumerate the shape components in the


1 Creating a scannedartifact. (a)Original inputdata. Three of 20 noisyrange maps,untextured. (b) Simplifiedversions of therange maps,textured anduntextured. A rectangularregion ismarked indifferent triangulations.(c) Severalrange maps areintegrated andsmoothed. The gravestoneis manuallysegmented fora semanticmarkup.

(a)

(b)

(c)

reference. This method also serves to answer spatialqueries, in this case a bounding-box query. Other queriesare closeness to a point, containment in a frustum, orray intersection. Some spatial queries can be quite com-plicated to compute—for example, the intersection of aNURBS patch and a bounding box, which can result ina collection of trimmed patches. So, the open questionis, What is the set of shape queries that every shape rep-resentation should be required to answer?

5. Documenting provenance and processinghistory

Imagine a complex 3D scene is collected and assem-bled from different, heterogeneous sources (such asphotogrammetry, laser scanning, and manual repair),and then archived. Much later, somebody discovers animportant shape detail, cuts it out from the larger shape(by manual segmentation), and sends it electronically toan expert. It is, of course, vital that the expert can tracethe provenance of the different parts of the shape toassess its degree of authenticity.

More precisely, the problem can be stated as follows:First, find or define a suitable standard for describing theprovenance of digital 3D data. Second, define a standardway of recording how the data have been processed, andhow they were combined to obtain the resulting 3D dataset. Ideally, the processing history is complete—that is,it has the replay property, which enables regeneratingthe result and varying the parameters.

The enormous complexity of this problem might notbe apparent. First, the system must cope with two levelsof heterogeneity, namely the various shape representa-tions and the various operations on these representa-tions. Shape-editing operations are not canonical: Every3D software has its own great mesh-editing functions, itsNURBS intersection routines, and its own CSG imple-mentation. Second, with interactive editing, storingeach manual processing step is barely feasible and hard-ly useful. Third, replaying is extremely difficult, requir-ing that all tools in the chain support the processinghistory and add to it. This process will fail with the weak-est link in the chain—not to mention issues with soft-ware versions, humans in the loop, operating systems,and undocumented ad-hoc scripting. Michael Prattreports more subtle problems in his great article on thestate-of-the-art in CAD model exchange.2

6. Consistency between shape and meaningAssume that the meaning of a shape is known, either

from manual annotation or from automatic extraction.The questions arise: Where is this information stored? Andfollowing which standards? Standardization is of primeimportance here to share and exchange the acquired high-er-level knowledge. Document collections make senseonly if they are searchable—that is, when a query canretrieve all the 3D objects that are classified as “columns”or “capitals,” for example, using these keywords.

It might be worth exploring whether and how emerg-ing cultural documentation standards such as theConceptual Reference Model of the InternationalCommittee for Documentation (CIDOC-CRM)3 can beextended to cover 3D object semantics. The temple

example illustrates the great semantic density of 3Dobjects. Some physical remains are scanned while miss-ing parts are typically modeled in a CAD program, basedon some reasonable assumptions. The identical copiesof a column all share the is-instance-of relationship tothe original. In a procedural temple reconstruction, cer-tain parameters such as the length may be computed asa function of the height, based on a textbook on Greekproportions from the 19th century. An excavated,scanned fresco might in fact be put back into several dif-ferent places based on alternative, equally plausiblehypotheses. Entity-relationship models such as the CRMappear to be offering a way to model these sorts of infor-mation in a consistent way.

But where are the authoring tools for such dense 3Dknowledge networks? How can we query, visualize, andnavigate through them? And how is the knowledge basekept consistent in case a 3D scene is edited? The over-laps-with relationship must be removed when a pair ofoverlapping objects is separated by changing a trans-formation matrix—and vice versa, of course.

7. Closing the semantic gapShape acquisition is very much like 3D photography.

A 3D scan is a snapshot that takes color and depth sam-ples from surfaces. It glues together all the differentobjects in a scene: The cup becomes one with the tableit stands on; the door is a priori indistinguishable fromthe wall. A single triangle or a point sample does notknow which semantic unit it is taken from.

This problem, the difference between a shape and itsmeaning and between representation and content isknown as the semantic gap. It is a high-level problem thathas consequences on the lowest level: Mesh simplifica-tion tries to remove redundant triangles, postulating thatredundancy can be measured with a mathematicalexpression. This is highly problematic, of course, from amethodological point of view, since the validity of thisassumption can be neither proved nor disproved.

What is a shape feature? Invariance of certainshape features is often stated as a goal of shape-pro-cessing algorithms. Simplification maintains crease fea-tures in a mesh, and smoothing removes high-level noisebut leaves low frequencies untouched. But the termshape feature as such is never defined and appears toexist as independent idea. Shrinking is perceived as anartifact, or unwanted side effect, of Laplacian smooth-ing. But does this make Laplacian smoothing wrong? Ofcourse, not; it does what it does.

A more complete theory of the meaning of shape isneeded that goes beyond sampling and curvature. Agood starting point might be the notion of saliency thathas received some attention recently.4 Another mightbe to formalize the very precise ideas (and language!)that many “shape professionals” have about shape—forexample, in the car industry.

Shape matching and 3D retrieval. Assume themeaning of a (high) number of shapes is known. Couldthis help in deriving the meaning of new shapes? Oneway to approach shape semantics is to classify a large


74 May/June 2007

number of shapes, typically by quantifying certain prop-erties. The resulting feature vectors are then treatedwith the machinery from high-dimensional statistics—for example, to identify clusters of shapes. The assump-tion is again that similar shapes also have similar featurevectors. This method also yields an implicit descriptionof shape. Only very few methods also permit explicitlysynthesizing shape, so that the feature space actuallybecomes a parameter space.

Content-based retrieval is one of the distinguishingfeatures of a digital library. In the case of 3D, this meansshape retrieval. So, an in-depth survey on the promis-ing results in this hot and vibrant research field over thepast several years is necessary.5

The ultimate goal: High-level editing ofscanned shapes. We propose as a guiding vision thefully automatic segmentation of a scanned scene in sucha way that each and every object is recognized, itsdegrees of freedom are identified, and it is presented to

the user readily editable in a way that respects its inher-ent structure. One generic model of a table, for instance,might have the high-level parameters width, height, andlength. With many repetitive structures, like buildingfacades, ship hulls, or ornaments, the changes theyundergo when the overall dimensions change are alsovery obvious. Following this idea, an idealized shapetemplate can be used to semantically enrich a scannedmodel. When the parameters of a generic table are adapt-ed to match a specific scanned table, each scan point notonly knows where it belongs (leg, table top, table sides,and so on), it ideally also knows where it has to go whenthe parameters change. Then, at least in principle, it iseven possible to separate structure from appearance, asFigure 2 suggests.

3D as a full member of the generalizeddocuments family

What is the great reward when all these difficult prob-lems will be solved? At that point, 3D documents can be


2 Separation of structure fromappearance. A generic chair model,parameterized with five 3D points,is adapted to given chairs takenfrom Fiell and Fiell’s 1,000 Chairs(Taschen Verlag, 1997). Surprisinglydifferent objects share the samestructure, so they are close on astructural level. This closeness in thesemantic sense requires completelynew distance metrics.

used in much the same way as any other document type.Full markup, indexing, and retrieval capabilities willallow a deep integration of 3D into digital libraries.Digital document technology, however, is to a largeextent based on XML. The homepage of the World WideWeb Consortium (http://www.w3.org) lists more thanfive dozen XML-related technologies, and it is mostinspiring to speculate about their significance for 3D.Many of these technologies are directly applicable:

■ XInclude: Transparent inclusion of remotely storedsubscenes. When an object is unreachable, a locallystored low-resolution approximation is used auto-matically.

■ XPointer/XPath: Flexible referencing of subparts of a3D object; very general framework for references, forexample, for spatial indexing of nodes or of wholesubscenes; transparent access to objects stored in thefile system, in a database, or on a Web server.

■ Extensible Stylesheet Language Transformations(XSLT): Personalized rendering, and context-depen-dent views; can deal with different versions of objects;may provide information in different languages.

■ XLink: Multidirectional links between several docu-ments, for example, to relate artifact and interpreta-tion in a bidirectional way; stored externally, so itworks also for read-only documents (DOI); link-on-link, to refer to the fact that a link exists.

■ XML encryption: To protect intellectual propertyrights: signing of documents, certificates of authen-ticity.

■ Web services: All sorts of operations on XML: filter,process, create views; actions to perform can beencoded in a URL; can generate XML scenes dynam-ically as response to a Web query; session manage-ment, billing, cookies, user rights management.

Now combine this digital document technology (3Dand XML) with the simple fact that the world is three-dimensional. The result is a truly fascinating vision of3D computer graphics as the means for interacting withthe world of information. Semantically enriched 3D

objects will act as proxies to create, store, query, access,and edit pieces of data more efficiently than ever before.The most important prerequisite is just that the humanand the computer both know, and agree on, the mean-ing of the 3D objects used for communication. ■

AcknowledgmentsThis article is based on an Epoch project paper pub-

lished in the Proceedings of the 7th InternationalSymposium on Virtual Reality, Archaeology and CulturalHeritage (VAST 2006). The work was supported by theproject Probado (funded by the German ResearchFoundation) and the Epoch network of excellence(funded by the European Commission under IST-2002-507382).

References1. Strategic Research Initiative V3D2: Distributed Processing

and Delivery of Digital Documents, D.W. Fellner, coordina-tor, German Research Foundation, 1997-2004; http://v3d2.tu-bs.de.

2. M. Pratt, “Extension of ISO 10303: The Step Standard, forthe Exchange of Procedural Shape Models,” Proc. Int’l ConfShape Modeling and Applications (SMI), 2004, pp. 317-326.

3. N. Crofts et al., ed., Definition of the CIDOC Conceptual Ref-erence Model, version 4.2, CIDOC Documentation Stan-dards Working Group, June 2005; http://cidoc.ics.forth.gr/docs/cidoc_crm_version_4.2.pdf.

4. R. Gal and D. Cohen-Or, “Salient Geometric Features for Partial Shape Matching and Similarity,” ACM Trans.Graphics, vol. 25, no. 1, 2006, pp. 130-150.

5. B. Bustos et al., “Content-Based 3D Object Retrieval,” to bepublished in IEEE Computer Graphics & Applications,July/Aug. 2007.

Contact author Sven Havemann at [email protected].

Contact department editor Miguel Encarnação at [email protected].


76 May/June 2007

Coming Next Issue: 3D DocumentsDigital libraries, in general, and technical or cultural preservation applications, in particular, offer a rich set

of multimedia objects like audio, music, images, videos, and 3D models. Instead of handling these objects consistently as regular documents—most applications handle them differently. As more artifacts in the technicaland engineering world are digitally born, content categorization, abstraction, and adequate representation isincreasingly vital. The July-August 2007 issue of IEEE Computer Graphics and Applications will provide in-depthcoverage of this timely topic in its SpecialIssue on 3D Documents. IEEE