a formal conceptual model and definitional...

14
G E O M A T I C A A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL FRAMEWORK FOR SPATIAL DATACUBES Mehrdad Salehi, Yvan Bédard and Sonia Rivest Département des sciences géomatiques, Université Laval, Québec Spatial datacubes extend the datacube concept underlying the field of Business Intelligence (BI) into the realm of spatial analysis, geographic knowledge discovery, and spatial decision-support. The traditional computer science community has defined spatial datacubes and their fundamental components (e.g., spatial dimension and spatial measure) through formal models limiting spatial data as only those data that has a geometric representation. The geomatics community has pursued spatial datacube models with a much richer view of spatial data. However, the proposed models by the geomatics community have not yet been formalized using precise mathematical languages. This paper, for the first time, integrates the rigor of mathematical languages with the richer view of spatial data to provide a formal model and precise definitions of spatial datacubes and their fundamental components. The proposed definitions provide the scientific community with a common and precise terminology for the concepts involved in spatial decision-support databases. 1. Introduction Strategic decision makers (analysts, executives, and managers) need to analyze and compare sum- marized data extracted from very large volumes of data. Indeed, it is more efficient to use aggregated and consolidated data covering a certain period of time rather than detailed individual records of transactional databases for strategic decision mak- ing. The difficulty in supporting both daily transac- tions and decision-support needs within a single database requires using a dual-database approach. This forms the typical backbone of data warehous- es [Bédard and Han 2008]. A data warehouse is a subject-oriented, integrated, time varying, non- volatile collection of data that is used primarily in organisational decision making [Chaudhuri and Dayal 1997]. Data warehouses are typically mod- eled using the datacube (or multidimensional, in the sense of business intelligence) paradigm [Gray et al. 1997; Abelló et al. 2006]. In the datacube structure, analysis is performed along a combination of axes of analysis called dimensions (e.g., categories of products, administrative regions, periods), and hence the structure is termed multidimensional. Each dimension includes one or several hierarchies, each composed of different analysis levels (e.g., city- province-country hierarchy and city-county-region- country hierarchy which may compose a spatial dimension labelled “administrative regions”). The hierarchical structure allows users to view and ana- lyze data at different levels of detail. An instance of a level is a member (e.g., “Montreal” is a member of the level “city”). Measures (e.g., population) are measurable quantities; these are analyzed against the members of different levels of dimensions. For instance, one may be interested in analyzing the measure “population” with respect to different levels of “administrative regions” and “time” dimensions. GEOMATICA Vol. 64, No. 3, 2010 pp. 321 to 332 Mehrdad Salehi [email protected] Yvan Bédard Sonia Rivest Les cubes de données spatiales étendent le concept de cube de données sous-jacent au domaine de l’in- formatique décisionnelle aux domaines de l’analyse spatiale, de la découverte des connaissances géo- graphiques et du soutien aux décisions spatiales. La communauté traditionnelle de l’informatique a défini les cubes de données spatiales et leurs composantes fondamentales (p. ex., la dimension spatiale et la mesure spatiale) au moyen de modèles formels limitant les données spatiales seulement à celles pouvant avoir une représentation géométrique. La communauté de la géomatique a approfondi les modèles de cubes de données spatiales avec une vision beaucoup plus étoffée des données spatiales. Toutefois, les modèles proposés par la communauté de la géomatique n’ont pas encore été officialisés en utilisant des langages mathématiques précis. Le présent article intègre, pour la première fois, la rigueur des langages mathématiques à la vision plus étoffée des données spatiales afin de présenter un modèle formel et des définitions précises des cubes de données et de leurs composantes fondamentales. Les définitions proposées offrent à la communauté scientifique une terminologie commune et précise des concepts impliqués dans les bases de données qui appuient les décisions.

Upload: others

Post on 21-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

A FORMAL CONCEPTUAL MODELAND DEFINITIONAL FRAMEWORKFOR SPATIAL DATACUBES

Mehrdad Salehi, Yvan Bédard and Sonia RivestDépartement des sciences géomatiques, Université Laval, Québec

Spatial datacubes extend the datacube concept underlying the field of Business Intelligence (BI) into therealm of spatial analysis, geographic knowledge discovery, and spatial decision-support. The traditionalcomputer science community has defined spatial datacubes and their fundamental components (e.g., spatialdimension and spatial measure) through formal models limiting spatial data as only those data that has ageometric representation. The geomatics community has pursued spatial datacube models with a much richerview of spatial data. However, the proposed models by the geomatics community have not yet been formalizedusing precise mathematical languages. This paper, for the first time, integrates the rigor of mathematicallanguages with the richer view of spatial data to provide a formal model and precise definitions of spatialdatacubes and their fundamental components. The proposed definitions provide the scientific community witha common and precise terminology for the concepts involved in spatial decision-support databases.

1. Introduction

Strategic decision makers (analysts, executives,and managers) need to analyze and compare sum-marized data extracted from very large volumes ofdata. Indeed, it is more efficient to use aggregatedand consolidated data covering a certain period oftime rather than detailed individual records oftransactional databases for strategic decision mak-ing. The difficulty in supporting both daily transac-tions and decision-support needs within a singledatabase requires using a dual-database approach.This forms the typical backbone of data warehous-es [Bédard and Han 2008]. A data warehouse is asubject-oriented, integrated, time varying, non-volatile collection of data that is used primarily inorganisational decision making [Chaudhuri andDayal 1997]. Data warehouses are typically mod-eled using the datacube (or multidimensional, in thesense of business intelligence) paradigm [Gray et al.1997; Abelló et al. 2006]. In the datacube structure,

analysis is performed along a combination of axesof analysis called dimensions (e.g., categories ofproducts, administrative regions, periods), andhence the structure is termed multidimensional.Each dimension includes one or several hierarchies,each composed of different analysis levels (e.g., city-province-country hierarchy and city-county-region-country hierarchy which may compose a spatialdimension labelled “administrative regions”). Thehierarchical structure allows users to view and ana-lyze data at different levels of detail. An instance ofa level is a member (e.g., “Montreal” is a member ofthe level “city”). Measures (e.g., population) aremeasurable quantities; these are analyzed against themembers of different levels of dimensions. Forinstance, one may be interested in analyzing themeasure “population” with respect to different levelsof “administrative regions” and “time” dimensions.

GEOMATICA Vol. 64, No. 3, 2010 pp. 321 to 332

Mehrdad [email protected]

Yvan Bédard

Sonia Rivest

Les cubes de données spatiales étendent le concept de cube de données sous-jacent au domaine de l’in-formatique décisionnelle aux domaines de l’analyse spatiale, de la découverte des connaissances géo-graphiques et du soutien aux décisions spatiales. La communauté traditionnelle de l’informatique a définiles cubes de données spatiales et leurs composantes fondamentales (p. ex., la dimension spatiale et lamesure spatiale) au moyen de modèles formels limitant les données spatiales seulement à celles pouvantavoir une représentation géométrique. La communauté de la géomatique a approfondi les modèles de cubesde données spatiales avec une vision beaucoup plus étoffée des données spatiales. Toutefois, les modèlesproposés par la communauté de la géomatique n’ont pas encore été officialisés en utilisant des langagesmathématiques précis. Le présent article intègre, pour la première fois, la rigueur des langages mathématiquesà la vision plus étoffée des données spatiales afin de présenter un modèle formel et des définitions précises descubes de données et de leurs composantes fondamentales. Les définitions proposées offrent à la communautéscientifique une terminologie commune et précise des concepts impliqués dans les bases de données quiappuient les décisions.

Page 2: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

Values resulting from unique combinationsbetween members of different dimension levels,along with their measure, are known as facts (e.g.,“the number of sport articles sold in Montreal in thefirst quarter of 2006 is 1,800,000” is a fact). A dat-acube is composed of a number of facts. In order tospeed up query answering, a datacube usuallyincludes a number of precomputed facts. A toolcalled On-Line Analytical Processing (OLAP),which includes data exploration operators such asroll-up and drill-down, are used to interactivelyquery a datacube [Bédard et al. 2004].

It is estimated that about 80% of the data inenterprise databases have a spatial reference[Franklin 1992]. Such a reference is composed ofdiverse types such as civic addresses, names ofplaces, coordinates, etc. In order to derive the maxi-mum profit from the spatial data and the efficiencyof the datacube structure in decision making, thefirst definitions of spatial datacubes were proposedat the end of the 1990s. This took place in the geo-matics and computer science communities by pio-neering works at the universities of Simon Fraser(Jiawei Han’s team), Minnesota (Shashi Shekhar’steam), and Laval (Yvan Bédard’s team) [Bédard etal. 2007]. Spatial datacubes provide capabilities notinherent to transaction-oriented systems such asgeographic information systems (GIS) and spatialdatabase engines (universal servers) and aim atsupporting interactive complex analysis involvingspatial and temporal data.

The early investigations into spatial datacubesand spatial OLAP (SOLAP) by the geomatics com-munity were published in the Geomatica journal [seeRivest et al. 2001]. These characteristics were thenrefined by the geomatics community, and models forspatial datacubes were proposed [Rivest et al. 2005;Bédard and Han 2008]. These models consider spa-tial data as any data that is used to localize a phe-nomenon on the Earth (e.g., street addresses andgeographic coordinates) regardless of the representa-tion method (i.e., by geometries or by text). Thisview of spatial data is consistent with internationalstandards in geomatics such as ISO/TC211 (2003).The definition given by these authors to spatialdimensions is not limited to dimensions whose lev-els’ members have geometric representations.However, these models do not provide explicitlyprecise definitions for all components of spatial dat-acubes. For instance, they do not give precise defini-tions for the spatial fact and spatial datacube.Moreover, the model and concepts defined by thegeomatics community have not yet been supportedby formal definitions, i.e., they are not defined usinga precise and unambiguous mathematical language.

In the middle of the present decade, a number offormal models for spatial datacubes were also pro-posed by the computer science community (see forexample Damiani and Spaccapietra [2006] andBimonte [2007]). Although the formalisation ofthese models is valuable, they do not provide com-mon definitions for every fundamental componentof the spatial datacube, such as the spatial dimen-sion, spatial measure, and spatial fact. In particular,the suggested formal definitions are based on arestrictive perspective that considers spatial data asonly those data that have a geometric representation.Based on such a restricted view of spatial data,which is not in sync with ISO and OGC internation-al standards, the subsequent definitions given to thefundamental components of spatial datacubes doesnot reveal the entire power of spatial data withinthese datacube components. For instance, the dimen-sion “administrative regions”, with the three levels“city”, “province”, and “country” whose members(e.g., Montreal, Quebec, and Canada, respectively)do not have a geometric representation, would not beconsidered a spatial dimension. Although membersof this dimension may not have a geometrical repre-sentation, they can still be used to refer to geograph-ic locations and to locate a phenomenon in space(e.g., population of the city Montreal in 2008).Therefore, this dimension is intrinsically spatial.Likewise, measures such as “road length” and“region area” are not considered as spatial measures.While not having a geometric representation, thesemeasures convey a spatial property of features thatcan be used to make thematic maps and perform spa-tial analysis, (such as number of kilometers of roadsper city, county, and province). Hence these meas-ures are inherently spatial. As is clearly demonstrat-ed by Caron [1998], OLAP has powerful potentialfor spatiotemporal analysis even if spatial data arenot represented geometrically.

The above discussions illustrate the need foran enhanced model for spatial datacubes that, forthe first time, integrates the rigor of formal modelswith the richer notion of spatial data, to provide aformal model and precise definitions for the funda-mental components of spatial datacubes such as thespatial dimension, spatial measure, and spatial fact.

In this paper, we make the following contributions:

1. We review and analyze the existing key mod-els for spatial datacubes by especially focusingon the definitions suggested by these modelsfor the fundamental components of spatial dat-acubes (Section 2).

2. We present a formal model for spatial dat-acubes at the conceptual level with a primary

322

Page 3: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

focus on defining a “spatial datacube” and itscomponents. In order to achieve this, afterdefining the general components of a datacube(i.e., attribute, level, dimension, measure,hyper-cell, and datacube), the “spatial” equiva-lent for these components are defined at boththe schema and instance levels (Section 3). Thedefinitions of the “spatial” components arebased on a broader perspective for the spatialdata aligned with the international standards forgeomatics, i.e., any data that provides a meansto localize a phenomenon. Finally, we discussa number of characteristics of the proposedspatial datacube model (Section 4).

2. State-of-the-Art onSpatial Datacube Modeling

Bédard [1997], Bédard et al. [2001], Rivest etal. [2005], Bédard et al. [2007], and Bédard and Han[2008] describe the structure of spatial datacubemodels and define and categorise spatial dimensionsand spatial measures. These definitions rely on a richperspective for spatial data that involve any dataused to localize phenomena on the Earth (e.g., placenames, roads addresses, and coordinates).Consequently, the definition given by these authorsto spatial dimensions is not limited to dimensionswhose members have geometric representations.Rivest et al. [2005] categorize spatial dimensionsinto three types: non-geometric, geometric, andmixed. In a non-geometric spatial dimension, thespatial reference is nominal (e.g., place names), andno geometric representation is associated with themembers of this dimension. The other two types ofspatial dimensions include geometric data on a map,and allow the members of the dimension to be visu-alized and queried graphically. In a geometric spatialdimension, all the members of all the levels havegeometric representations, while in a mixed spatialdimension, some members have no geometric repre-sentation. Similarly, two types of spatial measuresare recognized by the geomatics community Rivestet al. [2005]. A geometric spatial measure is definedas the set of geometries representing spatial objects,such as “accident locations”. Numeric spatial meas-ures such as “distance” and “area” are numeric val-ues that are the results of using spatial operators.Recently, a third type of spatial measure, called acomplete spatial measure, was introduced by Bédardand Han [2008]. This type of spatial measure isspecific to raster datacubes, and is a combination ofa numeric and a geometric spatial measure. It

encompasses, for example, pairs consisting of araster cell position and its associated value. Rivest et al.[2005] introduce different SOLAP navigation operators,such as spatial drill-down, spatial roll-up and spatialdrill-across. These operators have been implemented onJMap SOLAP, the first commercialized SOLAPsoftware product [KHEOPS Technologies 2005]. Weshould add that the above research works do not provideprecise definitions for some other fundamental compo-nents of spatial datacubes, such as the spatial fact.Moreover, the model and concepts defined by the geo-matics community have not yet been presented using anunambiguous and precise mathematical language.

Han et al. [1998] introduce a model for imple-menting spatial datacube applications. This modelrelies on the well-known implementation models ofdatacubes, i.e., star/snowflake schemas, and consistsof dimensions and measures. In this model, threetypes of dimensions are recognized. Non-spatialdimensions do not include geometric data. In spatial-to-spatial dimensions, all levels have geometric dataassociated with their members. When the lower-levels of a dimension include geometric data but thelevels above a certain level do not, the dimension issaid to be spatial-to-nonspatial. According to thesedefinitions, we can deduce that the spatiality of adimension depends on having at least one geometricmember. Therefore, a dimension that does notinvolve a geometric representation but addresses aspatial phenomenon (e.g., by names of cities,provinces, and countries) is not considered a “spatial”dimension in this model. In addition, Han et al.[1998] distinguish two types of measures for spatialdatacubes: numerical measures and spatial measures.A numerical measure contains only numerical data.In order to be qualified as spatial, a measure shouldcontain one or a collection of pointers to geometries.We should note that the categorization of measures,based on the storage format rather than the nature ofthe data, is at the implementation level not the con-ceptual level. Accordingly, new definitions arerequired to extend the definition of spatial dimen-sions and to provide a categorization and a definitionfor spatial measures at the conceptual level.

In order to extend the concept of datacube tothe spatial domain, Shekhar et al. [2001] intro-duced the “map cube” operator. This operatoraccepts a base map and a table associated with themap and generates a set of maps for analysis andcomparison. The output maps are produced usingOLAP operations on hierarchies and measures. Theauthors also propose a formal classification forgeometric aggregation functions. Within thisresearch, no definitions for specific components ofspatial datacubes such as spatial dimension, spatialmeasure, and spatial fact have been provided.

323

Page 4: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

At the conceptual level, Jensen et al. [2004]introduce a spatial datacube model for use in loca-tion-based services. This model is an n-dimensionalfact schema consisting of a fact type and a set ofdimension types. A dimension type includes a set ofcategory types (or levels) and a partial and a totalcontainment relationship between category types.This model has two distinctive characteristics com-pared with other proposed models: (1) it accepts par-tial containment relationships between two geomet-ric levels of a hierarchy, and (2) it handles the impre-cision of aggregation paths. An algebra is also pro-posed to serve as a basis for the model’s query lan-guage. In this work, however, no definition is givenfor the specific components of spatial datacubes.

Another framework for implementing spatialdatacubes based on the star schema, called“GeoDWFrame”, is proposed by Fidalgo et al.[2004]. They classify and informally define differenttypes of spatial dimensions in terms of the approachthat is used to implement the dimensions, i.e., thetechnique used to normalize and to store the geo-metric and descriptive data. As stated by the authors,the principal idea underlying this classification is toreduce geometric data redundancy in implementingspatial datacubes. Obviously, this classification isnot suitable for distinguishing spatial dimensions atthe conceptual level.

To allow the modelling of spatial measures atmultiple levels of geometric granularity, Damianiand Spaccapietra [2006] introduce a formal modelat the conceptual level, called “Multi-granularSpatial Data warehouse” (MuSD). For navigating inMuSD, this model is integrated with an algebra,which includes a number of spatial and non-spatialoperators. In this mode, like the spatial-to-spatialdimension defined by Han et al. [1998], a spatialhierarchy is presented as a hierarchy where all thelevels have a geometrical representation. A spatialmeasure is considered, like a spatial dimension, as ahierarchy of levels with a geometric representation.As a result, we conclude that labelling a hierarchy, adimension, and a measure as “spatial” requires themto have a geometric representation. Further, a spatialfact is defined as “a fact describing an event thatoccurred on the Earth in a position that is relevant toknow and analyze”. In addition, this model explic-itly defines a spatial datacube. The necessary con-dition for a datacube to be labelled as spatial is tohave at least one measure with a geometric repre-sentation. However, we suggest that a spatial dat-acube whose only dimensions have geometric rep-resentations can be considered a spatial datacube.The reason being that even when the dimensionshave geometric representations, the user can still

interactively explore and visualize the maps of thedimensions provided in the datacube.

At the conceptual level, Bimonte [2007] pres-ents a formal model, called “GeoCube”, with analgebra that supports spatial data within datacubes.The formal representation of GeoCube’s generalcomponents and operations is valuable, and isexplained in detail through various examples. In thismodel, a geographic entity is considered as an entitywith a geometric attribute. According to the interna-tional standards in geomatics, however, the defini-tion of a geographic (or spatial) entity is not limitedto entities with a geometric representation, but canalso include non-geometric attributes. Based on thisperspective, the definition given by Bimonte [2007]to a geographic dimension, i.e., a dimension whosemembers include geographic entities, is limited. Healso introduces three types of hierarchies, i.e.,descriptive, spatial, and generalization. A descriptivehierarchy is defined by descriptive attributes ofobjects. A spatial hierarchy is defined as a hierarchywhose levels are related by the topological relation-ships of inclusion and intersection. We should notethat even with a geometric representation, the levelsof a spatial hierarchy can be related based onsemantic rather than topological relationships. Forexample, consider the spatial hierarchy “financialinstitution,” with the two levels “branch” and“headquarter”. While the members of these twolevels have geometric representations on the mapand are very likely disjoint, a branch is semanticallyassociated to its headquarters. Finally, Bimontedefines a generalization hierarchy as a hierarchywhere the members of different levels represent thesame geographic information at different scales. Hedefines a geographic (or spatial) measure, in a sim-ilarly limited way, as an object with a geometricattribute. In this model, no definition is given forspatial facts.

In order to represent conceptual models of spa-tial datacubes visually, Malinowski and Zimányi[2008] introduce “MultiDim,” a spatially extendedentity-relationship model. Based on this work, a con-ceptual model is created in terms of dimensions andthe relationships between levels of dimensions (i.e.,entities), which is modeled by a fact relationship.While the fact relationship includes measures, thedimensions consist of a number of hierarchies of lev-els. The authors define the concepts spatial level,hierarchy, dimension, and measure. According totheir definitions, a spatial level includes at least oneattribute with a geometric representation. Theyrequire that a spatial hierarchy include at least onespatial level and that a spatial dimension include atleast one spatial hierarchy. Although Malinowski

324

Page 5: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

and Zimányi [2008] define the spatiality of a level ashaving a level with a geometric attribute, on the onehand, we notice that their definitions of the spatialhierarchy and the spatial dimension are differentfrom the definition suggested by Han et al. [1998]and Damiani and Spaccapietra [2006]. On the otherhand, we can recognize that Malinowski andZimányi [2008], like Han et al. [1998], consider aspatial measure as a measure represented geomet-rically. Although in a previous paper, the authorsconsidered a measure that holds a numeric valuecalculated using metric or topological operators asa spatial measure [Malinowski and Zimányi 2004],in this recent work, a measure calculated using spa-tial operators such as “road length” is considered aconventional measure. In summary, we can concludethat based on this model, in order to label a level, ahierarchy, a dimension, or a measure as “spatial”these components should involve geometric data.

In this section, we reviewed two categories ofmodels for spatial datacubes. The first category ofthese models, principally proposed by the geomaticscommunity, has considered a richer and more com-prehensive view of spatial data and subsequently hasdefined some of the components of spatial dat-acubes. However, the provided definitions by thesemodels have not yet been formally defined. Thesecond category includes a number of formal mod-els for spatial datacubes proposed by the computerscience community traditionally dealing with thefrequent but simplest cases of spatial data. Thesemodels consider a limited notion of spatial data,restricted to features (members) with geometricrepresentations, and typically not involving othertypes of spatial reference. These latter models ignorea huge amount of data that are inherently spatial butare not represented geometrically to be consideredspatial. Based on such limited assumptions, the def-initions given to the components of spatial datacubes(e.g., spatial dimension, spatial measure, and spatialfact) do not correctly convey the power of spatialdata integrated with these components. Among allthe proposed models, we did not find a common andformal definition for all fundamental components ofthe spatial datacubes.

3. A Model for SpatialDatacubes at theConceptual Level

In this section, we will define a model for spa-tial datacubes at the conceptual level that includesformal definitions for “spatial” datacubes and their

various constituents. These definitions are based ona broader view for spatial data that is consistent withthe international standards in geomatics. The pro-posed model explicitly distinguishes between theschema (i.e., the intentional representation), whichdefines the structure of a datacube element, and theinstance (i.e., the extensional representation), whichis the value associated to a constituent. This sectionis organized so that after defining an element of theschema (i.e., level, dimension schema, measure,hyper-cell, and datacube schema), the definition ofits instance (i.e., member, dimension instance,measure value, fact, and datacube) is provided.

The definitions of the elements of the model arefollowed by a number of examples. For this purpose,we consider a running example: the spatial datacube“fire disaster” for analyzing fire losses and injuriesfor different classes of fire in different administrativeregions of Canada and the USA and at differentepochs. The “fire disaster” datacube consists of threedimensions and three measures. The first dimension,“administrative regions”, has the following levels:“city”, “county”, “province/territory” (in Canada),“state” (in the USA), “country”, and “all”. The sec-ond dimension, called “time”, includes three levels:“day”, “month”, and “year”. “Fire class” is the thirddimension and includes two levels: “fire class” and“all fire classes”. The levels of the three dimensionshave a number of attributes. For example, the level“city” of “administrative regions” dimension has theattribute “location”. The three measures of this dat-acube are “fire zone,” which geometrically repre-sents the location of fire zones, “surface of destroyedresidential area,” which expresses the area in km2 ofthe residential zones that were destroyed by the fire,and “number of injuries,” which states the number ofpeople injured by the fire.

Definition 1: In order to describe a level, we needto define the level’s attributes. The level attribute ai

(attribute, for short) is defined by the triple ai = (typenature domain) where:

• type is the data type associated to the attributeai.

• nature refers to the spatial, temporal, or the-matic nature of the attribute ai.

• domain is the domain of attribute’s values.

The type of an attribute can be numeric (e.g.,real and integer), textual, date (e.g., instant andinterval), or geometric (e.g., point, line, polygon,and a set of these geometries). The nature of anattribute indicates whether that attribute describes aphenomenon in space, in time, or in a theme. Itsnature, i.e., “what” the attribute represents, such as

325

Page 6: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

spatial data about the location of a shopping center, isindependent of the type, i.e., “how” the attribute rep-resents, for instance by geometries on the map or bytextual address. The independence of the nature of anattribute from its representation method is necessaryin order to describe an attribute appropriately at theconceptual level. Accordingly, we distinguish threecategories of attributes by referring to their natures,i.e., spatial, temporal, and thematic attributes.Temporal attributes convey temporal information ona phenomenon like “age”. Non-spatial and non-tem-poral information is described by thematic attributessuch as “price”. Before describing spatial attributes,we explain spatial references and their categoriesaccording to the international standards in geomatics.

Spatial references are used to localize spatialfeatures in the geographic space, and are dividedinto two categories: direct and indirect. Direct spa-tial referencing is achieved by means of geometriesembedded in a coordinate system [ISO/TC2112004]. However, spatial references are not limitedto geometric coordinates. Indeed, indirect spatialreferences go beyond geometries, and use spatialidentifiers such as place names, distances, andpostal codes for spatial referencing [ISO/TC2112003]. For example, a place name such as“Montreal” refers to the geographic location of thecity of Montreal; it can be used alone to find thisplace on the Earth or it can be linked to geographiccoordinates in a gazetteer to position this place ona map. A distance can be used to localize a phe-nomenon with respect to a linear reference system,such as a distance or a civic address number alonga street. A postal code refers to a geographic regionthat is defined by address blocks or by a place namelike a municipality name, allowing one to find it onthe Earth. Multiple direct and indirect spatial refer-ences can refer to the same place in the real world,and these references are convertible. For instance,using Google Maps, one can enter a place name,such as “Montreal”, into the gazetteer and get theplace’s geometric representation as a map on thescreen. International standards in geomatics arecurrently pursuing further investigation to establisha conversion methodology among various spatialreferences [ISO/TC211 2008].

Definition 1.1: Inspired by the above perspectivefor spatial references, we consider spatial attributes(atspatial), beyond only the attributes with a geomet-ric representation. A spatial attribute is any attributethat describes spatial properties of phenomenaoccurring in geographic space. Examples of theseproperties include location (e.g., geographic coor-dinates, address, postal code), shape (e.g., a poly-gon representing the extent of a city), direction

(e.g., direction of a highway), length (e.g., roadlength), and area (e.g., area of a house). In order tobe consistent with the international standards ingeomatics, we adopt the same strategy they use forcategorizing spatial attributes.

Definition 1.2: A geometric spatial attribute (atgeo)is a spatial attribute that is represented by a geom-etry. More precisely, the “type” of a geometric spa-tial attribute in Definition 1 is geometric. A geo-metric spatial attribute is typically used to representa direct spatial reference. For instance, a point canrepresent the position of a feature with a location inspace without extent, such as the position of a hotelon a small-scale map. A line can describe the posi-tion of a linear feature like a road or river. The posi-tions of two-dimensional features are representedby polygons, such as the extent of a forest stand ora city on a medium-scale map. More complexgeometries can also be used, such as the aggrega-tion of a set of lines and a set of polygons, to rep-resent features like hydrological networks. ISO andOGC explicitly support such complex geometriesas well as spatial database modeling methods, suchas Perceptory [Bédard et al. 2004; Bédard andLarrivée 2008] and MADS [Parent et al. 2006].

Definition 1.3: A non-geometric spatial attribute(atnon-geo) is a spatial attribute that is represented bydata types other than the geometric type, such astextual or numeric types. A non-geometric spatialattribute can describe indirect spatial referencessuch as place names and addresses, or other spatialproperties of features like the length of a road or thearea of a house. Non-geometric attributes conveyspatial information that can be used for mappingand spatial analysis using a gazetteer, geocoding, orlinear referencing, among other methods.

Example 1: Referring to Definition 1, the followingattributes for the levels of different dimensions ofthe “fire disaster” datacube are defined:

• location = (geometric: polygon and point,spatial, polygons and points in a plane)

• name = (textual, spatial, {‘Gatineau’,‘Montreal’, ‘Austin’, ‘Quebec’, ‘Texas’,‘Canada’, ‘North America’, …})

• date = (date: instant, temporal, {02-01-2006,07-2007, 2008, … })

• type = (textual, thematic, {A, B, C, D, E,all_fire_classes})

“Location” and “name” are, respectively, geometricand non-geometric spatial attributes that will beused to describe the levels of the “administrative

326

Page 7: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

regions” dimension, such as “city”. “Date” is atemporal attribute that describes the levels of the“time” dimension (e.g., “day”). Finally, the attrib-ute “type” is a thematic attribute that is used todescribe the levels of the “fire class” dimension,such as the “fire class” level.

Definition 2: A level defines the granularity ofanalysis along a dimension, and is described by l ={a1,..., an}, where l is the name of the level and {a1,...,an} is its set of attributes. Among a level’s attributes,there is at least one distinguished identifier.

In the following formal definitions in first-order logic, connectives are denoted by ∨ (logical

tion), and ¬ (logical not). The symbols ∀ and ∃ arethe universal and existential quantifiers. Unarypredicates are expressed in the form p(x), statingthat “x is a p”, and the symbol ∈ stands for setmembership.

Definition 2.1: Let l be a level, a spatial level (lspa-

tial) is a member of the following set:

(1)

Example 2: Now, we define the levels of the “firedisaster” datacube. The identifiers are highlightedin italic.

Non-spatial levels are defined as follows:

• day = {date}• month = {date}• year = {date}• fire class = {type}• all fire classes = {type}

Both the attributes “name” and “location”, asdefined in Example 1, are spatial attributes.Referring to the Definition 2.1, the following levelsare spatial:

• city = {name, location}• county = {name, location}• province/territory = {name, location}• state = {name, location}• country = {name, location}• all = {name, location}

Definition 3: An instance of a level is a member ofthat level. Instantiation is achieved by assigningvalues to a subset of the level’s attributes. Sinceidentifiers are used to uniquely identify the mem-bers of a level, they should have a unique existingvalue for each member of the level.

Formally, a member m of a level is defined bythe triple m = ( ATm,V ,:). In this equation, ATm ={a1,..., ak} is the set of the member’s attributes, andincludes a subset of the level’s attributes. V ={v1,..., vk} is the set of values of the domain of theattributes ATm, and “:” is a function from elementsin ATm to elements in V.

Definition 3.1: Let m = (ATm,V ,:) be a member ofa spatial level. A geometric member (mgeo) isdefined below:

Definition 3.2: Similarly, a non-geometric member(mnon -geo ) of a spatial level is formally defined as:

Example 3: A number of members for the levels ofthe “fire disaster” datacube, defined in Example 2,are presented below. These members will be usedto define the other components of this datacube.

“city”: geometric member (name: Austin, location:ct_loc) and the non-geometric members(name: Gatineau), …

“county”: non-geometric member (name: Hull),geometric member (name: Travis, location:cnt_loc), …

“province/territory”: geometric members (name:Quebec, location: p_loc), …

“state”: geometric members (name: Texas, loca-tion: s_loc), …

“country”: geometric members (name: Canada,location: c_loc), …

“all”: geometric member (name: North America,location: NA_loc)

“day”: members (date: 01-01-2006), …“month”: members (date: 01-2006), …“year”: members (date: 2006), … “fire class”: members (type: A), …, (type: E)“all fire classes”: member (type: all_fire_classes)

For the sake of simplicity, we recognize mem-bers by the values given to their identifiers. Forexample, the member (name: Gatineau, location:ct_loc) is recognized as Gatineau, and the member(date: 01-2006) is referred to as 01-2006. In theabove members, ct_loc, cnt_loc, p_loc, s_loc, c_loc,and NA_loc are, respectively, the polygons repre-senting the location of members Austin, Travis,Quebec, Texas, Canada, and North America.

327

inclusive or ,∧ logical and , ⇒ logical implica−

l spatial = l ∃ ai ai ∈ l ∧ at spatial ai

mgeo = ATm, V ,: ∃ai ai ∈ ATm ∧ atgeo ai

mnon−geo=ATm,V ,: ∃ai ai∈ATm∧at non −geo ai ∧¬∃a j a j∈ATm∧atgeo a j

(2)

(3)

Page 8: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

Definition 4: A dimension schema (dimension,hereafter) describes an axis of analysis or a themeof interest for a user, under which the data analysisis to be performed [Rafanelli 2003]. A dimension dincludes a number of related levels. These levelsare ordered from detailed to general, and form ahierarchy of abstraction levels. Formally, a dimen-sion is defined as a pair d = (Ld, <), which forms alattice on levels Ld = {l1 l2 ,..., ln}. The set ld hastwo distinct levels, which are the lower-bound(leaf) and upper-bound (root, typically named all)of the lattice (dimension), and < is a partial-order(roll-up) relation, on the levels in Ld. For two levelsl1,l2 of a dimension, if l1 < l2, we say that l1 (thelower-level) rolls-up to l2 (the higher-level), and l1and l2 are two consecutive levels of the dimension.

Definition 4.1: Let d = (Ld, <) be a dimension. Aspatial dimension (dspatial) is defined as follows:

dspatial = { (Ld, <) | ∀ l (l ∈ Ld ⇒ lspatial (l))} (4)

Like a spatial attribute, a spatial dimension isoften incompletely described as a dimension whoselevels involve geometric attributes. But as we stat-ed earlier, spatial attributes are more than simplyattributes with a geometric representation. Hence,spatial dimensions may include spatial levelswhose attributes are non-geometric.

Example 4: In the following, we define the threedimensions “administrative regions”, “time”, and“fire class” of the “fire disaster” datacube using thelevels defined in Example 2.

• administrative regions = (city < county, coun-ty < province/territory (in Canada), county <state (in the USA), province/territory < coun-try (in Canada), state < country (in the USA),country < all)

• time = (day < month, month < year)• fire class = (fire class < all fire classes)

As defined in Example 2, all the levels thatappear in the “administrative regions” dimensionare spatial. Consequently, referring to Definition4.1, “administrative regions” is a spatial dimension.The dimension “fire class” is a non-spatial dimen-sion. The graphic representation of the dimension“administrative regions” is shown in Figure 1 by adirected acyclic graph where the arrows show theorder between the levels.

In some cases, a dimension can include sever-al hierarchies h1 = (Lh1

, <), h2 = (Lh2, <), …, where

each hierarchy represents an analytic perspective

within the dimension. For example, the dimension“administrative regions” includes two hierarchies:“Canadian division” = (city < county, county <province/territory, province/territory < country,country < all) and “USA division” = (city < county,county < state, state < country, country < all). Thesetwo hierarchies represent administrative divisionswithin two countries, Canada and the USA. Thegraphic representation of these two hierarchies isrepresented in Figure 2.

We can see from the structure of these twohierarchies that the end-user needs to distinguishbetween provinces/territories and states, becausethey are not considered equivalent for the purposesof the end-user’s analysis. However, for cities,counties, and countries, they are considered to bethe same in this example.

328

Figure 1: The graphic representation for the spatialdimension “administrative regions”.

Figure 2: The graphic representation for two hierar-chies: (a) “Canadian division” and (b) “USA division”.

Page 9: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

Definition 5: Like Bimonte [2007], we define adimension instance. An instance (di) for a dimen-sion d = (Ld, <) is a pair (L,≤) (where L’ = {m1,...,mn } is a set of members for levels in Ld, and ≤ is anorder (or roll-up) relation between these members),such that if mi and mj are respectively members ofthe two levels li and lj in Ld and li < lj , the follow

Instances of spatial dimensions are of three types:non-geometric, geometric, and mixed.

Definition 5.1: Let di = (L, ≤) be an instance of aspatial dimension. A geometric dimension instance(digeo) is a member of the following set:

digeo = { (L, ≤) | ∀ m (m ∈ L ⇒ mgeo (m)) } (5)

Definition 5.2: A non-geometric dimensionsinstance is defined as:

dinon-geo = { (L, ≤) | ∀m (m ∈ L ⇒ mnon-geo (m)) } (6)

Definition 5.3: Finally, a mixed dimensionsinstance is a member of the following set:

(7)

Example 5: Referring to the members defined inExample 3, an instance of the spatial dimension“administrative regions” is (Gatineau ≤ Hull, Hull≤ Quebec, Quebec ≤ Canada, …, Austin ≤ Travis,Travis ≤ Texas, Texas ≤ USA, …, Canada ≤ NorthAmerica, USA ≤ North America). Since this dimen-sion instance involves both geometric members,such as Austin and USA, and non-geometric mem-bers, like Gatineau and Hull, it is a mixed dimen-sion instance. The graphic representation of thismixed instance is shown in Figure 3. In this figure,Gatineau and Hull are respectively non-geometricmembers of the levels “city” and “county” and arerepresented by their names. The geometric mem-bers are shown by their geometries.

Definition 6: A measure is an attribute that is ana-lyzed against different levels of the dimensions.Accordingly, a spatial measure (measurespatial) is aspatial attribute. Two types of spatial measures fora vector spatial datacube are recognized: numericand geometric [Rivest et al. 2005].

Definition 6.1: A numeric spatial measure (mea-surespatial-numeric) is a non-geometric spatial attribute.

Definition 6.2: A spatial measure that is represent-ed by a geometric spatial attribute is a geometricspatial measure (measurespatial-geometric). A geomet-ric spatial measure can be computed, for instance,using topological operators (overlap) on membersof different dimension levels or can be an inde-pendent geometry, such as the location of fires.

Example 6: In the “fire disaster” datacube, the“number of injuries” is a non-spatial measure. Thismeasure is described by the thematic attribute“number of injuries = (numeric, thematic, naturalnumbers)”. The measure “surface of destroyed res-idential area” is a non-geometric spatial measure,and is described by the attribute “surface ofdestroyed residential area = (numeric, spatial, pos-itive real numbers)” expressing the surface of resi-dences that are destroyed by fire disasters. Finally,“fire zone” is a geometric spatial measure as it isdescribed by a geometric spatial attribute “fire zone= (geometric: polygon, spatial, set of polygons rep-resenting the location of fires)”. The “fire zone”measure represents the location of fires geometri-cally as polygons on the map.

Definition 7: A datacube schema (dcs) is the triple(Ddcs, MSdcs, HCdcs) where:

• Ddcs is a finite set of dimensions,• MSdcs is a finite set of measures, • and HCdcs is a finite set of hyper-cells (or

cuboids [Han and Kamber 2006]) as definedbelow.

A hyper-cell (hc) consists of a pair (L,MSdcs),where L is a finite set of dimension levels. The set

329

Figure 3: A graphic representation of a mixedinstance for the spatial dimension “administrativeregions”.

ing condition is met: m ∈ L ∧ m≤m j ⇒m j∈L.

dimixed = L, ≤ ∃m1,m2 m1 ∈ L ∧ mgeo m1

∧ m2 ∈ L ∧ mnon−geo m2 }

Page 10: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

L includes exactly one level from every dimensionin Ddcs . One should note that we have chosen to usethe term “hyper-cell” instead of “hypercube (dat-acube)” (which is common in the literature), for sucha cell. From a user’s point of view, there is only onedatacube model for an application. This datacubemodel embraces all the dimensions, all the measures,and all the possible hyper-cells. Indeed, we considera hyper-cell as containing only a set of levels andmeasures. A hyper-cell describes a model for a num-ber of facts, as we will define later. Analytically, thenumber of possible hyper-cells for a datacubeschema is expressed by the product of the numbersof different dimension levels.

Definition 7.1: Let hc = (L , MSdcs) be a hyper-cell.A spatial hyper-cell (hcspatial) is defined as follows:

(8)

Example 7: The schema for the datacube “fire dis-aster” is defined as: • The set of dimensions: Dfire accident = {adminis-

trative regions, time, fire class} • The set of measures: MS fire accident = {number of

injuries, surface of destroyed residential area,fire zone}

• Hyper-cells: The number of hyper-cells for thedatacube schema “fire disaster” is 36, which isthe result of 6 (number of “administrativeregions” dimension levels) multiplied by 3(number of “time” dimension levels) multi-plied by 2 (number of “fire class” dimensionlevels). Because of the large number of thesehyper-cells, we do not define all of them forthis example; instead, we define two hyper-cells: ({city, month, fire class}, {number ofinjuries, fire zone, surface of destroyed resi-

dential area}) and ({country, day, fire class},{number of injuries, fire zone, surface ofdestroyed residential area}). The former hyper-cell is graphically represented in Figure 4. The hyper-cell in Figure 4 includes the spatial

level “city” and two spatial measures “fire zone”and “surface of destroyed residential area”.Referring to Definition 7.1, this hyper-cell is spa-tial. Such a hyper-cell defines a model for a num-ber of facts. These facts are used to answer toqueries such as: “What is the number of injuries offire of class ‘A’ in the city of Montreal in July2006?” or “Where are the fire zones of class ‘B’ inthe city of Toronto in January 2007?”

Definition 8: A datacube (dc) is an instance for adatacube schema dcs = (Ddcs, MSdcs, HCdcs), andconsists of a pair (DI, F), where

• DI is a set of instances for dimensions inDdcs. In DI there is exactly one instancefor every dimension in Ddcs.

• F is a set of facts defined over dimensioninstances DI. A fact describes an event ofinterest for a decision-making processwithin an enterprise, and is an instance ofa given hyper-cell hc in HCdcs. Therefore,a fact f is defined by a pair (M,V), whereM is a finite set of members of dimensioninstances in DI (exactly one member fromeach dimension instance in DI), and V is afinite set of measure values for measuresin MSdcs. These measure values are calcu-lated with respect to the members of M.

Definition 8.1: As we mentioned earlier, a fact canbe modeled by a hyper-cell. A spatial fact (fspatial) isan instance of a spatial hyper-cell and describes anevent of interest for a decision-making process thathappened in the space. A spatial fact can be of oneof two types, geometric and non-geometric.

Definition 8.2: Let f = M,V) be a spatial fact, a geo-metric fact (fgeo) is defined as follows:

(9)

Definition 8.3: If a spatial fact is not geometric, it isa non-geometric fact ( fnon-geo).

Definition 8.4: A spatial datacube stores spatiallyreferenced facts. However, to be recognized as “spa-tial” by the IT community, the datacube must alsosupply a cartographic representation where the usercan exploit the provided maps in a significant way.

330

Figure 4: A graphic representation for the spatial hyper-cell ({city, month,fire class}, {number of injuries, fire zone, surface of destroyed residentialarea}). The levels are shown as three faces of the cell while the measures areinside the cell.

hcspatial = L, MSdcs ∃l l ∈ L ∧ l spatial l

∨ ∃ms msØMSdcs ∧ measurespatial ms }

fgeo = M , V ∃m m ∈ M ∧ mgeo m

∨ ∃v v∈ V ∧ geometry v }

Page 11: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

Thus, as in spatial databases, the ability to producecartographic outputs and manipulations is a centralcriterion to determine whether to label a datacubewith the term “spatial”. Such cartographic datanavigation capabilities are typically enabled whenthe datacube has a geometric measure (or a geomet-ric) or mixed dimension instances. Here, one shouldnote that a datacube including only a non-geometricdimension instance (e.g., names of cities, provincesand countries) or numeric spatial measures is nottypically considered a spatial datacube. However,this does not remove the spatial characteristics ofnon-geometric instances of spatial dimensions andspatial facts as well as numeric spatial measures.Based on the above discussion, we define a spatialdatacube, (dcspatial) as a datacube dc = (DI F),where among the facts in F, there is at least onegeometric fact:

(10)

Example 8: The datacube “fire disaster” is defined as: Three dimension instances: • An instance for the “administrative regions”

dimension as defined in Example 5: {(Gatineau≤ Hull, Hull ≤ Quebec, Quebec ≤ Canada, …,Austin ≤ Travis, Travis ≤ Texas, Texas ≤ USA,… Canada ≤ North America, USA ≤ NorthAmerica)}

• An instance for the “time” dimension: {(01-01-2006 ≤ 01-2006, 01-2006 ≤ 2006, …, 01-01-2007 ≤ 01-2007, 02-01-2007 ≤ 01-2007, 01-2007 ≤ 2007, …, 31-12-2007 ≤ 12-2007, 12-2007 ≤ 2007)}

• An instance for the “fire class” dimension: {(A≤ all_fire_classes, B ≤ all_fire_classes, C ≤

all_fire_classes, D ≤ all_fire_classes, E ≤all_fire_classes)}.

Based on the above dimension instances, thefacts for the spatial hyper-cell ({city, month, fireclass}, {number of injuries, fire zone, surface ofdestroyed residential area}) are presented in Table1. All these facts are geometric, because theyinclude geometries representing fire zones (each pi

represents the location of a fire zone). Referring tothe Definition 8.4, the “fire disaster” datacube is aspatial datacube. The facts in Table 1 represent thenumber of injuries, the fire zones, and the surfacesof destroyed residential area in different cities, dif-ferent months, for different fire classes. As statedearlier, the number of hyper-cells for a datacubeschema can be large. Furthermore, it is possible tohave several thousand facts for each hyper-cell,leading to a very large numbers of facts. In thisexample, however, we presented a limited numberof illustrative facts.

4. Characteristics ofthe Proposed SpatialDatacube Model 4.1. General Characteristics

The proposed model has the necessary and fun-damental features that any datacube model shouldinclude. These features are the following [Blaschka etal. 1998; Pedersen 2000; Torlone 2003]:

• Separation between structure and content: Thisis a fundamental feature of any database model.The proposed model makes a distinction

331

City Month Fire Class Number of injuries Fire zone Surface of destroyed residential area

Montreal 01- 2006 A 14 P1 12430

Gatineau 01-2006 B 3 P2 1100

Sherbrooke 02-2006 C 0 P3 125

… … … … … …

Austin 01- 2006 B 18 Pk 8700

Houston 02- 2006 A 14 Pk + 1 5400

… … … … … …

Table 1: The representation of a number of facts for the hyper-cell ({city, month, fire class}, {number of injuries, fire zone, surface ofdestroyed residential area}).

dcspatial = DI, F ∃ f f ∈ F ∧ fgeo f

Page 12: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

between the schema, which represents the struc-ture of data (e.g., the level, dimension schema,hyper-cell, and datacube schema), and theinstances, which are the data contents (e.g., themember, dimension instance, fact, datacube).

• Explicit notion of dimension and datacube: Theproposed model formally defines the differentcomponents of a datacube, (i.e., the level, mem-ber, dimension, measure, fact, and datacube).

• Explicit multiple hierarchies in dimensions: Asdefined, a dimension can include a number ofhierarchies of levels, and different aggregationpaths are allowed within a dimension. InExample 4, the “administrative regions”dimension includes two hierarchies,“Canadian division” and “USA division”.

• Several attributes per level: Within the pro-posed model, a level can include a set of attrib-utes. Including attributes allows the represen-tation of the descriptions of a level. The level“city” in Example 2 has two attributes: “name”and “location”.

• Measure sets: This feature indicates that themodel should be able to support facts thatinvolve several measures. According to theproposed definition, a hyper-cell, whichdescribes a number of facts, can contain a setof measures. Including a number of measuresin hyper-cells allows us to have facts with a setof measures.

4.2. Spatial Characteristics Taking into account that the model describes

“spatial” datacubes, in addition to the above generalcharacteristics, the proposed model has the follow-ing two features:• Supporting spatial data: A basic requirement

of a model for spatial datacubes is to supportspatial data. The proposed model supports spa-tial data within levels, dimensions, measures,facts, and datacubes. This support includesboth spatial data represented by geometry,(i.e., geometric spatial attribute as well as spa-tial data) represented by data types other thanthe geometric type such as addresses and placename, (i.e., non-geometric spatial data).

• Explicit and precise definitions for the funda-mental components: One of the principal fea-tures of the proposed model is to provide a pre-cise definition framework for the differentcomponents of spatial datacubes, distinguish-ing these components from the non-spatialones. Such a framework is based on a view ofspatial data considered as having both geomet-ric and non-geometric representations, as sug-

gested by the international standards in geo-matics (i.e., ISO/TC211 and OGC). For eachelement of a datacube, its “spatial” equivalentis defined. Two types of spatial attributes arerecognized: geometric spatial attribute andnon-geometric spatial attribute. We defined aspatial level and two types of members for spa-tial levels, (i.e., geometric and non-geometricmembers). A spatial dimension was defined,and instances of spatial dimensions weredivided into three types: geometric, non-geo-metric, and mixed. In addition, two categoriesfor spatial measures are discriminated: numer-ic and geometric spatial measures. We dividedspatial facts into two types, (i.e., geometric andnon-geometric facts). Finally, a spatial datacubeis defined as a datacube that includes at leastone geometric fact.

5. Conclusion and Future Work

In this paper, we addressed an important issue inthe realm of spatial decision-support databases: thelack of a formal model that correctly and preciselydefines fundamental components of spatial datacubes(e.g., spatial dimension, spatial measure, and spatialfact). In order to present this issue and propose asolution for it, we made two strategic contributions.

As the first contribution, we reviewed and ana-lyzed the existing models for spatial datacubes byspecially focusing on the definitions given by thesemodels to spatial datacube components. The resultsshow that, on the one hand, there are some modelsthat consider a boarder view of spatial data in align-ment with the international standard in geomatics.However, these models have not yet been presentedin a formalized way. On the other hand, there are anumber of formal models for spatial datacubes, butthese models consider a limited perspective for spa-tial data as only those data that have a geometricrepresentation. Consequently, the formal definitionsgiven to spatial datacubes and their fundamentalcomponents by the latter models do not correctlyreveal the entire power of spatial data integrated within these datacube components.

The second contribution of the present paperwas to propose a formal model for spatial dat-acubes with a primary focus on recognizing andprecisely defining its different “spatial” compo-nents at both the schema and instance levels. Toachieve this goal, we revisited the definition of thespatial attribute taking into account the internation-al standards in geomatics. A spatial attribute wasdefined as any attribute describing spatial proper-

332

Page 13: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

ties of phenomena localized in space, independent-ly of its manner of representation. Such a definitionresulted in two types of spatial attributes: geomet-ric and non-geometric. On this basis, we defined aspatial level and two types of members for spatiallevels: geometric and non-geometric members. Aspatial dimension, which provides an order on spa-tial levels forming a hierarchy, was defined, andinstances of spatial dimensions were divided intothree types: geometric, non-geometric, and mixed.We also defined two categories for spatial measures:numeric and geometric spatial measures. A hyper-cell represents a model for a number of facts. Wedefined a spatial hyper-cell, which defines a modelfor spatial facts. Following to this, we divided spatialfacts into two types: geometric and non-geometricfacts. A spatial datacube, unlike other components,should include at least one geometric representationto be recognized as “spatial”. Therefore, we defineda spatial datacube as a datacube that includes at leastone geometric fact.

Having such a precise and common terminolo-gy to refer to the components of spatial datacubesimproves semantic interoperability between agentsdealing with datacubes. We are now working todefine a conceptual framework to deal with the inter-operability of spatial datacubes using the presentedmodel. In addition, further research is on going torecognize and classify the necessary integrity con-straints for spatial datacubes based on the proposedmodel in this paper. For example, the proposedmodel includes the concept “hyper-cell,” whichdefines a model for facts. This concept is necessaryin order to define integrity constraints for facts.Formalizing the corresponding spatial operators forthe well-know OLAP operators (such as spatial roll-up and spatial drill-down), as an algebra for the pro-posed model, is another research step.

6. References Abelló, A., J. Samos and F. Saltor. 2006. YAM2: A

Multidimensional Conceptual Model ExtendingUML. Information Systems, 31(6), p. 541-567.

Bédard, Y. 1997. Spatial OLAP, Vidéoconférence, 2ème

forum annuel sur la R-D, Géomatique VI: Unmonde accessible, Montréal, Canada.

Bédard, Y. and J. Han. 2009. Fundamentals of SpatialData Warehousing for Geographic KnowledgeDiscovery. Geographic Data Mining andKnowledge Discovery (2nd edition), H.J. Miller andJ. Han (Eds.), Taylor & Francis.

Bédard, Y. and S. Larrivée. 2008. Spatial DatabaseModeling with Pictogrammic Languages.Encyclopedia of GIS, S. Shekhar and H. Xiong(Eds.), Springer-Verlag, p. 716-725.

Bédard, Y., S. Larrivée, M.J. Proulx and M. Nadeau. 2004.Modeling Geospatial Databases with Plug-Ins forVisual Languages: A Pragmatic Approach and theImpacts of 16 Years of Research andExperimentations on Perceptory. Proceedings of theCOMOGIS Workshop ER2004, LNCS 3289,Springer-Verlag, Shanghai, China, p. 17–30.

Bédard Y., T. Merrett and Han J. 2001. Fundamentals ofSpatial Data Warehousing for Geographic KnowledgeDiscovery. Geographic Data Mining and KnowledgeDiscovery, H.J. Miller and J. Han (Eds.) (1st edition),Taylor & Francis, pp. 53-73.

Bédard, Y., S. Rivest and M.J. Proulx. 2007. Spatial On-Line Analytical Processing (SOLAP): Concepts,Architectures and Solutions from a GeomaticsEngineering Perspective. Data Warehouses andOLAP: Concepts, Architectures and Solutions, R.Wrembel and C. Koncilia (Eds.), Idea GroupPublishing, London, U.K., p. 298-319.

Bimonte, S. 2007. Intégration de l’information géo-graphique dans les entrepôts de données et l’analyseen ligne : de la modélisation à la visualization, PhDThesis, INSA, Lyon, France, 207 pages.

Blaschka, M., C. Sapia, G. Höfling and B. Dinter. 1998.Finding Your Way through Multidimensional DataModels. Proceedings of the 9th InternationalConference on Database and Expert SystemsApplications (DEXA), LNCS 1460, Springer-Verlag,Vienna, Austria, p. 198–203.

Caron, P.Y. 1998. Étude du potentiel OLAP pour supporterl’analyse spatio-temporelle, MS Thesis, Departmentof Geomatics, Laval University, Quebec City, Canada.

Chaudhuri, S. and U. Dayal. 1997. Data Warehousing andOLAP for Decision Support. ACM SIGMOD Record,26(2), p. 507-508.

Damiani, M.L. and S. Spaccapietra. 2006. Spatial DataWarehouse Modeling. Processing and ManagingComplex Data for Decision Support, J. Darmont andO. Boussaid (Eds.), Idea Group Inc., p. 1-27.

Fidalgo, R.N., V.C. Times, J. Silva and F. Souza. 2004.GeoDWFrame: A Framework for Guiding the Designof Geographical Dimensional Schemas. Proceedingsof the 6th International Conference on DataWarehousing and Knowledge Discovery, Zaragoza,Spain, p. 26-37.

Franklin, C. 1992. An Introduction to GeographicInformation Systems: Linking Maps to Databases.Database, 15(2), p. 13–21.

Gray, J., S. Chaudhuri, A. Bosworth, A. Layman, D.Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.1997. Data Cube: A Relational AggregationOperator Generalizing Group-By, Cross-Tab, andSub Totals. Data Mining and Knowledge Discovery,1(1), p. 29-53.

Han, J. and M. Kamber, 2006, Data Mining: Concepts andTechniques (2nd edition), Morgan KaufmannPublishers, San Francisco, 800 pages.

Han, J., N. Stefanovic and K. Koperski. 1998. SelectiveMaterialization: An Efficient Method for Spatial DataCube Construction. Proceedings Pacific-AsiaConference on Knowledge Discovery and DataMining, Melbourne, Australia, p. 144-158.

408

Page 14: A FORMAL CONCEPTUAL MODEL AND DEFINITIONAL …yvanbedard.scg.ulaval.ca/wp-content/documents/publications/588.pdf · Sonia Rivest Les cubes de données spatiales étendent le concept

G E O M A T I C A

ISO/TC211. 2003. Geographic Information—SpatialReferencing by Geographic Identifiers, Report19112.

ISO/TC211. 2004. Geographic Information—SpatialReferencing by Coordinate, Report 19111.

ISO/TC211. 2008. Geographic Information—PlaceIdentifier Architecture, New work item proposal.

Jensen, C.S., A. Kligys, T.B. Pedersen and I. Timko. 2004.Multidimensional Data Modeling for Location-basedServices. The VLDB Journal, 13 (1), p. 1-21.

KHEOPS Technologies. 2005. JMap Spatial OLAP,Innovative Technology to Support Intuitive andInteractive Exploration and Analysis of Spatio-tem-poral Multidimensional Data, Available from:h t t p : / / w w w . k h e o p s -tech.com/fr/jmap/doc/WP_JMap_SOLAP.pdf(accessed March 2009).

Malinowski, E. and E. Zimányi. 2004. RepresentingSpatiality in a Conceptual Multidimensional Model.Proceedings of the 12th Annual ACM InternationalWorkshop on Geographic Information Systems,Washington DC, USA, p. 12-22.

Malinowski, E. and E. Zimányi. 2008. Advanced DataWarehouse Design: From Conventional to Spatialand Temporal Applications, Springer-Verlag, Vienna,Austria, 444 pages.

Parent, C., S. Spaccapietra and E. Zimányi. 2006.Conceptual Modeling for Traditional and Spatio-temporal Applications: The MADS Approach,Springer-Verlag, Vienna, Austria, 466 pages.

Pedersen, T.B. 2000. Aspects of Data Modeling and QueryProcessing for Complex Multidimensional Data, PhDThesis, Faculty of Engineering and Science, AalborgUniversity, Aalborg, Denmark, 180 pages.

Rafanelli, M. 2003. Multidimensional Databases:Problems and Solutions, Idea Group Inc., 473 pages.

Rivest, S., Y. Bédard and P. Marchand. 2001. TowardsBetter Support for Spatial Decision-Making:Defining the Characteristics of Spatial On-LineAnalytical Processing. Geomatica, 55(4), p. 539-555.

Rivest, S., Y. Bédard, M.J. Proulx, M. Nadeau, F. Hubertand J. Pastor. 2005. SOLAP: Merging BusinessIntelligence with Geospatial Technology forInteractive Spatiotemporal Exploration and Analysisof Data. Journal of International Society forPhotogrammetry and Remote Sensing (ISPRS),60(1), p. 17-33.

Shekhar, S., C.T. Lu, X. Tan, S. Chawla and R. Vatsavai.2001. Map Cube: A Visualization Tool for SpatialData Warehouses. Geographic Data Mining andKnowledge Discovery, H. J. Miller and J. Han (Eds.),Taylor & Francis, p. 73-108.

Torlone, R. 2003. Conceptual Multidimensional Models.Multidimensional Databases: Problems andSolutions, M. Rafanelli (Eds.), Idea Group Inc., p.69-90.

MS rec’d 09/04/20Revised MS rec’d 10/04/14

Authors

Mehrdad Salehi received his Ph.D. inGeomatics Sciences from Laval UniversityCanada, specializing in GIS and spatial databases.He also holds a Master of Science and Bachelor ofScience of Surveying and Geomatics Engineeringfrom the University of Tehran, Iran. Mr. Salehi’sresearch interests include spatiotemporal databases,spatial datacubes, spatial OLAP, and spatial dataquality. His professional background includes GISsoftware development, spatial database design anddevelopment, and LiDAR data processing.Currently, Mr. Salehi holds the title of GIS andSpatial Data Management Consultant at 4DM Inc.,Toronto, Canada.

Dr Bedard is professor of GIS and SpatialDatabases at Laval University, Quebec City,Canada. He is an active member of the Centre forResearch in Geomatics where he acted as Directorfor 7 years, and of Canada’s GEOIDE network ofcenters of excellence. Dr Bédard has a multi-milliondollar record in both fundamental and appliedresearch, including a Canada NSERC IndustrialResearch Chair in Geospatial Database from 2004-2010. He has contributed to over 100 full-refereedpapers and 300 non-refereed papers and confer-ences. His research interest focuses on geospatialdatabases modeling, Spatial OLAP and data quali-ty. He co-founded Intelli3, a private company merg-ing GIS and Business Intelligence solutions andcommercializing Map4Decision, a technologytransfer from Laval University.

Sonia Rivest holds a Master’s Degree inGeomatics Sciences from Université Laval, Quebec.She works at the Centre for Research in Geomaticsof Université Laval as a research professional with-in the GIS and spatial databases team, and for Intelli3

(a private company merging GIS and businessintelligence solutions and commercializingMap4Decision), as a specialist in geomatics andbusiness intelligence. Her professional interestsinclude spatial databases, multidimensional data-bases, and spatial OLAP. o

408