a graph-based overview visualization for data landscapes · 226 a graph-based overview...

42
NetLearning: beyond the WOW! factor [email protected]

Upload: others

Post on 17-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

Computer Science and Information Technology 1(3): 225-232, 2013DOI: 10.13189/csit.2013.010308 http://www.hrpub.org

A Graph-based Overview Visualizationfor Data Landscapes

Grigor Tshagharyan1, Hans-Jorg Schulz2,∗

1Faculty of Informatics and Applied Mathematics, Yerevan State University, 0025 Yerevan, Armenia

2Faculty of Computer Science and Electrical Engineering, University of Rostock, 18051 Rostock, Germany

∗Corresponding Author: [email protected]

Copyright c⃝2013 Horizon Research Publishing All rights reserved.

Abstract In many domains, it becomes more andmore common that an analysis spans various interlinkeddata sources that we collectively term data landscape.Yet for the selection of appropriate data sources fromthe wide range of available ones, current approaches andsystems rarely offer more support than a File-Open-dialog. This paper presents a visualization approachthat aims to give a stable and meaningful overviewof a data landscape to ease finding and selecting datasources that may be useful in a subsequent visualanalysis. As such, it serves as a visual starting pointfrom which to bootstrap a visual analysis by findingand selecting the data sources relevant to a questionat hand. This approach is exemplified by applying itto a current snapshot of the data landscape from theCKAN-LOD data hub consisting of 216 data sourceswith 613 links between them.

Keywords Linked Open Data, LOD Cloud, Multi-modal Data Visualization

1 Introduction

Data sources are increasingly interlinked as it becomesmore frequent that datasets are no longer self-contained,but reference other data sources via common identifiers.As a result, a typical visual analysis is rarely confinedto a single data source, but spans across multiple suchinterlinked data sources to yield meaningful and reliableinsights. This is sometimes termed multimodal data vi-sualization [23] or cross-media analysis [37, p.128]. Ex-amples for such analyses can be found in various fields,such as biomedicine [31], cancer research [24], and man-ufacturing [1]. These examples have in common thatthey each assume a rather small, hand-curated set ofno more than a dozen data sources that are appropriatefor the visual analysis to be conducted. Usually, thisset forms a mere subset out of the large number of allavailable data sources, which we term data landscape inthe spirit of similar terms for multi-dataset scenarios,such as information landscape [27] or data meadow [15].We consider such a data landscape to be heterogeneous,

if the individual data sources are of diverse type andstructure, so that the landscape connects, for example,textual, numerical, and pictorial data sources.

If the focus of an analysis shifts, it may necessitate theincorporation of different or additional data sources fromthe data landscape than the ones that were originally se-lected. Such a flexible (re-)selection of data sources is anintegral, if not the most important step of a visual anal-ysis, as a poor choice of data to base an analysis on canrender the entire analysis pointless. In contrast to thisobservation stands the fact that this step is hardly everconsidered and most visual analysis approaches assumethe data as given. Most approaches or systems supportthe analyst at this early stage with a mere dialog forloading the data. The few, more advanced methods areeither based on statistics or authoring: the former usesimilarity metrics to find data sources related to givendata [8], whereas the latter require a domain expert topreselect suitable data sources to pursue a given analysisgoal [35]. Meta data of the data sources and dependen-cies between them are in both cases rarely communi-cated or used.

This paper presents a visualization method for givinga meaningful graphical overview of large data landscapeswith potentially hundreds of different data sources toallow for an informed choice of a subset on which topursue a subsequent visual analysis. The challenge ofcreating such an overview is that the necessary layoutquality for serving as a useful visual index in which allthe available data sources are meaningfully placed in astable deterministic way and well separated is hard toachieve. The reason for this lies in the fact that the com-mon graph visualization algorithms are tuned towardsslightly different drawing aesthetics than those neededin this case. As a result of this, current visualizationsof data landscapes are either manually designed or donot scale well beyond a few dozen datasets, as the briefoverview of the related work in Section 2 shows. To over-come this challenge, we propose a carefully constructedlayout concept that employs a step-wise refinement usingapproaches from network visualization and graph layoutin Section 3. We exemplify our layout concept in Sec-tion 4 with a data landscape that was derived from theLinking Open Data (LOD) repository of the Compre-

Page 2: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

226 A Graph-based Overview Visualization for Data Landscapes

(a) Linking Open Data Cloud [10] (b) Linked Open Data Graph [36]

Figure 1. Overview visualizations of the CKAN-LOD landscape. (a) depicts a manually created diagram with different domainsbeing color-coded. According to the authors, it was incrementally crafted using the diagramming tool OmniGraffle. (b) shows anautomatically laid out diagram in which the colors signify user ratings of the data sources. It was created using the Fruchterman-Reingold layout algorithm [18].

hensive Knowledge Archive Network (CKAN) contain-ing 216 data sources and 613 interconnections betweenthem. Finally, Section 5 concludes this paper by outlin-ing ideas for future work.

2 Related Work

In principal, visualization across multiple data sourcescan occur on three levels of granularity: the item levelof individual data records, the table level of differentclasses of data items, and the landscape level of entiredatabases. For the first two levels, with which this paperis not concerned, the reader is pointed to [9] and [11],respectively, for examples of their visualization. Theyboth have in common that they make use of a givenstructure underlying and connecting the heterogeneousdata for traversing it, querying it, and visualizing it. Onthe item level, this structure is in many cases given as anRDF graph. Whereas on the table level, the databaseschema is utilized as a natural structure that underliesthe ensemble of tables.

On the landscape level, only a few visualizationshave yet taken on the challenge of representing multi-ple datasets. This may be due to the fact that the ad-vent of large data hubs and the growing Linked OpenData movement [39, ch.11] calling for visualizations onthis level are rather recent developments. Such datahubs collect (links to) data sources and make meta dataabout them available. Notable examples are the CKAN-LOD repository (http://datahub.io), the Socrata So-cial Data Platform (http://opendata.socrata.com),and the U.S. Open Government Initiative Data.gov(http://www.data.gov). Together, they collect hun-dreds of data sources from a variety of domains and thusprovide a common interface and a standardized entrypoint to them. Similar to the other two levels, a datalandscape is usually assumed to exhibit an underlyingstructure – in this case an interlinkage between individ-ual data sources as it is induced by foreign key relationsbetween contained items. These links can be weighted

by the number of individual foreign key references theysubsume, and they can be orientated by the direction ofthese references. This structure is commonly used to em-ploy graph-based drawing techniques for the landscape’svisualization. In practice, two different such techniquesare used: manual layouts in which the user positions allnodes interactively to produce a final static visualizationand automatic layouts in which the positions are com-puted to yield a dynamically updating representationwithout the need for user input.

Manual layouts, such as the one given by [34] for thebioinformatics domain or the CKAN-LOD Cloud [10]shown in Figure 1a, tend to be very readable with athoughtful node positioning that places data sourcesfrom the same domain close to each other. As staticoverviews of a data landscape, they are much less aninteractive exploratory tool than a visual index of thedata sources in which a click on a node redirects aweb browser to the corresponding data source for fur-ther inspection. This index-like notion is underlined bythe efforts to produce a meaningful relative node place-ment that aids in quickly (re-)finding data sources asone would desire it from an index. Furthermore, theyare carefully arranged in a way so that no two nodesoverlap each other, thus enabling a user to select themunambiguously. Their drawback is their inability to au-tomatically adapt either to a change in the underlyingdata landscape, or to a change in the user’s interest thatmay require a grouping by a different criteria than thedomain. In both cases, the designer of the visualiza-tion has to redo the visualization and manually makethe desired changes.

Automatic layouts, such as the one utilized by [24]in their Data-View-Integrator or the LOD Graph [36]shown in Figure 1b, do not suffer from this drawback.They gain their flexibility through the use of a force-based graph layout, but by using these layouts, theytrade in the meaningful and well-separated placementthat an index-like visualization requires. While thisstill works for small data landscapes with only a hand-ful of data sources, as one can see in [24], the visual

Page 3: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

Computer Science and Information Technology 1(3): 225-232, 2013 227

clutter increases for larger data landscapes. At somepoint, the amount of overplotting makes it impossibleto single out individual data sources and click on them– in particular, if they are positioned in the center ofthe “hairball” that the force-directed layout creates. Inaddition, due to their non-deterministic nature, simpleforce-based algorithms cannot be used to produce sta-ble layouts, nor can they convey any semantic relationbetween the data sources, such as by placing them closeto each other. While solutions or “work-arounds” ex-ist to alleviate these problems – i.e., by using pinningweights to increase stability [17] and additional forcesto model semantic relations, such as belonging to differ-ent domains/clusters [14] – these are so far not used inthe existing data landscape visualizations.

With their respective pros and cons, these two strate-gies of visualizing data landscapes denote two endpointsof the same design spectrum: one being tedious to man-ually construct, but extremely well-suited to serve as amap of the data landscape and as a gateway to it, whilethe other one is easily constructed automatically, butill-suited to perform the desired look-up task if the datalandscape gets realistically large. It is thus the aim ofthis paper to find a compromise between the two thatdoes no longer require manual layout work, but never-theless serves well as the visual index of data sourcesthat is needed. The method to achieve this is detailedin the next section.

3 A Visualization Method forData Landscapes

The reason for the inadequacy of the existing auto-mated approaches is the fact that the used force-directedlayout algorithms are designed to optimize different aes-thetic criteria than those needed for a stable visual in-dex. For example, a meaningful placement is very hardto achieve when the node positions are used as the degreeof freedom which the layout alters in order to achieveother aesthetic criteria. Among these criteria are totaledge length minimization and crossing-number reduc-tion [4], which are certainly desirable, but not at thecost of rendering the entire layout unsuitable for thetask at hand. This is in line with recent results, whichhave shown that a layout that compromises between var-ious aesthetics produces much better overall drawingsthat those that aim to achieve a few aesthetics to thefullest at the expense of all others [21]. Hence, the fol-lowing section will shortly state and prioritize the layoutconstraints for index-like overview visualizations of datalandscapes, before the concrete layout method is derivedfrom them.

3.1 Defining the Layout Constraints

The most essential constraints imposed by therequirements of a visual index are in order of decreasingimportance:

I. Stability: An essential aspect of the layout is thatthe same input will always result in the same outputand that small changes in the input will only result in

small changes in the output. Otherwise, the analystwould be presented with a different placement in eachvisualization session, even though the shown datalandscape changed not at all or only a little.

II. No Node-Edge Overlap: As the main purposeof the layout will be to select individual nodes, it isimportant that the edges do not occlude them. Whilethe edges are a fundamental part of the network,they provide only complementary information forthe selection of data sources and cannot be selectedthemselves. Hence, edge-edge-overlap (edge crossings)can be tolerated, yet node-edge-overlap cannot.

III. No Node-Node Overlap: A similar argumentcan be made for the case of overlapping nodes. Sincethey must be selectable on an individual basis, theirseparation is essential to the layout.

IV. Meaningfulness: In contrast to a standardnetwork drawing that aims at a meaningful relativepositioning of nodes, the sought overview shouldproduce a meaningful absolute positioning to aid thelook-up of nodes. This implies that a change in positionbetween visualization sessions bears meaning as well,for example, indicating that a data source’s propertieschanged, such as its user ratings or the date of its lastupdate.

The order of these constraints is logical as, forexample, a well-separated meaningful node placementis useless if it is completely cluttered with overdrawnedges, which make it impossible to select any node atall. This implies that a less important constraint maybe violated and not be achieved fully in order to fulfilla more important constraint. It can be observed thatthe manually generated data landscape visualizationsadhere to them. For example, the LOD cloud shown inFigure 1a fulfills them in the following ways:

I. Stability is achieved through an incremental layout,which takes the last LOD cloud diagram as a startingpoint for including newly added data sources [10].This way, repositioning is limited and the same datasource can be found in similar positions across differentversions of the diagram.

II. No Node-Edge Overlap is simply achieved bydrawing the nodes on top of the edges. While it fulfillsthe criterion, it also hampers the attribution of edgesto their incident nodes.

III. No Node-Node Overlap is achieved as a resultof the manual positioning process, which aims “to forma beautiful and fluffy cloud” [10] that carefully avoidssuch overlap.

IV. Meaningfulness lies in this example not so muchin the absolute position of a node, but rather in therelative position with respect to its neighboring nodesbelonging to the same application domain. This isadditionally highlighted by the color-coding of thedifferent domains (cp. Figure 1a).

Page 4: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

228 A Graph-based Overview Visualization for Data Landscapes

This shows that manually generated landscape visu-alizations come already pretty close to what we want toachieve with our proposed layout in an automated way.Therefore, these constraints directly influence our layoutapproach, as it is outlined in the next section.

3.2 Deriving our 3-Step Layout Approach

On the one hand, to the best of our knowledge, thereexists no monolithic layout algorithm that fulfills theabove layout constraints. On the other hand, we refrainfrom building a custom layout algorithm, as it wouldrequire additional implementation work to be used andit would reinvent the wheel in many aspects for whichalready decent solutions exist – e.g., for node overlap re-moval. As a compromise between these two strategies,we propose a modular step-wise layout concept that uti-lizes existing approaches and implementations for eachstep. This step-wise solution can be seen as a meta-algorithm, which does not specify in detail, which con-crete method to use, but what kind of methods to usein which order. Our meta-algorithm employs one lay-out step per layout constraint, adapting the placementof nodes and edges so that the constraints are met.

Looking at the four layout constraints, it can be notedthat Constraint I does not concern the layout resultas much as it concerns the layout process as a whole.It is not something that can simply be imposed as anadditional refinement step on an already existing layoutas, for example, the third constraint can be achievedby performing a node overlap removal step. Hence, weconsider the constraint of stability as a global one thatmust be fulfilled by all individual layout steps in orderto be valid for the entire layout process.

This leaves three actual layout steps to be carriedout – one for ensuring each of the three remaining con-straints. Since later layout steps potentially overwriteor adjust the outcome of earlier layout steps, we pursuethe constraints from the least important to the most im-portant one. This will, for example, sacrifice or reducethe meaningfulness of the node placement if this helps toremove node overlap in a later step. The three steps ofour meta-algorithm are given in the following togetherwith some concrete algorithmic suggestions for carryingthem out.

Meaningful Node Placement (Constraint IV).This is the initial step that performs the placement of thenodes according to two selected numerical meta data ofthe data sources in a Cartesian coordinate system, sim-ilar to the GraphDice technique [5] or to the SemanticSubstrates [33]. For example, the nodes can be placedon the X-axis according to the size of a data source andon the Y-axis according to the time of their last update.This way, all large and recently updated data sourceswill be placed at the top-right of the layout, whereassmaller and older data sources are placed towards itsbottom-left. It is obvious that such a mapping of thenodes onto the X/Y-plane is deterministic. Dependingon what is important for the analysis at hand, differentmeta data can be used. For instance, the time of thelast update may be useful in financial or clinical scenar-ios where it is important to always work with the most

recent data available. Whereas the number of incomingedges (number of other data sources referring to a datasource) could be used together with the average userrating as an indicator of the trustworthiness of a datasource, if a scenario depends on such a notion.

Node Overlap Removal (Constraint III). As theinitial step places nodes with the same meta data onthe same X/Y-position, the resulting overlap has to beremoved in order to permit their individual selection.Unfortunately, many common node overlap removal ap-proaches are either not deterministic (e.g., force-directedadjustments, such as [25]) or they interfere too stronglywith the meaningful placement (e.g., [26]). Ideally, over-lapping nodes are distributed in the vacant vicinity ofwhere the overlap occurs. Suitable approaches are, forexample, based on stress majorization, such as the FastNode Overlap Removal [12, 13]. As a result of this step,all nodes have their distinct drawing space and a mini-mum distance to each other, while still being in the prox-imity of their original position and thus remain quick tolocate. With this step, the nodes are separated and theedges can be routed through the gaps the node overlapremoval created between them.

Edge Routing (Constraint II). In contrast to manyestablished network layout methods, the node placementis in our case not driven by connectivity, i.e., nodes thatare connected are not necessarily placed close to eachother. Hence, it is important to show the edges in or-der to make the connections between the data sourcesexplicit. This raises the problem of edges overplottingthe nodes, which could be circumvented by following theidea of the LOD Cloud and drawing them underneaththe nodes. Yet, this makes it hard to attribute edgesto a particular pair of nodes if they cross underneatha number of them. This effect is not as visible in Fig-ure 1a, because the data sources are grouped by theirdomain. Since intradomain cross-references are muchlikelier than interdomain relations, edges in this diagramare in general rather short and thus less prone to crossa lot of nodes. Since this does not hold true for ev-ery node placement strategy, we are required to routearound the nodes. This problem can traditionally be re-duced to finding shortest paths in the visibility graph ofthe layout. A reasonably fast algorithm building on thisreduction is given in [28], yielding an almost Flow Map-like edge routing [29]. Furthermore, it is also possible toshow edges only on demand for nodes of interest, if thedensity of the data landscape calls for it.

As these three steps of the meta-algorithm merely de-fine the intention and the constraints of the operationto be performed, they do not require a particular fixedtechnique to carry them out. Instead, depending on theapplication case, different techniques than the ones sug-gested in the above description can be plugged into thesesteps. For example, if more than two meta data shallbe mapped to the X/Y-plane, Multidimensional Scaling(MDS) [38, 7] can be used. Whereas, if interactivity isneeded, a more scalable heuristic for node overlap re-moval, such as PRISM [16], can be applied. The follow-ing section will exemplify the utilization of our layoutmethod for the CKAN-LOD.

Page 5: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

Computer Science and Information Technology 1(3): 225-232, 2013 229

(a) DailyMed and related data sources (b) DrugBank and related data sources

Figure 2. Overview visualization of the CKAN-LOD landscape with 216 data sources and 613 links, laid out by their user rating(y-axis) and size (x-axis). In both examples, a pharmaceutical database is selected (black) and their related data sources are highlighted(green).

4 Applying our Method to theCKAN-LOD Landscape

This paper uses the CKAN-LOD repository as an ex-ample data landscape, as it is not only freely available,but also because it is the one which is probably the bestpublicized and researched data landscape. The structureof this landscape has been previously described in [6]and [20, ch.3], and thoughtfully been analyzed in [30].It is particularly well suited to exemplify our visualiza-tion approach, as it is up to date the only data landscapefor which manual and automated layout solutions exist(cp. Figure 1).

We implemented our layout approach using the al-gorithms mentioned in the previous section to performthe individual steps. It is realized as a Java softwarethat retrieves and parses the CKAN-LOD repository andgenerates an SVG diagram with added JavaScript inter-activity. This way, the layout can be computed offline(e.g., in regular intervals as a cron job on the server)and its result is then published online at a fixed URLupon its completion. We found this to be a more usefulsetup than a client-side layout, as parsing the reposi-tory and generating the layout requires a few minutes,which a user may not be willing to wait. The layout timeof a few minutes also reflects the compromise we madebetween the existing fast but cluttered automatic force-based layouts and the time-consuming but tidy manuallayout generation.

The result can be seen in Figure 2 that depicts thesubset of the CKAN-LOD landscape, for which the usednumerical meta data average rating and number of itemswere available. To yield an uncluttered view of the datalandscape, we display links only on demand for selecteddata sources (black). Their directionality is signified byan arrowhead that is placed at the far end of each edge

by the related data source (green). If the arrowheadpoints away from the selected data source towards therelated one, it indicates that the selected data source ref-erences the related one. If it points towards the selecteddata source, the selected data source is referenced by therelated one. If no arrowhead is shown, the dependencyis bidirectional.In the case at hand, an analyst seeks a pharmaceu-

tical data source, which is known to be a challengingendeavor in the field of health care and life sciences [32].Currently, mainly list-based approaches are used to getan overview of various pharmaceutical data sources [22].The analyst starts with an obvious candidate, the well-known DailyMed database shown in Figure 2a. Itsneighborhood of dependent data sources looks promis-ing, as it links out to databases with further informationabout diseases (Diseasome), clinical trials (LinkedCT),Traditional Chinese Medicine (TCMGeneDIT), and sideeffects (SIDER), which will ease investigations in anyof these particular aspects. A double-click on the datasource opens up the DailyMed webpage where the ana-lyst discovers a serious drawback to this particular datasource: it provides its data mainly in the form of En-glish text that is hard to parse automatically (see [3])and to distill into visualizations. Hence, she seeks foran alternative and finds it in the neighborhood of theDailyMed database: the DrugBank shown in Fig. 2b.Not only does the DrugBank provide its information in amore structured manner, but it also links to all the samedatabases plus some important additional ones, such asPubMed for publications, the Bio2RDF Pfam databaseof protein families, and the KEGG compound database.Its apparent higher appeal is furthermore reflected in itshigher average user rating, which puts it more towardsthe top of the landscape.To confirm that the higher average user rating does

not reflect mere subjective experiences, the analyst can

Page 6: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

230 A Graph-based Overview Visualization for Data Landscapes

(a) DailyMed and related data sources (b) DrugBank and related data sources

Figure 3. Visualizations of the same data sources as in Figure 2, but now laid out by their user rating (y-axis) and incomingconnections (x-axis).

switch to a different layout of the data landscape, whichpositions the data sources by user rating and the numberof incoming connections. The number of incoming con-nections reflects how many other data sources rely on agiven data source and thus acts as a proxy for its generaldependability and credibility. The result of this switchis shown in Figure 3 and the analyst can see from it ata first glance that both data sources – DailyMed andDrugBank – are placed at the same horizontal positionand are thus equal with respect to incoming connections.A closer look reveals that actually both data sources arelinked to by the very same set of seven related datasources.

As a result, the analyst was able to identify a moresuitable data source from the overview visualization andto confirm by changing the layout that a switch fromDailyMed to DrugBank will not make any subsequentpharmaceutical analysis less reliable.

5 Conclusion and Future Work

The proposed overview visualization for data land-scapes provides a valuable method for selecting datasources for their subsequent visual analysis. It achievesthis by not simply running a standard algorithm, butby fulfilling a thoughtfully prioritized list of aestheticconstraints that are targeted directly towards the taskof looking up data sources. As a result, it yields a vi-sual index of data sources that could only be producedmanually before.

In future work, we aim to explore two directions. Thefirst is based on the observation that our visualizationmethod is only as powerful as the available numericalmeta data for the meaningful positioning of the datasources in the visual index. The more of these metadata can be provided, the more ways of mapping out thedata landscape are possible and thus the better it can

be tailored to the needs of the analyst. Additional metadata can either be retrieved from dedicated providers,such as the LODStats project [2], or be derived usingservices, such as GeoIP lookups to fetch the lat/lon-coordinates of the database locations. The latter wouldfor example allow us to map the data sources onto aworld map and thus tie the abstract geography of thedata landscape to the physical geography of the globe.

The second direction to explore is to provide a moreseamless navigation experience when switching betweenthe overview visualization of the landscape that servesas a substitute for the File-Open-dialog and the actualvisualization of a selected data source. For example,it is conceivable to allow for zooming into regions of thelandscape visualization and as the zoomed-in nodes havemore screen space available, they are substituted by por-tals in which they are visualized in more detail, similar tothe approach proposed by [19]. This way, the user wouldnot have to leave the visualization to switch to a differentdataset, but simply zoom-out to the topmost overviewof the entire data landscape and then zoom back intoa different dataset. This would effectively tie these twolevels of visualization closer together – something thatwould be unthinkable with a mere File-Open-dialog.

Acknowledgements

The authors gratefully acknowledge funding from theGerman Research Foundation (DFG).

REFERENCES

[1] M. Aehnelt, H.-J. Schulz, and B. Urban. Towards acontextualized visual analysis of heterogeneous manu-facturing data. In ISVC’13: Proceedings of the Interna-

Page 7: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

Computer Science and Information Technology 1(3): 225-232, 2013 231

tional Symposium on Visual Computing, pages 76–85.Springer, 2013.

[2] S. Auer, J. Demter, M. Martin, and J. Lehmann. LOD-Stats – an extensible framework for high-performancedataset analytics. In A. ten Teije, J. Volker, S. Hand-schuh, H. Stuckenschmidt, M. d’Acquin, A. Nikolov,and N. A.-G. N. Hernandez, editors, EKAW’12: Pro-ceedings of the International Conference on Knowl-edge Engineering and Knowledge Management, LectureNotes in Computer Science, pages 353–362. Springer,2012.

[3] C. Barriere and M. Gagnon. Drugs and disorders: Fromspecialized resources to web data. In Proceedings of theWorkshop on Web Scale Knowledge Extraction, 2011.

[4] G. D. Battista, P. Eades, R. Tamassia, and I. G. Tol-lis. Graph Drawing: Algorithms for the Visualization ofGraphs. Prentice Hall, 1999.

[5] A. Bezerianos, F. Chevalier, P. Dragicevic,NiklasElmqvist, and J.-D. Fekete. Graphdice: Asystem for exploring multivariate social networks.Computer Graphics Forum, 29(3):863–872, June 2010.

[6] C. Bizer. The emerging web of linked data. IEEE In-telligent Systems, 24(5):87–92, September 2009.

[7] A. Buja, D. F. Swayne, M. L. Littman, N. Dean, H. Hof-mann, and L. Chen. Data visualization with Multi-dimensional Scaling. Journal of Computational andGraphical Statistics, 17(2):444–472, June 2008.

[8] J. Caldas, N. Gehlenborg, A. Faisal, A. Brazma, andS. Kaski. Probabilistic retrieval and visualization ofbiologically relevant microarray experiments. Bioinfor-matics, 25(12):i145–i153, May 2009.

[9] M. Cammarano, X. Dong, B. Chan, J. Klingner, J. Tal-bot, A. Halevy, and P. Hanrahan. Visualization of het-erogeneous data. IEEE Transactions on Visualizationand Computer Graphics, 13(6):1200–1207, 2007.

[10] R. Cyganiak and A. Jentzsch. Linking Open Data clouddiagram. http://lod-cloud.net/, September 2011.

[11] D. M. de Lima, J. F. Rodrigues Jr., and A. J. M. Traina.Graph-based relational data visualization. In IV’13:Proceedings of the International Conference Informa-tion Visualisation. IEEE Computer Society, 2013.

[12] T. Dwyer, K. Marriott, and P. J. Stuckey. Fast nodeoverlap removal. In P. Healy and N. S. Nikolov, editors,GD’05: Proceedings of the International Symposium onGraph Drawing, Lecture Notes in Computer Science,pages 153–164. Springer, 2005.

[13] T. Dwyer, K. Marriott, and P. J. Stuckey. Fast nodeoverlap removal – correction. In M. Kaufmann andD. Wagner, editors, GD’06: Proceedings of the Inter-national Symposium on Graph Drawing, Lecture Notesin Computer Science, pages 446–447. Springer, 2006.

[14] P. Eades and M. L. Huang. Navigating clustered graphsusing force-directed methods. Journal of Graph Algo-rithms and Applications, 4(3):157–181, 2000.

[15] N. Elmqvist, J. Stasko, and P. Tsigas. DataMeadow:a visual canvas for analysis of large-scale multivariatedata. Information Visualization, 7(1):18–33, February2008.

[16] E. Emden and Y. Hu. Efficient, proximity-preservingnode overlap removal. Journal of Graph Algorithms andApplications, 14(1):53–74, 2010.

[17] Y. Frishman and A. Tal. Multi-level graph layout on theGPU. IEEE Transactions on Visualization and Com-puter Graphics, 13(6):1310–1319, November-December2007.

[18] T. M. J. Fruchterman and E. M. Reingold. Graph draw-ing by force-directed placement. Software – Practiceand Experience, 21(11):1129–1164, November 1991.

[19] S. Hadlak, H.-J. Schulz, and H. Schumann. Insitu exploration of large dynamic networks. IEEETransactions on Visualization and Computer Graphics,17(12):2334–2343, 2011.

[20] T. Heath and C. Bizer. Linked Data: Evolving the Webinto a Global Data Space. Morgan&Claypool Publish-ers, 2011.

[21] W. Huang, P. Eades, S.-H. Hong, and C.-C. Lin. Im-proving multiple aesthetics produces better graph draw-ings. Journal of Visual Languages and Computing,24(4):262–272, 2013.

[22] A. Jentzsch, O. Hassanzadeh, C. Bizer, B. Andersson,and S. Stephens. Enabling tailored therapeutics withlinked data. In C. Bizer, T. Heath, T. Berners-Lee,and K. Idehen, editors, LDOW’09: Proceedings of theWWW’09 Workshop on Linked Data on the Web, vol-ume 538 of CEUR Workshop Proceedings, 2009.

[23] J. Kehrer and H. Hauser. Visualization and visual anal-ysis of multifaceted scientific data: A survey. IEEETransactions on Visualization and Computer Graphics,19(3):495–513, 2013.

[24] A. Lex, M. Streit, H.-J. Schulz, C. Partl, D. Schmal-stieg, P. J. Park, and N. Gehlenborg. StratomeX: Vi-sual analysis of large-scale heterogeneous genomics datafor cancer subtype characterization. Computer Graph-ics Forum, 31(3):1175–1184, June 2012.

[25] W. Li, P. Eades, and N. Nikolov. Using spring algo-rithms to remove node overlapping. In S.-H. Hong, edi-tor, APVIS’05: Proceedings of the Asia Pacific Sympo-sium on Information Visualization, Conference in Re-search and Practice in Information Technology, pages131–140. Australian Computer Society, 2005.

[26] K. Misue, P. Eades, W. Lai, and K. Sugiyama. Lay-out adjustment and the mental map. Journal of VisualLanguages and Computing, 6(2):183–210, June 1995.

[27] V. L. O’Day and R. Jeffries. Orienteering in an in-formation landscape: how information seekers get fromhere to there. In S. Ashlund, K. Mullet, A. Henderson,E. Hollnagel, and T. N. White, editors, INTERCHI’93:Proceedings of the INTERACT’93 and CHI’93 Confer-ence on Human Factors in Computing Systems, pages438–445. ACM Press, 1993.

[28] M. H. Overmars and E. Welzl. New methods for com-puting visibility graphs. In SCG’88: Proceedings of theSymposium on Computational Geometry, pages 164–171. ACM Press, 1988.

[29] D. Phan, L. Xiao, R. Yeh, P. Hanrahan, and T. Wino-grad. Flow map layout. In J. Stasko and M. O.Ward, editors, InfoVis’05: Proceedings of the IEEESymposium on Information Visualization, pages 219–224. IEEE Computer Society, 2005.

Page 8: A Graph-based Overview Visualization for Data Landscapes · 226 A Graph-based Overview Visualization for Data Landscapes (a) Linking Open Data Cloud [10] (b) Linked Open Data Graph

232 A Graph-based Overview Visualization for Data Landscapes

[30] M. A. Rodriguez. A graph analysis of the Linked DataCloud. arXiv.org e-print service, 0903.0194v1, March2009.

[31] H. Rohn, C. Klukas, and F. Schreiber. Creatingviews on integrated multidomain data. Bioinformatics,27(13):1839–1845, June 2011.

[32] M. Samwald, A. Jentzsch, C. Bouton, C. S.Kallesøe, E. Willighagen, J. Hajagos, M. S. Marshall,E. Prud’hommeaux, O. Hassenzadeh, E. Pichler, andS. Stephens. Linked open drug data for pharmaceuticalresearch and development. Journal of Cheminformat-ics, 3(1):19, 2011.

[33] B. Shneiderman and A. Aris. Network visualization bysemantic substrates. IEEE Transactions on Visualiza-tion and Computer Graphics, 12(5):733–740, 2006.

[34] S. Stephens, D. LaVigna, M. DiLascio, and J. Luciano.Aggregation of bioinformatics data using Semantic Webtechnology. Web Semantics: Science, Services andAgents on the World Wide Web, 4(3):216–221, Septem-ber 2006.

[35] M. Streit, H.-J. Schulz, A. Lex, D. Schmalstieg, andH. Schumann. Model-driven design for the visual anal-ysis of heterogeneous data. IEEE Transactions onVisualization and Computer Graphics, 18(6):998–1010,2012.

[36] E. Summers and R. Cyganiak. Linked Open Datagraph. http://inkdroid.org/lod-graph/, November2010.

[37] J. J. Thomas and K. A. Cook. Illuminating the Path:The Research and Development Agenda for Visual An-alytics. IEEE Computer Society, 2005.

[38] W. S. Torgerson. Multidimensional scaling: I. Theoryand method. Psychometrika, 17(4):401–419, December1952.

[39] L. Yu. A Developer’s Guide to the Semantic Web.Springer, 2011.