graph visualization - edbt summer school · pdf file 2015. 8. 29. · graph...

Click here to load reader

Post on 23-Sep-2020

6 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Graph Visualization

    P. Eades1 and K. Klein2

    1 University of Sydney [email protected] 2 Monash University [email protected]

    1 Introduction

    Graph visualization is the process of making a drawing of a graph, so that a human can understand the graph. This is illustrated as the graph visualization pipeline in Fig. 1. A drawing function D takes a graph G from a graph data set and produces a graph drawing D(G). A perception function P takes the drawing D(G) and produces some knowledge P (D(G)) in the human. The drawing function D can be executed with pen

    Graph Graph drawing Human

    𝑫 𝑷

    (a) (b) (c)

    Fig. 1. A graph visualization pipeline.

    and paper by a human, but since the advent of computer graphics in the 1970s, there has been increasing interest in executing this function on a computer; in this paper we discuss computer algorithms that implement D. The perception function P is executed by the human’s perceptual and cognitive facilities.

    As an illustration, consider the social network in Fig. 2; it describes a friendship relation between a group of people. It is represented in Fig. 2(a) as a table with the first column listing the people, and the second column listing the friends of each person. For example, the friends of Brian are Keith, John, Michael, and William. The drawing func- tion D produces the picture in Fig. 2(b); each person is represented by a box with text, and the friendship relation is represented by lines connecting the boxes. The perception function P takes the picture as input and produces some knowledge in the human. This could be low-level knowledge such as “John and George are friends”, or higher-level knowledge such as “Keith is important”.

    Graphs (aka networks)are one of the most pervasive models used in technology; social networks are prime examples, but we also find graphs in areas as diverse as

  • William

    Keith

    George

    Brian

    Morgan

    John

    Riley Lee

    Michael

    Paul

    Brian Keith, John, Michael, William

    George John, Paul, William

    John Brian, George, Paul

    Keith Brian, Michael, Morgan, William

    Lee Morgan, Riley

    Michael Brian, Keith, Paul

    Morgan Keith, Lee, Riley

    Paul George, John, Michael

    Riley Lee, Morgan

    William Brian, George, Keith

    Keith is important

    𝑫 𝑷

    (a) (b) (c)

    Fig. 2. A graph, a drawing of that graph, and some knowledge perceived by the human.

    biotechnology, in forensics, in software engineering, and epidemiology. For humans to make sense of these graphs, a picture or graph drawing is helpful. In this paper, we introduce the basic methods for creating pictures of graphs that are helpful for humans.

    The graph data in Fig. 1(a) is a set of attributed graphs. Each such graph consists of a set of vertices (sometimes called “nodes”) and a set of binary relationships (often called “edges”) between the vertices. The vertices and edges usually have attributes. For example, the vertices in Fig. 2 have textual names. Edge attributes could include, for example, a number that quantifies the strength of a friendship.

    The graph drawing in Fig. 1(b) is a “node-link” diagram: it consists of a glyph D(u) for each vertex u of the graph, and a curve segment D(e) connecting the glyphs D(u) and D(v) for each edge e = (u, v) of the graph. Each glyph D(u) has geometric attributes (such as position and size) and graphical attributes (such as colour). Similarly, each curve D(e) has geometric attributes (such as its route) and graphical attributes (such as colour and linestyle). Note that other kinds of graph drawing are possible; see Section 5.4 below. However, in this paper we will concentrate on the node-link metaphor, as it is the most commonly used.

    In practice, it is relatively easy to find a good mapping from the vertex and edge attributes to the graphical attributes of glyphs and lines, using well-established rules of graphic design (see, for example, [95]). A large variety of graphical notations exists in common application areas, Figs. 3 and 4 show real world examples of attribute map- pings from biology. The representation in 3 uses the SBGN standard [70] and has been produced with SBGN-ED [21], an extension of the Vanted framework [83].

    In contrast, it is difficult to find a good layout for a node-link diagram: if we chose the location of each vertex and the route for each edge badly, then the resulting diagram is tangled and hard to read. In Section 2 below, we examine the geometric properties of “good” node-link diagrams. Then we describe methods for constructing good layouts of node-link diagrams. In particular we describe two important approaches: the topology- shape-metrics approach (Section 3), and the energy-based approach (Section 4).

    2

  • Fig. 3. A part of a biological pathway drawn using the SBGN notation. Attributes are mapped to graphical attributes. The network is a part of the visual representation describing the develop- ment of diabetic retinopathy, a condition which leads to visual impairment if left untreated.

    2 Readability and faithfulness

    We now consider the properties of “good” drawings of graphs. We concentrate on geo- metric properties, in particular the location of each vertex and the route for each edge. There are two aspects of the quality of a graph drawing: readability and faithfulness.

    Readability concerns the quality of the perception function P in Fig. 1: how well does the human understand the picture? Two further drawings of the graph in Fig. 2(a) are in Fig. 5. Intuitively, these two drawings are less readable than that in Fig. 2(b).

    Geometric properties of readable drawings of a graph are commonly called “aes- thetic criteria”. Discussions of aesthetic criteria began in the 1970s. For example, Sugiyama (see [88]) and Tamassia et al. [91] produced structured lists of aesthetic criteria; a sam- ple is below.

    C1 The number of edge crossings is minimized. C2 The total length of edges is minimized. C3 The ratio of length to breadth of the drawing is balanced. C4 The number of edge bends is minimized (using straight lines where possible). C5 Minimization of the area occupied by the drawing.

    All these aesthetic criteria were based on intuition and introspection rather than any sci- entific evidence. Later, Purchase et al. [80] began the scientific investigation of aesthetic criteria, based on HCI-style human experiments. She measured the time to complete tasks such as tracing a shortest path in a graph drawing, and errors made in such tasks. These variables were correlated with aesthetic criteria such as those above. Purchase

    3

  • Fig. 4. A combination of three metabolic pathways with vertex attributes represented by charts and colors mapped onto the vertex representations. Bar charts represent the amount of a metabo- lite for four different plant lines.

    William

    Keith

    George

    Brian

    Morgan

    John

    Riley Lee Michael

    Paul

    (a) (b)

    William

    Keith

    George

    Brian

    Morgan

    John

    Riley

    Lee

    Michael

    Paul

    Fig. 5. Poor quality drawings of the graph in Fig. 2(a).

    found significant evidence that both time and errors increase with the number of edge crossings and with the number of edge bends, and less significant evidence for other

    4

  • aesthetic criteria. Further experiments [79, 101, 57] confirmed, refined and extended Purchase’s original work.

    Faithfulness [75] concerns the quality of the drawing function D in Fig. 1. The draw- ing D(G) of a graph G is faithful if it uniquely represents the graph G. In other words, D is faithful it has an inverse; that is, if the graph G can be recovered uniquely from the drawing D(G). This concept may seem strange at first, because it may seem that all graph drawings are faithful. However, the concept is significant for very large graphs. As a simple example, consider the graph in Fig. 6. This drawing uses a technique re- cently called edge bundling [54] (originally called edge concentration [74]) to cope with the large number of edges. While this drawing may be readable, it is not faithful: it does not uniquely represent a graph (because there are many graphs that could have this drawing).

    While readability has a long history of investigation, faithfulness has only arisen since the advent of very large data sets, and it is currently not well understood. One faithfulness criterion that has been proposed [75] is based on the intuition that in a faithful graph drawing, the distance between u and v in the graph G should be reflected by the geometric distance between the positions D(u) and D(v) of u and v in the draw- ing. To make this notion more precise, suppose that �G(u, v) is the distance between u and v in G (for example, �G(u, v) could be the length of a graph-theoretic shortest path between u and v). For a drawing function D that maps vertices of a graph G = (V,E) to points in R2, we define

    �(D(G)) = ⌃u,v2V (�G(u, v)��R2(D(u), D(v))2 (1)

    where �R2 is a distance function in R2 (for example, Euclidean distance). In other words, � is the sum of squared errors between distances in the graph G and distances in the drawing D(G). In this way, � measures the faithfulness of the drawings insofar as distances are concerned. In the 1950s, Torgerson [94] employed a similar criterion when he proposed the Multidimensional Scaling method for psychometrics, a projection technique that al