network-based exploration and visualisation of …...network-based exploration and visualisation of...
TRANSCRIPT
Network-based exploration and visualisation of
ecological data
1
2
3
4
5
6
7
8
9
10
Ben Raymond and Graham Hosie
Australian Antarctic Division
Channel Highway, Kingston 7050 AUSTRALIA
Corresponding author:
Dr Ben Raymond
Tel: +61 (3) 6232 3336
Fax: +61 (3) 6283 2336
11
12
13
14
15
16
Email: [email protected]
Correspondence address:
Dr Ben Raymond
Australian Antarctic Division
Channel Highway, Kingston 7050 AUSTRALIA
1
Abstract 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Networks – structured graphs consisting of sets of nodes connected by edges –
provide a rich framework for data visualisation and exploratory analyses. Although
rarely used for the visualisation of ecological data, networks are well suited to this
purpose, including data that one might not normally think of as a network. We present
a simple method for transforming a data matrix into network format, and show how
this can be used as the basis for interactive exploratory analyses of ecological data.
The method is demonstrated using a database of marine zooplankton samples acquired
in the Southern Ocean. The network analyses revealed zooplankton community
structures that are in good agreement with previously published results. Variations in
community structure were observed to be related to the temporal and spatial pattern of
sampling, as well as to physical environmental factors such as sea ice cover. The
analyses also revealed a number of errors in the data, including taxon identification
errors and instrument failures.
The method allows the analyst to generate networks from different combinations of
variables in the data set, and to examine the effects of varying parameters such as the
scales of spatial, temporal, and taxonomic aggregation. This flexibility allows the
analyst to rapidly gain a number of perspectives on the data and provides a powerful
mechanism for exploration.
Keywords:
exploratory analyses; data visualisation; networks; zooplankton; Southern Ocean;
community structure
2
Introduction 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Exploratory analyses and data visualisation are often used during the early phases of
scientific investigations. Such analyses recognise and explore patterns and structures
in data, and enable investigators to form hypotheses and conceptual models for further
investigation and experimentation.
Networks provide a rich framework for data visualisation and exploration. The term
“network” is used here to denote a structured graph consisting of a set of nodes (also
termed “vertices”) connected by edges. Each node in a network represents an entity
or concept of interest — for ecological data, the entities of interest are commonly
individual species or sample sites, although any choice of entity could potentially be
used. Relationships between the entities of interest are indicated by edges between
nodes in the network. A network is most commonly diagramatically represented using
circles or other shapes for the nodes, with lines between nodes showing the edges.
The edges can be weighted to indicate the strengths of the relationships, and can also
be directed, indicating that the relationships have an inherent direction (e.g. predation
or temporal succession). Networks are well suited to the analysis of many types of
ecological data. Complex structures of inter-related elements are pervasive in natural
systems (Aloy and Russell, 2004; Green et al., 2005; Proulx et al., 2005) and network-
based methods can provide an intuitive framework for understanding those systems.
The importance of considering the overall ecosystem context when investigating
elements of that ecosystem has long been recognised in the ecological sciences, but
has been given recent re-emphasis (e.g. Jordán and Scheuring, 2004).
3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Network-based methods have a long history as a general analytical framework in the
ecological sciences. The most common application has been to food web studies
(Pimm, 1982; Berlow et al., 2004), including non-trophic interactions (Paine, 1984;
Memmott, 1999; Brose et al., 2005). Networks with directed edges (termed “plexus
diagrams”) have long been used as a basis for investigating relationships amongst
species and between species and their environment (Whittaker and Warren Fairbanks,
1958; McIntosh, 1973; Gillison, 1978; Matthews, 1978; Dale, 2000) but their use for
this purpose is relatively uncommon, and classification and ordination techniques are
generally favoured. Networks in which edges represent quantified flows (often of
energy or matter) have also long been studied under the monikers of “ecological
network analysis” and “network environ analysis” (Ulanowicz, 1986; Christensen and
Pauly, 1992; Fath and Patten, 1999). Overviews of this field have been given by Fath
(2004) and Fath and Patten (1999). Networks have also been used to investigate
landscape connectivity (Fahrig and Merriam, 1985; Urban and Keitt, 2001;
Starzomski and Srivastava, 2007).
Network applications in ecology have received a recent surge of popularity (Green et
al., 2005; Proulx et al., 2005), riding on the wave of interest in “networks science” –
research that has examined network structures and processes across a number of
different disciplines (see e.g. Watts and Strogatz, 1998; Albert and Barabási, 2002;
Newman, 2003). The recent resurgence of interest in network theory in the ecological
sciences has been driven at least in part by the wider interest in networks science, but
probably more so by the recognition that network-based methods can facilitate the
analytical integration of an overall ecosystem with the dynamics of its individual
elements (Jordán and Scheuring, 2004). Networks offer insights into system-level
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
properties that arise from the structure of the network, and which are not evident from
the properties of the entities alone. Topology can inform how networks function and
respond to change (Berlow, 1999; Jordán, 2001; Dambacher et al., 2003; Proulx et al.,
2005), and these ideas have been applied to examples such as the propagation of
disease (Newman, 2002; Shirley and Rushton, 2005; Jeger et al., 2007), dispersal of
seeds (Lázaro et al., 2005), contaminant effects (Rohr et al., 2006), and the selection
of habitat for conservation (Rhodes et al., 2006).
Despite their application to these more formal types of analyses, networks are rarely
used as tools for the visualisation and exploration of ecological data. Network-based
approaches can be used for visualisation and exploration of a variety of data – not just
datasets that one might traditionally think of in terms of networks. Network-based
methods can provide insights that complement those obtained from more conventional
exploratory methods.
Networks are commonly represented graphically as a connected structure in two- or
three-dimensional space. The nodes can be positioned according to a variety of layout
algorithms (see e.g. Herman et al., 2000). Such algorithms take diverse approaches to
the problem, and a review is beyond the scope of this paper. In general, however, their
aim is to find a geometric arrangement of the nodes and edges of the network that best
conveys the network structure to the user. Ecologists familiar with multidimensional
scaling (MDS) might expect that a network would be laid out so that the geometric
distance between a pair of nodes matches the corresponding pairwise dissimilarity, as
is the case in MDS. It is true that a number of graph layout algorithms are based on
concepts very similar to MDS (e.g. Kamada and Kawai, 1989). However, network
5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
layouts are generally not constrained in this manner, but tend to be more concerned
with visual criteria such as minimising the overlap of nodes and crossing of edges.
Regardless of the details of the layout, the visual appearance of a network can be
altered in a number of ways, including the size, shape, and colour of the nodes and
edges, and the colours of the background. Large networks can be simplified by
merging nodes or edges. Changes can be interactive, for example simplifying those
parts of the network outside of the user’s immediate region of interest. The
visualisation can also include dynamic elements, such as popups that display
information about nodes or edges. Exploring data in a dynamic, interactive manner is
a powerful mechanism for gaining a conceptual understanding of the structures and
patterns present in the data. Interactivity can be particularly useful with very large
networks, which can be overwhelming in their visual complexity and therefore
difficult to comprehend.
A given set of entities of interest can be represented as a network simply by
connecting with edges those entities that are in some way related, or interact. Some
data may have notions of connectivity that are self-evident (e.g. trophic or
genealogical data), or have well-established methods for defining relationships
between entities. An example of the latter is species observation data: methods for
calculating dissimilarities between species form the foundation for much of modern
numerical ecology. Creating a network from such data can be done from the species
dissimilarity matrix (Dale, 2000). In the general case, however, a given data set might
not have a natural notion of connectivity. In the next section, we describe an
algorithm that can create networks from a wide range of data.
6
Methods 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Scientific data sets commonly take the form of observations of a set of variables,
structured in the form of a matrix in which rows correspond to observations and
columns to variables. A simple algorithm to create a network structure from such data
is:
1. Decide on the entities of interest to the analysis. Each individual entity will be
represented by a single node in the network. Choose the variable in the data set (i.e.
column of the data matrix) that best delineates those entities. The nodes in the
network are then formed from this variable, with one node for distinct value of the
variable.
2. Choose a variable that defines the relationships of interest between those
entities defined in step 1. Any pair of nodes that share a common value for this
variable are then connected by an edge.
The implementation of this algorithm is straightforward. The rows of the matrix are
first sorted by the edge variable, and then a single pass over the rows of the matrix is
required, connecting by edges all nodes that have an equal edge variable value. For
matrices with n rows and m distinct values of the edge variable, the average time-
complexity is O(n + (n/m)2). In the worst case of a fully-connected network, m=1 and
the complexity becomes O(n2). The complexity of the initial sorting step is neglected
here as it will be less than that of the remainder of the algorithm with an efficient
sorting algorithm (e.g. Williams, 1964).
7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
The basic algorithm can be extended in a number of useful ways. Multiple node
variables can be defined, in which case a node would be formed for each distinct
combination of those variables. Alternatively, the multiple node variables could be
treated as alternate node definitions, allowing more than one type of entity (type of
node) in the one network. Similarly, multiple edge variables could either denote a
compound edge condition (i.e. an edge is only formed between nodes that share
values of all variables) or a set of alternative conditions (an edge is formed between a
pair of nodes that share a value for any one of the edge variables). The edges can also
be weighted. Some possible weighting schemes are:
wij=2Nij/(Ni + Nj) (1)
wij=Nij/(Ni + Nj - Nij) (2)
wij=1-(Ni + Nj - 2Nij)/K (3)
where wij is the weight of the edge between nodes i and j; Ni is the number of times
the ith value of the node variable was observed; Nij is the number of times that the ith
and jth nodes were linked by a common value of the edge variable; and K is the
number of rows in the matrix. The example weighting schemes above correspond to
some well-known ecological functions: (1) is the Sørensen index (equivalent to the
Bray-Curtis similarity applied to binary data); (2) is the Jaccard similarity coefficient;
and (3) is 1-DG, the Gower dissimilarity metric.
Each node and edge is naturally associated with a set of rows in the data matrix. For
example, any given node is associated with a specific value of the node variable
chosen in step 1 of the algorithm. That node is therefore associated with all rows in
8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
the data matrix for which the node variable takes that value. Each node and edge can
be assigned “attributes” – data drawn from the relevant rows of the data matrix.
Attribute data are not used to form the network itself, but can be used to help interpret
network structures and patterns. Consider a matrix of species observation data, in
which each row holds the time, location, and taxon observed, as well as physical
environmental data relating to that time and location. A network might be created
from this data in which the nodes represent observation locations. The attributes of
those nodes would be the dates, taxa, and physical environmental data associated with
each location.
Given a network G which uses variables v1 and v2 as node and edge variables, it is
straightforward to transform this into its “alternate” network H, which swaps the
variables and uses v2 for nodes and v1 for edges. Edges that connect to the same node
in G correspond to nodes that are connected by edges in H. The transformation can be
done by visiting each edge in G, creating a node in H for each unique value of the
edge variable. During that process, a list is created for each node in G, recording all of
the edge values in G associated with that node. Each list then provides a list of nodes
in H that are to be connected by edges. For example, consider a network in which
nodes represent observation locations and each edge indicates a pair of locations that
have at least one taxon in common (i.e. a network of sites, linked by taxa). The
alternate of this network would be one in which nodes represent individual taxa, and
each edge indicates a pair of taxa that have been observed at the same location (a
network of taxa, linked by sites). There is a clear analogy between this example and
the choice in community ecology studies to analyse a data matrix by its rows (sample
sites) or columns (taxa).
9
Results 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
We illustrate our approach with data from the Southern Ocean continuous plankton
recorder (CPR) survey (Hosie et al., 2003). The CPR survey is an international
collaborative project with seven participating nations at the time of writing. The
survey uses a towed device to take contiguous transect samples of the zooplankton
present in the surface waters of the Southern Ocean.
[Insert Figure 1 about here]
We used CPR data collected aboard the RV Tangaroa during the 43rd Japanese
Antarctic Research Expedition, which occurred between 7 February and 3 March
2002. A sequence of tows were completed along the 140°E meridian using a CPR
towed 100m behind the vessel. The resulting track ran from approximately 47°S to
66°S and back again (Figure 1). Each tow was divided into segments with nominal
length 5 nautical miles; these segments represent the sampling units of the survey.
The taxa present in each segment were identified to species level wherever possible;
ostracods, and gelatinous plankton (hydromedusae, ctenophores, siphonophores)
were not identified to species level. Appendicularians were identified as either
Oikopleura sp. or Fritillaria sp. Physical environmental data (sea surface temperature
(SST) and salinity (SSS), photosynthetically active radiation (PAR), and fluorometry)
were recorded during the CPR tows at one-minute sample intervals. The mean value
of each of these variables was calculated for each segment. Further details of the data
acquisition are given by Hunt and Hosie (2005). For each segment we also calculated
the number of days since the sea ice cover had melted, using remotely-sensed passive
microwave estimates of sea ice cover (Cavalieri et al., 1996, updated 2006). Only
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
samples collected at night were included in the analyses, to avoid the effects of the
diurnal vertical migration of many Southern Ocean zooplankton taxa (Hunt and
Hosie, 2003). Night was defined as PAR < 100 μmol s-1 m-2.
The data was collated into a matrix in which each row corresponded to an observation
of a single taxon, and included the taxon identifier number and taxon abundance,
sample site identifier (the tow and segment number), latitude, longitude, time and
date, and values of the physical environmental data of the observation. We used the
algorithm described in the previous section to generate various network structures,
implemented in the Matlab package (Mathworks, MA, 2008). The networks generated
using this algorithm were then passed to the GUESS package (Adar, 2005) for
visualisation and exploration. GUESS provides a number of common graph layout,
clustering, and other algorithms, and a Python-based interactive interface through
which the user can dynamically alter the graph, allowing rapid exploration and testing
of ideas.
We developed extensions to GUESS for the analyses described here, including a
graphical user interface for easily altering the node and edge colours and sizes, a filter
for removing edges based on their weights (or other properties), and a facility for
transforming a network into its alternate form. Our code and an interactive version of
the examples shown here are available at http://data.aad.gov.au/graphvis/.. 21
22
23
24
25
The geometric layout of each network shown here was calculated using the GEM
algorithm (Frick et al., 1995) implemented in GUESS. Briefly, this algorithm treats
the nodes as mutually repulsive particles, and edges as springs that attempt to draw
11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
the nodes together. A simulated annealing algorithm with various heuristics is used to
find the placement of nodes that results in an equilibrium between the forces in this
spring/particle system.
[Insert Figure 2 about here]
We first examine patterns in species composition of the sample sites. Figure 2 shows a
community network in which the nodes represent tow segments (sample sites). The
taxon compositions of the segments were used to form the edges in the network: two
segments were linked if the same taxon occurred in both. Edges were weighted using
equation 1 (Bray-Curtis), so that segments with more similar taxon compositions are
more strongly linked. Note that the similarities between tow segments do not
incorporate taxon abundance information, only presence or absence on a particular
tow segment. Weak edges in Figure 2 (those with weight less than 0.75) have been
removed for visual clarity. The pruning of weak edges resulted in a small number (23)
of nodes that became disconnected from the main component of the network, and
which are not shown in Figure 2. These disconnected nodes were all from northerly
latitudes close to Tasmania. The attributes of the nodes have been used to provide
additional visual cues: the nodes have been coloured according to the latitude of the
tow segment. White represents the most northerly latitudes just south of Tasmania,
and black represents high latitudes near Antarctica. The node size shows the number
of taxa, with larger nodes representing segments with higher species richness.
Five clusters (A-E) were defined by visual inspection of the network. The node
colours suggest that the clustering is related to latitude. The high-latitude segments
12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
appear to be divided into two or three clusters (A and B; top left of Figure 2). The tow
segments in cluster A were acquired at a mean latitude of 63.3°S (range 64.5°S –
62°S), and for cluster B, 64.5°S (65.5°S – 63.3°S). There are two clusters at
intermediate latitudes (C and D; latitudes 52.6°S, range 60.8°S – 47.9°S; and 56.6°S,
range 59.1°S – 54.0°S), and a cluster E of segments from the more northerly latitudes
(50.1°S, range 52.5°S – 47.1°S). The temporal progression of the ship track is
overlaid with a dashed line on Figure 2, starting and ending in cluster E.
The segments in the cluster C have smaller node size, indicating lower numbers of
taxa on these segments. This low species richness (uncharacteristic for a mid-latitude
community) was puzzling and prompted us to re-examine the data. We discovered
that the PAR data were erroneous (all zeros) for the first two days of the voyage.
Cluster C in fact comprises daytime samples; the low species richness is a result of the
downward migration of many zooplankton taxa during daylight hours (Hunt and
Hosie, 2003).
The patterns of other attribute variables can be explored by altering node and edge
characteristics such as colour and shape. If we change the node colours to represent
the date of sampling (not shown), we see that the temperate segments (E; bottom-right
of Figure 2) were acquired at the extremes of the sample period – i.e. as the ship was
both leaving and returning to Tasmania (this can also be seen in the dashed line
representing the ship track on Figure 2). That this cluster comprises a mixture of
sample dates suggests that the species community remained relatively stable over the
timespan of the voyage. Cluster A also shows a similar bimodal distribution of sample
dates (the ship completed the southward leg of the transect on the 11th of February and
13
commenced the northward leg on the 27th). The two intermediate-latitude clusters (C
and D), while relatively similar in latitude, are distinct in sample dates. One (C)
comprises segments on the southward leg of the voyage, the other (D), the return leg.
The high-latitude clusters (A and B) also show some separation by sample date
(ranging from the 11th to the 27th of February), suggesting that either the regional
species composition changed over that time, or the ship’s movement caused the
sampling to traverse local variations in ecosystem structure.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Figure 3 shows the network with node colours changed to show the number of days
since the sea ice cover melted. The tow segments in cluster B and about half of those
in A were taken in waters previously covered by sea ice. Sea ice is known to have a
role in structuring ecosystems in Antarctic waters (Eicken, 1992; Lizotte, 2001).
While the species compositions of segments in cluster A are all relatively similar (and
thus the cluster is relatively tight), there are variations in sea ice cover and sample
date that probably give rise to subtle variations in composition within the cluster.
[Insert Figure 3 about here]
[Insert Figure 4 about here]
The “clusters” referred to in the results above were determined by visual inspection,
but it is possible to determine clusters more formally. Figure 4 shows the result of
applying a clustering algorithm (Newman, 2004) to the network of Figure 2. The sizes
of the nodes in Figure 4b are proportional to the total taxon counts of the constituent
tow segments. The edge thicknesses are proportional to weighted fraction of edges
14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
between node clusters, so that clusters connected with a heavy edge are more closely
related than clusters connected with a thin edge. The clusters of Figure 4 correspond
quite closely with the visually-determined clusters in Figure 2, with the exception that
the high-latitude segments in cluster A have been broken into two clusters I and III in
Figure 4. This split of cluster A into I and III closely follows the pattern of sea
ice/open water segments shown in Figure 3. The high-latitude ice-zone segments in
cluster I appear to be more closely related to the high-latitude open-water segments
(cluster III) than to the other high-latitude ice-zone segments (cluster II).
The clustering of the tow segments suggests that the species compositions should be
further analysed. One approach could collate a list of segments in each cluster, and
then examine the taxa, as is often done with a conventional cluster analysis of sample
sites. However, recall that we have the notion of the “alternate” network at our
disposal. The alternate of a sites-by-species network is a species-by-sites network –
one in which the nodes represent taxa and the edges indicate which taxa have similar
distributions across sample sites. We can interactively transform our sites-by-species
network into a series of species-by-sites networks representing the species
compositions of some or all of the tow segments.
[Insert Figure 5 about here]
[Insert Figure 6 about here]
Figures 5 and 6 show a series of such species-by-sites networks. The network shown
in Figure 5 represents the full set of tow segments. This network shows a core
15
structure of commonly observed taxa (Oithona similis, Neogloboquadrina
pachyderma, Fritillaria sp., Ctenocalanus citer, small calanoid copepods, Limicina
sp., and Thysanoessa macrura), with a periphery of those less frequently observed.
The core taxa are more highly connected (have more edges connecting them to other
nodes) than the peripheral taxa. The node colour shows the mean latitude of the taxon.
The peripheral taxa are members of high- and low-latitude communities while the
core taxa have intermediate mean latitudes. The structure of this network is
ambiguous: the core nodes could represent generalist species that are widely
distributed and therefore both commonly observed as well as highly connected.
Alternatively, the core nodes could represent an extensively sampled (and so these
taxa appear to be common) intermediate community that overlaps with the warm- and
cold-water extreme communities (thus giving the high connectivity). This ambiguity
arises because the use of the complete set of tow segments has effectively created a
juxtaposition of the different communities found along the latitudinal transect. Figures
6a to 6e show the species-by-sites network of each of the clusters I–VI; Figure 6f
shows that of those disconnected nodes that were not shown in Figure 2. This
sequence of networks resolves the ambiguity: clearly, the central taxa in Figure 5 are
relatively common at all latitudes, and the remainder of the community composition
varies with latitude. The community compositions of the clusters are similar to those
described by Hunt and Hosie (2005; 2006a; 2006b) and are not described in detail
here.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
The structural characteristics of this network raise a number of questions and so
suggest potentially interesting avenues for further exploration. The clusters are not
completely disjoint, but have a few edges connecting them. Is there anything
16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
interesting about the edges that provide this connectivity between the different
communities? In Figure 2, the edges represent shared taxa, and so the edge attributes
could be used to examine which taxa are common to two adjacent communities (the
taxa that link the communities). Alternatively, we might ask which taxa are absent
from inter-cluster edges, as these would be taxa that are present in one community but
not its neighbour. This latter question can be answered using the node and edge
attributes, by counting the number of edges that do not include a particular taxon, and
normalising by the maximum number of edges in which this taxon could potentially
have been found (the number of times that taxon appeared in either or both of the end-
nodes of the inter-cluster edges). We divided the segments from clusters I – III
(Figure 4) into those that were acquired in previously ice-covered waters, and those
that were acquired in areas of open water, and then examined the patterns of taxon
absence on the edges between those two sets of segments (Table 1). The ice-
associated taxa include Antarctic krill Euphausia superba, the association of which
with sea ice is well known and has been the subject of a great deal of research (see
e.g. Nicol, 2006). Although the genus Oikopleura, most likely O. gaussica in the
Southern Ocean (Tokioka, 1961), has a wide circumpolar distribution,
appendicularians in general were more abundant in the open ocean zone on this
transect (Hunt and Hosie 2005).
[Insert Table 1 about here]
Other single-taxa visualisations offer useful exploratory insights. Figures 7a–d shows
the network of Figure 2, but with emphasis on those nodes and edges associated with
the taxa Salpa thompsoni, Pelagobia longicirrata, Themisto gaudichaudii, and
17
Metridia lucens. Salpa thompsoni (Figure 7a) can be seen to be associated principally
with clusters I – III (the high-latitude segments), but also with a few segments in
cluster VI (samples acquired in the vicinity of the subantarctic front). Salpa thompsoni
is known to be associated with waters to the north of the seasonal ice zone (Pakhomov
et al., 2002), so casting doubt on the veracity of the association with samples from
cluster VI. Salp specimens captured by the CPR are commonly damaged in the
process and can be difficult to identify. We suggest that the samples from cluster VI
are likely to be misidentified specimens, probably of Salpa fusiformis. Pelagobia
longicirrata (Figure 7b) is associated with distinct subsets of clusters II, IV, V and VI.
This is consistent with the known geographic distribution of this polychaete, which is
widespread in the Southern Ocean (Hopkins, 1985; 1987) but fairly uncommon in the
CPR records (a total of 464 specimens have been counted in 19,796 samples). The
locally clustered nature of the distribution of P. longicirrata across the network is
interesting, and suggests that the spatial distribution of this taxon might be similarly
patchy. Themisto gaudichaudii (Figure 7c) is predominantly associated with the
warmer-water clusters V and VI, but with a small number of records in the high-
latitude, ice-zone clusters I and II. Themisto gaudichaudii is notably very abundant in
the polar frontal and subantarctic zones (Bernard and Froneman, 2002; 2003; Donelly
et al., 2006). The distribution of T. gaudichaudii here shows very little overlap with
that of S. thompsoni (Figure 7a). This is in contrast to observations in the north
Atlantic, where juvenile T. gaudichaudii have been observed to be associated with the
salps Pegea bicaudata and Iasis zonaria (Madin and Harbison, 1977). Metridia lucens
(Figure 7d) exhibits a substantial degree of seasonality in its pattern. The species was
absent in most of the southward (outgoing) leg of the voyage (samples shown as
circles) except for a few sites in the sea ice zone and in the far north group E in Figure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
18
2. Metridia lucens was more common in the samples on the return leg north (squares)
taken approximately 2 weeks later. At this stage we can only speculate the cause of
the seasonality, but this does demonstrate the value of the methodology in probing the
data and finding unique patterns warranting further investigation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[Insert Figure 7 about here]
Discussion
The methods described here have strong parallels with other visualisation and
dimension-reduction techniques commonly used with ecological data. Generally, such
methods aim to reduce the dimensionality of high-dimensional data down to two or
three dimensions for graphical display, while preserving as much of the structure of
the data as possible. The application of such methods in ecology is commonly to
species abundance data (i.e. the observed abundances of species at a set of sample
sites), usually with accompanying environmental data relating to the sample sites. The
aim is to visualise the relationships between species assemblage patterns, and relate
these to environmental variables. Methods can be broadly divided into direct and
indirect gradient analysis methods. The former constructs the a visualisation on the
basis of the patterns in the species data alone, without reference to the environmental
data. Relating these patterns to the environmental variables is a subsequent step in the
analyses. Such methods include principal components analysis (Hotelling, 1933) and
multidimensional scaling (Kruskal, 1964a; b). Direct (also known as constrained)
methods — such as canonical correspondence analysis (Ter Braak, 1986) — differ in
that the visualisation is constrained to show only the variation in the species data that
can be explained by an a priori selected set of environmental variables. The network
19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
approach that we have described is an indirect method, since the network structure is
determined by a subset of all variables: those variables chosen as “edge” variables.
The remaining variables in the data set form the attributes of the nodes and edges, and
are used in subsequent exploratory phases of the analysis.
The network approach is perhaps most closely related to multidimensional scaling
(MDS) and its variants. MDS produces a low-dimensional representation of a
multidimensional data set such that the geometric proximity of any two points in the
low-dimensional representation conveys the degree of similarity between the two
associated data points in the original high-dimensional space. Points that are close
together on the MDS diagram are likely to be similar in terms of their associated data.
MDS uses an objective function (called the “stress”) that relates the geometric
configuration of the points in the diagram to the pairwise dissimilarities in high
dimensional space. High stress values indicate a poor overall match between the two.
Networks, in contrast, do not attempt to show inter-point similarity by geometric
proximity, but rather use edges to explicitly show the relationships between points
(nodes) in the network. There is no stress value associated with a network.
As mentioned previously, the positioning of the nodes in a network can be done
according to a variety of layout algorithms. It is often the case that related entities
(nodes) will tend to be placed near to each other in a layout by virtue of the edges that
connect them. However, in general, the geometric proximity of two nodes in a
network cannot be directly interpreted as an indicator of the degree of relatedness of
the associated data points. It is the edges that convey this information.
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
The network-based approach is superficially similar to the idea of drawing a
minimum spanning tree on an MDS plot or other ordination diagram, to assist with
interpretation (Gower and Ross, 1969; Gillison, 1978). A spanning tree is a graph,
with no closed loops, that connects all objects in the MDS plot. The cost of a given
spanning tree is the sum of its edge weights (i.e. the sum of the pairwise
dissimilarities represented by the tree), and so a minimum spanning tree (MST) is one
that has the minimum cost of all possible spanning trees for a given plot. Thus, an
MST will tend to connect closely-related points, and can be used to gain an
impression of local ordering of points along a gradient (Austin, 1976). An MST is a
special case of a network, showing only a specific, minimal subset of pairwise
dissimilarities. It can be used as a component of a visualisation algorithm, but not as a
visualisation technique in its own right.
The explicit visual representation of relationships by edges in a network can be useful
with data that are difficult to represent clearly in an MDS diagram — for example,
four or more points in a two-dimensional MDS plot that are equally dissimilar from
each other. While the stress value of an MDS plot gives an indication of the overall fit
to the data, it does not identify individual points or areas of the diagram that are
poorly represented. Assessing the fit of individual points in an MDS plot requires an
additional step, such as calculating the contribution of individual points to the overall
stress. In a network, however, relationships that are poorly represented by the overall
geometric configuration of the nodes can still be identified by the edges in the
network. The drawback is that a network with a large number of edges can be
cluttered and difficult to interpret.
21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
The edges in a network show direct relationships between entities, and so a network
explicitly depicts the local structure within the data, allowing the topology of the
system as a whole to emerge from these individual relationships. This global-from-
local approach has also been used in a number of visualisation algorithms (e.g.
Sammon, 1969; Demartines and Hérault, 1997; Roweis and Saul, 2000). It is also
used in various ecological analytical techniques. One of the long-standing difficulties
in numerical ecology is the robust estimation of large ecological distances (Faith et
al., 1987). The difficulty arises when two objects have little or nothing in common.
These distant relationships can be estimated as a function of intermediate local
relationships (e.g. De'ath, 1999), or given special consideration in other ways (e.g.
Belbin, 1991). Local methods are generally also more robust to outliers in the data.
Outliers tend to have relatively large dissimilarities to other data points and so can
have a disproportionately large effect on the configuration of an MDS. This is often
addressed by selective removal of the outliers or by data transformation. Local
methods, including networks, are more robust against outliers because only the
immediate structure around an outlier is likely to be affected. One of the particular
disadvantages of local approaches is that the data manifold must be sufficiently
densely sampled, in order that neighbouring points in data space really do represent
closely-related entities. Sparse data sets might thus be poorly suited to these
approaches. In the approach presented here, sparse data will tend to form
disconnected networks in which relatively dense subsets of the data form individual
sub-networks.
The explicit characterisation of relationships by edges in a network has potential
advantages for the visual representation of complex data, as noted above, but perhaps
22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
more interestingly, allows subsequent exploration and analyses to focus on the
relationships themselves. Whereas a typical MDS analysis might examine the
attributes of the objects in relation to their geometric configuration (e.g. how the
sample site environmental variables vary with respect to the MDS configuration), a
network additionally allows the attributes of the edges to be examined. The edges in
the network also define its structure: how the different elements of the system relate to
one another. Many graph-theoretic algorithms have been developed for the analysis of
network structures, including clustering, outlier detection, and network traversal.
Visual exploratory analyses are often complemented by the use of algorithms such as
clustering and outlier detection. Applying graph-theoretic versions of such algorithms
can complement the information obtained by applying similar, but non-network
algorithms, to the same data. For example, clustering algorithms attempt to partition a
set of entities into a relatively small number of groups in such a way that entities
within a given group are more similar to each other than they are to entities from other
groups. The groups can then be used as a more succinct representation of the set as a
whole. Measures of similarity are conventionally based on the properties of the
entities of interest – for example, a cluster analysis of sediment samples might be
based on their chemical properties. In a network context, clustering similarly attempts
to create a partitioning of the entities (nodes), but the partitioning is based on the
topology of the network rather than the properties of the individual entities. Clustering
can simplify a complex network, in the same way that it can reduce the cardinality of
large data sets, facilitating the description and understanding of the various
components of the network. Clustering can also assist with visualisation, by reducing
the visual complexity of very large networks (Newman and Girvan, 2004), and by
obtaining more relevant representations of systems for which conventional networks
23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
are insufficient (Estrada and Rodríguez-Velázquez, 2005). There is a diversity of
network clustering algorithms and the reader is referred to e.g. Newman (2003) for an
overview of the field. Some are analogous to clustering methods used with non-
networked data, such as mixture model clustering (Newman and Leicht, 2007), or
hierarchical algorithms that successively merge nodes according to various criteria
(Clauset et al., 2004; Newman, 2004). Others have roots in network theory, including
methods based on graph partitioning (Wu and Leahy, 1993; Shi and Malik, 2000) and
edge betweenness (Newman and Girvan, 2004). Many of the latter have no clear
analogy to non-network algorithms. It is not yet well understood how structural
measures of a network (like betweenness) might relate to ecologically relevant
measures (Proulx et al., 2005), and so it is not clear how the clusterings produced by
such algorithms relate to the processes of the underlying ecosystem. However,
because network-based clustering methods utilise topological characteristics, it seems
likely that network-based methods can provide insights to complement those obtained
from more conventional methods of analysis. We have discussed clustering
algorithms in some detail here, and note that other exploratory algorithms, such as
outlier detection (Shekhar et al., 2001; Noble and Cook, 2003; Rattigan and Jensen,
2005), can be carried out in a network context, and so might similarly offer
complementary insights to conventional analyses.
Recent applications of network-theoretic theory to ecosystem analyses suggest that
many natural networks share certain structural characteristics. For example, the node
degree distribution (the distribution of the number of edges per node) of many natural
networks follows an asymmetric distribution, such that there are many nodes with
only a few edges, and only a few nodes with many edges (Allesina and Bodini, 2005).
24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
The node degree distribution of the sites-by-species network of Figure 2 bears little
resemblance to this, with roughly a skewed normal distribution with a long left tail
(not shown). Is this a genuine difference in network topology, with implications for
the underlying ecology? The interpretation is complicated by the nature of the edges
in our network. In the majority of the well-studied types of ecological networks (e.g.
food webs), the edges represent tangible interactions between entities (in the case of a
food web, trophic interactions). The edges in networks of species by sites, or sites by
species, such as those shown above, are subtly different. These edges represent
species (or site) similarities. Two species with similar distributions across sites thus
might be linked by an edge, but this does not imply that these two species necessarily
directly interact. Edges in such networks should perhaps be interpreted as indicators
only of potential interactions. It is not clear how to interpret topological descriptors
such as node degree distribution that are based on potential – rather than actual –
interactions.
Network approaches offer several possibilities for data integration during exploratory
phases of analysis. Modern ecological studies increasingly use online databases for
data management and delivery, with OBIS (http://iobis.org/) and GBIF
(
18
http://www.gbif.org/) being early and widely-known examples. Our algorithm can be
applied directly to data delivered through these services, offering a potential means
for exploring and visualising a user’s own data in the context of other data available
through such databases. Other online resources provide analytical services rather than
data delivery – for example, the food web constructor (
19
20
21
22
http://spire.umbc.edu/fwc/),
and OBIS-SEAMAP habitat modelling services (Best et al., 2007). Many of these
analytical resources fall within the fields of knowledge representation and other
23
24
25
25
semantic technologies, and so provide information about relationships between
entities. This relationship information could be used to form edges in a network;
indeed, directed graphs are already commonly used to visualise the relationships
described by semantic-web RDF documents (e.g. IsaViz,
1
2
3
4
http://www.w3.org/2001/11/IsaViz/). The extraction of relationship information from
semantic sources is in development for biological data (Köhler et al., 2006). Networks
and MDS can visually distinguish different types of entity (e.g. by colour or shape).
By varying the visual characteristics of edges, a network can also distinguish different
types of relationship. Edges are sometimes drawn on MDS plots for this reason (e.g.
Starzomski and Srivastava, 2007). Networks can be used to integrate ecological with
other types of data such as economic (Janssen et al., 2006), not merely in visual
syntheses but in analytical integrations that allow the effects of human impacts on
ecosystems to be explored (Dambacher et al., 2003; Fath, 2004; Rooney et al., 2006;
Dambacher and Ramos-Jiliberto, 2007).
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Conclusions
The idea of using networks for data visualisation and exploratory analyses is not new,
but its practical application in the ecological sciences seems to be uncommon.
We speculate that this limited uptake might be due to both a general lack of
appreciation of the wide applicability of such methods, and a historical lack of
readily-available software for such analyses. The latter has been addressed by
developments in both general network-analysis software such as GUESS, and of
ecological-specific network analysis software (e.g. Allesina and Bondavalli, 2004;
Fath and Borrett, 2006). We hope that the work presented here might go some way
toward addressing the former. While some data sets or applications may have a
26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
natural definition of connectivity that provides an intuitive view of the system, the
algorithm presented here allows the nodes and edges of a network to be formulated
from an arbitrary matrix of data. The analyst can generate networks from different
combinations of variables, and examine the effects of varying parameters such as the
scales of spatial, temporal, and taxonomic aggregation. This flexibility allows the
analyst to rapidly gain a number of perspectives on the data and providing a powerful
mechanism for exploration. Using network approaches in exploratory phases of
analysis may reveal hitherto unsuspected structural properties and so prompt the
analyst to consider the use of network-based approaches in later, more formal phases
of analysis. The fundamental structures and processes of ecological and biological
networks are still being discovered, and network-based methods of visualisation and
exploration both facilitate these discoveries as well as providing insights that can
complement other methods.
Acknowledgements
This work received support and critical input from a number of people, including Lee
Belbin, Eric Woehler, Jonny Stark, and Victoria Wadley. Comments from John
Leathwick and an anonymous referee improved the manuscript considerably. We
would like to thank K. Takahashi, and the Master and crew of the RV Tangaroa for
collecting the CPR samples, and T. Odate for making available the underway data.
References
Adar, E., 2005. GUESS: The Graph Exploration System.
Albert, R. and Barabási, A.L., 2002. Statistical mechanics of complex networks.
Reviews of Modern Physics, 74:47-97.
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Allesina, S. and Bodini, A., 2005. Food web networks: scaling relation revisited.
Ecological Complexity, 2:323-338.
Allesina, S. and Bondavalli, C., 2004. WAND: an ecological network analysis user-
friendly tool. Environmental Modelling and Software, 19:337-340.
Aloy, P. and Russell, R.B., 2004. Taking the mystery out of biological networks.
European Molecular Biology Organization Reports, 5:349-350.
Austin, M., 1976. Performance of four ordination techniques assuming three different
non-linear species response models. Vegetatio, 33:43-49.
Belbin, L., 1991. Semi-strong hybrid scaling, a new ordination algorithm. Journal of
Vegetation Science, 2:491-496.
Berlow, E.L., 1999. Strong effects of weak interactions in ecological communities.
Nature, 398:330-334.
Berlow, E.L., Neutel, A.-M., Cohen, J.E., De Ruiter, P.C., Ebenman, B., Emmerson,
M., Fox, J.W., Jansen, V.A.A., Iwan Jones, J., Kokkoris, G.D., Logofet, D.O.,
McKane, A.J., Montoya, J.M. and Petchey, O., 2004. Interaction strengths in food
webs: issues and opportunities. Journal of Animal Ecology, 73:585-598.
Bernard, K.S. and Froneman, P.W., 2002. Mesozooplankton community structure in
the Southern Ocean upstream of the Prince Edward Islands. Polar Biology, 25:597-
604.
Bernard, K.S. and Froneman, P.W., 2003. Mesozooplankton community structure and
grazing impact in the Polar Frontal Zone of the south Indian Ocean during the austral
autumn 2002. Polar Biology, 26:268-275.
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Best, B.D., Halpin, P.N., Fujioka, E., Read, A.J., Qian, S.S., Hazen, L.J. and Schick,
R.S., 2007. Geospatial web services within a scientific workflow: predictiing marine
mammal habitats in a dynamic environment. Ecological Informatics, submitted.,
2:210-223.
Brose, U., Berlow, E.L. and Martinez, N.D., 2005. From food webs to ecological
networks: linking non-linear trophic interactions with nutrient competition. In: P.C. de
Ruiter, V. Wolters and J.C. Moore (Editor), Dynamic Food Webs: Multispecies
Assemblages, Ecosystem Development and Environmental Change. Academic Press,
pp. 27-36.
Cavalieri, D., Parkinson, C., Gloersen, P. and Zwally, H.J., 1996, updated 2006. Sea
ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data.
Boulder, Colorado USA: National Snow and Ice Data Center. Digital media.
Christensen, V. and Pauly, D., 1992. Ecopath II: a software for balancing steady-state
ecosystem models and calculating network characteristics. Ecological Modelling,
61:169-185.
Clauset, A., Newman, M.E.J. and Moore, C., 2004. Finding community structure in
very large networks. Physical Review E, 70.
Dale, M.B., 2000. On plexus representation of dissimilarities. Community Ecology,
1:43-56.
Dambacher, J.M., Luh, H.-K., Li, H.W. and Rossignol, P.A., 2003. Qualitative
Stability and Ambiguity in Model Ecosystems. The American Naturalist, 161:876-
888.
29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Dambacher, J.M. and Ramos-Jiliberto, R., 2007. Understanding and predicting effects
of modified interactions through a qualitative analysis of community structure.
Quarterly Review of Biology, in press.
De'ath, G., 1999. Extended dissimilarity: a method of robust estimation of ecological
distances from high beta-diversity data. Plant Ecology, 144:191-199.
Demartines, P. and Hérault, J., 1997. Curvilinear components analysis: a self-
organizing neural network for nonlinear mapping of data sets. IEEE Transactions on
Neural Networks, 8:148-154.
Donelly, J., Sutton, T.T. and Torres, J.J., 2006. Distribution and abundance of
micronekton and macrozooplankton in the NW Weddell Sea: relation to a spring ice-
edge bloom. Polar Biology, 29:280-293.
Eicken, H., 1992. The role of sea ice in structuring Antarctic ecosystems. Polar
Biology, 12:3-13.
Estrada, E. and Rodríguez-Velázquez, J.A., 2005. Complex Networks as
Hypergraphs. eprint arXiv:physics/0505137.
Fahrig, L. and Merriam, G., 1985. Habitat patch connectivity and population survival.
Ecology, 66:1762-1768.
Faith, D.P., Minchin, P.R. and Belbin, L., 1987. Compositional dissimilarity as a
robust measure of ecological distance. Vegetatio, 69:57-68.
Fath, B.D., 2004. Network analysis in perspective: comments on "WAND: an
ecological network analysis user-friendly tool". Environmental Modelling &
Software, 19:341-343.
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Fath, B.D. and Borrett, S.R., 2006. A MATLAB function for network environ
analysis. Environmental Modelling & Software, 21:375-405.
Fath, B.D. and Patten, B.C., 1999. Review of the foundations of network environ
analysis. Ecosystems, 2:167-179.
Frick, A., Ludwig, A. and Mehldau, H., 1995. A Fast Adaptive Layout Algorithm for
Undirected Graphs. Lecture Notes in Computer Science, Proceedings of the Graph
Drawing Conference 1994, 894:388-403.
Gillison, A.N., 1978. Minimum spanning ordination — a graphic-analytical technique
for three-dimensional ordination display. Austral Ecology, 3:233-238.
Gower, J.C. and Ross, G.J.S., 1969. Minimum spanning trees and single linkage
cluster analysis. Applied Statistics, 18:54-64.
Green, J.L., Hastings, A., Arzberger, P., Ayala, F.J. and Cottingham, K.L.e.a., 2005.
Complexity in ecology and conservation: mathematical, statistical, and computational
challenges. BioScience, 55:501-510.
Herman, I., Melançon, G. and Marshall, S.M., 2000. Graph Visualization and
Navigation in Information Visualization: a Survey. IEEE Transactions on
Visualization and Computer Graphics, 6:24-44.
Hopkins, T.L., 1985. Food web of an Antarctic midwater ecosystem. Marine Biology,
89:197-212.
Hopkins, T.L., 1987. Midwater food web in McMurdo Sound, Ross Sea, Antarctica.
Marine Biology, 96:93-106.
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Hosie, G.W., Fukuchi, M. and Kawaguchi, S., 2003. Development of the southern
ocean continuous plankton recorder survey. Progress in Oceanography, 58:263-283.
Hotelling, H., 1933. Analysis of a complex of statistical variables into principal
components. Journal of Educational Psychology, 24:417-441,498-520.
Hunt, B.P.V. and Hosie, G.W., 2003. The continuous plankton recorder in the
Southern Ocean: a comparative analysis of zooplankton communities sampled by the
CPR and vertical net hauls along 140°E. Journal of Plankton Research, 25:1561-1579.
Hunt, B.P.V. and Hosie, G.W., 2005. Zonal structure of zooplankton communities in
the Southern Ocean south of Australia: results from a 2150km continuous plankton
recorder transect. Deep-Sea Research I, 52:1241-1271.
Hunt, B.P.V. and Hosie, G.W., 2006a. The seasonal succession of zooplankton in the
Southern Ocean south of Australia, part I: The seasonal ice zone. Deep-Sea Research
I, 53:1182-1202.
Hunt, B.P.V. and Hosie, G.W., 2006b. The seasonal succession of zooplankton in the
Southern Ocean south of Australia, part II: The Sub-Antarctic to Polar Frontal Zones.
Deep-Sea Research I, 53:1203-1223.
Janssen, M.A., Bodin, Ö., Anderies, J.M., Elmqvist, T., Ernstson, H., McAllister,
R.R.J., Olsson, P. and Ryan, P., 2006. A network perspective on the resilience of
social-ecological systems. Ecology and Society, 11:15.
Jeger, M.J., Pautasso, M., Holdenrieder, O. and Shaw, M.W., 2007. Modelling disease
spread and control in networks: implications for plant sciences. New Phytologist,
174:279-297.
32
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Jordán, F., 2001. Strong threads and weak chains? - a graph theoretical estimation of
the power of indirect effects. Community Ecology, 2:17-20.
Jordán, F. and Scheuring, I., 2004. Network ecology: topological constraints on
ecosystem dynamics. Physics of Life Reviews 1, 1:139-172.
Kamada, T. and Kawai, S., 1989. An algorithm for drawing general undirected
graphs. Information Processing Letters, 31:7-15.
Köhler, J., Philippi, S., Specht, M. and Rüeg, A., 2006. Ontology based text indexing
and querying for the semantic web. Knowledge-Based Systems, 19:744-754.
Kruskal, J.B., 1964a. Multidimensional scaling by optimizing goodness of fit to a
nonmetric hypothesis. Psychometrika, 29:1-27.
Kruskal, J.B., 1964b. Nonmetric multidimensional scaling: a numerical method.
Psychometrika, 29:115-129.
Lázaro, A., Mark, S. and Olesen, J.M., 2005. Bird-made fruit orchards in northern
Europe: nestedness and network properties. Oikos, 110:321-329.
Lizotte, M.P., 2001. The contributions of sea ice algae to Antarctic marine primary
production. American Zoologist, 41:57-73.
Madin, L.P. and Harbison, G.R., 1977. The associations of Amphipoda
Hyperiidea with gelatinous zooplankton. I. Associations
with Salpidae. Deep Sea Research, 24:449-463.
Matthews, J.A., 1978. An Application of Non-Metric Multidimensional Scaling to the
Construction of an Improved Species Plexus. Journal of Ecology, 66:157-173.
33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
McIntosh, R.P., 1973. Matrix and plexus techniques. In: R.H. Whittaker (Editor),
Ordination and Classification of Communities. Junk, The Hague, pp. 159-191.
Memmott, J., 1999. The structure of a plant-pollinator food web. Ecological Letters,
2:276-280.
Newman, M.E.J., 2002. The spread of epidemic disease on networks. Phys. Rev. E,
66:016128.
Newman, M.E.J., 2003. The structure and function of complex networks. Scientific
American.
Newman, M.E.J., 2004. Fast algorithm for detecting community structure in networks.
Physical Review E, 69.
Newman, M.E.J. and Girvan, M., 2004. Finding and evaluating community structure
in networks. Phys. Rev. E, 69:026113.
Newman, M.E.J. and Leicht, E.A., 2007. Mixture models and exploratory data
analysis in networks. Proc Nat Acad Sci USA, in press.
Nicol, S., 2006. Krill, currents, and sea ice: Euphausia superba and its changing
environment. BioScience, 56:111-120.
Noble, C.C. and Cook, D.J., 2003. Graph-based anomaly detection. In: L. Getoor,
T.E. Senator, P. Domingos and C. Faloutsos (Editor). ACM, Washington, DC, USA,
pp. 631-636.
Orsi, A., Whitworth, T., III and Nowlin, W.D., Jr, 1995. On the meridional extent and
fronts of the Antarctic Circumpolar Current. Deep-Sea Research, 42:641-673.
34
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Paine, R.T., 1984. Ecological determinism in the competition for space. Ecology,
65:1339-1348.
Pakhomov, E.A., Froneman, P.W. and Perissinotto, R., 2002. Salp/krill interactions in
the Southern Ocean: spatial segregation and implications for the carbon flux. Deep-
Sea Research I, 49:1881-1907.
Pimm, S.L., 1982. Food Webs. Chapman and Hall, London.
Proulx, S.R., Promislow, D.E.L. and Phillips, P.C., 2005. Network thinking in
ecology and evolution. TRENDS in Ecology and Evolution, 20:345-353.
Rattigan, M.J. and Jensen, D., 2005. The Case for Anomalous Link Detection. In: S.
Džeroski and H. Blockeel (Editor), Chicago.
Rhodes, M., Wardell-Johnson, G.W., Rhodes, M.P. and Raymond, B., 2006. Applying
network theory to the conservation of habitat trees in urban environments: a case
study from Brisbane, Australia. Conservation Biology, 20:861-870.
Rohr, J.R., Kerby, J.L. and Sih, A., 2006. Community ecology as a framework for
predicting contaminant effects. TRENDS in Ecology and Evolution, 21:606-613.
Rooney, N., McCann, K., Gellner, G. and Moore, J.C., 2006. Structural asymmetry
and the stability of diverse food webs. Nature, 442:265-269.
Roweis, S.T. and Saul, L.K., 2000. Nonlinear dimensionality reduction by locally
linear embedding. Science, 290:2323-2326.
Sammon, J.W., 1969. A nonlinear mapping for data structure analysis. IEEE
Transactions on Computers, 18:401-409.
35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Shekhar, S., Lu, C.T. and Zhang, P., 2001. Detecting graph-based spatial outliers:
algorithms and applications (a summary of results). In: F. Provost and R. Srikant
(Editor), Proceedings of the Seventh ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pp. 371-376.
Shi, J. and Malik, J., 2000. Normalized cuts and image segmentation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22:888-905.
Shirley, M.D.F. and Rushton, S.P., 2005. The impacts of network topology on disease
spread. Ecological Complexity, 2:287-299.
Starzomski, B.M. and Srivastava, D.S., 2007. Landscape geometry determines
community response to disturbance. Oikos, 116:690-699.
Ter Braak, C.J.F., 1986. Canonical Correspondence Analysis: a new eigenvector
technique for multivariate direct gradient analysis. Ecology, 67:1167-1179.
Tokioka, T., 1961. Appendicularians of the Japanese Antarctic Research Expedition.
Bulletin of Marine Biological Station of Asamushi, 5:241-245.
Ulanowicz, R.E., 1986. Growth and development: ecosystem phenomenology.
Springer-Verlag, New York, New York, USA.
Urban, D. and Keitt, T., 2001. Landscape connectivity: a graph-theoretic perspective.
Ecology, 82:1205-1218.
Watts, D.J. and Strogatz, S.H., 1998. Collective dynamics of 'small-world' networks.
Nature, 393:440-442.
36
1
2
3
4
5
6
7
8
9
Whittaker, R.H. and Warren Fairbanks, C., 1958. A Study of Plankton Copepod
Communities in the Columbia Basin, Southeastern Washington. Ecology, 39:46-65.
Williams, J.W.J., 1964. Algorithm 232 - Heapsort. Communications of the ACM,
7:347-348.
Wu, Z. and Leahy, R., 1993. An optimal graph theoretic approach to data clustering:
theory and application to image segmentation. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 15:1101-1113.
37
Figure and table captions 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Figure 1. The continuous plankton recorder transect (solid line) in relation to land
masses, oceanic fronts (as defined by Orsi et al., 1995), and sea ice extent. The
outward and return legs of the voyage overlap. STF=subtropical front;
SAF=subantarctic front; PF=polar front; SACCF=southern Antarctic circumpolar
current front; SB=southern boundary of the Antarctic circumpolar current; Ice=mean
maximum October sea ice extent.
Figure 2. A network of the continuous plankton recorder data, in which nodes
represent tow segments (sample sites) and the edges indicate sites with common
species. The dashed line indicates the temporal progression of the ship track. The end
of the southward leg is indicated by the white star. The colours of the nodes indicate
the latitude of the segment (see scale). The labels A–E provide references to features
that are discussed in the text.
Figure 3. The network of Figure 2, with node colours changed to reflect the number of
days since sea ice melt. White nodes indicate segments taken in open ocean (no sea
ice present over the preceding winter).
Figure 4. (a) The network of Figure 2, after clustering. Segments within the same
cluster are shown with the same colour and node shape; (b) a schematic representation
of the same network, in which the nodes within each cluster have been merged.
38
39
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Figure 5. Plankton community network, in which nodes represent taxa and edges
indicate taxa with similar spatio-temporal distributions. The network has been
constructed from the full set of tow segments.
Figure 6. Plankton community networks. Each is similar to that shown in Figure 5,
but generated from a subset of tow segments. The networks (a)-(f) correspond to
clusters I-VI in Figure 4; network (g) corresponds to disconnected nodes not visible in
Figure 2. The layout of the nodes is the same as in Figure 4. The taxa in the centre of
the network shown in Figure 5 are common to all clusters, and the remainder of the
community composition varies across the clusters.
Figure 7. The network of Figure 2, showing with dark grey those nodes and edges
specifically associated with the taxa (a) Salpa thompsoni, (b) Pelagobia longicirrata,
(c) Themisto gaudichaudii, and (d) Metridia lucens.
Table 1. Taxa from the segments of clusters I – III of Figure 4. These taxa were
typically absent from edges that paired an ice-zone segment with an open-water
segment, suggesting that these taxa might distinguish the ice-zone community from
the open-water community in this region.
1
Taxa in ice-zone
segments but
not open water
segments
Fraction of
edges from
which taxon
was absent
Total
count
Taxa in open
water segments
but not ice-zone
segments
Fraction of
edges from
which taxon
was absent
Total
count
Pelagobia
longicirrata
1.0 11 Oikopleura sp. 0.90 32
Copepod indet. 0.90 36
Euphausia
superba
0.73 33
Rhincalanus
gigas nauplius
0.64 52
2
3
4
Table 1
1