getting started in social network analysis with netdraw · 70-year pedigree), and because data...

Getting Started in Social Network Analysis with

NETDRAW

Bruce Cronin

University of Greenwich Business School

Occasional Paper 01/15

January 2015

2

Author contact: [email protected]

University of Greenwich, a charity and company limited by guarantee, registered in England (reg no.

986729). Registered Office: Old Royal Naval College, Park Row, Greenwich SE10 9LS.

3

Getting Started in Social Network Analysis with NETDRAW

Bruce Cronin

NETDRAW is one of the most accessible and undoubtedly the most widely used

software for social network visualisation. Bundled with UCINET, the most widely

used social network analysis software (Borgatti, Everett and Freeman 2002), its

popularity arises from its embrace of the Microsoft Windows environment and the

regular inclusion of new analytical techniques in the field as they are developed.

While UCINET provides extensive tools for comprehensive network analysis,

NETDRAW has powerful analytical capacity in its own right and is freely available.

For exploratory network analysis, NETDRAW is a perfectly appropriate tool on its

own.

Despite its popularity, however, the software is also very frustrating at times for new

users. It has very limited documentation of its features, assuming a grasp of

mathematically challenging concepts as its frame of reference, has some

idiosyncrasies that appear insurmountable without experience and its outputs are

sometimes difficult to interpret.

This paper provides a brief step-by-step guide to allow a new user to undertake a

simple social network analysis yet yield fairly sophisticated results. It commences

with a brief introduction to social networks, moves to issues in data collection,

preparation for analysis and data import. Simple visualisation is followed by

extraction of the main component and the calculation of node centrality metrics.

Instructions on command sequence given in this guide assume the default options are

unchanged. Defaults normally only need to be changed when there is a clear reason to

do so.

The placement and availability of commands varies slightly from one version of

NETDRAW to the next and it is frequently updated. The descriptions in this guide are

drawn from Version 2.139.

4

1. Introduction to Social Networks

There has been a long-standing distinction in organisational studies between the

formal, mandated, organisation and the informal, spontaneous, organisation necessary

to get things done. Figures 1 and 2 contrast the formal organisational structure of a

firm with the informal structure of relationships seen by staff of the same firm to be

important to their work.

Figure 1. Formal Organisation

© Dr Bruce Cronin 2006

Formal Organisation

Barry

CEO

CorporateKim

John, David, Gwenda,

Hilary, Dave

Product 1Joan

Fred, Manuel,

Nick, Simon, Will, Tamison,

Dwain, Eric, Erica, Alfred,

Becky, George, Sue, Eve,

Ian, Elizabeth, Rob

Product 2Chris

Albert, James, Mike,

Ron, Peter, Paul,

Thomas, Tom, Martin,

Sam, Timothy, Carol,

Claire, Lucy, Susan,

Wendy, Angela, Ian,

Gary, Nicholas, Pat,

Maureen, Tony

Figure 2. Informal Organisation


Informal Organisation

With whom do

you discuss

issues

important to

your work?

The representation of the informal relationships in the organisation is the result of a

social network analysis of data collected from individuals. Social Network Analysis

(SNA) is a set of techniques for identifying and representing patterns of interaction

among social entities, be it individuals, groups, organisations or social artefacts. It

provides precise and specific insight in place of intuition and general hunches.

Social network analysis is predominantly employs graphical techniques, an

application of the mathematics of graph theory. The approach employs four principal

tools, the first two illustrated in Figure 2:

social entities are represented as points, each known as a ‘node’ or ‘vertex’;

relationships are represented by lines, known as ‘ties, ‘edges’ or ‘arcs’.

5

It is also possible to represent:

the strength of the relationship, for example, by line width;

attributes of the nodes, for example, by different colours or size.

Relationships are distinguished in social network analysis principally by their

directionality and value. Directional relationships, formally represented as ‘arcs’,

involve a transfer from one node to another; providing information or advice, for

example. Non-directional relationships, formally represented as ‘edges’, comprise a

sharing; members of the same organisation, for example. Relationships may be valued

in terms of frequency of contact or subjective evaluation, for example.

2. Data Collection

The most common method for data collection is by questionnaire. Members of a

group of interest are listed on a roster and then each member is asked to assess their

relationship with each of the other members on some dimension. Such questionnaires

may be administered by personal interview or by self-completed questionnaire, often

postal or, increasingly, web-based. Unlike traditional probability based approaches to

social research, however, network analysis is highly sensitive to the response rate, so

intrusive methods of data collection are favoured. There is a corresponding overhead

in terms of negotiating access to sufficient numbers of respondents and in terms of

ethical considerations.

An alternative to questionnaire-based approaches is observation. These include

ethnography, where a researcher joins a group and observes interactions; expert panel

studies, where highly sensitive participants such as business managers are asked to

systematically reflect on what are normally hunches in this regard; and experiment,

where some information is introduced to one part of a group and the group is watched

to see where the information subsequently emerges. A recent development in

observational methodology has been the emergence of ‘smart’ email scanning

techniques to identify areas of expertise within an organisation and internal or

external contacts members may have.

A high response rate is important because the research centres on particular

relationships among a series of social entities. Whereas in traditional quantitative

approaches, individual respondents are often representative of broad trends and so

relatively unimportant in themselves, in network research single relationships may

have particular importance that would greatly distort findings if missed. There are,

however, emerging statistical techniques to test for missing data, involving the

comparison of network data against what would be expected to be found randomly.

Because social network analysis is a reasonably novel research technique (despite its

70-year pedigree), and because data collection is often necessarily intrusive,

negotiating access to respondents is typically more involved than in traditional social

science research. In particular, comprehensive buy-in is needed from the principal

decision-makers. There have been numerous cases where data collection has been

disrupted by the late intervention by a senior decision-maker who was not sufficiently

briefed during the establishment of a project.

6

Social network analysis also raises a particular range of ethical challenges, as Borgatti

& Molina (2003) discuss. Firstly, unlike much of traditional social science research, it

is not possible to respondent anonymity because specific personal relationships are

the object of study; we are interested in the relationship between ‘Tom’ and ‘Joan’,

rather than the relationship between any ‘X’ and ‘Y’. Secondly, respondents may cite

relationships with third parties who may not wish to participate in the study. While

respondents are reporting their own perception, they are also reporting shared

information – the fact that a relationship exists. Thirdly, it is difficult to offer

confidentiality as participants may deduce the identity of individuals from a network

position, even when presented anonymously in a sociogram. Lastly, it is obtaining

informed consent is difficult because the technique is new and participants probably

will not fully anticipate the implications of the information they provide.

Having addressed these challenges, the resulting data comprises a list of nodes, which

of these are related and normally some attributes about the nodes and relations. Small

amounts of data can be stored in a matrix or spreadsheet table, as in Figure 3. But larger

amounts are more usefully stored as a series of related node pairs, or ‘edge list’, as in

Figure 4. Both forms can be conceived as a matrix of node-to-node relationships.

In the first row of Figure 3 and the first two lines of Figure 4, Barry is reporting a

relationship with Kim and with David. Note that John is reporting a relationship with

Kim, while Kim is not reporting a relationship with John, indicating a directional

relationship.

Figure 3. Adjacency Matrix


Data Collection – The Relational Matrix

01100Gwenda

10001David

00011John

00001Kim

01010Barry

GwendaDavidJohnKimBarry

Who

With whom

Figure 4. Edge List

Barry Kim

Barry David

Kim Barry

John Barry

John Kim

David Barry

David Gwenda

Gwenda John

Gwenda David

7

3. Importing Data

The most convenient way to store network data is as an edgelist, or list of node pairs,

in a spreadsheet. A spreadsheet can hold a great deal of data; the 32-bit version of

Excel 2013 can hold 1048576 rows in a single worksheet; the 64-bit version is limited

only by RAM capacity. For larger datasets, Access 2013 can hold an edgelist in a

table of 2GB, or 10GB in association with the SQL Server or Sharepoint extension.

An edgelist can be imported to NETDRAW as a .dl formatted file. This involves the

insertion of header information (as in the first 5 rows of Figure 7) then saving the file

as a plain (space delimited) text file (as in Figure 8). This creates a file with a .prn

extension; it is clearer to rename this in Internet Explorer with a .dl extension (by

default, Windows Explorer hides known extensions; this can be changed via the

Options menu in Windows Explorer).

Figure 7. Preparing Edge List in Spreadsheet for Export in DL File Format

Figure 8. Save as Formatted Text (Space delimited) (*.prn)

8

To open a .dl file in NETDRAW, click on the third icon, a plain open folder, as

illustrated in Figure 9. Then open a Windows browser using the icon to the right

of the dialogue box. Select the .dl file to open and click ‘OK’.

Figure 9. Opening a .DL File in Netdraw

4. Network Visualisation

On opening the file, NETDRAW automatically generates a visualisation of the data

using default options. This uses a standard algorithm to push the most connected

nodes to the centre of the screen and the least connected nodes to the periphery, as in

Figure 10.

Figure 10. Network visualisation



With whom do

you discuss

issues

important to

your work?

.

.

.

1

2

3

4

9

The visualisation can be saved for further editing as a .vna file, the native format of

NETDRAW, using the command:

File\Save Data as

The image itself can be saved as an image file for import into Word or other

applications. Available formats are jpeg, bitmap or Windows metafile, the latter more

versatile for rescaling. The image can be created using the command:

File\Save Diagram as

5. Extracting the Main Component

Extensive datasets typically involve many small independent clusters around larger

ones and one particularly large and dense cluster, as in Figure 11. Each of these

separate, internally connected clusters is known as a ‘component’. The largest cluster

is the ‘main’ or ‘giant’ component.

Figure 11. Multiple Components


Components

A maximal connected sub-graph

• all points on a path but no path outside the sub-graph

Cronin & Popov (2005)

To identify the distinct components, use the command:

Analysis/Components

To select a single component, in the command panel on the right of the screen, select

the Nodes tab and then ‘Components’ from the drop-down list, as illustrated in Figure

12. Individual or multiple components can be selected by checking the checkboxes.

To select the main component, select ‘Component Size’ from the drop-down list and

click on the largest number. Checking the -999 box will selects isolated nodes.

Once a component is selected, this subset of nodes can be saved as a separate network

file, using the command.

File/Save As

10

Figure 12. Selecting Components

6. Centrality of Nodes

A network visualisation invites the question of which are the most important nodes in

the network. Standard algorithms locate the most central nodes in a network in the

centre of the visualisation. But the mathematics of graph theory provides the means to

answer such questions more precisely.

Figure 13. Network Visualisation Example



With whom do

you discuss

issues

important to

your work?

Attributing ‘importance’ to the position of a node in a social network implicitly

employs a theory of social interaction, normally a variant of the theory of social

capital. Coleman (1990) and Putnam (1993, 2001), for example, argue that the social

resources available to an individual, such as information and status, are strongly

11

determined by the extent to which that individual is at the centre of a cohesive

network of relationships. In Figure 13, nodes such as Nick, John, Sam and Chris

would appear to be central in terms of cohesiveness. Social network analysis provides

three principal ways of measuring the extent to which a node is at the centre of a

cohesive network: degree centrality, eigenvector centrality and closeness.

Another view of social capital, however, challenges the notion that individuals at the

centre of cohesive groups have the greatest access to valuable resources. Burt (1993)

argues that the most important position in a network is one that bridges otherwise less

connected parts of the network, a position of brokerage. In Figure 13, Nick, John and

Sam are still important in these terms but Manuel also enters the picture. Again, social

network analysis provides a means of measuring the position of nodes in these terms:

betweenness centrality.

NETDRAW provides a simple command to calculate these metrics for each node.

Note. These measures are only meaningful when carried out on a single component

(typically the main component); it makes little sense to talk about the centrality of a

node if the nodes are drawn from unconnected components.

To calculate the centrality of each node on a range of measures, use the command:

Analysis/Centrality measures

A dialogue box (See Figure 14) invites you to select which centrality measures to

calculate but in reality all are calculated. Leave the defaults unchanged unless you

have a particular reason to do so.

Figure 14. Analyse / Centrality Measures

The calculated metrics are stored against each node as ‘Attributes’. To view the

metrics for a particular node, right click on the node and select attributes (See Figure

15). The attributes of the node, including the metrics, are displayed as in Figure 16.

12

Figure 15. Menu Displayed from Right-click on a Node

Figure 16. Node Attributes

6.1 Degree Centrality

The most intuitive measure of centrality is the number of direct connections a node

has to other adjacent nodes. The number of connections is known as the ‘degree’ of a

node. In Figure 16, the node Lucy has a degree centrality of 2, that is, Lucy is

connected to 2 other nodes (Sue and Manuel). The most central node in a network in

these terms is the node with the highest degree, that is, the greatest number of

connections. This is an indicator of ‘popularity’. In Figure 17, the size of each node

varies by its degree centrality. This can be visualised using the command:

Properties/Nodes/Symbols/Size/Attribute-based (Set the Select Attribute field to

‘Degree’)

13

Figure 17. Node Size by Degree Centrality

6.2 Eigenvector Centrality

Degree centrality is a limited interpretation of social importance. Consider John and

Sam in Figure 17. They have potential social influence or power because they are

closely related to nodes with high degree centrality (Nick and Chris). A method of

identifying such nodes is Eigenvector Centrality (Bonacich 1972), which weights

degree centrality by the degree centrality of the nodes a node is connected to (see

Figure 18).

Figure 18. Node Size by Eigenvector Centrality

Rob

Elizabeth

Barry

Kim

David

Joan

Chris

John

NickSam

Gwenda

Eve

Manuel

Thomas

Lucy

Gary

Eric

Sue

Will

Dwain

Erica

Alfred

BeckyGeorge

Ian

Dave

Paul

Maureen

James

Ron

Susan

Timothy

Tom

Carol

Tony

Rob

Elizabeth

Barry

Kim

David

Joan

Chris

John

NickSam

Gwenda

Eve

Manuel

Thomas

Lucy

Gary

Eric

Sue

Will

Dwain

Erica

Alfred

Becky

George

Ian

Dave

Paul

Maureen

James

Ron

Susan

Timothy

Tom

Carol

Tony

14

6.3 Closeness Centrality

An alternative measure of centrality in a cohesive group to degree or eigenvector

centrality is closeness, which does take into account the position of all other nodes in

the network. Closeness can be best conceived in terms of its reciprocal, farness, which

is the sum of the shortest path distance, that is, the number of steps, from one node to

each other. Figure 19, extracted from Figure 13, presents the farness for each node.

Tom has the greatest farness. From the reciprocal, Susan has the greatest closeness.

Figure 19. Farness / Closeness

Tom 1+ 2+ 2 = 5

Susan 1+1+1 = 3

Gary 1+1+2 = 4

Ron 1+1+2 = 4

The closeness centrality of each node is also captured by the Analysis/Centrality

measures command in NETDRAW. Note that a programme error shows Farness

rather than centrality. To show Closeness, check the ‘Reverse values’ option in the

dialogue box when selecting the attribute to set node size by.

Figure 20.Node Size by Closeness Centrality

Susan

Tom

Gary

Ron

Rob

Elizabeth

Barry

Kim

David

Joan

Chris

John

NickSam

Gwenda

Eve

Manuel

Thomas

Lucy

Gary

Eric

Sue

Will

Dwain

Erica

Alfred

Becky

George

Ian

Dave

Paul

Maureen

James

Ron

Susan

Timothy

Tom

Carol

Tony

15

6.4 Betweenness Centrality

In contrast to the measures of centrality within a cohesive group discussed above,

betweenness centrality identifies nodes that bridge less connected parts of the

network. The betweenness of a node is the proportion of times a node appears on the

shortest paths between each other pair of nodes. In Figure 21, Nick has the highest

betweenness.

Figure 21. Node Size by Betweenness Centrality

7. Network Cohesiveness

While individual nodes in a network may have positions of particular influence, the

network structure as a whole also provides particular opportunities and limitations for

all nodes. Some parts of the network are more connected than others and these regions

may be more influential than others or in other respects perhaps more constrained

than others.

7.1 k-core Analysis

k-core analysis identifies parts of a network that are more connected than others. k

refers to the number of immediate connections a node has. A k-core of 1 refers to all

nodes that have a degree of 1 or more, ie. all nodes in the network. A k-core of 2

refers to the subset of all nodes that have a degree of 2 or more, etc. Figure 22

presents three k-cores. Checking and unchecking the checkboxes in the control panel

on the right allows the identification of different concentrations of connectivity in the

network.

Rob

Elizabeth

Barry

Kim

David

Joan

Chris

John

NickSam

Gwenda

Eve

Manuel

Thomas

Lucy

Gary

Eric

Sue

Will

Dwain

Erica

Alfred

Becky

George

Ian

Dave

Paul

Maureen

James

Ron

Susan

Timothy

Tom

Carol

Tony

16

Figure 22. k-core analysis

7.2 Sub-group Analysis

Network structure can also be analysed in terms of nodes that are more closely related

to each other than other nodes, ie. as clusters or communities. A popular algorithm for

the demarcation of community structures is the Girvan-Newman (2002) algorithm.

Figure 23 illustrates the application of this algorithm to distinguish 2 distinctive

groups. Accept the defaults unless there is a specific reason not to.

Figure 23. Sub-Group Detection using the Girvan-Newman Algorithm

17

8. Enhancing Visualisations

Network visualisation aims to provide a meaningful visual representation of a

network dataset. How this is represented depends on the particular algorithm used and

the output media. The visualisation will differ with different screen sizes and

resolutions, for example.

NETDRAW has a wide range of editing tools to allow the user to configure the

visualisation to better convey the meaning of the data. Any node can be moved to any

position by clicking on it and dragging it to another location. The size, colour, shape

and labelling of any individual node or groups of nodes can be adjusted. Nodes and

links can even be added or deleted. To adjust the properties of nodes, click on each

node of interest and use the command:

Properties\Nodes

Figure 24. Changing the Colour of a Selected Node

In Figure 24, the selected node 3 is highlighted. A new colour for the node can be

applied with the command:

Properties\Nodes\Colour\General

To deselect a node or group of nodes, click anywhere on the main drawing area.

Where no single or group of nodes is selected, the Properties command will make

changes to all nodes. Where attributes have been associated with nodes, changes can

be made to nodes according to their attribute.

Of course, it is the ethical responsibility of researchers to ensure that this editing is

only undertaken for the purpose of improving the communication of the meaning of

the data. The best way to ensure this is to begin with standard algorithms to generate

the network visualisation and then to minimise editing strictly to the goal of clarifying

meaning. For example, it may help to separate two nodes that are drawn in the same

location so that both are visible, or to move a node slightly so that a label becomes

visible.

18

8.1 NETDRAW Filters

NETDRAW has some tools to allow the selection or isolation of nodes on the basis of

general properties as below.

Iso deletes isolated nodes from the visualisation

Pen deletes pendants (nodes with a degree of 1) from the visualisation

Self deletes self-loops from the visualisation

MC displays only the main component

Ego displays the ego net control box.

^Del deletes non-active nodes and lines

~Del restores all deleted nodes

8.2 NETDRAW Graphic Controls

The first icon opens the attribute node colour control, allowing different colours to be

associated with different node attributes.

The second icon opens the attribute node shape control, allowing different shapes to

be associated with different node attributes.

The third icon opens the Label editor, allowing the label of each node to be edited.

The numeric box reports the label size in points, which can be adjusted with the

adjacent up and down arrows. The adjacent up and down arrows change the size of

the nodes,

L toggles labels off or on

toggles arrowheads on or off

1.4 toggles line width on or off

The last icon allows the insertion of an additional node by clicking on the

visualisation.

19

8.3 The NETDRAW Control Panel

Relations Tab

Where the relations have been transformed within

NETDRAW, each visualisation is listed in the Relations box

on the Rels tab.

Clicking on the Dn or Up button displays each visualisation

in turn. It is thus possible to cycle through a series of related

networks for comparative purposes. The All button selects

all visualisations and Cl deselects these.

The colour and width of all displayed ties between nodes can

be adjusted with the colour drop-down box beneath the Dn

button and the Size input box to the right of this.

20

8.4 The NETDRAW Egonet Control

The Egonet control is activated by the Ego icon. This displays a control window

listing all nodes in the network. Clicking on the step button in this control window

displays each node in turn, together with all nodes adjacent to this. The distance input

box allows the setting of the path distance (number of steps) to include in the egonet.

21

8.5. Adding Attributes to a Visualisation in NETDRAW

A common application of a network visualisation is to identify different nodes within

a network on the basis of some attributes. For example, male and female employees or

different work-groups may be distinguished.

Attribute information is imported to an existing visualisation, that is, a network must

already be visualised before the attribute information is added. With the network

visualisation open, the attribute information is added simply by opening a file with

this information in it.

The attribute file has as many rows as the network has nodes and at least two

columns, the second being the attribute associated with the node. Multiple attributes

can be included as multiple columns. Note. Attribute information must be numeric.

For small datasets, attribute data can be pasted into the Node Attribute Editor via the

command:

Transform/Node attribute editor

For larger datasets, attributes can be imported via a dl file. This needs to be in the

edgelist2 format. Any number of attributes can be included for each node but each

attribute needs to be listed on a separate line.

dl

nr = 3, nc = 2

format = edgelist2

labels embedded

data:

FirmAmarket 1

FirmAregion 0

FirmB market 1

FirmBregion 0

FirmCmarket 1

FirmCregion 2

In this example, there are three nodes (nr = 3) and two attributes (nc=2). The labels

for the nodes and attributes are drawn from the data. Each row lists the node, then the

attribute, then the value of the attribute.

22

References

Bonacich P. 1972. Factoring and weighting approaches to status scores and clique

identification. Journal of Mathematical Sociology 2, 113-120.

Borgatti, S. P. 2002. NetDraw: Graph visualization software. Harvard: Analytic

Technologies.

Borgatti, S. P., Everett, M.G. and Freeman, L.C. 2002. Ucinet for Windows: software

for social network analysis. Harvard, MA: Analytic Technologies.

Borgatti, S.P. and Molina, J-L. 2003. Ethical and strategic issues in organizational

network analysis. Journal of Applied Behavioral Science 39(3): 337-350.

Burt, R.S. 1992. Structural holes: The social structure of competition. Cambridge:

Harvard University Press.

Coleman, J. S. 1990. Foundations of social theory. Cambridge, MA, Belknap Press.

Cronin, B. and V. Popov, V. 2005. Director networks and UK corporate performance.

International Journal of Knowledge, Culture and Change Management 4, 1195-

1205.

Girvan M. and Newman M. E. J., 2002. Community structure in social and biological

networks. Proceedings of the National Academy of Sciences 99, 7821–7826.

Gould, J. and Fernandez, J. 1989. Structures of mediation: A formal approach to

brokerage in transaction networks. Sociological Methodology :89-126.

Krackhardt, D. 1999. The ties that torture: Simmelian tie analysis in organizations.

Research in the Sociology of Organizations 16, 183-210.

Putnam, R. D. 1993. Making democracy work: Civic traditions in modern Italy.

Princeton, NJ, Princeton University Press.

Putnam, R. D. 2001. Bowling alone: the collapse and revival of American community.

London, Simon & Schuster.

Seidman, S. 1983. Network structures and minimum degree. Social Networks 5(2),

269-287.

getting started in social network analysis with netdraw · 70-year pedigree), and because data...

Documents