manynets: an interface for multiple network analysis and …ben/papers/freire2010manynets.pdf ·...

ManyNets An Interface forMultiple Network Analysis and Visualization

12Manuel Freire 2Catherine Plaisant 2Ben Shneiderman 2Jen Golbeck1Universidad Autonoma de Madrid 2University of Maryland

28049 Madrid Spain College Park MD 20742manuelfreireuames plaisantbengolbeckcsumdedu

ABSTRACTTraditional network analysis tools support analysts in study-ing a single network ManyNets offers these analysts a pow-erful new approach that enables them to work on multiplenetworks simultaneously Several thousand networks canbe presented as rows in a tabular visualization and then in-spected sorted and filtered according to their attributes Thenetworks to be displayed can be obtained by subdivisionof larger networks Examples of meaningful subdivisionsused by analysts include ego networks community extrac-tion and time-based slices Cell visualizations and interac-tive column overviews allow analysts to assess the distribu-tion of attributes within particular sets of networks Detailssuch as traditional node-link diagrams are available on de-mand We describe a case study analyzing a social networkgeared towards film recommendations by means of decom-position A small usability study provides feedback on theuse of the interface on a set of tasks issued from the casestudy

Author KeywordsNetwork analysis Exploratory analysis Table interface In-teraction Information Visualization Graphical User Inter-face

ACM Classification KeywordsH52 Information Interfaces and Presentation Miscellaneous

General TermsTheory Human Factors Experimentation

INTRODUCTIONThe field of Social Network Analysis (SNA) has recentlygained visibility as social networking sites have increasedin relevance numbers and participants Sites with millionsof users are now commonplace Analyzing these networks

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page To copy otherwise orrepublish to post on servers or to redistribute to lists requires prior specificpermission andor a feeCHI 2010 April 1015 2010 Atlanta Georgia USACopyright 2010 ACM 978-1-60558-929-91004$1000

helps our understanding of how people interact and commu-nicate Additionally insights into how users interact withinthese systems are important to diagnose and improve themand to develop better systems in the future

Exploratory analysis of large networks such as those foundin SNA often starts with visual overviews Whole-networkoverviews such as those generated with node-link diagramsor matrix representations are hard to interpret especially inthe case of large dense networks Comparing these overviewsto gain insights regarding a set of networks is even more dif-ficult

The need to visualize multiple networks at once arises natu-rally in many situations For instance analysts may wish tocompare social networks for county managers in 3140 UScounties to gain insights into the strength of their collabora-tions Subdivision of larger networks is also an importantsource of multiple networks for instance [24] describesthe subdivision of an evolving social network by temporalslices which can then be examined to locate temporal pat-terns or regions and periods change In biological networksthe distribution of ldquomotifsrdquo (small patterns of connectivity)has been suggested as an indicator of their functional signifi-cance [23] More generally analysts can subdivide networksinto closely-knit communities or clusters and in Social Net-work Analysis the use of ego-networks to look at individualneighborhoods is well-established Indeed going one stepfurther analysts may need to compare groups of networksThey may want to discover if the ego-networks of one socialnetwork are similar to those of a different social networkand whether they exhibit similar temporal characteristics

The main contribution of this paper is our approach to net-work analysis it is the first approach that attempts to vi-sualize many networks (up to several thousands) at onceWe represent these groups of networks in a table whereeach row represents a single network generally a part of alarger network that has previously been split Configurablecolumns contain user-defined statistics Typical columns in-clude link count degree distribution or clustering coeffi-cient In this sense each row of this interface representsthe fingerprint of a network the combination of its cell val-ues for each of the currently-displayed attribute columnsThe use of a table allows easy comparisons between rows

CHI 2010 Visualization April 10ndash15 2010 Atlanta GA USA

213

Figure 1 ManyNets displaying a time-sliced cell-phone call network Each row represents the network of calls in a 5-hour period with a 50overlap with the previous row (10 rows cover an entire day) Rows with more than 40 connected components have been selected by dragging on theldquoComponent countrdquo column summary histogram and are highlighted with a light-green background in the table Each column summary highlightsin bright green those values that correspond to currently-selected rows In the left-most column summary equally-spaced highlights reveal a temporalpattern This synthetic dataset is from the VAST 2008 Challenge [16] and contains 400 nodes and 9834 links

and columns and can benefit from focus+context techniquessuch as those found in TableLens [30] Our table visualiza-tion is tightly integrated with a node-link diagram networkvisualization tool SocialAction [28] allowing on-demandnetwork inspection A secondary contribution is in the gen-eral field of table-based interfaces in the form of ldquocolumnsummariesrdquo These are special cells placed on top of thecolumn headers that provide an abstract of the contents oftheir column The summaries also support direct user inter-action and reflect application state by highlighting valuesthat correspond to currently-selected rows

The next section presents our approach and describes the in-terface in general terms using a temporal network datasetfrom the VAST 2008 Challenge [16] as a guiding exam-ple After a section on related work we describe an in-depth case-study using data from the FilmTrust social net-work [14] Finally we present the results of usability studywith 7 participants

DESIGN OF MANYNETSWe follow the Visual Information Seeking Mantra [33] tointroduce the ManyNets approach to network visualizationThe first step of this mantra calls for Overview first Analystscan access two distinct types of overviews First rows them-

selves act as overviews of the networks that they representScalar values such as vertex (node) count and edge (link)count are represented with horizontal bars When there isa distribution of values within each network such as in thenode degree column the distribution is shown as a histogramin the corresponding cell For example in Figure 1 the his-tograms that represent the distributions of phone call dura-tions can be seen to be roughly bell-shaped On top of eachcolumn name column summaries provide the second type ofoverview Each column summary summarizes the contentsof an entire column In this sense the set of column sum-maries is an overview of all the networks at once Withinsummaries we use miniature histograms to represent dis-tributions of values for both scalar values and distributionsFor example in Figure 1 the column summary histogramfor call duration shows a much smoother bell curve that con-siders all the distribution values in each network In the samefigure a histogram also shows the distribution of scalar val-ues like Edge Count over all networks

Analysts can zoom and filter (the second step of the Mantra)collections of networks in several ways Columns can beresized by direct manipulation and filtered by hiding themusing the ldquocolumn controlrdquo (displayed as a small button ontop of the scrollbar) Sorting can be seen as a special typeof filtering All columns even those that contain distribu-


214

Figure 2 Details on demand for a degree distribution histogram In allvisualizations in ManyNets hovering the pointer over a value displaysa tooltip

tions can be sorted on and additional ldquomultiple-columnrdquosorting orders can be used In a semantical sense it is possi-ble to ldquozoomrdquo into a given row or set of rows by slicing thenetwork further revealing finer-grained details In the caseof the dataset of Figure 1 it would be interesting to subdi-vide the network further into node neighborhoods (ldquoego net-worksrdquo) This would allow us to focus on the neighborhoodof node 200 (a caller that is present in many of the slicesbut not directly visible in the table) described in the originaldataset as an important lead (see [16]) Analysts can filterout uninteresting networks by selecting rows to be removedor retained Selections can be specified by clicking on thecorresponding rows through interaction with column sum-maries or by specifying a custom filter We expect analyststo be proficient in the use of spreadsheets the expressionsthat are used in column filters are similar in complexity tothose used in spreadsheet formulas In Figure 1 the ex-pression Column[rsquoComponent countrsquo] gt 40 results in thecurrently-displayed selection

Finally details on demand are available by ldquomousingrdquo (hov-ering the mouse pointer for a few seconds without clicking)over any part of the interface this will display a small tool-tip describing the value or values under the pointer Largermore detailed views of cell contents or column summariescan be obtained by right-clicking on them this will displaya pop-up window with a larger view and controls that al-low manipulation of view settings (Figure 2) Left-clickingany cell selects the corresponding row and displays its con-tents in a detail panel The same detail panel keeps track ofthe currently-selected rows Last but not least ManyNets istightly integrated with SocialAction and any row or set ofrows can be opened in SocialAction for further inspectionas a node-link diagram SocialActionrsquos interface providesits own facilities for visualizing network overviews zoom-ing and filtering and providing details on demand (see [28]for details) For instance SocialAction can use colors tohighlight node or links based on attribute values suppliedby ManyNets and can perform interactive filtering based onthese same attributes

Figure 3 Selecting statistics Choices correspond to columns displayedin Figure 1

Selecting and Adding ColumnsWe now review selected ManyNets features in greater detailBasic network statistics such as link and node counts linkdensity and component counts are calculated and displayedby default More computationally expensive statistics (egnetwork diameter) are only added if the user explicitly re-quests them using the dialog shown in Figure 3 Some ofthese statistics return distributions instead of scalar valuesdistributions are displayed as miniature histograms embed-ded in the main table

There can be four sources for columns Topology-basedstatistics are those that can be extracted by traversing the net-work without any additional domain knowledge Domain-dependent link attributes and node attributes define two ad-ditional sources In Figure 3 the Tower attribute refers to theparticular cell-phone tower from which the call was madeSince these attributes are bound to links or nodes they nat-urally result in distributions that will be represented as his-tograms The final source for statistics is the user columnscan be defined by entering expressions in Python For in-stance the edge-vertex ratio (ratio of links to nodes) can beadded using Column[rsquoEdge countrsquo] Column[rsquoVertex

countrsquo] This results in a column indistinguishable from thedefault ldquoEdge-vertexrdquo ratio column (see Figure 4) Differenttypes of references to columns can be inserted by using suit-able keywords Among others VarColumn[rsquocolrsquo] insertsthe variance of a cell in a column that contains distributionsand MaxColumn[rsquocolrsquo] inserts the maximum A similar in-terface can be used to specify Python expressions for filtersselections and sorting-column definitions

Column SummariesThe column summaries visible on top of each column ofFigure 1 provide an overview of the values of their columnsFor instance in Figure 1 the distribution of phone call dura-


215

Figure 4 Adding a calculated column using Python Abundant exam-ples are available to users lowering the effort of writing these expres-sions Column references are substituted by their values before theyare evaluated They can be inserted using the drop-down combo boxesor typed in directly

tions can be seen to follow a bell-shaped distribution with aminimum of -145 seconds (these ldquoerrorsrdquo were present in thedataset itself) and a maximum of 2171 Column summariesdisplay the minimum and maximum column values undera histogram representing the distribution of values withinthe column avoiding the need to query the summary (viatool-tips) or sort on the column This is especially usefulwhen the number of rows is large Histograms are used torepresent column overviews in the case of columns that al-ready contain histograms in their cells histograms of his-tograms are generated by aggregating the corresponding dis-tributions

Summaries also provide context-dependent information whenrows are selected selecting a set of rows highlights in allcolumn summaries the contribution that networks representedby these rows makes towards the overall distribution of val-ues In Figure 1 all slices with more than 40 connected com-ponents have been selected (green background) A tempo-ral pattern can be observed in the column summary of sliceIDs the highlighted values (also green) are spaced regularlythroughout the summary histogram (which uses the same or-der as the ID column) Not only do row selections affect col-umn summaries ndash it is also possible to select rows by inter-acting directly with a corresponding summary When drag-ging (clicking and holding while moving) the mouse pointeracross any of these column summary histograms all rowsthat have values in the corresponding value range will be-come selected This can be used to quickly test for possibledependencies between statistics and was the method used toselect rows in Figure 1

Ranking ColumnsSorting by a column is an expected feature in table-basedinterfaces When the number of networks (and therefore ofrows) is large visually scanning for values is cumbersomeand provides no guarantees of not having missed the tar-gets Sorting does not have these drawbacks However itis unclear how best to sort on a column that contains dis-tributions We use the distribution average by default butit is possible to use different sorting criteria by adding ad-

Figure 5 Using a multiple-criteria sorting column (right) In this caserows have been sorted by node count (red bar) and link density (greenbar) The status bar (bottom of figure) displays the steps used to obtainthe current set of networks

ditional columns to the table To sort rows by the maxi-mum degree the user would add a computed column us-ing the expression max(Column[rsquoDegreersquo]) and sort thisrecently-added column instead of the original To sort onthe difference between the maximum and the minimum de-gree the expression would be max(Column[rsquoDegreersquo]) -

min(Column[rsquoDegreersquo])

ManyNets also allows users to sort by several attributes atonce (columns or user-defined expressions) For instancewe may wish to find networks that are both large and denseSorting first by size and then by density will not work whenboth attributes are not correlated We address this concernby allowing users to add ldquomulti-criteria sorting columnsrdquoUsers can create these columns in a similar way to customcolumns instead of being asked for one expression to cal-culate they have the option to specify several expressionsThe resulting column displays the normalized ldquoscorerdquo foreach expression in a stacked bar (see Figure 5) The use ofthis additional column-type presents several advantages overinserting a conventional Python expression column There isvisual feedback on the degree to which each criterion has in-fluenced the total sorting score and scores are normalizedso that users do not need to take into account the differentvalue ranges for each criterion Additionally it is possibleto adjust the relative weights of each criterion by using themouse pointer to ldquostretchrdquo or ldquoshrinkrdquo any of the score-barsThe effects of such adjustments are interactively propagatedto all other bars

Generating Sets of NetworksSo far we have presented the interface from the point ofview of a fixed set of networks There are several ways toarrive at a set of networks to analyze The most straight-forward is to load the networks directly there are many sit-uations where users need to analyze and compare numbersof independent networks (eg citation networks in differ-ent scientific domains or internal communication in differ-ent companies) It is also possible to decompose a singlenetwork into a set of networks Finally subdividing a set ofnetworks into yet more networks also makes sense For ex-ample time-slicing several temporal networks would allow


216

the temporal patterns of each to be compared We proposefour methods to derive interesting sets of networks from aprevious set ego-based clustering attributes and feature-based subdivisions

Ego networks are due to their significance in Social Net-work Analysis an important case The definition of ego net-works (a network centered on an individual) allows two de-grees of freedom the radius around the central individualand whether or not connections between non-central neigh-bors should be included Both are accepted as a parameterswhen splitting networks with ManyNets the default is tosplit by radius 1 including neighbor-neighbor links This isoften referred to as ldquoradius 15rdquo Notice that any single nodewill in general appear in multiple networks once as a cen-tral node and once again per connected edge Likewise alledges will appear in exactly two networks Sets of derivednetworks in ManyNets frequently share nodes and edges

Clustering-based subdivisions take into account the regionsof strong connectivity found in the network There is consid-erable literature on graph clustering and community-findingalgorithms (for instance see [37] or [25]) We have imple-mented network subdivision into connected components ar-guably the simplest of clustering algorithms and intend toadd further algorithms in future revisions

In attribute-based subdivision the network is divided ac-cording to the value of an attribute each resulting subnetcontains only a particular value or value-range for the cho-sen attribute If a network has a temporal dimension as isfrequent in social networks it is possible to ldquoslicerdquo the net-work along the time axis adopting the terminology of [24]Slices can be instantaneous capturing a snapshot of the net-work at a given moment in time or they can span time-windows (the value-range scenario) containing the unionof all the slices within that time-window Additionally itis possible to generate the slices so that they partially over-lap Overlapping slices provide context for evaluating smallchanges which would not be present with thinner and non-overlapping slices A set of time-sliced networks sorted bytime allows temporal patterns to be located with the tool asillustrated in Figure 1

Local feature-based subdivisions attempt to extract all theinstances of a network feature For instance it is possibleto build collections of all the connected dyads or triads ina given graph these are a particular case of so-called ldquonet-work motifsrdquo In the context of biological networks motifdistribution has been proposed as an indicator of functionalsignificance [23] Currently ManyNets only supports the ex-traction of triads and dyads In the future we may want toimitate a motif-specification interface similar to that foundin [31]

RELATED WORKMany systems have been developed to aid with social net-work analysis and network analysis in general Henry andFekete [18] broadly classify them into those that provideframeworks that need to be extended via programming (such

as JUNG[27] GraphViz [12] and Prefuse [17]) and thosethat can be directly used by means of graphical user inter-faces (for instance Pajek [6] UCInet [7] Tulip [3] or So-cialAction [28]) Some systems fall in-between such asGUESS [2] and GGobi [36] GUESS allows access to ahost of filtering and highlighting capabilities through built-in Python scripting and GGobi can interface with the Rstatistical package The above tools rely almost solely onnode-link diagrams Matrix representations are less usedbut better for performing certain tasks on dense networks[13] An interface that is explicitly designed to use bothvisualizations is described in [18] When comparing gen-eral network structure however both matrix and node-linkrepresentations make poor overviews there are too manydegrees of freedom and even two structurally identical net-works are likely to result in very different representations Acanonical representation based on network connectivity thataddresses this problem is presented in [5]

Efforts to make large dense node-link diagrams simpler byabstracting the parts of the network furthest from a user-defined focus can be found in [40] and [38] among othersExploratory analysis is then driven by a top-down method-ology drilling into areas in order to access details Howeverthis approach makes it difficult to spot low-level patterns thatmay be distributed throughout the network they would behidden under layers of abstractions

The use of several statistics to represent a network or a sub-net is frequent throughout the literature An example of aldquonetwork fingerprintrdquo can be found in [39] in this case fin-gerprints refer to a small node-link diagram of an ego net-work and bar-charts for in-degree and out-degree during auser-defined time window Additionally canonical represen-tations such as the portraits proposed in [5] can be used inthe same fashion In [8] Brandes makes a case for the use-fulness of combining different statistics to better understandindividual networks Leskovec [21] and Kumar [20] use dif-ferent types of plots to compare a handful of social networksand their temporal evolution to one another The collectionof plots for a single network can be seen as a ldquofingerprintrdquoand has been used as inspiration in ManyNets We have notfound any prior art where the network fingerprints are ar-ranged in a table for easy comparison and sorting Howevertables of subnetworks that correspond to a query without ad-ditional attributes are used in [11] Outside of the field ofnetwork analysis the Hierarchical Clustering Explorer [32]presents a powerful interface for visualizing relationshipsbetween the multiple attributes (columns) of a large set ofelements (rows) several features of HCE will be added tofuture versions of ManyNets (see section on Future Work)

Tables are known for their information density and muchresearch has gone into building better table presentationsor dealing with screen-size limitations well-known exam-ples are TableLens [30] and InfoZoom [35] This researchis gradually finding its way into mainstream applicationsFor instance recent versions of Microsoft Excel can dis-play scalars with horizontal bars The column summariesfound in ManyNets are similar in concept to the rows of Info-


217

Zoomrsquos ldquoOverview Moderdquo (InfoZoom uses rows for dimen-sions instead of columns) Within column summaries ouruse of overlays to highlight the values of currently-selectedrows was partly inspired by [10] The spreadsheet metaphoris another well-known success story in Human-Computer In-teraction It allows users to easily extend an analysis by al-lowing any cell to build on the contents of other cells in thissense we allow columns to build on the contents of exist-ing columns Most spreadsheet systems limit cell contentsto single elements generally numbers or strings Severalresearch systems have gone beyond these limitations allow-ing display and computation with complex elements as cellvalues [29 9] In [9] different thumbnail-sized views of asingle evolving network are displayed as node-link diagramsin individual cells A different approach is that of NodeXL[1] where a commercial spreadsheet has been extended toprovide single-network visualization directly exposing ta-bles of nodes and edges to users as spreadsheet pages

Figure 6 Overview of the FilmTrust network (described below) us-ing SocialAction Directed links are colored according to their ldquotrustrdquoattributes with red highest and blue lowest

CASE STUDY FILMTRUSTTrust networks are social networks augmented with trust in-formation Links are directed and labeled with the trust thatthe source user has for the sink user Trust can be binaryor scored on a discrete or continuous scale These networkshave been used in the computing literature in two ways Firsttrust inference is an important problem when two users arenot directly connected trust inference algorithms approx-imate a trust value between them (see [15] for a survey)Secondly trust has been used to improve the ways users in-teract with information It is a tool for personalization andhas been applied to generating movie recommendations [1426] to assigning posting permissions in online communities[22] to finding safe mountain ski routes [4] to improvingthe performance of P2P systems [19] and to sorting and fil-tering blog posts [22]

For this analysis we used the trust network that underliesthe FilmTrust website [14] FilmTrust is an online socialnetwork where users rate and review movies and also ratehow much they trust their friends about movies Trust isassigned on an integer scale from 1 (low trust) to 10 (hightrust) Friendship and trust are not required to be mutualrelationships user A can add user B as a friend without Brsquosapproval or reciprocation Trust ratings are also asymmetricusersrsquo trust values for one another are independent and canbe very different The FilmTrust network has been onlinesince 2005 and contains 666 nodes and 1396 links with onegiant component and 69 much smaller components Figure6 is an overview of the network generated by loading it intoManyNets and then displaying it with SocialAction A fewinteresting patterns are visible but there is no way to analyzethe dense central cluster without decomposing it into smallernetworks

AnalysisThe analysis was performed by one of the authors with along experience in the domain of trust networks It was splitinto two sessions of 45 and 3 hours during which she keptcareful logs of her observations documenting 10 distinct hy-potheses in the first analysis session and 5 in the secondwhich was more focused

First she examined the node-link diagram The next stepwas to split the network into all ldquo15rdquo ego networks (eachcontaining a center node its immediate neighbors and alllinks from center to neighbors or between neighbors) andexamine the results within the tool A look at trust distribu-tions reveals a high rate of ldquomaximumrdquo values and generallylow rates of low-to-mid values as predicted by previous re-search in ratings When sorting the ego networks by size sheobserved that large ego networks are overrepresented amongthe low ID numbers (see highlights in the ID column sum-mary in Figure 7) this is to be expected and is often referredto as ldquopreferential attachmentrdquo

Initially our expert relied mainly on sorting and filtering tobrowse for insights in distributions and to look for ldquointer-estingrdquo networks For instance user 345 (second-to-lastrow of Figure 7) stands out with connections to 16 neigh-bors all of them strangers to each other and the minimumtrust value assigned to each of these neighbors After sortingby trust and filtering out the smallest ego networks it be-came apparent that few ego-networks of more than 5 nodeshad high trust values throughout this prompted a hypoth-esis does trust tend to be symmetric between users Thisis most easily observed in cliques which can be located byfiltering by edge density All 15 ego-network cliques wereof size 2 (dyads) except for 2 triplets Examining the trustdifferences showed very similar trust values in these littlepairs However by starting with ego-networks only pairs ofusers disconnected from the main component were shownthis did not represent other dyads present in the graph Forthe next analysis session we added the possibility to split anetwork by dyads and triads and a mechanism to visualizemany network-rows simultaneously within SocialAction


218

Figure 7 The ldquo15rdquo ego networks in the FilmTrust dataset ranked by size The 20 largest have been selected and the pop-up window is displaying alarger version of the summary column for the trust attribute Notice the different distribution in trust among the selected rows as compared with theglobal distribution The summary for the ID column shows that the selected rows are unevenly distributed due to preferential attachment

The first step in the second analysis session was a compari-son of trust in isolated pairs with trust in all pairs of nodesin the network While trust assigned by isolated pairs (fromego networks) follows the same distribution as trust assignedamong all pairs isolated pairs seem to assign fewer mid-dle range values An interpretation is that participants eithertrust each otherrsquos criteria regarding films or they do not withlittle need for middle ground The same experiment com-paring 3-vertex ego networks to all triplets yields strongerresults isolated groups of three tend to have significantlymore trust with each other than do groups of 3 embeddedwithin larger components (see Figure 8) This lead to a sim-ilar question about how trust was distributed in tightly clus-tered groups Indeed there appears to be no correlation be-tween trust and edge density except when the group is iso-lated By selecting the top-20 largest ego-networks (Figure7) the trust distribution can be seen to be significantly lower(especially regarding high ratings) than the general distribu-tion

A second hypothesis was to test the ldquotransitivityrdquo of trust IfA trusts B and C very highly trust between B and C can beexpected to be equally high In this analysis she noted theneed for sorting by distribution ldquosimilarityrdquo prompting usto develop of additional statistics for user-defined columns(such as trust variance) Using a combination of the tableview and SocialAction visualization she found that trust didnot follow any sort of transitivity Trust values between Band C in this example varied widely and independently fromthe values assigned by A

Figure 8 A triplet with asymmetric trust values Red arrows indicatehigh trust blue arrows indicate low trust

OutcomeOur analysis of the FilmTrust network began with two ma-jor questions that follow from conventional wisdom Firstsince trust (in this particular domain) reflects a similarity intastes do nodes trusted by the same people trust one an-other Secondly and a related point does trust increase instrongly connected sub-groups As we pursued answers tothese questions we made several additional insights Wefound that conventional wisdom does not hold in this trustnetwork Two nodes that are trusted by a person do notnecessarily trust one another (see figure 8 for an example)


219

Secondly tightly clustered groups do not have higher lev-els of trust than the general population within these groupstrust tends to follow the same distribution as it does overallThese results have significant implications for research intotrust networks and will be directly applicable to improvingtrust inference algorithms building trust network simulatorsand designing better trust-based interfaces

In a matter of hours our trust network expert was able tolearn to use the tool answer dozens of questions and makesignificant discoveries in a dataset she was already familiarwith all insights described in this paper were previously un-known to her While results of case studies are difficult togeneralize we believe in the spirit of [34] that they are avaluable tool when evaluating complex analytic tools suchas ManyNets

USABILITY STUDYWe carried out a pilot usability study with 7 participants toidentify opportunities for improvement All had computerscience background but no previous network analysis ex-perience Our goal was to understand the use of the inter-face by novice users and to guide the development of train-ing materials for future users We asked our participants tothink aloud while performing actions and to describe anyproblems as they encountered them Before attempting theactual tasks participants were asked to watch an 8-minutevideo describing the problem domain and the general use ofthe tool Participants were then asked to familiarize them-selves with the tool by performing a set of common opera-tions (split a network rank columns and write a filter) andwere provided with reference documentation that they coulduse at any time during the actual tasks

Our tasks were based on those that our analyst performedwhen studying the FilmTrust dataset Using this same datasetparticipants were asked to

1 Find a pair of nodes that have rated each other with thelargest difference in their trust values and estimate howmany such pairs can be found in the whole network

2 Find the densest 15 ego network with more than 3 nodes

3 Find the 15 ego network with more than 3 nodes that hasthe lowest average trust among the nodes

4 Find a group of three nodes (not necessarily an ego net-work) with at least one high trust rating (gt 8) and at leastone low trust (lt 3) rating

5 Describe any difference in the the distribution of trust val-ues in 15 ego networks with 4 nodes compared to the dis-tribution among all 15 ego nets

6 Find a group of at least three nodes where one node hasgiven a rating gt 5 to a neighbor and that neighborrsquos recip-rocal rating is lt 5

ResultsParticipants required around 30 minutes of hands-on experi-ence before becoming proficient with the tool as evidenced

Figure 9 Closeup of the details-on-demand panel shown after selectingthe first two rows of by dragging on their trust values

by faster task execution and greater feeling of control whenfaced with the last tasks This prompted us to extend ourinitial training and underlines the need for providing real-world examples of tool use in training materials Early par-ticipants also requested more context on the significance ofour choice of tasks for real-world analysis a motivation sec-tion was provided to later participants

Participants chose different ways of accomplishing the sametasks For instance some participants preferred to filter net-works into different views and then work with the filteredviews while others added user-defined columns and rankedon the new columnrsquos values Once they found a satisfyingapproach participants tried to extend it to any further taskseven though alternatives would have proven more effectiveThis suggests that future training materials should exposeusers to different styles of analysis highlighting the mosteffective methods for different tasks

Several usability issues were identified Most participantspreferred the use of a details-on-demand panel that was al-ways visible (Figure 9) to the pop-up details-on-demand di-alog (Figure 2) which had to be manually dismissed Asa result newer versions of ManyNets merge both featuresinto an enhanced details-on-demand panel Participants alsorequested the possibility to drag-select rows on the (larger)column overviews found in the details-on-demand panel Fi-nally the expression syntax now includes better error report-ing and multi-criteria columns are now more discoverable

Three participants asked for the possibility of accessing theindividual components of networks In particular for task 6it seemed more natural to access the attributes of individualnodes by querying for them directly instead of having todeal with a distribution of values as seen from a networkperspective Addressing this concern would require us toallow direct access to the underlying data which could againbe represented as a table of nodes or edges similar to thoseused in NodeXL [1]


220

FUTURE WORK AND CONCLUSIONOne of the most natural behaviors when confronted with aset of elements is to look for similarities and differences Itwould be useful to calculate and display intra-group sim-ilarity based on user-defined criteria (for instance graphedit distance) Graphically displaying similarities (as seen inHCE [32]) would allow identification of clusters of relatednetworks and provide interesting overviews of their internalstructure Clusters could then be used as criteria for networksubdivision allowing additional observations to be made

We plan to add support for bipartite networks this would al-low us to compare the trust values in the FilmTrust networkwith the actual ratings that the users gave to the films in thesystemrsquos database Exposing film ratings together with trustvalues may help to understand the way trust is assigned Ad-ditionally our analysis were done considering only a recentsnapshot of the network Using timestamp-data may providefurther insights

We have described ManyNets an interface to visualize setsof networks and presented a case study where the interfacewas used to make new discoveries in the FilmTrust networkdataset This case study demonstrated the effectiveness ofour interface in a representative Social Network Analysistask leading to insights and suggesting new research av-enues within trust networks The usability study has identi-fied several areas for improvement and suggested additionalfeatures Many of the comments received during both stud-ies are finding their way into ManyNets We believe thatManyNets opens new ways of analyzing large social net-works and other collections of complex networks

ACKNOWLEDGMENTSThe first author is supported by a MECFulbright Scholar-ship (Reference No 2008-0306) Partial support for thisresearch was provided by Lockheed Martin Corporation Weparticularly thank Brian Dennis for providing motivating sce-narios for ManyNets and feedback on our designs and CodyDunne and John Guerra for technical insights Finally wewish to thank the anonymous reviewers for their comments

REFERENCES1 Nodexl websitehttpwwwcodeplexcomNodeXLLast visited Sep 2009

2 E Adar Guess a language and interface for graphexploration In CHI rsquo06 Proceedings of the SIGCHIconference on Human Factors in computing systemspages 791ndash800 New York NY USA 2006 ACM

3 D Auber Tulip mdash A huge graph visualizationframework In M Junger and P Mutzel editors GraphDrawing Software Mathematics and Visualizationpages 105ndash126 Springer-Verlag 2004

4 P Avesani P Massa and R Tiella A trust-enhancedrecommender system application Moleskiing In SACrsquo05 Proceedings of the 2005 ACM symposium onApplied computing pages 1589ndash1593 New York NYUSA 2005 ACM

5 J P Bagrow E M Bollt J D Skufca and D benAvraham Portraits of complex networks EPL81(6)68004 Mar 2008

6 V Batagelj and A Mrvar Pajek-Program for LargeNetwork Analysis Connections 21(2)47ndash57 1998

7 S Borgatti M Everett and L Freeman Ucinet forwindows Software for social network analysisHarvard Analytic Technologies 2002

8 U Brandes P Kenis and D Wagner Communicatingcentrality in policy network drawings IEEE Trans VisComput Graph 9(2)241ndash253 2003

9 E H Chi and S K Card Sensemaking of evolvingweb sites using visualization spreadsheets In ProcIEEE Symposium on Information Visualization (Info Visrsquo99) pages 18ndash25 142 1999

10 H Dawkes L A Tweedie and B Spence Vicki thevisualisation construction kit In AVI rsquo96 Proceedingsof the workshop on Advanced visual interfaces pages257ndash259 New York NY USA 1996 ACM

11 P Durand L Labarre A Meil J-L Divo1Y Vandenbrouck A Viari and J Wojcik Genolink agraph-based querying and browsing system forinvestigating the function of genes and proteins BMCBioinformatics Jan 17 2006

12 J Ellson E Gansner L Koutsofios S North andG Woodhull Graphviz-open source graph drawingtools Graph Drawing 2265483ndash485 2001

13 M Ghoniem J Fekete and P Castagliola On thereadability of graphs using node-link and matrix-basedrepresentations a controlled experiment and statisticalanalysis Information Visualization 4(2)114ndash1352005

14 J Golbeck Generating predictive movierecommendations from trust in social networks InProceedings of the Fourth International Conference onTrust Management pages 93ndash104 Springer 2006

15 J Golbeck The science of trust on the webFoundations and Trends in Web Science 2008

16 G Grinstein C Plaisant S Laskowski T OConnellJ Scholtz and M Whiting VAST 2008 ChallengeIntroducing mini-challenges In IEEE Symposium onVisual Analytics Science and Technology 2008VASTrsquo08 pages 195ndash196 2008

17 J Heer S K Card and J A Landay prefuse a toolkitfor interactive information visualization In CHI rsquo05Proceedings of the SIGCHI conference on Humanfactors in computing systems pages 421ndash430 NewYork NY USA 2005 ACM

18 N Henry and J Fekete MatrixExplorer adual-representation system to explore social networksIEEE Transactions on Visualization and ComputerGraphics 12(5)677ndash684 2006


221

19 S D Kamvar M T Schlosser and H Garcia-MolinaThe eigentrust algorithm for reputation management inp2p networks In WWW rsquo03 Proceedings of the 12thinternational conference on World Wide Web pages640ndash651 New York NY USA 2003 ACM Press

20 R Kumar J Novak and A Tomkins Structure andevolution of online social networks In T Eliassi-RadL H Ungar M Craven and D Gunopulos editorsKDD pages 611ndash617 ACM 2006

21 J Leskovec L Backstrom R Kumar and A TomkinsMicroscopic evolution of social networks In KDD rsquo08Proceeding of the 14th ACM SIGKDD internationalconference on Knowledge discovery and data miningpages 462ndash470 New York NY USA 2008 ACM

22 R Levien and A Aiken Attack-resistant trust metricsfor public key certification In 7th USENIX SecuritySymposium pages 229ndash242 1998

23 R Milo S Shen-Orr S Itzkovitz N KashtanD Chklovskii and U Alon Network motifs simplebuilding blocks of complex networks Science298824ndash827 2002

24 J Moody D McFarland and S BenderdeMollDynamic network visualization American Journal ofSociology 110(4)1206ndash1241 2005

25 M E J Newman and M Girvan Finding andevaluating community structure in networks PhysicalReview E 69026113 Aug 11 2003

26 J OrsquoDonovan and B Smyth Trust in recommendersystems In IUI rsquo05 Proceedings of the 10thinternational conference on Intelligent user interfacespages 167ndash174 New York NY USA 2005 ACM

27 J OrsquoMadadhain D Fisher P Smyth S White andY Boey Analysis and visualization of network datausing JUNG Journal of Statistical Software 101ndash352005

28 A Perer and B Shneiderman Integrating statistics andvisualization case studies of gaining clarity duringexploratory data analysis In CHI rsquo08 Proceedings ofthe twenty-sixth annual SIGCHI conference on Humanfactors in computing systems pages 265ndash274 NewYork NY USA 2008 ACM

29 K W Piersol Object-oriented spreadsheets theanalytic spreadsheet package In OOPLSA rsquo86Conference proceedings on Object-orientedprogramming systems languages and applicationspages 385ndash390 New York NY USA 1986 ACM

30 R Rao and S K Card Exploring large tables with thetable lens In CHI 95 Conference Companion pages403ndash404 1995

31 F Schreiber and H Schwobbermeyer MAVisto a toolfor the exploration of network motifs Bioinformatics21(17)3572ndash3574 Aug 22 2005

32 J Seo and B Shneiderman Knowledge discovery inhigh-dimensional data Case studies and a user surveyfor the rank-by-feature framework IEEE Transactionson Visualization and Computer Graphics12(3)311ndash322 MayJune 2006

33 B Shneiderman The eyes have it a task by data typetaxonomy for information visualization In Proceedingsof the IEEE Workshop on Visual Language pages336ndash343 1996

34 B Shneiderman and C Plaisant Strategies forevaluating information visualization toolsmulti-dimensional in-depth long-term case studies InBELIV rsquo06 Proceedings of the 2006 AVI workshop onBEyond time and errors pages 1ndash7 New York NYUSA 2006 ACM

35 M Spenke Visualization and interactive analysis ofblood parameters with infozoom Artificial Intelligencein Medicine 22(2)159ndash172 2001

36 D F Swayne D T Lang A Buja and D CookGGobi evolving from XGobi into an extensibleframework for interactive data visualizationComputational Statistics amp Data Analysis43(4)423ndash444 2003

37 S Van Dongen Graph clustering by flow simulationMasterrsquos thesis Center for Mathematics and ComputerScience (CWI) 2000

38 F van Ham and J J van Wijk Interactive visualizationof small world graphs In Proceedings of the IEEESymposium on Information Visualization(INFOVISrsquo04) pages 199ndash206 Washington DC USA2004 IEEE Computer Society

39 H Welser E Gleave D Fisher and M SmithVisualizing the signatures of social roles in onlinediscussion groups The Journal of Social Structure8(2) 2007

40 P C Wong H Foote P Mackey G C Jr H J Sofiaand J Thomas A dynamic multiscale magnifying toolfor exploring large sparse graphs InformationVisualization 7(2)105ndash117 2008


222

Introduction

Design of ManyNets

Selecting and Adding Columns

Column Summaries

Ranking Columns

Generating Sets of Networks

Related Work

Case Study FilmTrust

Analysis

Outcome

Usability Study

Results

Future Work and Conclusion

Acknowledgments

REFERENCES

Figure 1 ManyNets displaying a time-sliced cell-phone call network Each row represents the network of calls in a 5-hour period with a 50overlap with the previous row (10 rows cover an entire day) Rows with more than 40 connected components have been selected by dragging on theldquoComponent countrdquo column summary histogram and are highlighted with a light-green background in the table Each column summary highlightsin bright green those values that correspond to currently-selected rows In the left-most column summary equally-spaced highlights reveal a temporalpattern This synthetic dataset is from the VAST 2008 Challenge [16] and contains 400 nodes and 9834 links

and columns and can benefit from focus+context techniquessuch as those found in TableLens [30] Our table visualiza-tion is tightly integrated with a node-link diagram networkvisualization tool SocialAction [28] allowing on-demandnetwork inspection A secondary contribution is in the gen-eral field of table-based interfaces in the form of ldquocolumnsummariesrdquo These are special cells placed on top of thecolumn headers that provide an abstract of the contents oftheir column The summaries also support direct user inter-action and reflect application state by highlighting valuesthat correspond to currently-selected rows

The next section presents our approach and describes the in-terface in general terms using a temporal network datasetfrom the VAST 2008 Challenge [16] as a guiding exam-ple After a section on related work we describe an in-depth case-study using data from the FilmTrust social net-work [14] Finally we present the results of usability studywith 7 participants

DESIGN OF MANYNETSWe follow the Visual Information Seeking Mantra [33] tointroduce the ManyNets approach to network visualizationThe first step of this mantra calls for Overview first Analystscan access two distinct types of overviews First rows them-

selves act as overviews of the networks that they representScalar values such as vertex (node) count and edge (link)count are represented with horizontal bars When there isa distribution of values within each network such as in thenode degree column the distribution is shown as a histogramin the corresponding cell For example in Figure 1 the his-tograms that represent the distributions of phone call dura-tions can be seen to be roughly bell-shaped On top of eachcolumn name column summaries provide the second type ofoverview Each column summary summarizes the contentsof an entire column In this sense the set of column sum-maries is an overview of all the networks at once Withinsummaries we use miniature histograms to represent dis-tributions of values for both scalar values and distributionsFor example in Figure 1 the column summary histogramfor call duration shows a much smoother bell curve that con-siders all the distribution values in each network In the samefigure a histogram also shows the distribution of scalar val-ues like Edge Count over all networks

Analysts can zoom and filter (the second step of the Mantra)collections of networks in several ways Columns can beresized by direct manipulation and filtered by hiding themusing the ldquocolumn controlrdquo (displayed as a small button ontop of the scrollbar) Sorting can be seen as a special typeof filtering All columns even those that contain distribu-


214










215











216












217









218







219


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES










215











216












217









218







219


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES











216












217









218







219


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES












217









218







219


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES









218







219


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES







219


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES


















220
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES
























221
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES
























222

Introduction

Design of ManyNets


Column Summaries

Ranking Columns


Related Work


Analysis

Outcome

Usability Study

Results


Acknowledgments

REFERENCES

manynets: an interface for multiple network analysis and …ben/papers/freire2010manynets.pdf ·...

Documents