social network analysis and ctsa consortium structure and ...faculty.cas.usf.edu/jskvoretz/social...
TRANSCRIPT
CTSA Process Evaluation: Social Network Analysis and CTSA Consortium Structure and Organization: Preliminary Results from Archival Data Sources
Author: John Skvoretz, Ph.D. October 28, 2009
Prepared for: National Center for Research Resources One Democracy Plaza 6710 Democracy Blvd. Bethesda, Maryland 20892
Prepared by: Westat 1600 Research Boulevard Rockville, Maryland 20850 (301) 251-1500
1
Social Network Analysis and CTSA Consortium Structure and Organization: Preliminary Results from Archival Data Sources
Introduction. Two aims are pursued in this report. The first aim is to assess the prospects for and the difficulties with using archival data in a social network analysis of CTSA Consortium structure and organization. The second aim is to present a preliminary social network analysis of the pattern of connections among the institutional members of the CTSA Consortium from readily available archival data. The analysis is preliminary in two senses. First, it uses only readily available data and not data that would require more time to access and integrate into the analysis. In the closing section of this report, additional data collection steps that would provide for more nuanced analysis are specified. Second, only data from the “start up” years of the Consortium (2006‐2008) are used, the idea being to provide a baseline snapshot of the network structure among CTSA institutions from which changes can be tracked and analyzed in more detail in subsequent research efforts. Social Network Analysis: Data and Methods. The general objective of social network data collection is to assess the presence or absence of certain types of ties between each pair of members of a bounded population. Typically only a small fraction of the possible pairs are directly connected or “adjacent” to one another, but often a large fraction if not all pairs are indirectly connected to one another through a chain or path of direct connections. The methods for the analysis of such data constitute the techniques that make up social network analysis. The techniques can be usefully classified by their level of analysis: the complete network level, the nodal (individual) level, the dyad or pair level, and the level of triads or other restricted subgraphs.
Techniques for complete network analysis are designed to describe the overall pattern of connection in the population of individual elements. Often these individual elements are persons and the ties connecting them are recognized social relations like friendship or advice seeking. However, the individual elements may be corporate entities (committees, organizations, institutions) with the ties connecting them defined appropriately for such a population. Techniques at the complete network level answer questions such as: is the network connected overall, that is, is there a path direct or indirect between any pair of elements or does the network break into components, that is, subsets of pairs directly or indirectly connected to each other but with each subset not connected to the others; is the average path length connecting pairs short or long; does the overall pattern connection exhibit “small worldness,” short paths despite a large amount of “clustering,” that is, a high likelihood that the contacts of a person are themselves connected. Clustering relates to a second concern at the complete network level, namely, the extent to which the patterning of ties is “clumpy,” that is, are there clusters or subsets of nodes within which a large fraction of ties are present but between which ties are few and the density of connection is sparse.
At the nodal level, techniques focus on describing a node’s position in the overall pattern of connection. A widely used and important set of methods is available to describe the centrality or importance of a node in the overall pattern. A node can be important or central
2
for multiple reasons – it is directly connected to many other nodes (high versus low “degree” centrality, it is on relatively many short paths between other pairs of nodes (high vs. low “betweenness” centrality), it is directly connected to nodes that are important or central, and so on. Each of these is one aspect of a node’s centrality or importance due to its position in the overall pattern of connection. Furthermore, it is often the case that a node can be important in one sense of centrality but not important in another sense.
At the dyad or pair level, the focus is on describing patterns of connection in dyads or pairs of nodes. Measures of reciprocity , that is, whether a connection from one node to another is returned by the same type of connection from the latter to the former, are important here. Often nodes are not undifferentiated collections of persons or corporate entities, but have different properties themselves such as, for persons, age, gender, and ethnicity or for corporate entities, size and public‐private status. There are methods to analyze how the occurrence of ties may be related to the sharing of such properties. Methods also exist for the analysis of larger subgraphs than the dyad. Triads, in particular, are of interest because the patterning of ties in these subgraphs can reveal various structuring forces, such as, closure (one’s friends are themselves friends), shaping the overall pattern of connection.
Social Network Analysis and the CTSA Process Evaluation. For the CTSA process evaluation we are interested in assessing the nature and degree of communication and collaboration across CTSA institution PIs and other participants and within and across committees, with the goal of improving coordination and efficiency. Social network analysis of data collected on committee chairs, for example, will allow us to evaluate the presumption that, if committee goals are being reached, one reason may be due to people working together, which is measured through assessment of indicators of communication and collaboration among the committee chairs. To conduct a social network analysis it is essential to define the network and to be able to gather information about interactions between each member of the network. By including indicators of interaction (e.g., telephone calls outside of consortium meetings, working together on sub‐group planning meetings, exchanging manuscripts, referring graduate students, sharing resources) in the PI survey data collection instrument we can assess the “network” of CTSA Institution PIs. Similarly, as we are interviewing each committee chair, we can also ask about communication and collaboration and conduct an analysis of the network of committee chairs. This type of analysis provides a view of which members are key and which are isolated, as well as how this may change over time.
Data for conducting a social network analysis can be collected through secondary sources or by primary data collection instruments such as interviews and surveys. Considered in this report are secondary sources, particularly, the minutes of key function committee meetings. The value of secondary sources is that data collection is unobtrusive and does not require an extensive time commitment on the part of respondents. The most valuable of these sources are ones that mention names of principal actors in the Initiative. Two actors have a connection if they are co‐mentioned in a document. Such a connection may be of several different types depending on the document type and the “co‐mention” type. In the current situation and for a preliminary analysis, the minutes of different key function committees were
3
aggregated to give an overall picture of the connections between individuals and institutions generated by their co‐involvement on key function committees. Future analysis will consist of data newly collected, as described below. Basic Research Questions. As noted earlier, the general objective of social network analysis is to characterize the overall pattern of connections generated by mapping the measured ties over the population and to describe the positions of individual elements in the overall pattern. Thus, the social network analysis for the CTSA process evaluation will answer the following questions:
1. To what extent is the network (both individual CTSA institution PIs/institutional representatives and CTSA institutions) connected?
2. If connected, are there discernible clusters within which there are many links between individual PIs/institutional representatives and CTSA institutions but between which there are few links?
3. If individual PIs/institutional representatives and CTSA institutions are not directly connected to each other, are they indirectly connected through relatively short or relatively long paths?
4. Does connectedness vary by subgroup (e.g., cohort, discipline, clinical specialty, other types of background, or similar types of institutions)?
5. Are there institutions that take more of a leadership or gatekeeping role than others? What are the characteristics that relate to the roles that institutions play?
6. What does the network of connections in CTSA look like at baseline? Are some institutions and cohorts of institutions more connected than others?
7. How does connectedness change over time, both in the CTSA Initiative at large and in significant subgroups?
The analysis contained in this document answers questions 1 – 3. These questions can be answered with available data. As data become available with respect to important background attributes of individual PIs/institutional representatives or CTSA institutions (e.g., geographic location, scientific or clinical specialty), these questions of connectedness can be examined relative to these background characteristics (questions 4 and 5). Moreover, when the data are available over time, the question of how connectedness changes over time, both in the full CTSA Initiative and in significant subgroups, will also be addressed (questions 6 and 7). Over the course of the evaluation, social network analysis will be expanded to assess the nature and variety of types of connections, and the types of individuals involved in the various connections. This will require the collection of data in interviews and surveys. For example,
4
social network analysis will be able to assess the patterns of collaboration with respect to authorship, regional networks, working on joint data collection, and working on joint projects. Also with respect to people, it will be able to answer the questions such as the extent to which are PIs are involved in various ways on the key function committees and/or the strategic goals committees, etc.
The task of characterizing the overall pattern of connection includes the investigation of significant variation in the position of particular actors in the overall pattern of connection. Here, the basic research question is: are some actors more important than others and if so why? These questions have several possible answers depending on how “importance” is operationalized. Again with data on important background attributes and data over time, this question can be pursued further to see if actor importance is related to background attributes and how actor importance changes over time. In the substantive context of CTSA consortium structure and organization, the connections between people who represent institutions can also be used to construct a network of connections between institutions. These connections have a natural “strength of tie” metric based on the number of connections between individual representatives from different institutions. Similar questions of overall pattern of connection and differences in importance can be posed at this level of analysis: is the inter‐institutional network connected and are some institutions more important than others in the overall pattern of connection. With connections having variable strength, an additional research question can be posed and that is how the strength of connection is related to background attributes of the institution, probably the most important of which, at this time, is award cohort.
Data Sources. One archival data source for a preliminary network analysis of CTSA Initiative activities is the minutes of the meetings of the various key function committees. These minutes typically (though not always) list the participants and their institutional affiliations.1 As a source of network data, these rosters of attendees can be used to create a two‐mode network of persons by the meetings they attended. Such a network can then be used to create a person by person network in which a tie between two individuals has a value which is the number of meetings they co‐attended.
There are eleven kfcs listed on the Consortium website: Administration, Biostatistics /Epidemiology /Research Design (BERD), Clinical Research Ethics (CREW), Clinical Research Innovation, Communications, Community Engagement, Education and Career Development, Evaluation, Public‐Private Partnerships, Informatics, and Translational (Table 1). One kfc (CREW) has no meetings with minutes posted. The other committees vary in the number of meetings with posted minutes containing rosters of attendees. Two (Communications and Informatics) have minutes from 2006; the rest have minutes starting in 2007 with the exception
1 Occasionally, the minutes identify whether a participant was actually present at the meeting or logged in over the web. However, this practice is not consistent across kfcs.
5
of Public‐Private Partnerships and that exception starts in 2008. Not all of the hot links to meetings of kfcs contain minutes with rosters of participants. Such minutes are of no use in the construction of the initial two mode networks. Table 1 summarizes these data sources. Community Engagement has the most data, 18 meetings with minutes having rosters in the three calendar years. Informatics and Translational have the least data; each has 2 meetings with minutes having rosters. To get an adequate number of initial data points, the three calendar years 2006, 2007, and 2008 were used in the preliminary analysis. This decision can, of course, be revisited and the cutoff point changed.
Table 1. KFC meetings with minutes having rosters
*x of y means there were y hot links to meetings but only x had minutes having rosters
The basic record in the main data file has the name of the participant, institutional affiliation, a variable to indicate whether or not they are from NIH or academic institution, and then for each of the 68 meetings in Table 1, “1” if they attended the meeting and “0” otherwise. This main data file combines similarly structured files for each key function committee. Some cleaning was necessary to identify the occasional misspellings of names etc. In the first pass of data analysis, I focus on the participants from CTSA insitutions. This main data file contains 543 participants. The matrix of participant by participant is therefore 543 by 543, with a total of 147,153 participant dyads ([543*542]/2). The (i,j) cell counts the number of times two persons attended the same meeting and the maximum value is 13.2 Most of the cells have the value 0 indicating the two persons never coattended a meeting (92% or 134,809/147,153 of all possible dyads).
2 By construction, the matrix is symmetric, that is, the value in the (i,j) cell equals the value in the (j,i) cell – if person i attended 13 meetings that person j attended, then person j attended 13 meetings that person i attended.
2006 2007 2008 All years Administration 0 3 9 12 Biostatistics/Epidemiology/Research Design (BERD) 0 2 6 8 Clinical Research Ethics 0 0 0 0 Clinical Research Innovation 0 3 3 6 Communications 1 1 of 3* 4 of 6 5 of 9 Community Engagement 0 9 of 11 9 of 12 18 of 23 Education and Career Development 0 1 2 of 3 3 of 4 Evaluation 0 3 of 4 2 of 7 5 of 11 Informatics 1 1 0 of 1 2 of 3 Public‐Private Partnerships 0 0 of 1 6 of 9 6 of 10 Translational 0 1 1 of 3 2 of 3 Number of meetings 2 30 59 91 Number of meetings with participant rosters 2 24 42 68
6
A second analysis looks at the connections among the key function committees themselves. In this analysis there are 10 nodes, one for each of the key function committees and the (I,j) cell counts the number of persons who attended at least one meeting of the ith key function committee and one meeting of the jth key function committee. This network gives on overall picture of how the key function committees are themselves integrated into the overall structure of the Consortium.
A third analysis is based on institutional co‐attendance. In this analysis, the nodes in the network are CTSA institutions and the (i,j) cell counts the number of times a representative from the first institution coattended a meeting with a representative from the second institution. There are 38 institutions (the first three waves of awardees) represented in the set of minutes from 2006 to 2008 and 703 institutional dyads ([38*37]/2 = 703). All pairs are directly connected in that there is at least one meeting where representatives from each pair co‐attended. However there is a great deal of variation. The most strongly connected pair is 1C08 and 1C12 with a value of 127 which means that in the 68 meetings there were 127 instances in which there was the co‐attendance by a pair of representatives from each institution. At the other extreme, there were a number of institutional pairs with a value of 2, that is, there were only two instances over the 68 meetings where a pair of representatives from each institution coattended. The mean was 30.5.
A portion of the basic matrix of connection for the institutional network is found in Table 2. The diagonal cell counts the number of times at least one representative from the row institution attended one of the 68 meetings. It is possible for this number to exceed 68 if the institution had more than one person attending meetings. Similarly a count less than 68 does not mean that the row institution had a representative at that many meetings since it is
Table 2. Part of the institutional matrix of connections
possible that it had multiple attendees at one or more meetings. The off diagonal cell counts refer to the number of pairs, one person from the row institution and one from the column institution that co‐attended a meeting. The (i,j) off diagonal cell is necessarily equal to the (j,i) off diagonal cell. To clarify what a particular cell value means concretely, we can take the lowest value in the entire matrix, namely, 2. It could come about by each institution being represented by one person at two meetings or by one of the institutions being represented by two persons and the other by one at one meeting. The count is best viewed as a measure of
1C01 1C02 1C03 1C04 1C05 1C06 1C07 … 1C01 1C02 1C03 1C04 1C05 1C06 1C07 …
50 66 44 52 57 58 55 …
66 77 80 78 97 93 77 …
44 80 68 72 95 79 56 …
52 78 72 67 77 76 55 …
57 97 95 77 73 88 63 …
58 93 79 76 88 77 63 …
55 77 56 55 63 63 50 …
… … … … … … … …
7
the potential for information flow between institutions with larger counts indicated greater opportunity for substantive contact between the two institutions through their representatives (either in the meeting or at another time).
Results of Analysis. The network of persons (identified by CTSA institution ID label and individual ID number) is displayed in Figure 1. (The layout is produced by an algorithm that tries to minimize the number of line crossings and place nodes that are close to each other in network distance close to each other in the xy plane.) If a person attended the meetings of only one committee, the color of the node representing that person is keyed to the color of the committee. If a person attended the meetings of more than one committee, the node representing that person is colored blue and is larger in size. Less than 10% of the 543 persons in the network – 35 to be exact – attended at least one meeting of more than one committee.3 The clustering in Figure 1 is clearly associated with different key function committees. The pattern occurs because few persons serve on more than one committee. Nevertheless, these few who do serve on two or more committees create enough linkages across committee clusters that the entire set of 543 persons is connected. In fact, the network itself is a very good example of a “small world” in the technical sense that there is a high degree of clustering (the associates of a person are likely to be tied to each other) but relatively short paths between any two pairs of persons.4 Ignoring the specific weight on each tie, the clustering coefficient for the person network is .71 and the average (shortest) path length between pairs of persons is 2.69. A clustering coefficient of .71 indicates that two associates of a person themselves have a tie 71% of the time – the maximum value being, of course, 100%. The shortest path length is 1 if there is a direct tie between the two individuals. In the person network, 8.4% of the dyads are connected directly (and so the density of the network is .084), 32% are connected by a path of length 2, 44% by a path of length 3, 13% by a path of length 4, and the rest by paths of length 5 or the maximum 6. That the clustering coefficient is so high, is not unexpected given how the person network is constructed – definitionally every subset of three persons who attended a meeting is completely connected and so contributes positively to the clustering coefficient value. Nevertheless, despite the high degree of clustering, the average length of the shortest path between two actors is less than 3 links.
In any network, some actors are more central than others. Two versions of centrality relevant here are degree centrality, which answers the question who has the most direct connections, and betweenness centrality, which answers the question given the set of shortest paths between pairs of persons, who is on relatively many of these paths. Table 3 lists the IDs
3 Specifically, 30 of the 35 attended at least one meeting of just two committees, four attended at least one meeting of three committees and one person attended at least one meeting of four committees.
4 Because individuals can maintain only a fixed number of ties, the more these ties go to persons who are already connected to each other, the fewer are the ties available to connect to other clusters. In the extreme, the network breaks into clusters with no ties between clusters and so infinite path lengths between any two persons in different clusters. This is the fascination of the “small world” phenomena – high clustering yet somehow short paths on average.
8
Figure 1. Network of co‐attendance ties among academic participants in CTSA consortium key function committee meetings
3C11001
2C04002
2C060033C14004
2C12005
1C03006
3C030072C02008
3C12009
2C04010
3C14011
3C08012
1C02013
3C06014
1C10015
2C05016
1C07017
3C08018
1C05019
1C120201C01021
2C08022
2C03023
1C04024
3C02025
1C06026
1C05027
2C04028
1C11029
1C08030
1C04031
2C02032
1C07033
1C12034
1C06035
1C12036
3C07037 3C060383C13039
2C11040
1C05041
3C030422C03043
2C12044
1C02045
1C03046
1C01047
1C06048
1C08049
1C11050
2C08051
2C12052
1C100532C10054
1C090552C03056
3C09057
2C10058
2C06059
2C100602C05061
2C12062
2C11063
1C12064
2C01065
1C07066
1C08067
2C09068
1C11069
2C05070
1C07071
3C010723C07073
2C05074
3C05075
2C060762C07077
2C08078
2C11079
3C04080
2C05081
2C02082
2C070831C080843C02085
3C10086
2C02087
2C09088
2C03089
2C11090
2C02091
1C12092
2C09093
2C11094
1C02095
2C03096
1C03097 1C03098
2C040992C061001C06101
3C10102
2C031031C03104
2C11105
3C13106 3C13107
3C08108
2C10109
2C05110
3C051111C11112
1C09113
1C051142C09115
3C05116
1C07117
3C041183C05119
1C08120
3C121213C01122
2C07123
1C04124
2C111252C05126
1C031273C12128
3C09129
3C07130
3C10131
3C04132
1C03133
1C08134
2C011351C02136
3C14137
3C03138
2C02139
3C021402C091411C08142
2C02143
3C07144
2C08145
1C02146
1C12147
1C05148
2C09149
3C03150
2C121511C04152
3C02153
3C07154 1C12155
1C03156
3C06157
2C02158
2C12159
1C05160
1C02161
2C06162
3C12163
3C04164
3C05165
1C11166
1C09167
2C011681C04169
3C09170
2C10171
1C01172
3C10173
1C10174
2C07175
2C081761C06177
3C07178
1C10179 2C051803C08181
1C01182
3C141832C10184
1C12185
2C08186
2C09187
3C08188
1C05189
3C13190
1C09191
1C08192
2C04193
1C07194
1C03195
3C12196
2C07197
1C03198
2C04199
3C05200
1C02201
1C08202
1C11203
1C12204
3C10205
2C04206
1C09207
3C04208
3C04209
1C02210
1C03211
3C05212
1C10213
3C14214
2C12215
2C06216
2C02217
3C052183C01219
1C06220
3C092213C09222
2C07223
2C05224
1C03225
1C05226
1C11227
1C052281C04229
2C022301C01231
1C12232
1C082331C05234
1C08235
3C05236
2C08237
2C10238
1C07239
2C03240
2C07241
1C12242
2C09243 1C02244
2C09245
2C01246
3C03247
2C022482C07249
2C01250
1C04251
2C05252
1C03253
1C03254
3C02255
2C09256
2C01257
1C03258
3C08259
1C03260
2C04261
2C082622C02263
1C02264
2C092653C04266
1C11267
1C07268
2C01269
2C11270
3C08271
3C08272
3C09273
1C10274
2C02275
3C03276
1C01277
?278
2C02279
1C10280
3C02281
1C08282
3C09283
1C08284
3C05285
1C05286
2C02287
2C07288
2C04289
3C062902C04291
2C04292
3C05293
2C10294
2C12295
3C12296
1C092971C03298
1C12299
3C12300
3C14301
1C08302
2C05303
2C04304
1C02305
1C05306
1C09307
3C04308
2C12309
1C06310
2C04311 1C013122C02313
1C12314
1C02315
3C06316
2C02317 1C08318
3C11319
1C05320
2C07321
3C10322
3C02323
2C11324
2C02325
1C07326
2C08327
3C05328
2C08329
3C07330
2C04331
2C09332
2C05333
2C07334
1C01335
3C06336
2C11337
3C05338
1C043392C08340
3C01341
3C10342
2C10343
1C11344
1C09345
1C06346
2C01347
2C11348
1C07349
1C03350
3C023511C09352
1C08353
1C11354
3C07355
2C02356
1C11357
2C02358
3C04359
2C013602C03361
2C06362
2C09363
1C08364
3C04365
2C12366
2C07367
2C02368
2C04369
1C08370
2C05371
3C08372
1C013732C083742C10375
2C08376
1C09377
3C10378
1C06379
1C02380
3C09381
1C05382
1C083833C12384
3C143851C03386
3C06387
1C06388
2C05389
1C12390
2C11391
1C08392
1C043931C10394
3C14395
1C07396
1C11397
2C12398
2C02399
1C01400
2C07401
1C124022C11403
1C03404
2C05405
1C02406Upenn407
3C034083C08409
3C04410
1C04411
2C02412 1C11413
3C13414 1C014151C07416
3C10417 2C024182C06419
1C124203C03421
3C07422
1C12423
2C08424
1C01425
2C04426
3C084271C12428
1C04429
1C08430
3C08431
3C09432
2C094333C03434
1C01435
1C03436
1C05437
2C06438
1C01439
2C01440
1C06441
1C05442
1C014432C05444
1C07445
3C02446
2C07447
1C07448
2C02449
3C06450
2C104513C01452
3C11453 2C04454
3C03455
3C12456
2C07457
2C03458
2C05459
2C10460
1C084611C10462
1C02463
1C06464
1C05465
1C08466
1C09467
1C024681C11469
1C08470
1C02471
1C06472
1C12473
1C03474
1C09475
1C07476
1C04477
1C12478
1C11479
1C09480
1C07481
1C124821C07483
1C08484
3C04485
1C08486
1C094871C12488
1C034891C07490
1C05491
1C08492
2C08493
2C12494
2C074951C10496
1C04497
1C11498
2C05499
2C01500
3C10501
2C02502
1C09503
2C11504
3C07505
1C02506
3C04507
1C06508
2C04509
1C06510
2C02511
1C12512
2C05513
2C04514
1C09515
3C14516
3C015173C04518
1C10519
3C09520
1C12521
3C11522
1C01523
1C05524
2C11525
1C02526
1C01527
2C085281C08529 1C07530
1C12531
?532
1C11533
2C09534
1C10535
2C095362C09537
1C03538
1C075391C08540
1C06541
1C095421C04543
9
Table 3. Ten most central persons for two measures of centrality
*The count of the number of ties to other actors in the network. **The extent to which a node lies between other nodes in the network. This measure takes into account the connectivity of the node's neighbors, giving a higher value for nodes which bridge clusters.
of the top ten participants in terms of degree centrality and betweenness centrality. Four of the ten are found in both lists: 1C10174 (ranked 1 and 4), 1C09307 (ranked 2 and 1), 1C0849 (ranked 4 and 3), and 1C02161 (ranked 7 and 10). Therefore, four persons are central figures in the network both in terms of number of links to others and in terms of being “between” others – being on many of the shortest paths connecting pairs of other actors. However, other persons in the top ten lists are either central only because they have relatively many connections or because they are “gatekeepers.” Of the top ten most central actors in terms of connections, nine are from the first cohort of CTSA centers and one from the second cohort. Among the ten most central actors in terms of being between others, six are from the first cohort, and two each from the second and third cohort. Given that individuals from the first cohort would have had more time to establish connections, one would expect them to show up on these two top ten lists. More surprising, perhaps, is that anyone from the second or especially the third cohort shows up as among the most central of actors. Useful additional analyses will be done once we collect data on important attributes of the actors, such as seniority, position, and academic discipline.
Without the 35 persons who attend meetings of two or more committees the Figure 1 network would break into distinct components, one per key function committee, not connected to each other. Each of the 30 persons who attended meetings of two committees effectively creates a tie between those two committees. Each of the four persons who attended meetings of three committees creates a tie between three pairs of committees and the one person who attends meetings of four committees effectively creates a pairwise tie for six committee pairs.
Rank Degree* Betweenness** 1 2 3 4 5 6 7 8 9 10
1C10174 1C09307 2C03056 1C08049 1C08067 1C11069 1C02161 1C12036 1C09055 1C07071
1C09307 2C12044 1C08049 1C10174 3C05212 1C10274 2C11040 3C13190 1C10015 1C02161
10
Figure 2. How KFCs are connected by people who attend 2 or more KFCs
Admin
BERD
Communication
Community
CRI
Education
Evaluation
Informatics
PublicPrivate
Translational
Therefore the 35 persons collectively create 48 ties between pairs of committees and this network is depicted in Figure 2, where line thickness is proportional to the number of coattendees and size of node is proportional to its degree (number of direct connections to other nodes).
Figure 2 reveals that the largest amount of coattendance at the meetings of a pair of KFCs occurs between the Administration and the Communication KFC. The next largest amount occurs between the Administration and the Evaluation KFC. Administration and Evaluation are the most central of the KFCs connecting directly to seven others. Five of the ties Administration has to other committees consist of a single person who has coattended meetings of both committees. This is true for only two of the ties of Evaluation. Therefore, while both Evaluation and Administration are central to the overall pattern of connection in Figures 1 and 2, Evaluation may be considered somewhat more central. That Evaluation and Administration should be most heavily involved in the integration of the KFC structure follows from the importance of both to the tasks of the more specialized KFCs. One final observation pertains to the Translational KFC – absent one person who has coattended at least one of its meetings and meetings of the Administration KFC, it would be disconnected from the rest of the committee structure.
Figure 3 displays the institutional network. Since it is completely connected, the picture is not especially revealing even when the size of the edges connecting two institutions is proportional to the number of pairs of institutional representatives that coattended kfc
11
Figure 3. The inter‐institutional network, all ties
meetings. One can see that, in general, thicker edges connect institutions in the first cohort (red) to institutions in the second cohort (blue) than either to institutions in the third cohort (yellow). The fact that the ties differ in strength means the network can be analyzed byrestricting ties according to how strong they are. There are 38 institutions and therefore 703 institutional dyads (unordered pairs of institutions) ([38*37]/2 = 703).
Figure 4 graphs the distribution of tie strength to other institutions in an institution’s cohort (panels 1, 3, and 6 reading from top to bottom) and to institutions in other cohorts (panels 2,4, and 5). It shows that intra‐cohort tie strength is related to cohort year: the average intra‐cohort tie strength for the 2006 cohort is 70.2, for the 2007 cohort, it is 51.0 and for the 2008 cohort, it is 12.8. A similar pattern holds for inter‐cohort ties: the ties between 2006 and 2007 (49.5) are stronger than the ties between 2006 and 2008 (15.2) and 2007 and 2008 (15.4). These relationships are to be expected simply because an older cohort member has had more opportunity to send representatives to meetings than a younger cohort member. However, there is a great deal of variability within the older cohorts (2006 and 2007) and thus differences in how well connected institutions in a given cohort are.
1C011C08
1C09
1C12
1C10
1C06
1C03
1C11
1C02
1C04
1C05
1C07
2C06 2C02
2C07
2C01
2C10
2C12
2C032C05
2C09
2C11
2C08
2C04
3C14
3C133C10
3C07
3C08
3C04
3C11
3C02
3C01
3C09
3C033C05
3C06
3C12
12
Figure 4. Distributions of tie strength by cohorts
This network was also analyzed by examining the overall pattern of connections at different “thresholds” of tie strength: what institutions remain connected to one another if we only look at ties that have strength greater than certain thresholds. Figure 5 shows, for instance, that if we take a threshold of 66, which is the median strength of ties between institutions in the oldest cohort, the youngest cohort will no longer be connected to the other institutions. That is, the ties from the 2008 cohort to nearly all of the other institutions have a value of 66 or smaller and so are among the weaker ties in the original network depicted in Figure 3. Figure 5 demonstrates this fact: all of the institutions in the 2008 cohort and three of the institutions in the 2007 cohort are no longer connected to the rest of the institutions. It
Strength of Tie
Freq
uenc
ies
from
Coh
ort x
to C
ohor
t y
05
1015
0 50 100
2006
-200
6
051015
2006
-200
7
05
1015
2006
-200
8
051015
2007
-200
7
05
1015
2007
-200
8
051015
2008
-200
8
13
Figure 5. The inter‐institutional network, strength threshold=66*
*The only ties represent in the diagram are ones that have a value greater than 66. All ties from the connected cluster to the isolates on the left have a strength of 66 or less.
is also clear from this “strong tie” network that some institutions in both the 2006 and 2007 cohort are more tenuously connected than others, in particular, 1C09, 2C01, and 2C11.
The network in Figure 5 can be analyzed to determine which institutions are most central in this network of “strong” ties. If importance is defined as sheer number of connections the top five most central nodes are: 1C08, 1C12, 2C02, 1C02, and 1C06 tied with 1C05 for fifth. The top five in terms of betweenness are 1C08, 2C02, 1C12, 1C02, and 2C08. It is noteworthy that two institutions in the 2007 cohort appear in this top five. In general, 2006 awardees are more embedded in the institutional network than 2007 awardees or 2008 awardees for obvious reasons. To the extent that the co‐presence of institutional representatives at relatively many meetings can proxy for a greater opportunity for exchange of information and ideas, one would expect greater prospects for cross institutional collaborations among the institutions that remain connected in Figure 4 than among the institutions no longer connected to these or to each other.
Next Steps. One purpose of the preliminary analysis was to assess the difficulties with and prospects for using archival data to characterize the structure and operation of the CTSA Initiative. Consequently, relatively near to hand and accessible sources were accessed, namely,
1C01
1C08
1C09
1C12
1C10
1C06
1C03
1C11
1C02
1C04
1C05
1C07
2C06 2C02
2C072C01
2C102C12
2C03
2C05
2C09
2C11
2C08
2C04
3C14
3C13
3C10
3C07
3C08
3C04
3C11
3C02
3C01
3C09
3C03
3C05
3C06
3C12
14
the minutes of the key function committees. The completeness of this set of minutes will be investigated more thoroughly. A second next step on the data collection side is to seek out and use minutes from other group meetings. The minutes of meetings of the strategic goal committees, for instance, are available at the ctsaweb.org website, and the wiki has minutes of some workgroup meetings. Since the context of these meetings is sufficiently different from the context of the key function committee meetings, combination of data from these sources must be done so that disaggregation is possible. A third next step on the data collection side is to acquire data on relevant background attributes of individuals attending these various meetings and on the institutions. Such data would provide the opportunity for a more nuanced analysis of the network of persons displayed in Figure 1 and the network of institutions displayed in Figures 3 and 5. These data will be obtained through key informant interviews and surveys. Westat will also collect network data directly by asking respondents to report communication and collaborations with other members of the CTSA initiative.
The time frame for building the baseline network needs to be assessed in terms of future analyses. One research question of interest is how the network of connections between persons and the derived network of connections between institutions changes over time and that can be addressed by comparing networks built on different time slices in the consortium’s history. Currently the baseline includes the years 2006 through 2008, in part because of fluctuation in the regularity with which meetings were held during the “start up” phase of the consortium. If meeting schedules have now been regularized, it is likely that two calendar years will yield sufficient data points to build a time 2, time 3, etc. network for comparison.
Finally, the networks analyzed so far consider only the role of CTSA award institutions. The various personnel from the NIH who participate in the key function committee meetings have not been integrated into the overall analysis. Some consideration should be given as to whether and how their participation should be incorporated and analyzed.