social network analysis and ctsa consortium structure and ...faculty.cas.usf.edu/jskvoretz/social...

CTSA Process Evaluation: Social Network Analysis and CTSA Consortium Structure and Organization: Preliminary Results from Archival Data Sources

Author: John Skvoretz, Ph.D. October 28, 2009

Prepared for: National Center for Research Resources One Democracy Plaza 6710 Democracy Blvd. Bethesda, Maryland 20892

Prepared by: Westat 1600 Research Boulevard Rockville, Maryland 20850 (301) 251-1500

1

Social Network Analysis and CTSA Consortium Structure and Organization: Preliminary Results from Archival Data Sources

Introduction. Two aims are pursued in this report. The first aim is to assess the prospects for and the difficulties with using archival data in a social network analysis of CTSA Consortium structure and organization. The second aim is to present a preliminary social network analysis of the pattern of connections among the institutional members of the CTSA Consortium from readily available archival data. The analysis is preliminary in two senses. First, it uses only readily available data and not data that would require more time to access and integrate into the analysis. In the closing section of this report, additional data collection steps that would provide for more nuanced analysis are specified. Second, only data from the “start up” years of the Consortium (2006‐2008) are used, the idea being to provide a baseline snapshot of the network structure among CTSA institutions from which changes can be tracked and analyzed in more detail in subsequent research efforts. Social Network Analysis: Data and Methods. The general objective of social network data collection is to assess the presence or absence of certain types of ties between each pair of members of a bounded population. Typically only a small fraction of the possible pairs are directly connected or “adjacent” to one another, but often a large fraction if not all pairs are indirectly connected to one another through a chain or path of direct connections. The methods for the analysis of such data constitute the techniques that make up social network analysis. The techniques can be usefully classified by their level of analysis: the complete network level, the nodal (individual) level, the dyad or pair level, and the level of triads or other restricted subgraphs.

Techniques for complete network analysis are designed to describe the overall pattern of connection in the population of individual elements. Often these individual elements are persons and the ties connecting them are recognized social relations like friendship or advice seeking. However, the individual elements may be corporate entities (committees, organizations, institutions) with the ties connecting them defined appropriately for such a population. Techniques at the complete network level answer questions such as: is the network connected overall, that is, is there a path direct or indirect between any pair of elements or does the network break into components, that is, subsets of pairs directly or indirectly connected to each other but with each subset not connected to the others; is the average path length connecting pairs short or long; does the overall pattern connection exhibit “small worldness,” short paths despite a large amount of “clustering,” that is, a high likelihood that the contacts of a person are themselves connected. Clustering relates to a second concern at the complete network level, namely, the extent to which the patterning of ties is “clumpy,” that is, are there clusters or subsets of nodes within which a large fraction of ties are present but between which ties are few and the density of connection is sparse.

At the nodal level, techniques focus on describing a node’s position in the overall pattern of connection. A widely used and important set of methods is available to describe the centrality or importance of a node in the overall pattern. A node can be important or central

2

for multiple reasons – it is directly connected to many other nodes (high versus low “degree” centrality, it is on relatively many short paths between other pairs of nodes (high vs. low “betweenness” centrality), it is directly connected to nodes that are important or central, and so on. Each of these is one aspect of a node’s centrality or importance due to its position in the overall pattern of connection. Furthermore, it is often the case that a node can be important in one sense of centrality but not important in another sense.

At the dyad or pair level, the focus is on describing patterns of connection in dyads or pairs of nodes. Measures of reciprocity , that is, whether a connection from one node to another is returned by the same type of connection from the latter to the former, are important here. Often nodes are not undifferentiated collections of persons or corporate entities, but have different properties themselves such as, for persons, age, gender, and ethnicity or for corporate entities, size and public‐private status. There are methods to analyze how the occurrence of ties may be related to the sharing of such properties. Methods also exist for the analysis of larger subgraphs than the dyad. Triads, in particular, are of interest because the patterning of ties in these subgraphs can reveal various structuring forces, such as, closure (one’s friends are themselves friends), shaping the overall pattern of connection.

Social Network Analysis and the CTSA Process Evaluation. For the CTSA process evaluation we are interested in assessing the nature and degree of communication and collaboration across CTSA institution PIs and other participants and within and across committees, with the goal of improving coordination and efficiency. Social network analysis of data collected on committee chairs, for example, will allow us to evaluate the presumption that, if committee goals are being reached, one reason may be due to people working together, which is measured through assessment of indicators of communication and collaboration among the committee chairs. To conduct a social network analysis it is essential to define the network and to be able to gather information about interactions between each member of the network. By including indicators of interaction (e.g., telephone calls outside of consortium meetings, working together on sub‐group planning meetings, exchanging manuscripts, referring graduate students, sharing resources) in the PI survey data collection instrument we can assess the “network” of CTSA Institution PIs. Similarly, as we are interviewing each committee chair, we can also ask about communication and collaboration and conduct an analysis of the network of committee chairs. This type of analysis provides a view of which members are key and which are isolated, as well as how this may change over time.

Data for conducting a social network analysis can be collected through secondary sources or by primary data collection instruments such as interviews and surveys. Considered in this report are secondary sources, particularly, the minutes of key function committee meetings. The value of secondary sources is that data collection is unobtrusive and does not require an extensive time commitment on the part of respondents. The most valuable of these sources are ones that mention names of principal actors in the Initiative. Two actors have a connection if they are co‐mentioned in a document. Such a connection may be of several different types depending on the document type and the “co‐mention” type. In the current situation and for a preliminary analysis, the minutes of different key function committees were

3

aggregated to give an overall picture of the connections between individuals and institutions generated by their co‐involvement on key function committees. Future analysis will consist of data newly collected, as described below. Basic Research Questions. As noted earlier, the general objective of social network analysis is to characterize the overall pattern of connections generated by mapping the measured ties over the population and to describe the positions of individual elements in the overall pattern. Thus, the social network analysis for the CTSA process evaluation will answer the following questions:

1. To what extent is the network (both individual CTSA institution PIs/institutional representatives and CTSA institutions) connected?

2. If connected, are there discernible clusters within which there are many links between individual PIs/institutional representatives and CTSA institutions but between which there are few links?

3. If individual PIs/institutional representatives and CTSA institutions are not directly connected to each other, are they indirectly connected through relatively short or relatively long paths?

4. Does connectedness vary by subgroup (e.g., cohort, discipline, clinical specialty, other types of background, or similar types of institutions)?

5. Are there institutions that take more of a leadership or gatekeeping role than others? What are the characteristics that relate to the roles that institutions play?

6. What does the network of connections in CTSA look like at baseline? Are some institutions and cohorts of institutions more connected than others?

7. How does connectedness change over time, both in the CTSA Initiative at large and in significant subgroups?

The analysis contained in this document answers questions 1 – 3. These questions can be answered with available data. As data become available with respect to important background attributes of individual PIs/institutional representatives or CTSA institutions (e.g., geographic location, scientific or clinical specialty), these questions of connectedness can be examined relative to these background characteristics (questions 4 and 5). Moreover, when the data are available over time, the question of how connectedness changes over time, both in the full CTSA Initiative and in significant subgroups, will also be addressed (questions 6 and 7). Over the course of the evaluation, social network analysis will be expanded to assess the nature and variety of types of connections, and the types of individuals involved in the various connections. This will require the collection of data in interviews and surveys. For example,

4

social network analysis will be able to assess the patterns of collaboration with respect to authorship, regional networks, working on joint data collection, and working on joint projects. Also with respect to people, it will be able to answer the questions such as the extent to which are PIs are involved in various ways on the key function committees and/or the strategic goals committees, etc.

The task of characterizing the overall pattern of connection includes the investigation of significant variation in the position of particular actors in the overall pattern of connection. Here, the basic research question is: are some actors more important than others and if so why? These questions have several possible answers depending on how “importance” is operationalized. Again with data on important background attributes and data over time, this question can be pursued further to see if actor importance is related to background attributes and how actor importance changes over time. In the substantive context of CTSA consortium structure and organization, the connections between people who represent institutions can also be used to construct a network of connections between institutions. These connections have a natural “strength of tie” metric based on the number of connections between individual representatives from different institutions. Similar questions of overall pattern of connection and differences in importance can be posed at this level of analysis: is the inter‐institutional network connected and are some institutions more important than others in the overall pattern of connection. With connections having variable strength, an additional research question can be posed and that is how the strength of connection is related to background attributes of the institution, probably the most important of which, at this time, is award cohort.

Data Sources. One archival data source for a preliminary network analysis of CTSA Initiative activities is the minutes of the meetings of the various key function committees. These minutes typically (though not always) list the participants and their institutional affiliations.1 As a source of network data, these rosters of attendees can be used to create a two‐mode network of persons by the meetings they attended. Such a network can then be used to create a person by person network in which a tie between two individuals has a value which is the number of meetings they co‐attended.

There are eleven kfcs listed on the Consortium website: Administration, Biostatistics /Epidemiology /Research Design (BERD), Clinical Research Ethics (CREW), Clinical Research Innovation, Communications, Community Engagement, Education and Career Development, Evaluation, Public‐Private Partnerships, Informatics, and Translational (Table 1). One kfc (CREW) has no meetings with minutes posted. The other committees vary in the number of meetings with posted minutes containing rosters of attendees. Two (Communications and Informatics) have minutes from 2006; the rest have minutes starting in 2007 with the exception

1 Occasionally, the minutes identify whether a participant was actually present at the meeting or logged in over the web. However, this practice is not consistent across kfcs.

5

of Public‐Private Partnerships and that exception starts in 2008. Not all of the hot links to meetings of kfcs contain minutes with rosters of participants. Such minutes are of no use in the construction of the initial two mode networks. Table 1 summarizes these data sources. Community Engagement has the most data, 18 meetings with minutes having rosters in the three calendar years. Informatics and Translational have the least data; each has 2 meetings with minutes having rosters. To get an adequate number of initial data points, the three calendar years 2006, 2007, and 2008 were used in the preliminary analysis. This decision can, of course, be revisited and the cutoff point changed.

Table 1. KFC meetings with minutes having rosters

*x of y means there were y hot links to meetings but only x had minutes having rosters

The basic record in the main data file has the name of the participant, institutional affiliation, a variable to indicate whether or not they are from NIH or academic institution, and then for each of the 68 meetings in Table 1, “1” if they attended the meeting and “0” otherwise. This main data file combines similarly structured files for each key function committee. Some cleaning was necessary to identify the occasional misspellings of names etc. In the first pass of data analysis, I focus on the participants from CTSA insitutions. This main data file contains 543 participants. The matrix of participant by participant is therefore 543 by 543, with a total of 147,153 participant dyads ([543*542]/2). The (i,j) cell counts the number of times two persons attended the same meeting and the maximum value is 13.2 Most of the cells have the value 0 indicating the two persons never coattended a meeting (92% or 134,809/147,153 of all possible dyads).

2 By construction, the matrix is symmetric, that is, the value in the (i,j) cell equals the value in the (j,i) cell – if person i attended 13 meetings that person j attended, then person j attended 13 meetings that person i attended.

2006 2007 2008 All years Administration 0 3 9 12 Biostatistics/Epidemiology/Research Design (BERD) 0 2 6 8 Clinical Research Ethics 0 0 0 0 Clinical Research Innovation 0 3 3 6 Communications 1 1 of 3* 4 of 6 5 of 9 Community Engagement 0 9 of 11 9 of 12 18 of 23 Education and Career Development 0 1 2 of 3 3 of 4 Evaluation 0 3 of 4 2 of 7 5 of 11 Informatics 1 1 0 of 1 2 of 3 Public‐Private Partnerships 0 0 of 1 6 of 9 6 of 10 Translational 0 1 1 of 3 2 of 3 Number of meetings 2 30 59 91 Number of meetings with participant rosters 2 24 42 68

6

A second analysis looks at the connections among the key function committees themselves. In this analysis there are 10 nodes, one for each of the key function committees and the (I,j) cell counts the number of persons who attended at least one meeting of the ith key function committee and one meeting of the jth key function committee. This network gives on overall picture of how the key function committees are themselves integrated into the overall structure of the Consortium.

A third analysis is based on institutional co‐attendance. In this analysis, the nodes in the network are CTSA institutions and the (i,j) cell counts the number of times a representative from the first institution coattended a meeting with a representative from the second institution. There are 38 institutions (the first three waves of awardees) represented in the set of minutes from 2006 to 2008 and 703 institutional dyads ([38*37]/2 = 703). All pairs are directly connected in that there is at least one meeting where representatives from each pair co‐attended. However there is a great deal of variation. The most strongly connected pair is 1C08 and 1C12 with a value of 127 which means that in the 68 meetings there were 127 instances in which there was the co‐attendance by a pair of representatives from each institution. At the other extreme, there were a number of institutional pairs with a value of 2, that is, there were only two instances over the 68 meetings where a pair of representatives from each institution coattended. The mean was 30.5.

A portion of the basic matrix of connection for the institutional network is found in Table 2. The diagonal cell counts the number of times at least one representative from the row institution attended one of the 68 meetings. It is possible for this number to exceed 68 if the institution had more than one person attending meetings. Similarly a count less than 68 does not mean that the row institution had a representative at that many meetings since it is

Table 2. Part of the institutional matrix of connections

possible that it had multiple attendees at one or more meetings. The off diagonal cell counts refer to the number of pairs, one person from the row institution and one from the column institution that co‐attended a meeting. The (i,j) off diagonal cell is necessarily equal to the (j,i) off diagonal cell. To clarify what a particular cell value means concretely, we can take the lowest value in the entire matrix, namely, 2. It could come about by each institution being represented by one person at two meetings or by one of the institutions being represented by two persons and the other by one at one meeting. The count is best viewed as a measure of

1C01 1C02 1C03 1C04 1C05 1C06 1C07 … 1C01 1C02 1C03 1C04 1C05 1C06 1C07 …

50 66 44 52 57 58 55 …

66 77 80 78 97 93 77 …

44 80 68 72 95 79 56 …

52 78 72 67 77 76 55 …

57 97 95 77 73 88 63 …

58 93 79 76 88 77 63 …

55 77 56 55 63 63 50 …

… … … … … … … …

7

the potential for information flow between institutions with larger counts indicated greater opportunity for substantive contact between the two institutions through their representatives (either in the meeting or at another time).

Results of Analysis. The network of persons (identified by CTSA institution ID label and individual ID number) is displayed in Figure 1. (The layout is produced by an algorithm that tries to minimize the number of line crossings and place nodes that are close to each other in network distance close to each other in the xy plane.) If a person attended the meetings of only one committee, the color of the node representing that person is keyed to the color of the committee. If a person attended the meetings of more than one committee, the node representing that person is colored blue and is larger in size. Less than 10% of the 543 persons in the network – 35 to be exact – attended at least one meeting of more than one committee.3 The clustering in Figure 1 is clearly associated with different key function committees. The pattern occurs because few persons serve on more than one committee. Nevertheless, these few who do serve on two or more committees create enough linkages across committee clusters that the entire set of 543 persons is connected. In fact, the network itself is a very good example of a “small world” in the technical sense that there is a high degree of clustering (the associates of a person are likely to be tied to each other) but relatively short paths between any two pairs of persons.4 Ignoring the specific weight on each tie, the clustering coefficient for the person network is .71 and the average (shortest) path length between pairs of persons is 2.69. A clustering coefficient of .71 indicates that two associates of a person themselves have a tie 71% of the time – the maximum value being, of course, 100%. The shortest path length is 1 if there is a direct tie between the two individuals. In the person network, 8.4% of the dyads are connected directly (and so the density of the network is .084), 32% are connected by a path of length 2, 44% by a path of length 3, 13% by a path of length 4, and the rest by paths of length 5 or the maximum 6. That the clustering coefficient is so high, is not unexpected given how the person network is constructed – definitionally every subset of three persons who attended a meeting is completely connected and so contributes positively to the clustering coefficient value. Nevertheless, despite the high degree of clustering, the average length of the shortest path between two actors is less than 3 links.

In any network, some actors are more central than others. Two versions of centrality relevant here are degree centrality, which answers the question who has the most direct connections, and betweenness centrality, which answers the question given the set of shortest paths between pairs of persons, who is on relatively many of these paths. Table 3 lists the IDs

3 Specifically, 30 of the 35 attended at least one meeting of just two committees, four attended at least one meeting of three committees and one person attended at least one meeting of four committees.

4 Because individuals can maintain only a fixed number of ties, the more these ties go to persons who are already connected to each other, the fewer are the ties available to connect to other clusters. In the extreme, the network breaks into clusters with no ties between clusters and so infinite path lengths between any two persons in different clusters. This is the fascination of the “small world” phenomena – high clustering yet somehow short paths on average.

8

Figure 1. Network of co‐attendance ties among academic participants in CTSA consortium key function committee meetings

3C11001

2C04002

2C060033C14004

2C12005

1C03006

3C030072C02008

3C12009

2C04010

3C14011

3C08012

1C02013

3C06014

1C10015

2C05016

1C07017

3C08018

1C05019

1C120201C01021

2C08022

2C03023

1C04024

3C02025

1C06026

1C05027

2C04028

1C11029

1C08030

1C04031

2C02032

1C07033

1C12034

1C06035

1C12036

3C07037 3C060383C13039

2C11040

1C05041

3C030422C03043

2C12044

1C02045

1C03046

1C01047

1C06048

1C08049

1C11050

2C08051

2C12052

1C100532C10054

1C090552C03056

3C09057

2C10058

2C06059

2C100602C05061

2C12062

2C11063

1C12064

2C01065

1C07066

1C08067

2C09068

1C11069

2C05070

1C07071

3C010723C07073

2C05074

3C05075

2C060762C07077

2C08078

2C11079

3C04080

2C05081

2C02082

2C070831C080843C02085

3C10086

2C02087

2C09088

2C03089

2C11090

2C02091

1C12092

2C09093

2C11094

1C02095

2C03096

1C03097 1C03098

2C040992C061001C06101

3C10102

2C031031C03104

2C11105

3C13106 3C13107

3C08108

2C10109

2C05110

3C051111C11112

1C09113

1C051142C09115

3C05116

1C07117

3C041183C05119

1C08120

3C121213C01122

2C07123

1C04124

2C111252C05126

1C031273C12128

3C09129

3C07130

3C10131

3C04132

1C03133

1C08134

2C011351C02136

3C14137

3C03138

2C02139

3C021402C091411C08142

2C02143

3C07144

2C08145

1C02146

1C12147

1C05148

2C09149

3C03150

2C121511C04152

3C02153

3C07154 1C12155

1C03156

3C06157

2C02158

2C12159

1C05160

1C02161

2C06162

3C12163

3C04164

3C05165

1C11166

1C09167

2C011681C04169

3C09170

2C10171

1C01172

3C10173

1C10174

2C07175

2C081761C06177

3C07178

1C10179 2C051803C08181

1C01182

3C141832C10184

1C12185

2C08186

2C09187

3C08188

1C05189

3C13190

1C09191

1C08192

2C04193

1C07194

1C03195

3C12196

2C07197

1C03198

2C04199

3C05200

1C02201

1C08202

1C11203

1C12204

3C10205

2C04206

1C09207

3C04208

3C04209

1C02210

1C03211

3C05212

1C10213

3C14214

2C12215

2C06216

2C02217

3C052183C01219

1C06220

3C092213C09222

2C07223

2C05224

1C03225

1C05226

1C11227

1C052281C04229

2C022301C01231

1C12232

1C082331C05234

1C08235

3C05236

2C08237

2C10238

1C07239

2C03240

2C07241

1C12242

2C09243 1C02244

2C09245

2C01246

3C03247

2C022482C07249

2C01250

1C04251

2C05252

1C03253

1C03254

3C02255

2C09256

2C01257

1C03258

3C08259

1C03260

2C04261

2C082622C02263

1C02264

2C092653C04266

1C11267

1C07268

2C01269

2C11270

3C08271

3C08272

3C09273

1C10274

2C02275

3C03276

1C01277

?278

2C02279

1C10280

3C02281

1C08282

3C09283

1C08284

3C05285

1C05286

2C02287

2C07288

2C04289

3C062902C04291

2C04292

3C05293

2C10294

2C12295

3C12296

1C092971C03298

1C12299

3C12300

3C14301

1C08302

2C05303

2C04304

1C02305

1C05306

1C09307

3C04308

2C12309

1C06310

2C04311 1C013122C02313

1C12314

1C02315

3C06316

2C02317 1C08318

3C11319

1C05320

2C07321

3C10322

3C02323

2C11324

2C02325

1C07326

2C08327

3C05328

2C08329

3C07330

2C04331

2C09332

2C05333

2C07334

1C01335

3C06336

2C11337

3C05338

1C043392C08340

3C01341

3C10342

2C10343

1C11344

1C09345

1C06346

2C01347

2C11348

1C07349

1C03350

3C023511C09352

1C08353

1C11354

3C07355

2C02356

1C11357

2C02358

3C04359

2C013602C03361

2C06362

2C09363

1C08364

3C04365

2C12366

2C07367

2C02368

2C04369

1C08370

2C05371

3C08372

1C013732C083742C10375

2C08376

1C09377

3C10378

1C06379

1C02380

3C09381

1C05382

1C083833C12384

3C143851C03386

3C06387

1C06388

2C05389

1C12390

2C11391

1C08392

1C043931C10394

3C14395

1C07396

1C11397

2C12398

2C02399

1C01400

2C07401

1C124022C11403

1C03404

2C05405

1C02406Upenn407

3C034083C08409

3C04410

1C04411

2C02412 1C11413

3C13414 1C014151C07416

3C10417 2C024182C06419

1C124203C03421

3C07422

1C12423

2C08424

1C01425

2C04426

3C084271C12428

1C04429

1C08430

3C08431

3C09432

2C094333C03434

1C01435

1C03436

1C05437

2C06438

1C01439

2C01440

1C06441

1C05442

1C014432C05444

1C07445

3C02446

2C07447

1C07448

2C02449

3C06450

2C104513C01452

3C11453 2C04454

3C03455

3C12456

2C07457

2C03458

2C05459

2C10460

1C084611C10462

1C02463

1C06464

1C05465

1C08466

1C09467

1C024681C11469

1C08470

1C02471

1C06472

1C12473

1C03474

1C09475

1C07476

1C04477

1C12478

1C11479

1C09480

1C07481

1C124821C07483

1C08484

3C04485

1C08486

1C094871C12488

1C034891C07490

1C05491

1C08492

2C08493

2C12494

2C074951C10496

1C04497

1C11498

2C05499

2C01500

3C10501

2C02502

1C09503

2C11504

3C07505

1C02506

3C04507

1C06508

2C04509

1C06510

2C02511

1C12512

2C05513

2C04514

1C09515

3C14516

3C015173C04518

1C10519

3C09520

1C12521

3C11522

1C01523

1C05524

2C11525

1C02526

1C01527

2C085281C08529 1C07530

1C12531

?532

1C11533

2C09534

1C10535

2C095362C09537

1C03538

1C075391C08540

1C06541

1C095421C04543

9

Table 3. Ten most central persons for two measures of centrality

*The count of the number of ties to other actors in the network. **The extent to which a node lies between other nodes in the network. This measure takes into account the connectivity of the node's neighbors, giving a higher value for nodes which bridge clusters.

of the top ten participants in terms of degree centrality and betweenness centrality. Four of the ten are found in both lists: 1C10174 (ranked 1 and 4), 1C09307 (ranked 2 and 1), 1C0849 (ranked 4 and 3), and 1C02161 (ranked 7 and 10). Therefore, four persons are central figures in the network both in terms of number of links to others and in terms of being “between” others – being on many of the shortest paths connecting pairs of other actors. However, other persons in the top ten lists are either central only because they have relatively many connections or because they are “gatekeepers.” Of the top ten most central actors in terms of connections, nine are from the first cohort of CTSA centers and one from the second cohort. Among the ten most central actors in terms of being between others, six are from the first cohort, and two each from the second and third cohort. Given that individuals from the first cohort would have had more time to establish connections, one would expect them to show up on these two top ten lists. More surprising, perhaps, is that anyone from the second or especially the third cohort shows up as among the most central of actors. Useful additional analyses will be done once we collect data on important attributes of the actors, such as seniority, position, and academic discipline.

Without the 35 persons who attend meetings of two or more committees the Figure 1 network would break into distinct components, one per key function committee, not connected to each other. Each of the 30 persons who attended meetings of two committees effectively creates a tie between those two committees. Each of the four persons who attended meetings of three committees creates a tie between three pairs of committees and the one person who attends meetings of four committees effectively creates a pairwise tie for six committee pairs.

Rank Degree* Betweenness** 1 2 3 4 5 6 7 8 9 10

1C10174 1C09307 2C03056 1C08049 1C08067 1C11069 1C02161 1C12036 1C09055 1C07071

1C09307 2C12044 1C08049 1C10174 3C05212 1C10274 2C11040 3C13190 1C10015 1C02161

10

Figure 2. How KFCs are connected by people who attend 2 or more KFCs

Admin

BERD

Communication

Community

CRI

Education

Evaluation

Informatics

PublicPrivate

Translational

Therefore the 35 persons collectively create 48 ties between pairs of committees and this network is depicted in Figure 2, where line thickness is proportional to the number of coattendees and size of node is proportional to its degree (number of direct connections to other nodes).

Figure 2 reveals that the largest amount of coattendance at the meetings of a pair of KFCs occurs between the Administration and the Communication KFC. The next largest amount occurs between the Administration and the Evaluation KFC. Administration and Evaluation are the most central of the KFCs connecting directly to seven others. Five of the ties Administration has to other committees consist of a single person who has coattended meetings of both committees. This is true for only two of the ties of Evaluation. Therefore, while both Evaluation and Administration are central to the overall pattern of connection in Figures 1 and 2, Evaluation may be considered somewhat more central. That Evaluation and Administration should be most heavily involved in the integration of the KFC structure follows from the importance of both to the tasks of the more specialized KFCs. One final observation pertains to the Translational KFC – absent one person who has coattended at least one of its meetings and meetings of the Administration KFC, it would be disconnected from the rest of the committee structure.

Figure 3 displays the institutional network. Since it is completely connected, the picture is not especially revealing even when the size of the edges connecting two institutions is proportional to the number of pairs of institutional representatives that coattended kfc

11

Figure 3. The inter‐institutional network, all ties

meetings. One can see that, in general, thicker edges connect institutions in the first cohort (red) to institutions in the second cohort (blue) than either to institutions in the third cohort (yellow). The fact that the ties differ in strength means the network can be analyzed byrestricting ties according to how strong they are. There are 38 institutions and therefore 703 institutional dyads (unordered pairs of institutions) ([38*37]/2 = 703).

Figure 4 graphs the distribution of tie strength to other institutions in an institution’s cohort (panels 1, 3, and 6 reading from top to bottom) and to institutions in other cohorts (panels 2,4, and 5). It shows that intra‐cohort tie strength is related to cohort year: the average intra‐cohort tie strength for the 2006 cohort is 70.2, for the 2007 cohort, it is 51.0 and for the 2008 cohort, it is 12.8. A similar pattern holds for inter‐cohort ties: the ties between 2006 and 2007 (49.5) are stronger than the ties between 2006 and 2008 (15.2) and 2007 and 2008 (15.4). These relationships are to be expected simply because an older cohort member has had more opportunity to send representatives to meetings than a younger cohort member. However, there is a great deal of variability within the older cohorts (2006 and 2007) and thus differences in how well connected institutions in a given cohort are.

1C011C08

1C09

1C12

1C10

1C06

1C03

1C11

1C02

1C04

1C05

1C07

2C06 2C02

2C07

2C01

2C10

2C12

2C032C05

2C09

2C11

2C08

2C04

3C14

3C133C10

3C07

3C08

3C04

3C11

3C02

3C01

3C09

3C033C05

3C06

3C12

12

Figure 4. Distributions of tie strength by cohorts

This network was also analyzed by examining the overall pattern of connections at different “thresholds” of tie strength: what institutions remain connected to one another if we only look at ties that have strength greater than certain thresholds. Figure 5 shows, for instance, that if we take a threshold of 66, which is the median strength of ties between institutions in the oldest cohort, the youngest cohort will no longer be connected to the other institutions. That is, the ties from the 2008 cohort to nearly all of the other institutions have a value of 66 or smaller and so are among the weaker ties in the original network depicted in Figure 3. Figure 5 demonstrates this fact: all of the institutions in the 2008 cohort and three of the institutions in the 2007 cohort are no longer connected to the rest of the institutions. It

Strength of Tie

Freq

uenc

ies

from

Coh

ort x

to C

ohor

t y

05

1015

0 50 100

2006

-200

6

051015

2006

-200

7

05

1015

2006

-200

8

051015

2007

-200

7

05

1015

2007

-200

8

051015

2008

-200

8

13

Figure 5. The inter‐institutional network, strength threshold=66*

*The only ties represent in the diagram are ones that have a value greater than 66. All ties from the connected cluster to the isolates on the left have a strength of 66 or less.

is also clear from this “strong tie” network that some institutions in both the 2006 and 2007 cohort are more tenuously connected than others, in particular, 1C09, 2C01, and 2C11.

The network in Figure 5 can be analyzed to determine which institutions are most central in this network of “strong” ties. If importance is defined as sheer number of connections the top five most central nodes are: 1C08, 1C12, 2C02, 1C02, and 1C06 tied with 1C05 for fifth. The top five in terms of betweenness are 1C08, 2C02, 1C12, 1C02, and 2C08. It is noteworthy that two institutions in the 2007 cohort appear in this top five. In general, 2006 awardees are more embedded in the institutional network than 2007 awardees or 2008 awardees for obvious reasons. To the extent that the co‐presence of institutional representatives at relatively many meetings can proxy for a greater opportunity for exchange of information and ideas, one would expect greater prospects for cross institutional collaborations among the institutions that remain connected in Figure 4 than among the institutions no longer connected to these or to each other.

Next Steps. One purpose of the preliminary analysis was to assess the difficulties with and prospects for using archival data to characterize the structure and operation of the CTSA Initiative. Consequently, relatively near to hand and accessible sources were accessed, namely,

1C01

1C08

1C09

1C12

1C10

1C06

1C03

1C11

1C02

1C04

1C05

1C07

2C06 2C02

2C072C01

2C102C12

2C03

2C05

2C09

2C11

2C08

2C04

3C14

3C13

3C10

3C07

3C08

3C04

3C11

3C02

3C01

3C09

3C03

3C05

3C06

3C12

14

the minutes of the key function committees. The completeness of this set of minutes will be investigated more thoroughly. A second next step on the data collection side is to seek out and use minutes from other group meetings. The minutes of meetings of the strategic goal committees, for instance, are available at the ctsaweb.org website, and the wiki has minutes of some workgroup meetings. Since the context of these meetings is sufficiently different from the context of the key function committee meetings, combination of data from these sources must be done so that disaggregation is possible. A third next step on the data collection side is to acquire data on relevant background attributes of individuals attending these various meetings and on the institutions. Such data would provide the opportunity for a more nuanced analysis of the network of persons displayed in Figure 1 and the network of institutions displayed in Figures 3 and 5. These data will be obtained through key informant interviews and surveys. Westat will also collect network data directly by asking respondents to report communication and collaborations with other members of the CTSA initiative.

The time frame for building the baseline network needs to be assessed in terms of future analyses. One research question of interest is how the network of connections between persons and the derived network of connections between institutions changes over time and that can be addressed by comparing networks built on different time slices in the consortium’s history. Currently the baseline includes the years 2006 through 2008, in part because of fluctuation in the regularity with which meetings were held during the “start up” phase of the consortium. If meeting schedules have now been regularized, it is likely that two calendar years will yield sufficient data points to build a time 2, time 3, etc. network for comparison.

Finally, the networks analyzed so far consider only the role of CTSA award institutions. The various personnel from the NIH who participate in the key function committee meetings have not been integrated into the overall analysis. Some consideration should be given as to whether and how their participation should be incorporated and analyzed.

social network analysis and ctsa consortium structure and ...faculty.cas.usf.edu/jskvoretz/social...

Documents