agent-based modeling and simulation of collaborative...

35
Agent-Based Modeling and Simulation Agent-Based Modeling and Simulation of Collaborative Social Networks of Collaborative Social Networks Research in Progress Research in Progress Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science North Carolina State University Renee Tynan Chris Hoffman Department of Management University of Notre Dame Supported in part by the Supported in part by the National Science Foundation - Digital Society & Technology Program National Science Foundation - Digital Society & Technology Program AMCIS2003 Tampa, FL August 2003

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Agent-Based Modeling and SimulationAgent-Based Modeling and Simulationof Collaborative Social Networksof Collaborative Social Networks

Research in ProgressResearch in Progress

Greg Madey

Yongqin GaoComputer Science &

EngineeringUniversity of Notre Dame

Vincent FreehComputer Science

North Carolina StateUniversity

Renee Tynan

Chris HoffmanDepartment ofManagement

University of Notre Dame

Supported in part by the Supported in part by the

National Science Foundation - Digital Society & Technology ProgramNational Science Foundation - Digital Society & Technology Program

AMCIS2003

Tampa, FL

August 2003

OutlineOutline

•• Definitions: Agents, models, simulations, collaborativeDefinitions: Agents, models, simulations, collaborativesocial networks, computer experimentssocial networks, computer experiments

•• Phenomenon: Free/Open Source Software (F/OSS)Phenomenon: Free/Open Source Software (F/OSS)•• Conceptual modelsConceptual models

–– ER modelER model–– BA modelBA model–– BA model with constant fitnessBA model with constant fitness–– BA model with dynamic fitnessBA model with dynamic fitness

•• Experiments and resultsExperiments and results•• SummarySummary•• Some discussion questionsSome discussion questions

Agent-Based Modeling andAgent-Based Modeling andSimulationSimulation

•• Conceptual models of a phenomenonConceptual models of a phenomenon•• Simulations are computer implementations of theSimulations are computer implementations of the

conceptual modelsconceptual models•• Agents in models and simulations are distinctAgents in models and simulations are distinct

entities (instantiated objects)entities (instantiated objects)–– Tend to be simple, but with large numbers of themTend to be simple, but with large numbers of them

(thousands, or more) - i.e., swarm intelligence(thousands, or more) - i.e., swarm intelligence–– Contrasted with higher level Contrasted with higher level ““intelligent agentsintelligent agents””

•• Foundations in complexity theoryFoundations in complexity theory–– Self-organizationSelf-organization–– EmergenceEmergence

Collaborative Social NetworksCollaborative Social Networks•• Research-paper co-authorship, small world phenomenon, e.g., Research-paper co-authorship, small world phenomenon, e.g., ErdosErdos

number number ((Barabasi Barabasi 2001, Newman 2001)2001, Newman 2001)

•• Movie actors, small world phenomenon, e.g., Kevin Bacon numberMovie actors, small world phenomenon, e.g., Kevin Bacon number(Watts 1999, 2003)(Watts 1999, 2003)

•• Interlocking corporate directorshipsInterlocking corporate directorships•• Open-source software developers Open-source software developers (Madey et al, AMCIS 2002)(Madey et al, AMCIS 2002)

•• Collaborators are nodes in a graph, and collaborative relationship areCollaborators are nodes in a graph, and collaborative relationship arethe edges of the graphthe edges of the graph

Classical Scientific MethodClassical Scientific Method

1.1. Observe the worldObserve the worlda)a) Identify a puzzling phenomenonIdentify a puzzling phenomenon

2.2. Generate a falsifiable hypothesis Generate a falsifiable hypothesis (K. Popper)(K. Popper)

3.3. Design and conduct an experiment with theDesign and conduct an experiment with thegoal of disproving the hypothesisgoal of disproving the hypothesisa)a) If the experiment If the experiment ““failsfails””, then the hypothesis is, then the hypothesis is

accepted (until replaced)accepted (until replaced)b)b) If the experiment If the experiment ““succeedssucceeds””, then reject hypothesis,, then reject hypothesis,

but additional insight into the phenomenon may bebut additional insight into the phenomenon may beobtained and steps 2-3 repeatedobtained and steps 2-3 repeated

The Computer ExperimentThe Computer Experiment

Agent-Based Simulation asAgent-Based Simulation asa Component of thea Component of theScientific MethodScientific Method

Modeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Agent-Based Simulation asAgent-Based Simulation asa Component of thea Component of theScientific MethodScientific Method

Modeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Social NetworkModel of F/OSS

Grow ArtificialSourceForge

Analysis ofSourceForge

Data

Open Source Software (OSS)Open Source Software (OSS)•• Free Free ……

–– to view sourceto view source–– to modifyto modify–– to shareto share–– of costof cost

•• ExamplesExamples–– ApacheApache–– PerlPerl–– GNUGNU–– LinuxLinux–– SendmailSendmail–– PythonPython–– KDEKDE–– GNOMEGNOME–– MozillaMozilla–– Thousands moreThousands more

LinuxGNU

Savannah

Free Open Source Software (F/OSS)Free Open Source Software (F/OSS)

•• DevelopmentDevelopment–– Mostly volunteerMostly volunteer–– Global teamsGlobal teams–– Virtual teamsVirtual teams–– Self-organized - often peer-based meritocracySelf-organized - often peer-based meritocracy–– Self-managed - but often a Self-managed - but often a ““charismaticcharismatic”” leader leader–– Often large numbers of developers, testers, support help, endOften large numbers of developers, testers, support help, end

user participationuser participation–– Rapid, frequent releasesRapid, frequent releases–– Mostly unpaidMostly unpaid

F/OSSF/OSSDevelopersDevelopers

Linus TolvaldsLinux

Larry WallPerl

Richard StallmanGNU

GNU ManifestoEric RaymondCathedral and Bazaar

F/OSS: A F/OSS: A Puzzling Puzzling PhenomenonPhenomenon

•• Contradicts traditionalContradicts traditionalwisdom:wisdom:–– Software engineeringSoftware engineering–– Coordination, large numbersCoordination, large numbers–– Motivation of developersMotivation of developers–– QualityQuality–– SecuritySecurity–– Business strategyBusiness strategy

•• Almost everything is doneAlmost everything is doneelectronically and available inelectronically and available indigital formdigital form

•• Opportunity for IS ResearchOpportunity for IS Research-- large amounts of online-- large amounts of onlinedata availabledata available

•• Research issues:Research issues:–– Understanding motivesUnderstanding motives–– Understanding processesUnderstanding processes–– Intellectual propertyIntellectual property–– Digital divideDigital divide–– Self-organizationSelf-organization–– Government policyGovernment policy–– Impact on innovationImpact on innovation–– EthicsEthics–– Economic modelsEconomic models–– Cultural issuesCultural issues–– International factorsInternational factors

SourceForgeSourceForge

• VA Software• Part of OSDN• Started 12/1999• Collaboration tools• 58,685 Projects• 80,000 Developers• 590,00 RegisteredUsers

SavannahSavannah• Uses SourceForgeSoftware• Free SoftwareFoundation•1,508 Projects•15,265 RegisteredUsers

F/OSS: Importance

Major Component of e-Technology Infrastructure with majorpresence in

e-Commercee-Sciencee-Governmente-Learning

Apache has over 65% market share of Internet Web serversLinux on over 7 million computersMost Internet e-mail runs on SendmailTens of thousands of quality productsPart of product offerings of companies like IBM, Apple

Apache in WebSphere, Linux on mainframe, FreeBSD in OSXCorporate employees participating on OSS projects

Free/Open Source SoftwareFree/Open Source Software

•• Seems to challenge traditional economic assumptionsSeems to challenge traditional economic assumptions•• Model for software engineeringModel for software engineering•• New business strategiesNew business strategies

–– Cooperation with competitorsCooperation with competitors–– Beyond trade associations, shared industry research, andBeyond trade associations, shared industry research, and

standards processes standards processes —— shared product development! shared product development!

•• Virtual, self-organizing and self-managing teamsVirtual, self-organizing and self-managing teams•• Social issues, e.g., digital divide, internationalSocial issues, e.g., digital divide, international

participationparticipation•• Government policy issues, e.g., US software industry,Government policy issues, e.g., US software industry,

impact on innovation, security, intellectual propertyimpact on innovation, security, intellectual property

Research ModelResearch Model

Parameter Values

Structural Features

Parameter Values

Cross Validation

Structural Features

Combined Data MiningParameter Values

Understanding the Social and Task

Dynamics that Predict Developer Behaviors

Social Network Analysis: Longitudinal

Study of Preferential Attachment and Dynamic

Attachment

Conceptual Explanatory Model of

OSS: Agent-Based Modeling and Simulation

ObservationsObservations

•• Web miningWeb mining•• Web crawler (scripts)Web crawler (scripts)

–– PythonPython–– PerlPerl–– AWKAWK–– SedSed

•• MonthlyMonthly•• Since Jan 2001Since Jan 2001•• ProjectIDProjectID•• DeveloperIDDeveloperID•• Almost 2 million recordsAlmost 2 million records•• Relational databaseRelational database

PROJ|DEVELOPER8001|dev3788001|dev89758001|dev99728002|dev276508005|dev313518006|dev125098007|dev193958007|dev46228007|dev356118008|dev8975

Models of the F/OSS Social NetworkModels of the F/OSS Social Network(Alternative Hypotheses)(Alternative Hypotheses)

•• General model featuresGeneral model features–– Agents are nodes on a graph (developers or projects)Agents are nodes on a graph (developers or projects)–– Behaviors: Create, join, abandon and idleBehaviors: Create, join, abandon and idle–– Edges are relationships (joint project participation)Edges are relationships (joint project participation)–– Growth of network: random or types of preferentialGrowth of network: random or types of preferential

attachment, formation of clustersattachment, formation of clusters–– FitnessFitness–– Network attributes: diameter, average degree, degreeNetwork attributes: diameter, average degree, degree

distribution, clustering coefficientdistribution, clustering coefficient•• Four specific modelsFour specific models

–– ER (random graph) - (1960)ER (random graph) - (1960)–– BA (preferential attachment) - (1999)BA (preferential attachment) - (1999)–– BA ( + constant fitness) - (2001)BA ( + constant fitness) - (2001)–– BA ( + dynamic fitness) - (2003)BA ( + dynamic fitness) - (2003)

15850 dev[46]dev[83] 15850 dev[46]

dev[48]

15850 dev[46]dev[56]

15850 dev[46]dev[58]

6882 dev[58]dev[47]

6882 dev[47]dev[79]

6882 dev[47]dev[52]

6882 dev[47]dev[55]

7028 dev[46]dev[99]

7028 dev[46]dev[51]

7028 dev[46]dev[57] 7597 dev[46]

dev[45]

7597 dev[46]dev[72]

7597 dev[46]dev[55]

7597 dev[46]dev[58]

7597 dev[46]dev[61]

7597 dev[46]dev[64]7597 dev[46]

dev[67]

7597 dev[46]dev[70]

9859 dev[46]dev[49]9859 dev[46]

dev[53]

9859 dev[46]dev[54]

9859 dev[46]dev[59]

dev[46]

dev[83] dev[56]

dev[48]

dev[52]

dev[79]

dev[72]

dev[51]

dev[57]

dev[55]

dev[99]

dev[47]

dev[58]

dev[53]

dev[58]

dev[65]

dev[45]

dev[70]

dev[67]

dev[59]

dev[54]

dev[49]

dev[64]

dev[61]

Project 6882

Project 9859

Project 7597

Project 7028

Project 15850

F/OSS Developers - Collaboration Social NetworkDevelopers are nodes / Projects are links

24 Developers5 Projects

2 Linchpin Developers1 Cluster

Computer ExperimentsComputer Experiments

•• Agent-based simulationsAgent-based simulations

•• Java programs using Swarm class libraryJava programs using Swarm class library–– Validation (docking) exercises using Java/RepastValidation (docking) exercises using Java/Repast

•• Grow artificial Grow artificial SourceForgeSourceForge’’s s (Epstein & Axtell, 1996)(Epstein & Axtell, 1996)

–– Parameterized with observed data, e.g., developer behaviorsParameterized with observed data, e.g., developer behaviors•• Join ratesJoin rates•• New project additionsNew project additions

•• Leave projectsLeave projects

–– Evaluation of four models (hypotheses)Evaluation of four models (hypotheses)

–– Verification/validationVerification/validation

Four Cycles of Modeling &Four Cycles of Modeling &SimulationSimulation

Modeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Social Network ModelsER => BA => BA+Fitness => BA+Dynamic Fitness

Grow ArtificialSourceForge

Analysis ofSourceForge

Data

Degree DistributionAverage Degree

DiameterClustering Coefficient

Cluster Size Distribution

ER model ER model –– degree distribution degree distribution

• Degreedistribution isbinomialdistribution whileit is power law inempirical data

• Fit fails

ER model - diameterER model - diameter

•• Average degree isAverage degree isdecreasing while it isdecreasing while it isincreasing in empiricalincreasing in empiricaldatadata

•• Diameter is increasingDiameter is increasingwhile it is decreasing inwhile it is decreasing inempirical dataempirical data

• Fit fails

ER model ER model –– clustering coefficient clustering coefficient

•• Clustering coefficient isClustering coefficient isrelatively low around 0.4relatively low around 0.4while it is around 0.7 inwhile it is around 0.7 inempirical data.empirical data.

•• Clustering coefficient isClustering coefficient isdecreasing while it isdecreasing while it isincreasing in empiricalincreasing in empiricaldatadata

• Fit fails

ER model ER model –– cluster distribution cluster distribution

•• Cluster distribution in ERCluster distribution in ERmodel also have power lawmodel also have power lawdistribution with Rdistribution with R22 as 0.6667 as 0.6667(0.9953 without the major(0.9953 without the majorcluster) while Rcluster) while R22 in empirical in empiricaldata is 0.7457 (0.9797data is 0.7457 (0.9797without the major cluster)without the major cluster)

•• The actual distribution isThe actual distribution isdifferent from empirical datadifferent from empirical data

•• The later models (BA andThe later models (BA andfurther models) have similarfurther models) have similarbehaviorsbehaviors

• Fit fails

BA model BA model –– degree distribution degree distribution

•• Power laws in degreePower laws in degreedistribution, similar todistribution, similar toempirical data (+ forempirical data (+ forsimulated data and x forsimulated data and x forempirical data).empirical data).

•• For developer distribution:For developer distribution:simulated data has Rsimulated data has R22 of of0.9798 and empirical data has0.9798 and empirical data hasRR22 of 0.9712. of 0.9712.– Fit succeeds

•• For project distribution:For project distribution:simulated data has Rsimulated data has R22 of of0.6650 and empirical data has0.6650 and empirical data hasRR22 of 0.9815. of 0.9815.– Fit fails

BA model BA model –– diameter and CC diameter and CC

•• Small diameter and highSmall diameter and highclustering coefficient likeclustering coefficient likeempirical dataempirical data

•• Diameter and clusteringDiameter and clusteringcoefficient are bothcoefficient are bothdecreasing like empiricaldecreasing like empiricaldatadata

• Fit succeeds

BA model with constant fitnessBA model with constant fitness

•• Power laws in degree distribution,Power laws in degree distribution,similar to empirical data (+ forsimilar to empirical data (+ forsimulated data and x for empiricalsimulated data and x for empiricaldata).data).

•• For developer distribution:For developer distribution:simulated data has Rsimulated data has R22 as 0.9742 and as 0.9742 andempirical data has Rempirical data has R22 as 0.9712. as 0.9712.– Fit succeeds

•• For project distribution: simulatedFor project distribution: simulateddata has Rdata has R22 as 0.7253 and empirical as 0.7253 and empiricaldata has Rdata has R22 as 0.9815. as 0.9815.– Fit fails

•• Diameter and CC are similar toDiameter and CC are similar tosimple BA model.simple BA model.– Fit succeeds

Discovery: BA with dynamic fitnessDiscovery: BA with dynamic fitness

•• Problem with BA with constant fitnessProblem with BA with constant fitness–– Intuition: Project fitness might change with time.Intuition: Project fitness might change with time.

•• Data mining observation: project Data mining observation: project ““life cyclelife cycle””property - fitness generally decreases with timeproperty - fitness generally decreases with time

•• New model not in the literatureNew model not in the literature–– Hypothesis: BA with dynamic fitness of projectsHypothesis: BA with dynamic fitness of projects–– Computer experimentComputer experiment

BA model with dynamic fitnessBA model with dynamic fitness

•• Power laws in degreePower laws in degreedistribution, similar todistribution, similar toempirical data (+ forempirical data (+ forsimulated data and x forsimulated data and x forempirical data).empirical data).

•• For developer distribution:For developer distribution:simulated data has Rsimulated data has R22 as as0.9695 and empirical data has0.9695 and empirical data hasRR22 as 0.9712. as 0.9712.– Fit succeeds (as before)

•• For project distribution:For project distribution:simulated data has Rsimulated data has R22 as as0.8051 and empirical data has0.8051 and empirical data hasRR22 as 0.9815. as 0.9815.– Fit is better, but more work

needed

Agent-Based Modeling and SimulationAgent-Based Modeling and Simulationas Components of the Scientific Methodas Components of the Scientific Method

Observation

Hypothesis

Experiment

SummarySummary

•• Why Agent-Based Modeling and Simulation?Why Agent-Based Modeling and Simulation?–– Can be used as components of the Scientific MethodCan be used as components of the Scientific Method–– A research approach for studying socio-technicalA research approach for studying socio-technical

systemssystems

•• Case study: F/OSS - Collaboration Social NetworksCase study: F/OSS - Collaboration Social Networks–– SourceForge SourceForge conceptual models: ER, BA, BA withconceptual models: ER, BA, BA with

constant fitness and BA with dynamic fitness.constant fitness and BA with dynamic fitness.–– SimulationsSimulations

•• Computer experiments that tested conceptual modelsComputer experiments that tested conceptual models•• Provided insight into the phenomenon under study and guidedProvided insight into the phenomenon under study and guided

data mining of collected observationsdata mining of collected observations

DiscussionDiscussion

•• ““The social sciences are, in fact, the The social sciences are, in fact, the ‘‘hardhard’’ sciences sciences””,,Herbert Simon (1987)

• Computational social science: agent-based modelingand simulation

• Kuhn’s periods of “Normal Science” punctuated by“Paradigm shifts”

• Karl Popper’s “theory-testing through falsification””

• Relevant literature on the role of simulation in theprocess of scientific discovery

Thank youThank you