howell and egley, 2005 - governor's office of crime control

22
IRIS27 Eija Halkola: Scientific collaboration and information infrastructures – Information practices in LIDET-experiment Scientific collaboration and information infrastructures – Information practices in LIDET- experiment Eija Halkola Department of Information Processing Sciences, University of Oulu, P.O. Box 3000, FIN-90014 University of Oulu, Finland [email protected] Abstract: This paper studies scientific collaborations and collaborative infrastructures through focusing on everyday information practices of data management of the distributed ecological Long-Term Intersite Decomposition Team (LIDET)-experiment. LIDET, being part of the Long Term Ecological Research (LTER) network, has placed an emphasis on long-term preservation and sharing of data. The research is empirically grounded and applies qualitative research approaches and a socio-technical perspective aiming to produce descriptive understandings of the aspects of information practices including planning, implementation and development of information infrastructures in the LIDET-case study. Collaborative preplanning and centralized data management created a

Upload: others

Post on 10-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Scientific collaboration and informationinfrastructures – Information practices in LIDET-

experiment

Eija Halkola

Department of Information Processing Sciences, University of Oulu, P.O. Box3000, FIN-90014 University of Oulu, Finland

[email protected]

Abstract: This paper studies scientific collaborations and collaborativeinfrastructures through focusing on everyday information practices of datamanagement of the distributed ecological Long-Term Intersite Decomposition Team(LIDET)-experiment. LIDET, being part of the Long Term Ecological Research(LTER) network, has placed an emphasis on long-term preservation and sharing ofdata. The research is empirically grounded and applies qualitative researchapproaches and a socio-technical perspective aiming to produce descriptiveunderstandings of the aspects of information practices including planning,implementation and development of information infrastructures in the LIDET-casestudy. Collaborative preplanning and centralized data management created a

Page 2: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

sustainable information infrastructure. The infrastructure was also able to adjust tochanges in working practices, an increase in the number of collaborators anddevelopments in technology. Empirical material used in this case study is collectedthrough ethnographic fieldwork in the Long-Term Ecological Research (LTER)network during the year 2002.

Keywords: collaborative infrastructure; information practices, use and long termpreservation of data; metadata; distributed scientific experiments; organization;coordination; practical experience

Page 3: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

1. Introduction

The research relates to the context and studies of scientific collaborations(Karasti et al., 2004) and collaborative infrastructures (Star & Bowker, 2002;Star & Ruhleder, 1996; Baker et al. 2002) by studying the information practicesof a networked ecological experiment and its infrastructural support forcollaboration. In this case study of the LIDET-experiment, in collaboration withLong Term Ecological Research (LTER) network, the emphasis is on theproblematics of distributed scientific collaboration and long-term informationmanagement. Research is empirically grounded and applies a socio-technicalperspective (Kling 1999) aiming to produce descriptive understandings of theelements of information practices and collaboration infrastructures of the caseorganization. The main information practices in LIDET-experiment relate togathering and storing environmental data, creating metadata and designing andmaintaining databases.

The focus in this case study has been on the Community of Practice (CoP) (Lave& Wenger, 1991) that formed the research coordination and data managementgroup called “Litterbag Central” at the H.J. Andrews site, Oregon StateUniversity (OSU). Communities of practice are groups of people who sharesimilar goals and interests. In pursuit of these goals and interests, they employcommon practices, work with the same tools and express themselves in acommon language. The CoP, data management group of LIDET consisted of theinformation manager of the coordinating site, the principal investigator, tworesearch assistants, computer support employees and temporary assistants. Forinformation manager, the data manager group consisting of a representativefrom each LTER site, also provided a forum in its annual meetings for cross-siteconversations, collaborative project development and joint network technologystrategizing (Karasti & Baker 2004).

In LIDET, as in the LTER context in general, the information managers havehad a fundamental role in both developing the technological infrastructure(Baker et al. 2000) and in providing ongoing support for science and data care(Karasti & Baker 2004). Ecologists of the LIDET-team as well as other LTERscientists, responsible for collecting and interpreting data, benefit from workingin partnership with data and information managers, to ensure optimal data andinformation availability (Baker et al., 2000).

Recent advances in communication technologies and infrastructure havebrought new possibilities for scientific collaboration and networking. Science ingeneral increasingly depends on geographically distributed collaboration.(Finholt 2002.) Knowledge is constructed jointly at a distance bymultidisciplinary teams supported by electronic communication andinformation infrastructures (Kanfer et al., 2000). There exists a discrepancy

Page 4: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

between the potential to understand the contextual knowledge practices andemploy the situated knowledge of scientific collaborations, and the limited waysin which the current information infrastructures can support these kinds ofknowledge practices in the context of extended collaborations (Kanfer et al.2000).

“Embedded” authentic knowledge of actors is entrenched within specificCommunities of Practice (Lave and Wenger, 1991). Through analysing theempirical research material of this case study by applying a socio-technicalapproach (Kling 1999) and qualitative analysis methods, it is possible to studythe data management of the LIDET-experiment embedded within planning,organizing and coordinating the ecological research. Some of the situatedknowledge related to everyday data management practices of LIDET are alsodescribed and discussed. Focusing on everyday information practices ismotivated by profound theoretical notions of working practices (Karasti 2001,cf. Karasti and Baker 2004). Information practices in this case study contextrelate to practices of information gathering, archiving, retrieval, use, andsharing.

In this study information infrastructures are seen as socio-technical constructs(Star & Ruhleder 1996, Star & Bowker 2002) that operate in specific social andorganizational settings. This paper focuses upon the notions of infrastructuresuggested by Star and Ruhleder (1996), which are used as a framework todescribe the information practices in creating infrastructure for scientificcollaboration. Star and Ruhleder(1996) see infrastructure as part of humanorganization. In addition Star and Bowker (2002) define infrastructure as beingrelative to working conditions and argue that it never stands apart from thepeople who design, maintain and use it. By foregrounding the backgroundelements of information practices of the case, this case study performs Bowker’s(1994) concept of infrastructural inversion.

As background information this paper describes the context of the targetorganisation and explains research materials and methods used in this casestudy. The elements of data management of the long-term ecological LIDET-experiment are described including the intertwined preplanning of research anddata management and also data management during the long-term research.

This paper also presents the characteristic aspects, distributedness and long-term perspective of the LIDET-experiment, which strongly defined the datamanagement practices and challenges addressed to its informationmanagement. The centralized research and data management as well as theadaptation and modification of research and data management practices arepresented as the main solutions to these data management challenges.

Page 5: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

2. Background

Complex issues (e.g., global change, sustainability, biodiversity, and emergingdiseases) require interdisciplinary collaboration and synthesis over muchbroader spatial and longer temporal scales (Levin, 1992; Kareiva & Anderson,1988).

The Long-Term Ecological Research (LTER) program was initiated in 1980 bythe National Science Foundation (NSF). The central, organizing intellectual aimof the LTER program is to understand long-term patterns and processes ofecological systems at multiple spatial scales (Hobbie et al., 2003). Today LTERis a federation of twenty-four independent research sites, one network office sitein the United States (Hobbie et al., 2003) and in addition a developingInternational LTER (Gosz, 1999). The LTER Network initial vision (Franklin etal., 1990) and continuing mission (Hobbie et al., 2003) include the concept ofdata management, requiring it to be part of each research site’s science plan.Since the mid 1990’s, each site is required to have data available on the Internettwo years after its collection. In information management LTER is faced withthe specific challenge of how to maintain datasets over the long-term (Micheneret al., 2000) which are useable and understandable after decades, possibly evenafter centuries (Baker et al., 2002).

2.1 Empirical background

LIDET-experiment (http://www.fsl.orst.edu/lter/research/intersite/lidet.htm),in collaboration with LTER network, was a 10-year, inter-site research to testthe effect of substrate quality and macroclimate on long-term decompositionand nutrient dynamics of fine litter. The LIDET decomposition experimentshave been installed at distributed sites that span a wide array of ecosystemsincluding the 17 LTER sites and 4 non-LTER sites. Seven additional sites wereadded in 1991 bringing the total to 28 participating sites. Team experiments,such as LIDET, designed and set up as an intersite experiment and carried outsimultaneously at many sites may help us to understand ecosystem behaviorover greater temporal and spatial scales. (Long-Term Intersite DecompositionTeam, 1995.) The LIDET-experiment was coordinated inside the LTER-networkand benefited from the existing technical infrastructure developed within theLTER-network.

The research material flow was centrally organised and coordinated in thelaboratory of the H.J. Andrews LTER site in order to minimize the fieldcollaborators’ responsibilities. The Data management group from the organisingsite was responsible for chemical analysis, data management and preliminarydata analysis (Long-Term Intersite Decomposition Team, 1995). During theexperiment, data management personnel became experts of long-term research

Page 6: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

and were asked for advice related to research supplies and arrangements.Samples were weighed, handled and analyzed in one laboratory withstandardized methods, thus assuring the comparability of the results ofchemical analyses. Standardized methods made the results comparable so thatindividual sites were able to place their site-specific research results in a largercontext and benefit from the research approach. The individual sites alsobenefited from general access to the Near Infrared Chromatograph (NIR)analytical method.

Data and credit for the LIDET-experiment were shared according to agreedguidelines. Each site annually received a current, proofed copy of the data fromthe site. Site investigators had one year in which to prepare site-specificmanuscripts, to be published under individual names. After one year, the databecame available for intersite syntheses to be published under joint LIDETauthorship and for model parametrization and testing. (Long-Term IntersiteDecomposition Team, 1995.)

The interview data outlined the importance of data management preplanning ofthe long-term preservation of data. Preplanning of research procedures anddata management of the LIDET-experiment included preplanning of everydayworking practices as well as annual preplanning. A member of the LIDETpersonnel describes the preplanning and inter-site nature of the experiment:

“The difference with LIDET was that LIDET was a preplanned … the wholething was designed to do an inter-site work… It was planned out to do thingswith exactly the same methods across multiple sites, and doing it with arandom statistical design so they could actually have statistical hypothesis andresults.” (LIDET personnel)

Collaborative planning of data management and technology development wasclosely intertwined with planning, organisation and coordination of researchprocedures and information practices, aiming to provide support for science.The contribution of the experienced information management to planning wasto ensure the preservation of the collected data in long-term databases. Carefulplanning also saved resources in the data-handling phase.

Two research assistants were mainly responsible for the everyday datamanagement of the LIDET-experiment. Study packages as well as instructionsfor sample collection were prepared and sent to all field collaborators annually.Instead of sending sites only detailed instructions of how to implement theexperiment, research assistants sent very carefully designed and pre-plannedresearch packages, ensuring the unity of the used methods and the carrying outof the experiments. Research sample numbers were generated with the LIDET-program from the LIDET-database and associated with the sample data sheetsand labels.

Page 7: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Each collaborating site received ten types of leaf and root litter and also woodendowels. Litterbags were installed once at all 28 collaborating sites and collectedannually during the experiment. Collaborators at the individual sites wereresponsible for collecting the litter samples, oven-drying, weighing and sendingsamples to OSU for chemical analysis, data entry and long-term storage. (Long-Term Intersite Decomposition Team, 1995.)

Data management during the research procedure was time consuming since itrequired a lot of organization and coordination of the research material flow.Research assistants received the collected, oven-dried and weighed sampleswith datasheets from sites. All received samples were tagged, weighed and datawas checked before being sent back to sites for verification.

LIDET- computer software program was developed for managing all researchdata. Data management personnel were responsible for entering data in theLIDET-database and also for quality control and assurance of data. Controlledidentification numbering of each sample benefited in the detecting of anomaliesor unexpected measurements and quality control. The LIDET database waseventually stored in the OSU Forest Science Data Bank (FSDB), developed andmaintained by the H.J. Andrews site to store the LTER and other ecological datasets (Stafford et al., 1988).

2.2 Theoretical background

Studying infrastructures in the LIDET-experiment context means a focus oninformation practices in data- and information management work and indeveloping and maintaining infrastructures for scientific collaboration. Manyaspects of infrastructure are “singularly unexciting” suggesting that the study ofinfrastructure work is “studying boring things” (cf. Star, 1999). In the LIDETcase this includes investigating the gradual process of the modification ofworking practices and maintenance of infrastructures (Star & Bowker 2002), forinstance the sewing and packing of sample bags.

Star and Ruhleder’s (1996) notion of infrastructure is grounded in a perspectivesuggested by Bowker and Star (1999) where information infrastructures areperceived as performative, that is, partially creating the world they subtend andthe tradition of science and technology studies. According to Star and Ruhleder(1996) infrastructure is a fundamentally relational concept; infrastructurebecomes real in relation to organized practices. Star and Ruhleder (1996) definethe qualities and salient features of infrastructure:

• Embeddedness. Infrastructures are sunk into, inside of, other structures,social arrangements and technologies.

Page 8: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

• Transparency. Infrastructure is transparent to use, in the sense that it doesnot have to be reinvented each time or assembled for each task, but invisiblysupports those tasks.

• Reach or scope. This may be either spatial or temporal: infrastructure hasreach beyond a single event or one-site practice.

• Learned as part of membership. The taken-for-grantedness of artifacts andorganisational arrangements is learned as part of membership in acommunity of practice (Lave and Wenger, 1992).

• Links with conventions of practice. Infrastructure both shapes and is shapedby the conventions of a community of practice.

• Embodiment of standards. Modified by scope and often by conflictingconventions, infrastructure takes on transparency by plugging into otherinfrastructures and tools in a standardized fashion.

• Builds on an installed base. Infrastructure does not grow de novo; it wrestleswith the “inertia of the installed base”, and inherits strengths and limitationsfrom that base.

• Becomes visible upon breakdown. The normally invisible infrastructurebecomes visible when it breaks.

This paper focuses on the information practices of distributed and long-termecological research and its infrastructural support for collaboration, includingthe development and maintenance of information infrastructures.

2.3 Research method

The Field of Science and Technology Studies applies ethnomethodologicalstudies to provide understanding of work practices in scientific work at theconcrete level (e.g. Star & Ruhleder 1996; Bowker et al. 1997). This studyfocuses on scientific collaboration and information practices to producedescriptive understandings of the elements of information practices andcollaboration infrastructure of the case.

This research analyses the ethnographic materials collected by professor HelenaKarasti within LTER network (http://www.lternet.edu/) in a BioDiversity andEcoInformatics (BDEI) project in 2002. The main empirical material foranalyses consisted of interview data of LIDET participants. In addition, researchmaterial included scientific articles of the LTER-network and related literatureof information management, extensive LTER Internet pages and digitalpictures. In the qualitative content analysis of interview materials my aim hasbeen to capture different views of the experiment from within the community.By applying a socio-technical approach and qualitative content analysis this casestudy also outlines the social aspects in the analysis of the reserach material.

Page 9: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Qualitative content analysis, being most usable for analysis of textual data(Catanzaro 1988, Krippendorff 1981), was carried out as descriptive inductivedata analysis of the research data, where created codes were based on theresearch material (Catanzaro 1988). Content analysis is based on interpretationand induction in a process which proceeds from the empirical material to amore conceptualized analysis (Tuomi and Sarajärvi 2002).

The main categories based on the content analysis were related to informationand data management during different phases of research, and to categoriesdescribing distributedness and long-term perspective of the LIDET experiment.The notions of infrastructure (Star & Ruhleder 1996) were related to theinformation practices of LIDET-experiment in order to explore the elements oflong-term collaboration infrastructure and also increase sensitivity towardssocial issues of networked ecological research and its infrastructural support.

The qualitative analysis of interview material was started individually by theauthor during year 2003. The excerpts from the interview materials have alsobeen discussed in collaborative data analysis sessions. The excerpts ofinterviews in this paper are marked with acronyms.

3. Long-term and distributed scientificcollaboration

The LIDET- experiment was a widely distributed, centrally coordinated andloosely coupled, 10 year ecological research experiment where collaboration wasorganized indirectly through the research material specimens and data. Thecharacteristic aspects of LIDET scientific collaboration, i.e. distributedness andlong-term perspective, define its central information practices and collaborationinfrastructures. These characteristic aspects of the LIDET-experimentaddressed challenges to data management and also to the maintenance of theinfrastructure.

3.1 Long-term perspective

Long-term science is concerned with the research need to collect and archivedata over long time periods (Karasti & Baker 2004). The interview materialemphasized that planning the data management for the long-term anddistributed research collaboration must be based on the preplanning of theexperimental design and also anticipation of possible problems during the long-term in order to enable the successful carrying out of the long-term experiment.

Designing infrastructures and making decisions about new technologiesaddressed a concern of how to minimize the risks for data archival for long-termdata management and science support. New pieces of technology in LTER are

Page 10: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

evaluated against their value for ecological research. Before being embeddedinto existing infrastructures, new technologies also have to be aligned withexisting information practices and infrastructure. (Karasti & Baker 2004.)

3.1.1. Work practice development

The need for improvements in working practices became obvious in differentphases of the long-term LIDET-experiment. According to Star and Ruhleder(1996) infrastructure is shaped by developing work practices and changingconventions of a community of practice. In the centrally managed LIDET-experiment embedded knowledge of data management personnel could beapplied to modifying the infrastructure in order to support scientificcollaboration. During the experiment artifacts were developed to make theresearch preparations and coordination more flexible. The developed researchartifact, named as a pneumatic stapler, allows assistants to close a sample bagwith three staples at a time, making the sample bag preparations faster. Todescribe the everyday innovation, a member of the LIDET personnel tells “anartifact story” of the pneumatic stapler developed especially for this experiment:

“OK, LIDET has made, has made us the litter decomposition experts soeverybody is contacting us all the time for stuff and they’re always wanting toborrow… this multi-head stapler! For it’s pneumatically operated, … Becausenobody sells anything like this and so people email us saying how frustratedthey are trying to find something that could do this and could they please,please borrow yours, and so this spends a lot of time travelling. It just cameback from Yellowstone and tomorrow it’s going to British Columbia.”(LBC)

Modifications to working practices were also needed because of un-anticipatedfield situations during sample collecting. Sample bags of the same set were sewnon a string and each string was marked with an identification number. Thiseveryday work practice innovation of tying sample bags together in order tominimize the risk of sample bags getting mixed was based on situatedknowledge of data management personnel. This embedded feature of theinfrastructure (Star & Ruhleder 1996) became visible during special fieldcircumstances. A member of the LIDET personnel describes research situationsin the field:

“… when a fieldworker went out into the field and say an elk had run throughand got stuck on the string and dragged it all around … they [field workers]could go and dig around and at least find the tag number so that they wouldknow that they were collecting the right string ...“ (LBC)

Adapting the experiment arrangements and developing the sample collectingtechnique made the collaboration in the experiment easier and also assured thesuccess of the experiment.

Page 11: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Data management personnel also created a check list to help the samplecollection and data management of the experiment. The check list was used forrecording the process of printing the labels and labeling the sample bags andalso was needed in order to prevent the losses of research materials. A memberof the LIDET personnel describes it this way:

“The next person coming in didn’t even know the study was going on, doesn’tknow the litter bags are, doesn’t know what to do with them … so we ended upcreating a check list so that every year we could record the process of printingthe labels, labelling the bags. “ (LBC)

3.1.2. Management of infrastructure changes

Both social and technological infrastructure had to preserve its functionalityduring a long-term experiment. According to Star and Ruhleder (1996) thenormally invisible quality of the working infrastructure becomes visible when itbreaks. In LIDET-case infrastructure breaks were caused by technical and socialchanges. At a social level Star and Bowker (2002) see that flexibility formodifications to information infrastructures enables responding to emergentsocial needs. A good information infrastructure is stable enough to allowinformation to be able to persist in time and also be designed for flexibility(Bowker & Star 2002).

The social infrastructure was subjected to changes during a long-termexperiment. Changes in research personnel or use of temporary staff causesbreaks in social infrastructure, endangering the continuity of data collection andarchiving. As an example of a social infrastructure break, a member of theLIDET personnel describes the difficulty faced in delivering a research package:

“… it had happened that the bag had arrived and the person in the front officedidn’t know who to give it to because the person we addressed it to was nolonger there... “ (LBC)

In the case of LIDET these personnel changes emphasized the importance ofinforming collaborators. Annual instructions were sent to field collaboratorsalong with study packages and they were also informed of different stages andchanges in research practices during the experiment.

Developing technologies were aligned with existing technologies and practicesduring the LIDET-experiment. Despite technological development, oldtechnologies need to be maintained long enough to be able to carry out theexperiment. The most notable technological change during the LIDET-experiment was the update of the used computer operating system from a DOS-based to windows-based system.

Programs used in LIDET information practices were designed in the DOS-operating system, which caused problems with data management. LIDET was aten-year experiment and even during that time problems occurred related to the

Page 12: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

fact that the computer support personnel did not have experience of the earlierDOS-based systems. The change of operating system caused incompatibilityproblems between originally installed programmes and the new operatingsystem. Normal working practices like generating labels and data sheets failedbecause the original programs did not function in the new operating system.Knowledge of earlier information practices and infrastructure preserved amongcomputer support staff was important when implementing new technologies.Computer support personnel had to come up with solutions to make systemscompatible. A member of the LIDET personnel describes the complicatedsituation related to implementation of new technologies in this way:

“… we would have to go and talk to our computer support people and theywould have to reach back into their memories … to try and think back and tryand think of strange little arcane ways of making the … computer handle thesepassé mechanisms… go to a printer capture queue or something … “ (LBC)

In LIDET, technological infrastructure such as old program versions as well asold computers and printers had to be retained in a central laboratory during theexperiment to avoid problems in working practices. A new operating systemcould not be directly applied because conventions of practice were built on aninstalled base (Star & Ruhleder 1996) of a previous generation of technology.

Despite the preplanning, changes were required to be made to the LIDETdatabase schema during the experiment. The original database schema wasplanned according to the needs of 21 sites. When seven additional sites joined,the number of collected samples increased causing overlapping of sampleidentifying numbers. The solution was to change the 4-digit identifying numberto a 5-digit number in the database schema. A member of the LIDET personnelpoints out the effort and contribution that was needed because of the tagnumber change in the database:

“… some of the tags had to be changed, and then the new tag numberspresented different problems with being able to merge the site informationwith the lab information … it just makes for a very, very tedious effort to pulleverything back together…“ (LIDET personnel)

This change of the identifying number during the LIDET-experiment alsorequired changes to the metadata descriptions. Another example was when anadditional comment field for pooled samples and a flag field for outliers relatedto identifying a tag were added in the database. Additional codes also had to bemade for some divergencies recognised in the samples. These kinds of issuesduring the long-term experiment required changes to the database schema. Thisdemonstrates the importance of flexibility and tailorability of the informationinfrastructures in LIDET.

Page 13: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

3.1.3. Metadata and auxiliary data related to research

Preserving the usability of scientific data in the long term requires data to beaccompanied by contextual information that describes the data collections.These descriptions are called metadata (data about data). (Karasti and Baker2004.) Information can be lost through degradation of the raw data ormetadata. Adequate documentation (metadata) of sampling and analyticalprocedures, data anomalies, and data set structure will help to insure that datacan be correctly interpreted or reinterpreted at a later day. (Michener et al.,1997.) Bowker and Star (1999) further suggest that formal data descriptions“wrapped” in informal descriptions increase the usefulness of the data.

Information concerning the LIDET-experiments stored in the FSDB includesthe data and related metadata comprising an abstract describing theexperiment, the formats and variable definitions of the data, and the programsused to process the data.

One example of metadata in LIDET was sample site information. Collaboratorswere asked to draw a map and write descriptions on the location of the researchsite and samples. When the permanent plots are set up it is essential to attachinstructions on how to be able to find the samples. A member of the LIDETpersonnel also advised field collaborators to document unexpected samples:

“… there was a rock in here, or the bag was ripped, or animals seemed to haveopened this bag ” (LIDET personnel)

Relating this kind of auxiliary data in the database proved its value alreadyduring the experiment when researchers detected anomalies and assured thequality of measurements.

Yet another example of metadata is climate data. Weather is an importantparameter in many site and synthesis studies. LTER climate database(CLIMDB) provides current and comparable climate summaries for each LTERsite(Baker et al. 2000). Harvested climate data did not completely cover allLIDET sites and was not updated for all sites. Field collaborators of LIDETagreed to provide site-specific background information on climate, but gettingthem to provide other auxiliary data sets was more difficult to produce becauseof limited resources. A member of LIDET personnel explains his views aboutproblems in gathering the related data:

“ … gathering of sort of this not extraneous but related data that wasn’t reallycollected in LIDET and all that, it’s been a problem. But I think it probablyindicates a more classic effect of trying to work on the network when peoplecan’t agree on priorities, resources are limited…“ (LIDET personnel)

Even though the information infrastructure in LIDET was designed to includemetadata for long-term research use, the available resources were a limitingfactor in the gathering and preserving of related contextual metadata.

Page 14: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

3.1.4. Merging with other databases and information systems

Merging differently tagged pieces of information from two different databases ofLIDET experiment also demanded the contribution of data managementpersonnel. Tagged sample identification information was changed during theexperiment, causing compatibility problems when the site-specific field datawas merged with the laboratory measurements such as weight values.

Merging the LIDET-database with other LTER databases was the responsibilityof the information manager. The documentation and data of the LIDET-experiment was planned to be stored along with LTER and other ecological datasets at FSDB (Stafford et al., 1988). The broader view and greater network levelknowledge of the experienced information manager of LTER informationsystems were required in order to merge data of the LIDET-database withFSDB.

Concerning climate data, local technological infrastructure of LIDET differedfrom LTER network level infrastructure. Getting background climate data fromthe LTER climate database (CLIMDB) and merging this data with LIDETdatabase required extra work from information management. Instead of beingable to use the User Interface of CLIMDB the information manager harvestedthe climate data straight from CLIMDB and then transferred the data to theLIDET-database. According to Star and Ruhleder (1996) infrastructure takes ontransparency by plugging into other infrastructures and tools in a standardizedfashion. This points to the importance of compatibility in infrastructures whenmerging site and network level data.

3.2 Distributed experiment and collaboration

Information management of distributed and multi-intersite experiments likeLIDET required contribution to both coordination and organisation work. Plansin LIDET were used more as resources for situated actions (Suchman 1987) inthe sense that the centralized data management and organising of thedistributed experiment allowed modifying the study plans in order to meetspecific needs in the local situations. This in turn required effort and embeddedknowledge from data management personnel.

By annual mailings and preparation of “study in a box”- research materials, theparticipation of geographically distributed field collaborators was made aseffortless as possible. A member of the LIDET personnel describes the annualstudy preparations:

“… a lot of that organizational work was done … by us and it made it real easythe field cooperators to … study in a box … if we had sent them the loose bagsand string and more detailed instructions (and told them to buy flags and …

Page 15: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

that kind of thing) … I’m sure it would have resulted in more fallout, loss ofcooperators. … “ (LBC)

Managing a large number of samples collected from distributed sites waschallenging. The massive number of sample boxes and samples to be handleddemanded spatial organization and agreed procedures for the preservation andhandling of samples in the laboratory. All the research material at a certainphase of analysis was placed in a certain place in the laboratory. Thisorganizational effort also made the infrastructure visible (Star & Ruhleder 1996)to temporary staff, who could then learn correct work practices.

Previously communication infrastructure was based on personal telephone callsand letters. The technological development of Internet and e-mail made massmailings possible and made communication with field collaborators less time-consuming. Also the sharing of information via www-pages became possible.This advantage to coordination of distributed experiment is highlighted in aninterview:

”… in terms of coordination, this was being set up just as the internet, email ,and all that was starting to be coordinated by the network office, so when Istarted this out I had to call all these people. But within a couple of months … Icould do mass mailings. ”(LIDET personnel)

LIDET infrastructures were built to support widely distributed scientificcollaboration enabling the collaboration between a central site andgeographically remote field sites. The wide geographical distribution of theparticipating sites defined the spatial scope property of infrastructure (Star &Ruhleder 1996).

3.3 Data sharing and publishing

Internet technologies offer opportunities for data sharing on an extensive scale(Karasti et al 2004). In LIDET guidelines for sharing data and publication wereagreed by the LIDET participants. Data from the LIDET-experiment is alreadyavailable on the Internet for intersite analysis and for model parametrizationand testing.

Concerning the issues of data sharing and data reuse, Karasti et. al (2004)summarize how various examples show that long-time datasets exist but thatthey are frequently incomplete, badly maintained and not well documented.Data exist but is not useable by any but the local user, requiring too manyresources for quality assurance and control (Karasti et al. 2004).In LIDET, testanalyses of data were made to control the quality of collected data before finalanalysis. Test analyses, in addition to providing preliminary results of data, gavefeedback of infrastructure operation and would have pointed out areas ofimprovement in information practices. In order to succeed in the long-term

Page 16: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

experiment a member of the LIDET personnel also highlighted the use of databefore the final analysis:

“… You just have to use the data. Even if you’re not doing the final analysis youhave to sort of exorcise it to see in fact what it should be. Otherwise, you willend up with data, you won’t why it’s the way it is. And you won’t be able tocorrect it 10 years later“ (LIDET personnel)

Even though in terms of the publications the LIDET-experiment had a veryelaborate arrangement, less data than could be expected has been published. Asummarising article of the results of different LIDET teams has been published,but on the other hand many of the investigators have delayed the release ofdata. On reason presented by Helly et al. (2002) is that data submitters protecttheir data from being scooped by a quicker or alternative interpretation of theirown data and consequent publication. A proposal of a mechanism to encourageand enable early publication of scientific data in a manner that producesbeneficial side effects has been presented in an article on publication of digitalscientific data by Helly et al., 2002.

4 Summary

A long-term, distributed ecological LIDET-experiment was initiated to enablethe study of different ecosystems with standardized methods in geographicallywidely distributed sites. The long-term and distributed nature of the experimentcaused challenges for the data management.

Through qualitative content analysis of the material it has been possible tostudy how information practices are intertwined with ecological research workpractices. Everyday information practices of data management of the LIDET-case were studied applying a socio-technical approach (Kling 1999). The notionsof infrastructure (Star & Ruhleder 1996, Star & Bowker 2002) allowed toanalyse the information practices and collaboration infrastructure of the case ina socially sensitive manner. This paper foregrounds the long-term perspectiveand the elements of data management and information practices of LIDET-experiment.

Through the analysis of the empirical material of the LIDET case there is anemphasis on the importance of data management preplanning and theintertwining of research and data management practices. Data managementdesign of the LIDET-experiment was driven by the research. The mostimportant features of the LIDET-database were defined and maintained incollaboration with experienced ecologists, information managers andinformation systems developers.

Page 17: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Based on the interview material it has been possible to notify and capture someof the embedded knowledge of LIDET participants entrenched in the researchand data management practices of the described ecological experiment. Thisknowledge that members of the CoP had was used in the design andmodifications of the information infrastructure in the LIDET-experiment.

Management of research procedures as well as centralized handling of researchsamples and data in one laboratory have been the basic solutions to the datamanagement challenges. The centralized preplanning, organisation andcoordination of data management minimized the collaboration effort of theparticipating research sites in order to maintain the social infrastructure ofcollaborators. Only a few sites disappeared during this long-term experimentand the losses of information during the experiment phases were also minor.

The use, maintenance and development of long-term collaborationinfrastructure of LIDET was a gradual process including modifying and tailoringof infrastructure in order to support the research needs and assure the long-term preservation and reuse of data. Improvements to everyday workingpractices shaped the infrastructure, and changes in the number of collaboratingsites created a need to change the LIDET database. Infrastructure changes werealso brought by new technologies developed during the experiment. Newtechnologies offered new possibilities, e.g. in communication.

The technological development of LIDET information infrastructure was alsoprioritised according to its support for long-term science and data archivingpurpose. In some cases old technology co-existed with new technology until theend of experiment.

Specific preplanning of the experimental design and anticipation of possibleproblems during the long-term formed the basis for intertwined design of thedata management and infrastructure. The information infrastructure of LIDET-experiment, created by collaborative preplanning and centralized datamanagement, was able to flexibly adjust to changes and thus succesfullysupported scientific collaboration in the LIDET-case.

Based on this research there is need to carry out more studies of the use,maintenance and development of scientific collaboration infrastructures for thelong-term perspective through other case studies. Further research is needed toincrease understanding of how to relate local information practices andnetwork-level solutions to be implemented in common infrastructures.

Page 18: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Acknowledgements

I want to express my thanks to professor Helena Karasti for guidance. Iacknowledge EnviroNetGrants, multidisciplinary doctoral level Graduate schoolin Environmental Issues at the University of Oulu for support this work.

Page 19: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

References

Baker, K., Bowker C. & Karasti, H. (2002). Designing and Infrastructure forHeterogeneity in Ecosystem Data, Collaborators and Organizations. Procs of the2nd National Conference on Digital Government Research, 20-22 May 2002, LosAngeles, CA., Digital Government Research center, pp. 141-144.

Bowker, G. & Star, S. (1999). Sorting Things Out: Classification and itsConsequences MIT Press, London.

Bowker, G., Star, S., Turner, W., Gasser, L. (eds.). 1997. Social Science,Technical Systems and Cooperative Work: Beyond the Great Devide. Mahwah(NJ): Lawrence Erlbaum Associates.

Callahan, T. (1984). Long-term Ecological Research. Bioscience 34(6):363-367.

Catanzaro, M. (1988). Using qualitative analytical technigues. Pp. 437-456 inWoods N. - Catanzaro M. (eds.) Nursing Research: Theory and Practice. C.V.Mosby Company.

Finholt, T.A. (2002). Collaboratories. In Cronin, B (ed.). Annual Review ofInformation Science and Technology. Medford, Information Today. AnnualReview of Information Science and Technology, 36, 73-107.

Franklin, J. , Bledsoe, C. & Callahan, J. (1990). Contributions of the Long-TermEcological Research Program, An expanded network of scientists, sites andprograms can provide comparative analyses.

Gosz, J. (1999). Ecology Challenged? Who? Why? Where is This Headed?Ecosystems (2), 1999, pp.475-481.

Helly, J., Elvins, T., Sutton, D., Martinez, D., Miller, S., Pickett, S. & Ellison, A.(2002). Controlled Publication of Digital Scientific Data. Communications of theACM. May 2002, Vol. 45. No. 5

Page 20: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Hobbie, J. et al. (2003). Scientific Accomplishments of the Long TermEcological Research Program: An Introduction. BioScience, Vol. 53 No.1.

Kanfer, A. et al. (2000). Modelling Distributed Knowledge Processes in NextGeneration Multidisciplinary Alliances. Available:http://citeseer.nj.nec.com/339903.html.

Karasti Helena 2001: Bridging work practice and system design - integratingsystemic analysis, appreciative intervention and practitioner participation,Computer Supported Cooperative Work – An International Journal, Vol. 10. No2, pp. 211-246.

Karasti, H., Baker, K. & Bowker C. (2004). ECSCW 2003 Computer SupportedScientific Collaboration (CSSC) workshop report. SIGGROUP Bulletin.

Karasti, H. and Baker, K. (2004): Infrastructuring for the Long-Term:Ecological Information Management, Hawaii International Conference onSystem Sciences 2004 (HICSS’37). January 5-8 2004, Hawaii, USA.

Kareiva, P., & Anderson, M. (1988). Spatial aspects of species interactions: thewedding of models and experiments. Pp. 35-50 in A. Hastings, ed. Communityecology. Springer-Verlag, New York, New York. USA.

Kling, R. (1999). What is Social Informatics and Why Does it Matter? D-LibMagazine 5(1). Available: www.dlib.org/dlib/january99/kling/01kling.html.

Krippendorff, K. (1980). Content analysis: An introduction to its methodology.Beverly Hills. Sage.

Lave, J. and Wenger, E. (1991). Situated Learning: Legitimate PeripheralParticipation. Cambridge, Cambridge University Press

Levin, S. (1992). The problem of pattern and scale in ecology. Ecology 73:1943-1967

Page 21: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

The Long-Term Intersite Decomposition Experiment Team. (1995). Meeting thechallenge of long-term, broad-scale ecological experiments. LTER PublicationNo. 19. U.S. LTER Network Office, Seattle, WA.23 pp.

Michener W. (1986). Data management and long-term ecological research. Pp 1-8 in Michener W. , (ed.) Research Data Management in the Ecological Sciences.Columbia, S.C.: University of South Carolina Press.

Michener, W., Brunt, J., Helly, T., Kirchner, T. & Stafford, S. (1997). Non-geospatial metadata for the ecological sciences. Ecological Applications 7(1):330-342.

Michener, W. K. & Brunt, J. W. (eds.) (2000). Ecological Data: Design,Management and Processing. Oxford, Blackwell Science. Methods in Ecology.

Miles, M. & Huberman, A. 1984. Qualitative data analysis: A sourcebook of newmethods. Beverly Hills, Ca.: Sage.

Porter, J. (1998). Scientific databases for Environmental Research. Data andInformation Management in the Ecological Sciences: A Resource Guide.Albuquerque, NM: LTER Network Office, University of New Mexico.

Porter J., Callahan, J. (1994). Circumventing a dilemma: historical approachesto data sharing in ecological research. Pp. 193-202 in Michener, WK, Brunt, JW& Stafford, SG, eds. Environmental Information Management and Analysis :Ecosystem to Global Scales.

Stafford, S. (1998). Issues and consepts of Data Management: The H.J. AndrewsForest Science Data Bank as a Case Study. Data and Information Managementin the Ecological Sciences: A Resource Guide. Albuquerque, NM: LTER NetworkOffice, University of New Mexico. Available:www.ecoinformatics.org/pubs/guide/stafford.fv2.htm

Stafford, S., Spycher, G. & Klopsch, M. (1988). Evolution of the forest scienceData bank. A progress report on managing complex research data. Journal ofForestry, Vol. 86, No.9, September 1988.

Page 22: Howell and Egley, 2005 - Governor's Office of Crime Control

IRIS27Eija Halkola: Scientific collaboration and information infrastructures –Information practices in LIDET-experiment

Star, S. (1999). The Ethnography of Infrastructure. American BehavioralScientist, 43(3), 1999, pp. 377-391.

Star, S. & Bowker, G. (2002). How to Infrastructure. In Lievrouw, L. A. &Livingstone, S. (eds.). Handbook of New Media : Social Shaping andConsequences of ICTs. London, Sage.

Strauss, A. and Corbin, J. (1990). Basics of Qualitative Research: GroundedTheory Procedures and Techniques. Newbury Park, CA: Sage Publications.

Suchman, L. (1987). Plans and situated actions: the problems of human-machine communication. Cambridge, Cambridge University Press.

Tuomi, J. & Sarajärvi, A. (2002). Laadullinen tutkimus ja sisältöanalyysi.Tammi, Helsinki.

Woolgar, S. (2003) Social Shaping Perspectives on e-Science and e-SocialScience: The case for research support. A consultative study for the Economicand Social Research Council (ESRC). (Downloaded 27 June 2004 athttp://www.sbs.ox.ac.uk/downloads/E-SocialScience.pdf).