chip tracker: a microarray laboratory information management system

Chip Tracker: a Microarray Laboratory Information Management System

Patricio Yankilevich

M.Sc Project Dissertation

for the Degree of Master Science in Informatics with specialism

in Bioinformatics

The University of Edinburgh

September 2003

-2-

Chip Tracker: a Microarray Laboratory Information Management System

Author: Patricio Yankilevich

School of Informatics, University of Edinburgh

([email protected])

Academic supervisor: J. Douglas Armstrong PhD

School of Informatics, University of Edinburgh

([email protected])

Industrial supervisors: Dr. Vitali Proutski - Dr. Ann Brown

Organon Research Unit, Organon Laboratories ltd.

([email protected], [email protected])

-3-

Declaration

This project was carried out between May and September 2003. The work discussed

in this dissertation, unless otherwise stated, is my own; and the manuscript has been

composed myself.


School of Informatics

The University of Edinburgh

September 2003

-4-

Abstract In this project a Microarray Laboratory Information Management System (Microarray

LIMS) was designed and developed to become part of the integrated bioinformatics

infrastructure for the Organon GeneChip microarray facility. The software complies

with microarray communication standards and uses an industry standard relational

database management system combined with a platform-independent web browser

interface for data entry and retrieval. Therefore it is portable and flexible and can also

be used as a stand alone tool to manage a microarray laboratory workflow.

The resulting system guides the users through the microarray laboratory workflow

steps facilitating the management and tracking of biological samples and microarray

chips via a user friendly interface. In addition to this, the application automates the

data collection process, tracks the chips that have been ordered and prompts pre

populated purchase orders of chips, controls the assignation of samples to chips for

hybridisation. It randomises chip usage and also logs hybridisation results and chip

faults as part of the QC procedures. Finally, a highly flexible reporting tool enables

the users and laboratory managers to search the database on the usage history of the

platform.

This project captures and systemises the real microarray laboratory workflow and thus

improves performance and minimises human errors added to the microarray

experiment.

-5-

Acknowledgements I would like to thanks my both families for all the love and support they have given

me. Thanks especially to wife Juliana, for providing me with a stable home

environment and to my parents Oscar and Ana for their financial support.

I would like to take this opportunity to thanks Douglas Armstrong my academic

supervisor for letting me take this challenging project at an industrial collaborator. I

would like to thanks my industrial supervisors Vitali Protski for his advice and Ann

Brown for her assistance and for providing me with a friendly work environment.

Finally, I would like to thanks Mark, Julie and David from the Chip team, Bridget

from System development, and Alistair and Donald from the Bioinformatics team for

helping me with the final corrections.

-6-

Table of contents

Abstract ......................................................................................................................4

Acknowledgements .................................................................................................5

Table of contents .....................................................................................................6

Chapter 1: Introduction .......................................................................................8 Organon Microarray Facility .........................................................................................8 Microarray LIMS ...........................................................................................................9 Objectives and Solution ...............................................................................................10

Chapter 2: The Microarray ..............................................................................11 The New Paradigm of Drug Discovery .......................................................................11 Introduction to Functional Genomics ..........................................................................11 Microarray Technology ...............................................................................................12 Microarray Experiment ................................................................................................16

Chapter 3: Laboratory Information Management Systems and Microarray LIMS .................................................................................................20 Laboratory Information Management Systems (LIMS) ..............................................20 Microarray LIMS .........................................................................................................21

Chapter 4: Chip Tracker and the Organon Microarray Experiment Process.......................................................................................................................24 The Organon Microarray Experiment Process ............................................................24 Microarray Laboratory Workflow and Chip Tracker Scope........................................28

Chapter 5: Chip Tracker Design and Architecture .................................31 Presentation Tier (Front End) ......................................................................................33 Middle Tier ..................................................................................................................36 Database Management Tier (Back end).......................................................................38 Integration with existing Systems................................................................................40

ExpAnD..............................................................................................................40 Rosetta Resource Tracker...................................................................................41 Rosetta Resolver .................................................................................................41

Chip Tracker Deployment............................................................................................42

Chapter 6: Chip Tracker XML Parser and the Microarray Experiment Standards ........................................................................................44 MIAME, MAGE and other Standards .........................................................................44 Chip Tracker XML Parser and the Gene Expression Markup Language (GEML) .....45

Chapter 7: Understanding Chip Tracker Features..................................49 Logging On and Off.....................................................................................................50 Chips in Stock ..............................................................................................................51

-7-

Samples in Stock..........................................................................................................54 Prompt of Purchase Orders ..........................................................................................57 Chips Arrival................................................................................................................58 Assignation of Chips to Samples .................................................................................59 Log of Hybridised Chips..............................................................................................63 Statistics and Custom Reports .....................................................................................64

Chapter 8: System Validation of Chip Tracker ........................................66 System Validation........................................................................................................66 Control Life Cycle of the project .................................................................................67 User Acceptance Test ..................................................................................................67

Chapter 9: Conclusion and Future Work....................................................69

Bibliography ...........................................................................................................71

Appendix A: GEML .xml example file ........................................................74

Appendix B: SQL commands .........................................................................75

Appendix C: User Acceptance Test ..............................................................77

-8-

Chapter 1

Introduction

The past decade has seen dramatic progress in the development of high throughput

life science technologies such as microarrays, which have become the technology of

choice for gene expression analysis. Microarray technology has enabled researchers

in the field of functional genomics to conduct new types of experiments that generate

immense amounts of data. The data created from these microarray experiments

necessitates the creation of tools that can manage both the biological and experimental

information used in a microarray experiment workflow more effectively and

efficiently. While many scientists are focusing on the analysis of the microarray data

other issues such as data management, quality, and standards remain to be addressed.

This project presents the “Chip Tracker”, a Microarray Laboratory Information

Management System (Microarray LIMS), designed to manage and track the

submissions of biological samples and microarray chips used in the microarray

laboratory. The project has been carried out with the aid of an industrial collaborator

Organon Laboratories Ltd., at the Organon Research facility in Newhouse, Scotland.

Organon have recently installed their first microarray facility, an Affymetrix DNA

Microarray platform.

Organon Microarray Facility

The Organon Microarray facility carries out microarray experiments on thousands of

biological samples sent to the facility each year. Systems have been developed to

capture the experimental annotation of RNA samples submitted to the microarray

laboratory (Expand), to deal with the day to day management and processing of the

RNA samples (Resource Tracker) and to migrate data from both Expand and the

Affymetrix platform to the Resolver data analysis software. These components

-9-

interact to produce a fully integrated data management infrastructure for the

microarray platform. None of them however deal with the actual Chips used by the

facility.

The users of the microarray facility request to have their RNA samples hybridised to

one or more of the thirteen different types of Affymetrix Chips. The experimental

results obtained from the hybridisation are very valuable but also expensive and time

consuming. It is therefore important to ensure correct assignment of samples to chips.

With each Chip costing in excess of £300 and up to 2,000 Chips being used each year

it is apparent that some type of tool is required to track the Chip usage by the facility,

from purchase to final quality control analysis. Before the implementation of this

project the chip information was stored in Excel spreadsheet and the purchase of chips

was based on direct communication with the microarray platform customers. These

methods soon however became unwieldy due to the large number and variety of Chips

being processed by the facility. By taking all of the above factors into account the

decision was taken to add Chip Tracker microarray LIMS component to the existing

infrastructure of the microarray facility.

Microarray LIMS

Laboratory Information Management Systems (LIMS) are used by many types of

laboratories for capturing data during the experimental process. Such LIMS can be

used in research and development, in-process testing, quality control and assurance.

There is a small number of integrated systems that deal with the management of the

microarray experiment process as BioArray Software Environment (BASE),

GeneTraffic and Affy LIMS. These systems are customisable bioinformatics

solutions, but none of them address completely the necessities of a microarray

laboratory workflow such as tracking and administration of the materials used in the

microarray lab. Capturing this data is essential for both good house keeping and the

analysis of the data produced by the experiments.

-10-

In the absence of a commercially available alternative it was decided to custom build

a Chip Tracker database and data entry tool to be integrated with the existing

bioinformatics infrastructure of the Organon microarray platform.

Objectives and Solution

The main objective of this project is the development a system to enable the

microarray laboratory workers to manage and track the submissions of biological

samples and microarray chips that are been used through the microarray laboratory

workflow. As a solution we present the Chip Tracker, a microarray LIMS that

automates some of the tasks carried out in the laboratory worker and guides him/her

along the steps of the laboratory workflow via a user friendly web interface.

The Chip Tracker application also tracks the chips that have been ordered and

prompts pre populated purchase orders of the necessary chips depending on the

samples arriving in the laboratory. Other laboratory procedures within the microarray

experiment workflow are also managed from the Chip Tracker application. Such

procedures include control of the assignation of samples to chips for hybridisation,

randomisation of chip usage, log of chip faults (part of the QC procedures) and

creation of statistical reports on platform usage.

The system developed in this project not only fully integrates with the existing

Organon microarray platform, but also can be used as a stand-alone microarray LIMS.

Finally, this project addresses the needs raised by the scientists working in the

microarray lab and captures the real microarray laboratory workflow.

-11-

Chapter 2

The Microarray

The New Paradigm of Drug Discovery

Biological and biomedical research is in the midst of a significant transition driven by

two primary factors: the massive amount of DNA sequence information and the

development of new genomics technologies to exploit it. Consequently, we find

ourselves at a time when new types of experiments are possible, and observations,

analysis and discoveries are being made on an unprecedented scale [McConnell et al.,

2002].

Genomics technologies and in particular DNA microarrays are rapidly increasing our

understanding of disease, drug targets, and, in the future, how drugs may be used in

the clinics. Such an understanding would potentially improve the traditional drug

discovery pipeline, enabling better decisions to be made earlier in the therapeutic

discovery and development process. Better decisions should ultimately result in better

drugs and therapies and allow safer drugs to reach the market sooner. Microarray

applications in drug discovery are expanding and included basic research and target

discovery, biomarker determination, pharmacology, toxicogenomics, target

selectivity, development of prognostic tests and disease-subclass determination

[Butte, 2002].

Introduction to Functional Genomics

As mentioned above, the constant advances in molecular biological, analytical and

computational technologies are enabling us to systematically investigate the complex

molecular processes underlying biological systems. Over the past few years, more

than 60 organisms have had their genomes completely sequenced, with another 170 or

so are in progress (see www.tigr.org). The sequence of human genome has been

-12-

deciphered, by both public and private efforts, and the complete sequence of mouse

and other animal and plant genomes are nearing completion. Unfortunately, the DNA

sequence does not tell us what the genes do, how cells work, how cells form

organisms, what goes wrong in disease, how we age or how to develop a drug or how

a phenotype is determined. Thus, functional genomics has become an increasingly

important scientific discipline [McConnell et al., 2002]. This rapid accumulation of

genome sequence data represents the beginning of a fundamentally new kind of

biological research ushering in the so called ‘post-genome era’.

Functional genomics is the study of gene function through the parallel expression

measurements of genomes, most commonly using the technologies of microarrays and

serial analysis of gene expression (SAGE). The successful use of these large-scale

functional genomics technologies depends on robust and efficient systems for tracking

and managing material and information flow.

Microarray Technology

In this new setting for biological research, DNA array technologies (microarrays) that

allow for the simultaneous recording of thousands of gene expression levels in a

single experiment have acquired a special role. This technology has opened new ways

of looking at organisms in a genome-wide manner. It is now possible to study

complete genome patterns of gene expression in prokaryotes or in simple eukaryotes

like yeast or C. elegans while in higher organisms, like humans, tens of thousands of

genes related to a given living system can be monitored [Dopazo, 2002].

Microarrays work by hybridisation (non-covalent chemical bonding) of fluorescently

labelled RNA or DNA in solution to DNA molecules (probes) that are attached to

specific locations on the chip surface. The hybridisation reactions take place in

parallel across the entire array at the same time. Thus, the hybridisation of a sample to

an array is, in effect, a highly parallel search by each molecule for a matching partner

on an ‘affinity matrix’. The eventual binding of labelled molecules to the surface-

bound probe is determined by the rules of molecular recognition. The process is

straightforward, highly parallel (all sequences are counted simultaneously), and, if

done correctly, quantitative [McConnell et al., 2002].

-13-

There are two dominant types of microarrays that have been extensively used for most

global gene expression measurements or experiments. The first, which is the one used

in the Organon microarray laboratory, are high-density arrays of oligonucleotides

(short strands of nucleic acids). The oligonucleotide strands are directly synthesised

on glass wafer surface using a process of light-directed combinational synthesis

known as photolithography [Lockhart et al., 1996]. A single oligonucleotide array can

contain more than 500,000 probes, typically of 25 bases long, in an area smaller than

half-inch square. The human U133A array, for example contains over 260,000

different probes that together measure the expression of 22,283 different transcripts

(or potential genes) at once. The process of hybridisation and scanning the chips

requires highly expensive equipment, increasing the need to maximise usage and

minimise failures or delays in the utilisation of the facility. This type of microarray is

known as one-channel arrays and is commercially available from Affymetrix under

the name of ‘GeneChip’.

The other main array type, cDNA array (also called spotted DNA array), consists of

solid support (usually nylon or glass) where cDNA or oligonucleotides are arrayed in

a fixed pattern. Fluorescent DNA derived from mRNA coming from the control and

test samples are competitively hybridised to the complementary DNA probes on the

array. The radioactive or fluorescence emissions of specifically bound probes are

detected using an appropriate scanner. These intensity values are proportional to the

amounts of specific RNA, originally present in the cell [Schena et al., 1995]. Two

different samples, the control and the treatment, are hybridised to a single cDNA

array. This is also called a two-channel array.

Before hybridisation, samples under study are often amplified and then labelled with

fluorescent dyes. The samples are then hybridised to the microarray, and they bind to

complementary probes affixed to the microarray surface. The arrays are then scanned,

producing a fluorescent image where the fluorescent intensity at any particular probe

location indicates the relative concentration of complimentary RNA sequence present

in the sample. This enables a quantitative estimate of each gene expression to be

calculated. Figures 1 and 2 (provided by Affymetrix) shown how the GeneChip

oligonucleotide array is built and how it works.

-14-

A.

B.

C.

Figure 1. Oligonucleotide array technology. A) Cartoon depicting a single feature on an Affymetrix GeneChip® microarray. B) Hybridisation of tagged probes. C) Scanning of tagged and un-tagged DNA [figures provided by Affymetrix, 1].

-15-

Figure 2. Overview of gene expression measurements with an Affymetrix platform. The process begins with mRNA samples from cells which are labelled with a fluorescent dye and hybridise to the microarray chip. Messenger RNA expression levels are determined using the quantitative fluorescent image. The process starts with the original RNA (oRNA) à copy DNA (cDNA) à labelled copy RNA (lcRNA) à fragmented labelled copy RNA (flcRNA) à Hybridisation à Wash and Stain à Scan [adapted from figures provided by Affymetrix, 2].

Current evidence implies that oligonucleotide-based arrays are more reliable for

global screening, as thus give a more accurate and comprehensive representation of

gene expression profile compared to long cDNA array. While direct synthesis of

oligonucleotides by the photolithographic process offers the advantage of abolishing

the need to hydrolyse the oligonucleotide from its synthetic support and re-attach it to

the microarray, this approach does not allow an independent confirmation of the

fidelity of synthesis. Because of this and because this approach does not allow

purification of oligonucleotides prior to attachment to the microarray, oligonucleotide

chip manufacturing can lead to internal errors. Batch-to-batch variance may also

contribute to data bias, as separate samples are hybridised to separate chips when

using oligonucleotide arrays [Li et. al., 2001]. In order to tackle this problem the

application developed in this project is endowed with a random assignation of chips to

samples process that will be explained later.

-16-

Organon acquired an Affymetrix GeneChip system to perform its microarray

experiments in 2002. This system is made of the following four components:

Probe Arrays or Chips: GeneChip probe arrays are available in human, rat, mouse

and other model organisms. A full range of custom formats is also available.

Hybridisation Oven: The GeneChip 640 Hybridisation Oven can process from 1 to

64 arrays per cycle. The oven delivers precise temperature control for consistent

performance across all probe array applications.

Fluids Station: The FS400 Fluidics Station automates staining and washing of up to

four arrays at once.

Scanner: After processing in the Fluidics Station, probe arrays should be stored as

recommended and transported to a centrally located scanner.

Figure 3. Affymetrix GeneChip Platform [adapted from a figure provided by Affymetrix, 3].

The Affymetrix GeneChip platform had become the industry-standard microarray

experiment solution for genomics research. Initially applied for target identification,

GeneChip RNA expression analysis is being used by innovative biotechnology and

pharmaceutical companies downstream in target validation, lead optimisation and

clinical trials.

Microarray Experiment

The goal of a microarray experiment is to measure and compare the relative

expression levels of thousands of genes in a sample simultaneously. Typically, these

samples compare different stages of the cell cycle, cell types, healthy and diseased

cells or different treatments. A higher level goal of genomic and gene expression

experiments is to identify new genes involved in a pathway, potential drug targets or

expression markers that can then be used in a predictive or diagnostic fashion.

-17-

A typical microarray experiment involves the following steps:

1. Experiment design

2. Biological experiment to isolated total RNA Sample from the biological specimens

3. The Sample is treated and labelled with fluorescent dye

4. Hybridisation of the labelled Sample to the Array Chip

5. Washing, staining, and scanning of the Array Chip

6. Analysis of the scanned image

7. Generation of gene expression profiles

Figure 4. Microarray experiment workflow. A typical microarray experiment workflow involves preparation of the biological samples (orange), array production (blue), in this case is supply by Affymetrix, and array hybridisation, scanning and image analysis (yellow). The actions taking place before and after the microarray laboratory workflow are in green [adapted from a figure provided by Amersham].

Figure 4 shows a detailed schema of the actions performed on a microarray

experiment. Note that the tasks involved in the design and manufacture of the array

itself are not performed at the Organon microarray laboratory. The arrays are provided

-18-

by Affymetrix as a part of an agreement between Organon and Affymetrix. The

GeneChip arrays most widely used by Organon are:

• Human Genome U133A and U133B

• Rat Genome U34A, U34B and U34C (and its new versions)

• Murine Genome U74A, U74B and U74C (and its new versions)

Figure 5. Microarray chip type availability provided by Affymetrix. Phylogenetic tree schematic illustrating GeneChip arrays available today [adapted from a figure provided by Affymetrix, 4].

Microarray technology is associated with handling great amounts of data generated at

different steps during the respective microarray experiment. This data must be

processed and stored appropriately for the evaluation of experimental results. Several

standardisation approaches have been developed for the description of microarray

experiments during recent years by various institutions and companies. The most

promising approach is the MIAME (Minimum Information About Microarray

Experiments) standard, an international initiative supported by EBI (European

Bioinformatics Institute) whose aim is to provide a standard defining the required

minimum information that has to be stored and transferred for a gene expression

microarray study. MIAME is not a formal specification, but a set of guidelines. An

explanation of MIAME and other standards is given on chapter 6.

-19-

The use of standards to integrate genomic analysis throughout drug discovery and

development is important for transforming the drug discovery pipeline paradigm and

will allow researchers to meet the challenges ahead. Figure 6 illustrates some

examples of how the microarrays genomic experiments are used in the different stages

of the new process of drug discovery.

Figure 6. Modern pipeline of drug discovery. Some examples of genomic experiments involving the use of microarrays at different stages of the process are also presented [adapted from a figure provided by Affymetrix, 5].

“The ultimate challenge to the bioinformatics community is the intelligent integration

of data from many interrelated sources, which will be necessary to take greatest

advantage of the knowledge in the data” [Searls, 2000]. It is this integration that

enables scientists to turn data into knowledge for answering complex questions in

system biology and drug discovery.

A complete system for expression arrays requires the implementation and

development of different experimental protocols, databases and bioinformatics tools

for data collection and analysis. Because of the recent advances in computational and

statistical techniques, many scientists are focusing on the analysis of microarray data

and developing models from these data. At the same time, issues of data collection,

quality, and standards remain major bottlenecks to obtaining useful and applicable

results [Bobashev et al., 2002]. The aim of this project was focus on the development

of a Laboratory Information Management System (LIMS) that implements these

pending issues to avoid any possible bottlenecks while performing the microarray

experiments.

-20-

Chapter 3

Laboratory Information Management Systems

(LIMS) and Microarray LIMS

Laboratory Information Management Systems (LIMS)

The task of managing laboratory data is not a new one. Over the past two decades the

use of LIMS has revolutionised how laboratories manage their data. A LIMS is more

than software; it has become the workhorse of the laboratory encompassing laboratory

workflow combined with user input, data collection, instrument integration, data

analysis, user notification, delivery of information and reporting [Turner, 2001]. The

essential concept of a basic LIMS is that of a computer system which would automate

the clerical activities associated with the processing of the analytical results,

improving accuracy and turnaround times to an acceptable level. LIMS is a technique

independent of discipline and has applications in any industry where laboratory

analysis is important, from Healthcare to Food & Drink and Pharmaceutical

industries. A typical LIMS computer system bridges the gap between the analyser and

the company’s financial and administrative mainframes in all but the smallest labs.

Most LIMS require considerable customisation to meet the needs of a specific

laboratory. A customised LIMS will focus on the special aspects of their users needs.

Differences in research and development or production chain in the individual

organisations lead to an increase of the interest in customized systems [Bund et al.,

1998]. Usually, also for customized systems, the core software is commercially

available, although this is not the case for microarray LIMS, the applications on the

market are not flexible or simple enough to be easily adapted to the client laboratory

needs.

-21-

Many of the most popular commercial LIMS take advantage of open systems

architectures offering client-server capabilities and enterprise-wide access to lab

information with web-based front-end. The development of microarray technology

gives place to a new kind of laboratories and experiments that are now part of the new

process of drug discovery. Thus the requirements of research groups and laboratory

workers are changing. In the last few years microarray laboratories had been created

in most of the big R&D companies and with them the need of new LIMS, microarray

LIMS, is increasing.

A LIMS system can be understood from different viewpoints:

• To an analyst, LIMS is indeed the computer system which interfaces to his

analyser, computes, stores data, and prints results;

• To a laboratory manager, it is the system which lets him track samples, identifies

their current status, audits their turnaround times, and provides better data on

usage than he could ever have obtained from the best-organised of paper records;

• To a management information systems analyst, however, LIMS can and must be a

feeder system, passing resource management data to the corporate mainframe.

Microarray LIMS

A reasonable working definition of the role of analytical laboratory is that it must

deliver accurate, understandable results to the originator of the request for analysis,

within a suitable timescale. As mentioned earlier for a microarray laboratory such an

operation entails the sequence of receiving the biological samples and chips, ordering

of chips if needed, assigning corresponding chips to samples, processing samples,

hybridising, checking the results, (if necessary re-hybridise the sample), passing this

information to the bioinformatics team for analysis and issuing a report to the

requester. It is important to note that much of this cycle relates, not to analysis or

hybridisation, but to the clerical handling of elements and results of the

hybridisations. A microarray LIMS not only should guide the lab worker to perform

this tasks, but it is also an indispensable tool for the laboratory manager to track

resources, to complete statistical QC/QA routines, document and summarise resource

-22-

utilisation within the laboratory. Such factors are key when very high cost genomics

experiments are being completed.

The following academic data management for microarrays systems were evaluated for

its suitability for Organon microarray laboratory requirements. MADGE [Kokocinski

et al., 2003], a data management software for cDNA microarrays. BioArray Software

Environment (BASE) [Lao et al., 2002], a more flexible platform for comprehensive

management and analysis of microarray data. And finally, QuickLIMS [McIndoe et

al., 2003], a LIMS system to manage data for the DNA-microarray fabrication.

Although this projects are LIMS participating on different stages of the microarray

process, none of them cover the microarray laboratory workflow data management

specifically as Chip Tracker does, making this project novel and necessary. Some of

these systems do not appear in the following table due to its recent release.

Figure 7. Mainstream Academic Microarray Software. The table show the lack of LIMS systems in the field [3rd Millennium].

The commercial options for microarray LIMS are also very few. There is a new

product by Amersham Biosciences, Scierra™ Microarray Laboratory Workflow

System, which is a complete management system for microarray experiments and

gene expression data which includes LIMS aspects. This software is probably the best

option and is not included in the following table due to its recent release into the

market.

-23-

Figure 8. Mainstream Commercial Microarray Software. As shown there is also a lack of LIMS systems in the market. The systems shown covering LIMS functionality are not designed for a microarray laboratory workflow but for a microarray experiment workflow [3rd Millennium].

The general lack of software in microarray LIMS is because this is a new and

emerging field. At the Scottish Bioinformatics Forum 2003 which took place in July

at the National e-Science Centre in Edinburgh, Professor David Gilbert, Director of

the Bioinformatics Research Centre at the University of Glasgow talked about the

present challenge for Bioinformatics to close the gap between computational (in-

silico) and wet-lab research, and the growing need of better LIMS to really contribute

to R&D in biomedical and life sciences.

-24-

Chapter 4

Chip Tracker and the Organon Microarray

Experiment Process

The Organon Microarray Experiment Process

The Organon Microarray Experiment Process utilises four computer systems,

ExpAnD, Chip Tracker (now been incorporated), developed in-house, plus Rosetta

Resource Tracker and Resolver that were developed by Rosetta Biosoftware. With the

integration of these four systems and the adherence to standard operating procedures

in the laboratories, the Organon Microarray Experiment Process is a MIAME

compliant platform for the storage, normalisation, presentation and publication of data

obtained from microarray experiments.

The microarray experiment process requires the input from a multidisciplinary team

of people. The laboratory researchers design their chip experiments with help of the

CSB (Chip Statistics and Bioinformatics team). They are responsible for defining the

experimental annotation and generating the RNA for the hybridisations. The

microarray lab workers, who are members of the Chip team, are responsible for

creating labelled, fragmented cRNA from the original RNA and hybridising it to the

chip. The CSB and Bioinformaticians are responsible for analysing the data. The

Bioinformaticians work with the Systems team to maintain the databases and

infrastructure of the platform.

Standard Operating Procedures documents that enable the users and Chip team to

carry out microarray experiments have been publish online on the intranet. The

procedure steps required for the process are the following (see figure 9):

1) Experiment Design

2) Sample collection, RNA extraction and QC

-25-

3) Data entry into ExpAnD, and shipment to the microarray laboratory

4) Chip team process (guided by Chip Tracker and Resource Tracker)

5) Data retrieval from Resolver and Data analysis

Figure 9. Organon Microarray Experiment Process dataflow and systems. The figure shows the systems that participate in the Organon Microarray Experiment Process and it interaction.

1. Experiment Design

Experiment design is a critical step carry out by the researchers and the Chip Statistics

and Bioinformatics team (CSB). The final success of the experiment it is strongly

connected with a well defined design.

2. Sample collection, RNA extraction and QC

After experiment design the samples should be collected from the organism under

study, RNA prepared and quality controlled at the researcher’s laboratory.

3. Data entry into ExpAnD and shipment

In order to describe chip experiment details the researcher have to use the Experiment

Annotation Database (ExpAnD) to record all the details about the RNA samples that

will be used to hybridise to the chips. The ExpAnD process ends with a shipment

function that creates a list of XML files (XML LIMS Queue) with all the detail

information of the samples that have been sent to the microarray laboratory.

-26-

4. Chip Team Process (guided by Chip Tracker and Resource Tracker)

The Chip Team at Newhouse receives submissions of samples from the remote and

local research laboratories; both the Chip Tracker and Resource Tracker systems

automatically parse the XML LIMS Queue with all the information of the arriving

samples. We have developed the Chip Tracker XML Parser component that scans a

server directory for new experimental samples data that are uploaded into the Chip

Tracker Database without requiring an operator’s assistance. This process runs every

hour although its regularity can be set to a different interval or be stopped and

manually executed. This component will be explained in greater detail on chapter 6.

The Chip Team perform further QC on the samples and then use the total RNA to

perform: cDNA preparation, in vitro transcription, cRNA fragmentation, hybridisation

mix preparation, and finally hybridisation (unlike in two-channel cDNA arrays, a

single sample is hybridise on a GeneChip). After hybridisation and scanning, visual

chip QC and assessment of some QC parameters is performed, data that pass

requirements are transferred by FTP to the Rosetta Resolver server and automigrated

into the database. Figure 10 highlights the hybridisation tasks involved in the

hybridisation process that were previously enumerated in this paragraph. These are

the actions performed by the microarray lab worker with the guide of the Resource

Tracker system.

Note that in addition to the hybridisation tasks the microarray lab workers have to

perform management and tracking the biological samples and chips. These

management tasks (now performed by using Chip Tracker) together with the

hybridisation tasks conform the microarray laboratory workflow.

5. Data retrieval from Resolver and Data Analysis

Rosetta Resolver is Organon’s main chip data analysis software, and it is the

application that the users will first see their chip data on. The data analysis procedures

will largely depend on the design of the experiment and before actual data analysis to

identify genes of interest is started, some thought should be given to management of

the intensity hybridisations in the context of the experiment setup.

-27-

Figure 10. Hybridisation Process. Highlights the principles of the standard eukaryotic assay. These are all the steps of the hybridisation process that the microarray lab workers at Organon have to go through in order to finally get the scanned image to be analyse by the bioinformatics department. The process can be summarised as oRNA à cDNA à lcRNA à flcRNA à Hybridisation à Wash and Stain à Scanning [Affymetrix, 1].

At the moment the through put of the microarray laboratory on the experiment process

is about 140 chips per month, with a turn around time for experiment of about a

month. With the help of the recently installed systems, Chip Tracker and Resource

Tracker, the aim of the Chip team is to reach a through put of over 180 chips per

month with a turn around time of less than 2 weeks, limited by the resources.

Further explanation on the functionality of the Resource Tracker, ExpAnD and

Resolver systems, their interaction and how the information flow through is given on

the integration with existing systems section in the next chapter.

-28-

Microarray Laboratory Workflow and Chip Tracker Scope Chip Tracker forms part of an entire platform. This project is the latest to be added

and it completes the data management aspects of the microarray platform. Chip

Tracker was designed to follow the natural workflow of the microarray laboratory

worker. The main features of the Chip Tracker Microarray LIMS are:

• Manage and administrate the stock of samples and chips.

• Workflow management.

• Automated microarray experiment data collection and notifying the Chip team

of what chips are required for samples that are en route.

• Accurate tracking of chips and samples.

• Pre-populate and prompt purchase orders of the chips required to perform the

experiments for recently shipped samples.

• Load newly purchased chips into the system.

• Assign chips to samples for hybridisation.

• Log chips with the hybridisation results.

• Create custom reports and statistical analysis on the microarray facility usage.

Experimental information entered by the researcher is captured by the ExpAnD

system and is therefore not part of the microarray laboratory workflow and it is not in

the scope of the Chip Tracker system.

At the microarray laboratory most of the chip and sample tracking is now going to be

done by Chip Tracker, but the tracking of the extracts of the original RNA samples

while performing cDNA preparation, in vitro transcription, cRNA fragmentation,

hybridisation mix preparation, and the final hybridisation is done with the Resource

Tracker system. Although these tasks are part of the microarray laboratory workflow

it was not necessary to include them in the scope of this project. Resource Tracker is

an add-on component that fully integrates the Rosetta Resolver system, which as

mentioned earlier, was specifically developed by Rosetta for the Organon microarray

laboratory. Resource Tracker interacts with Resolver database and client interface

directly.

-29-

Figure 11 describes the Organon microarray laboratory workflow. Apart from the

RNA extractions during the hybridisation process, every step of the workflow is

guided by Chip Tracker. By looking at the figure you will be able to understand how

data flows through the Chip Tracker application.

Figure 11. Microarray laboratory workflow. This are the features cover by Chip Tracker to complete the Organon microarray laboratory workflow.

The following images (figures 12a and 12b) capture the scope of the Chip Tracker

project. Figure 12a was taken from the Chip Tracker Unified Modelling Language

(UML) model we developed in order to document and guide the project designing.

Figure 12a. Chip Tracker Use Case Diagram (taken from the Rational Rose Chip Tracker model).

-30-

Figure 12b illustrates how Chip Tracker interacts with the rest of the Organon

Bioinformatics infrastructure and what is the scope of activities of the application and

microarray laboratory.

Figure 12b. Chip Tracker Scope and Interaction. The image illustrates the interaction between the microarray platform systems and the activities perform at the microarray lab. Some of the main features of the Chip Tracker system are shown in orange arrows.

The retrieval of data from the scanned image and the posterior analysis of the

microarray data are perform by the Bioinformatics team. Organon had acquired

Rosetta Resolver, which is one of the best gene expression data analysis software

solutions in the market to help the bioinformaticians with these tasks.

As Chip Tracker is managing, tracking and administrating the chips and samples, and

Resource Tracker is tracking the RNA extracts along the different experimental steps,

all the functionality requirements of the Organon microarray laboratory workers are

completely fulfilled and systemised.

-31-

Chapter 5

Chip Tracker Design and Architecture The Chip Tracker microarray LIMS was designed to cover the management aspects of

the Organon microarray laboratory workflow and to be integrated with the existing

Bioinformatics infrastructure. Although the individual components of the system are

connected with specific laboratory environments, some general principles have guided

the project design. These include the compliance to the actual microarray software

communication standards and the use of an industry standard relational database

management system combined with platform-independent web browser interface for

data entry and retrieval. Chip Tracker is a web application that was designed using a

three-tier server model. A carefully designed user friendly web interface allows the

lab workers to gain password-protected access to the system remotely. The

functionality provided by Chip Tracker will help them to manage and perform the

steps in the microarray laboratory workflow.

The Chip Tracker has a three tier structure composed of a: Presentation tier, a Middle

(Application) tier and a Data Management tier. The Presentation tier, which is the

client side tier, is the Graphical User Interface (GUI) - an HTML based visual display

generated dynamically by Active Server Pages (ASP) that serves as the portal for the

lab worker to interact with the application. The Middle tier acts as an application

server, implementing the microarray experiment workflow logic in ASP and accesses

the Chip Tracker Database. A Visual Basic component to automate data collection of

the Samples being sent to the laboratory, called Chip Tracker XML Parser, is also

included in this tier. And finally, the Data Management tier running Oracle 8i

Relational Database Management System (RDBMS) where we have designed and

implemented the Chip Tracker Database, a simple database schema to model the

microarray laboratory workflow.

-32-

The Chip Tracker application was designed with a dual purpose, it can be used as a

stand-alone system or it can be fully integrated with the existing Organon

infrastructure. The GEML data exchange format, a standard for communicating

information about microarray experiments, was used to integrate Chip Tracker with

the existing Organon systems.

Figure 13. Chip Tracker three-tier architecture diagram.

The decision for creating a three-tier architecture, which moves some of the

processing out of the server into the clients and has a separate data access layer, was

based on the following:

• Centralised management of the application, that can be reused or easily modified.

• Concentration and distribution of requests from the client, so the performance is

better and the application is scaleable.

• Flexible hardware architecture, allowing all three layers to be separately distributed

or replicated if required.

-33-

• Integration of existing systems. The Chip Tracker XML Parser implemented in the

middle layer provides access to other microarray applications and hides

complexity.

• The middle layer offers a transparent access to underlying systems and contains

inter-system functionality.

• The middle layer also takes care of locating resources, accessing them and

gathering results.

This architecture would enable us to develop a better and more sophisticated

application with the only disadvantage that the resulting software can be more

complex and becomes more difficult to understand.

The choice of technologies used in this project is based on the Organon technical

standards and is a result of a common agreement between ourselves, Organon’s

Bioinformatics and Software Development teams. The Chip Tracker was the first

developmental project conducted in this way.

Presentation Tier (Front End) This layer represents the primary interface to the user. Special attention was made in

designing a user friendly Graphical User Interface (GUI), suitable for biologists

working at the microarray laboratory. The microarray laboratory workers had constant

input throughout the life cycle of the project to ensure that the system met their user

requirements.

Another advantage of the Chip Tracker three-tier architecture is the implementation of

what is known as Thin Client, which means that the clients implement only the

graphical user interface leaving the server side with the implementation of application

logic and the data management. A thin client makes the installation of the system

simpler because there is nothing to be installed on the client computer; a web browser

is all that is needed. This distribution allowed us to use the computing power at the

client for sophisticated presentation, which do most of the control on the application

-34-

server side and therefore increased the performance, control, maintenance and

integration.

The technology used to create the GUI includes:

• Hypertext Markup Language pages (HTML) which are generated dynamically by

ASP (Active Server Pages are dynamically processed by the web server before

being sent to the client),

• Hypertext Transfer Protocol GET and POST requests (HTTP),

• Uniform Resource Locator encoding (URL),

• JavaScript,

• Stylesheets.

The use of HTML forms is the most common way to communicate data from the

presentation to the middle tier. Arguments can be passed using the methods HTTP

GET, where the value of the form fields are encoded in the URL, which is not secure,

and HTTP POST, a more common method used to send form information in a hidden

way. The latter method, POST, is the one mostly used by Chip Tracker to send

information to the server or between pages.

JavaScript embedded inside the HTML pages it is used to add further functionality to

the presentation tier. This functionality includes the validation of form input fields and

browser control as the display of pop-up alert messages or control the characters

inserted into the system from the clients keyboard (i.e. allow only number keys while

entering a numeric filed). By utilising client-side JavaScript in this way the

communication between the client machine and the web/application server is

minimised and the database field data type constraints enforced. In order to give Chip

Tracker interface the Organon look and feel, especially for the fonts and colours, we

created a Cascading Style Sheet (CSS) with the definition of how to display the

system HTML documents.

Many technological innovations rely upon User Interface Design to elevate their

technical complexity to a usable product. Technology alone may not win user

acceptance and subsequent usability. The User Experience, or how the user

experiences the end product, is the key to acceptance. By following a Prototyping

-35-

approach and performing User Usability Testing throughout the design process we

were able to ensure a user optimised interface. This empirical testing permitted the

users to provide data about what does work as anticipated and what does not work.

Figure 14. Interface features. The image illustrates some of the interface features and its explanation. These features had been specifically designed to make the Chip Tracker a microarray user friendly tool to guide the user along the microarray laboratory workflow.

A good User Interface Design can make a product easy to understand and use, which

results in greater user acceptance and facilitates the system usage and incorporation in

microarray laboratories. Form the User Acceptance Test (described in chapter 8) we

received a good feedback about the interface, to the point that the users claimed that

they had no need to use the help pages in order to understand how to use the system.

Data entry fields and validations behave automatically according the laboratory

workflow. Features implemented to guide the user along the different pages of the

system shown in figure 14 also include; inhibited field control, specific default values,

entry data control. Most of these features have been implemented in JavaScript code.

-36-

As mentioned earlier the presentation tier acts as a portal to allow the users to store,

retrieve and analyse the data in the Chip Tracker Database. The following image

represents the site map of the Chip Tracker interface.

Figure 15. Chip Tracker interface sitemap.

The functionality and information presented in the Chip Tracker interface is detailed

in chapter 7, where every page of the system is described in the framework of the

microarray laboratory workflow.

Middle Tier The main functionality of the Chip Tracker application is implemented in this tier,

which acts as an application server. This includes:

• Encoding of the microarray laboratory workflow logic in ASP being administered

by Internet Information Server (IIS web server).

• Connection to the Chip Tracker Oracle 8i database.

• Automatic sample data collection from ExpAnD system.

• Accept form input from the GUI in the presentation tier.

• Generation of output pages and reports for the presentation tier.

The Chip Tracker workflow logic is encoded in ASP pages. The ASP uses both

VBScript, which stands for Visual Basic Script, and JavaScript to provide application

logic on the server and client sides respectively. The VBScript scripting components

implement the server-side processing of the page (i.e. the page is dynamically

-37-

processed by the web server before being sent to the client). Where as JavaScript

implements the client-side processing. The ASP pages work in the following way,

first form data is posted via HTTP or via a URL from the client browser to an ASP

page on the application server. The ASP script is executed via ASP.DLL on the web

server and database access performed on the database server; finally a formatted

HTML page is sent back to client browser including the results of the script execution.

Connection to the Chip Tracker Database is carried out by using standard Structured

Query Language (SQL) commands that are passed to an Open Database Connectivity

(ODBC) driver. The ODBC driver is the standard protocol for accessing information

in SQL database servers developed by Microsoft Corporation. This database driver is

installed in the middle layer, between the application interface and the database. The

purpose of this is to translate the application’s data queries into commands that the

RDBMS understands. Both the application and the RDBMS must be ODBC-

compliant for this protocol to work.

The Chip Tracker system accesses its database by 2 different types of queries. Static

queries are used for standard queries such as checking chip availability or insert

arriving chips into the corresponding tables. Other queries are created on the fly

depending on the conditions selected by the user to build a custom report for example.

All the accesses to the database are carried out using SQL queries that are created or

embedded in the ASP pages.

The implementation of Chip Tracker uses Internet Information Server (IIS) as a web

server to administer the ASP and HTML pages. The use of IIS and ASP make it easy

to access data and put it on a web page. Chip Tracker use ASP to make decisions

about what to display on the interface web pages.

In order to collect the sample information stored in the ExpAnD system automatically

a small Visual Basic application named Chip Tracker XML Parser was developed and

installed in the application server. The XML Parser automatically scans a server

directory for new experimental samples data. If a new data file is found it is uploaded

into the Chip Tracker Database without requiring an operator’s assistance. This

component is discussed in greater detail in the next chapter.

-38-

Database Management Tier (Back end) The Chip Tracker Database was designed to track Chips and Samples objects. The

complete information about the microarray experiment is replicated in ExpAnD and

Rosetta databases, so only the necessary information to manage and track the samples

shipped it is imported by the XML Parser into the Chip Tracker Database. This, in

addition to the information of the chips in stock (available chips in the microarray lab)

and chips being ordered, is all that is necessary to track the state of the chips and

samples and manage the pairing of both. The resulting database schema is simple and

contains 6 tables that model the flow of data and change of states of the samples and

chips during the microarray laboratory workflow.

Organon uses Oracle as the RDBMS of choice. The Chip Tracker Database it has

been implemented in Oracle 8i, a powerful and robust relational database. However,

the system does not rely on any oracle specific functionality, so the schema can be

ported easily to a variety of SQL-compliant databases. This in addition to its

architecture and simplistic database structure makes Chip Tracker suitable to be used

as a stand alone application.

Oracle 8i is ODBC-compliant. Queries embedded in the ASP pages and queries

created on the fly, are passed through the ODBC driver into the Oracle RDBMS.

Once the query is resolved, the answer is passed backed as recordsets. The recordsets

of table rows are then processed in the ASP to perform some functionality or to be

presented as a report to the user.

-39-

Figure 16. Chip Tracker Database schema. The database design was done in UML using Rational Rose to guide the designing process.

The Samples table stores the data necessary to track the samples being shipped into

the microarray lab. When the XML Parser uploads the sample .xml file into the Chip

Tracker Database, a new record is created for every chip type required for

hybridisation, this information is specified in the sample data file. The default value

for samples status is ‘standby’, with chip_ID attribute of null, which means that is not

been assigned to a chip for hybridisation.

The Chips table has the information of all the chips in the microarray laboratory,

available or not, assigned to samples or not, used or not. The purchased chips arrive in

the lab on packs of five identical chips and are inserted into the Chips and Batches

tables by the Chips Arrival functionality of the system. All the chips in a pack belong

to the same fabrication batch. Similarly to the arrived samples, the chips default value

for status is ‘standby’ with sample_ID of null, which means that the chip is not

assigned and therefore it is available. The information inserted in the Batches table is

rarely modified and keeps record of the total amount of chips that enter the lab. This

table is only updated if the lab workers find a physical damage on the arrived chips

-40-

and decide to delete those chips from the batch to send them back to the supplier.

Packs of chips with the same batch number can arrive in different shipments, what is

certain is that packs with the same batch number will be of the same chip type and

will have the same expiry date. To track the different batches delivered into the

laboratory the Batches table key consists of the batch_ID, which is the batch number,

and the arrival date.

The OrderedChips table keeps record of chips that have been purchased, but not

delivered to the lab yet, i.e. they are en route. This table holds a record for every chip

type with the amount of chips of the type expected to arrive, the date of the last order

made and the user_ID of the person who made the order. Every time a request for

Purchase Order of chips is emitted by the Chip Tracker system the information in the

OrderedChips table is updated. In the same way, every time a batch of chips is loaded

in the system a subtraction on the amount of chips ordered is performed on the

OrderedChips table.

The Chiptypes table is a lookup table with the information of the chip types available

from Affymetrix for researchers designing microarray experiments. This table is a

replica of the Chiptypes table in the ExpAnD system. If the new chip type appears in

the market a manual process run by the ExpAnD administrator will update both

tables. Because the chip type alias is encoded within the sample information being

exported into the system both tables must have exactly the same records.

Finally, the ChipTeamUsers table stores the usernames/passwords and other details of

the microarray lab users for authentication. The table also records some of tasks

performed by the lab users such as purchase of chips or log of the hybridised chips.

Every page of the system that is accessed is authenticated using the data available in

this table.

Integration with existing Systems

ExpAnD Expand is Organon’s Experimental Annotation Database. Expand was built to enable

scientists to enter their experimental annotation and sample processing instructions

into one centralised resource. The Experimental annotation captured in Expand

-41-

describes the majority of the information covered in the Experimental Description

section of MIAME.

Users of Expand ship the actual mRNA samples to the microarray laboratory at

Newhouse for processing. The Expand system contains a shipment section that deals

with this process. The Shipment process prompts users to enter the samples that they

wish to send to the Chip Team. On submitting the request a list of .xml files is

produced in Rosetta’s GEML2.2 dtd format corresponding to each of the samples and

ftped to the Chip Tracker and Resolver server creating an XML LIMS Queue which is

used to populate the respective databases.

Rosetta Resource Tracker

The Resource Tracking add-on to Rosetta Resolver allows the Chip Team users to

track a resource, such as a particular type of RNA extract. The user enters information

such as resource type, volume, concentration, and carrier. The add-on automatically

assigns a storage location for the resource and the user provides confirmation that the

resource is deposited in the specified location. The user indicates whether a quality

control check (QC) is to be performed on the resource and if so, the user enters data

pertaining to the QC results. In the Organon microarray laboratory workflow the Chip

Team does QC on every sample arriving into the lab.

Rosetta Resource Tracker, which is part of Resolver, collects the same biological

sample information entered in ExpAnd in the same way that Chip Tracker does by

parsing duplicate XML LIMS Queue. This in effect means that Resource Tracker and

Chip Tracker are pre-populated with the same basic data associated with the mRNA

samples arriving at Newhouse, thus saving the Chip Team from having to actually

enter this data. When the sample actually arrives the Chip Team is able to ‘Receive’

the sample, which is assigned a storage location by Resource Tracker.

Rosetta Resolver

The Rosetta Resolver® Analysis System is a commercial software created by Rosetta

Biosoftware. It provides a solution for analysing large quantities of expression data

generated by any of the major microarray technologies. Resolver combines advanced

-42-

analysis software, a high capacity database, and high-performance server hardware to

enable users to store, retrieve and analyse large volumes of gene expression data.

Chip Tracker Deployment

Although Chip Tracker can be used for multiple users simultaneously it has not been

developed with that purpose, and as a single user application its server requirements

are not demanding.

Figure 17. Chip Tracker and Organon bioinformatics infrastructure deployment.

Note that if Chip Tracker is used as a stand-alone application it is possible to install

the client and server components on the same server computer. The following is a list

of the minimum computer requirements to install Chip Tracker:

Minimum Client

Any windows operating system (95/98/ME/2000/NT/XP)

Internet Explorer 5.0 or higher

Minimum Application Server

Any windows Server operating system (2000/NT/XP)

-43-

Microsoft Internet Information Server 4.0 or above (IIS)

Microsoft ODBC for Oracle version 2.5

Microsoft Data Access Com (MDAC) 2.7

Minimum Database Server

Oracle 8i RDBMS

-44-

Chapter 6

Chip Tracker XML Parser and the Microarray

Experiment Standards

MIAME, MAGE and other Standards

With the proliferation of microarray databases, there is a growing appreciation for the

importance of analyses across experiments and the need for well-documented

repositories. The Microarray Gene Expression Database (MGED) group

(www.mged.org) is a grass-roots movement to promote the adoption of standards in

microarray experiments and data. MGED developed requirements for the Minimum

Information About a Microarray Experiment (MIAME) to ensure that microarray data

can be easily interpreted and the results derived from the analysis can be

independently verified [Brazma et al., 2001]. Microarray papers submitted to

scientific journals are now required to comply with MIAME standards and provide

supplementary information.

MIAME should also prompt microarray manufacturers and software producers to

develop adequate microarray laboratory information management systems (LIMS),

enabling the production and capture of MIAME-compatible primary data at the bench.

In many cases, it is expected that most of the MIAME information will be recorded

through local LIMS software before being uploaded into central archiving using a

standard format. As such, the development of such MIAME-friendly LIMS software

will be an important task.

The Microarray And Gene Expression Markup Language (MAGE-ML) is a data

exchange format to communicate information about microarray experiments between

local laboratories databases, central archives, stand-alone analysis packages and an

-45-

object model MAGE-OM have been developed for the MGED group. MAGE-ML is

based on XML and can be used to describe microarray designs, microarray

manufacturing information, microarray experiment setup and execution information,

gene expression data and data analysis results. Rosetta Biosoftware were involved in

the definition of these standards.

A predecessor of MAGE is the Gene Expression Markup Language (GEML) that was

submitted to the Object Management Group (OMG) by Rosetta as a proposed

standard for Gene Expression data in November 2000. Along with proposals from

EBI, which consisted of MGED’s Microarray Markup Language (MAML), and a

Corba-based proposal from NetGenics. The three submitters decided to work together

on a joint revised submittal that has become the basis for the Microarray and Gene

Expression Data (MAGE) UML model and DTD (Document Type Definition). As

such, MAGE is now the proposed standard being submitted by Rosetta and EBI (for

MGED) to the OMG. Rosetta Resolver still supports GEML but the current

standardisation effort is focused on MAGE.

GEML is the communication standard used by Chip Tracker in order to automatically

read the microarray experiment setup and execution information submitted on the

ExpAnD system.

The use of standards is essential to manage microarray data. The institution and

adoption of common standards will be of immediate benefit to researchers, scientific

journals and those developing data management systems and tools for data analysis,

and presents a major step toward making such discoveries a reality [Stoeckert et al.,

2002].

Chip Tracker XML Parser and the Gene Expression

Markup Language (GEML)

To explain the Chip Tracker XML Parser we should start describing what the XML

format is. XML stands for extensible markup language, which is a set of rules

whereby new vocabularies may themselves be defined. In some respects it is similar

-46-

to HTML, in that tags are used to encode information, but in HTML the information is

related to the formatting of a document, using a predefined set of tags. In XML, the

tags do not indicate how a document should be formatted, but instead provide

semantic context to the content of the document. XML vocabularies define their own

tags, and thus use XML to hold information in such way that information can be

understood. Because of this, and the wide support that XML has received since its

release as a W3C recommendation in 1998, both GEML and MAML chose XML for

encoding microarray data. Usually a XML document is not a stand-alone document,

but will refer to another document, called the document type definition, or DTD. The

DTD contains a set of rules, or declarations, that specify which tags can be used, and

what they contain. It is the DTD that it is specified in GEML and MAGE-ML. XML

documents created to use GEML will refer to this DTD [Spellman et al., 2002].

The Gene Expression Markup Language (GEML) is a free, public-domain, open-

standard XML DTD (Document Type Definition) for the common expression of

genetic information for storing DNA microarray and gene expression data. The

GEML enables data exchange between a variety of gene expression systems including

web-based genome databases. The GEML format has the following advantages: (1)

Independent of any particular database schema. (2) Keeps track of which data

collection methodology was used, enabling normalisation, integration, and

comparison of data across methodologies. (3) Extensible through the ability to specify

additional name/value pairs. (4) Is XML-based. GEML was created and is licensed in

order to define a single, distinct GEML format and avoid proliferation of incompatible

variations [Hoffman, 2000].

It was decided to develop an XML parser to load biological samples data from any

GEML compliant software into the Chip Tracker Database because the ExpAnD

system was already interacting with Rosetta Resolver version 3.0, which supports

GEML. The process begins with ExpAnD creating an .xml file per sample submitted

to the microarray lab. These files are then uploaded into a remote server directory via

ftp creating the XML LIMS Queue. Finally, the XML Parser component polls the

queue directory looking for files. Every .xml file (GEML22.dtd compliant) found in

this directory is automatically parsed and imported into the Chip Tracker Database.

The XML Parser is a Visual Basic executable developed by us which is launched by

-47-

the application server hourly. The system administrator can also run or stop the parser

manually on request.

Every .xml file encodes a single sample, its experiment details and one to many chip

types codes required for hybridisation. The file name corresponds with the sample

code (the same code printed on the biological sample tube) in the ExpAnD database

and subsequently in the Chip Tracker database. As mentioned earlier, the complete

information about the microarray experiment is replicated in ExpAnD and Rosetta

databases, therefore only the necessary information to manage and track the chips and

samples is imported by Chip Tracker. From the parsed information only the following

attributes are inserted as a new record in the Samples table: sample_id, chiptype_id,

shipment_id, labbooknumber, labbookpage, experiment, project, projectgroup,

responsible and shipmentdate. If the sample file that has been parsed requires (in its

specification) hybridisation to three different chip types, then three different sample

records (one per chiptype required) will be inserted. For example, the file 000702.xml

encodes the 000702 sample information of a particular rat studied at the lab. The

researchers want to investigate the expression of every gene at the moment the sample

was taken, so they require the sample to be hybridised to the whole Rat genome,

which at the moment compromise three GeneChip arrays (Rat Genome U34A, U34B

and U34C). The Chip Tracker XML Parser will parse the file and add three new

records on the Samples table with the corresponding sample_id-chiptype key.

After a file is parsed it is then moved to a Processed samples folder. This is done both

for backup and for error recovery reasons. If a sample has been deleted from the

system by mistake then it can always be loaded back by placing the corresponding

sample file back in the XML LIMS Queue directory. In this way the system is

independent and there is no need to resubmit the sample data from the ExpAnD

system. If an error occurs while parsing a file, the XML Parser creates an error log

text file describing the problems found. The log file will have the present date for a

name and it will be saved in the Processed samples folder to be read by the system

administrator.

The specific experiment information that is imported from ExpAnD is encoded under

the LIMS category in GEML version 2.2. The complete specification of the GEML

-48-

DTD is provided by Rosetta Biosoftware (http://www.rosettabio.com/tech/geml). An

example of an .xml file exported by ExpAnD, the GEML LIMS Category

specification tree and an SQL command showing the import can be found in

Appendix A.

AGAVE and BSML are other open XML data standards created to facilitate the

interchange of data from diverse technologies, but as mentioned on the previous

section the current standardisation effort is focused on MAGE. Rosetta announced

that the next version of Resolver, expected for 2004, will also be MAGE compliant.

The implemented microarray data exchange format standard that allows interaction

with any other system following the standard, in addition to the three tier architecture,

makes the Chip Tracker a flexible and versatile piece of software.

-49-

Chapter 7

Understanding Chip Tracker Features

The application includes many advanced features to help the user perform the

management and tracking of the biological samples and chips in the laboratory

workflow. The following are the Chip Tracker features described in this chapter:

• Secure Logging into the system

• Management and Tracking of Chips and Samples in Stock

• Pre populated prompt of Purchase Order of the necessary Chips

• Loading Chips on the Arrival to the laboratory

• Assignation of Chips to Samples

• Log results of Hybridised Chips

• Create custom Reports and Statistics

It is helpful to understand how the data flows and is modify through the application

from the moment that it is uploaded. The following microarray laboratory workflow

illustration is represented with the order and complete name of the Chip Tracker

features:

Figure 18. Order of the Chip Tracker features shown on the microarray laboratory workflow.

-50-

Logging On and Off

One of the advantages of the Thin Client structure of Chip Tracker is that it only has

to be installed on the server side, therefore in order to use the system the user requires

to obtain his assigned user name and password from the system administrator and

have a web browser installed in his/her computer. To log into the system the user will

navigate to the corresponding Chip Tracker URL, also assigned by the system

administrator, and enter his username and password. The Login page authenticates the

data consulting the ChipTeamUsers table in the database and starts a session

instantiated with the user details. Once the user is logged in, the Chips in Stock page

will appear as a default. The top right corner of the screen will always display the date

and the name of the user logged.

Figure 19. Snapshot of the Login page. Chip Tracker not only authenticates the user when logging into the system, but also on

every page accessed, making a more secure system. The application state or session,

which holds the logged user information, is controlled by hiding the information

within the dynamically created web pages. In this way some of the actions taken by

the microarray lab worker, such as log hybridised chips, are registered under the users

-51-

name for further controls or reports. After 30 minutes of inactivity the system will log

out the user for security purposes.

Most of the microarray laboratory workflow require tasks to executed by a single

worker. It is therefore this person who annotates the action taken or results in the Chip

Tracker or Resource Tracker systems depending on the stage of the process.

Chips in Stock

The principal requirement and purpose of this system for the Chip Team was the

management and tracking of Chips being use in the microarray laboratory. The system

default page is ‘Chips in Stock’, and it shows the actual availability and requirements

of Chips in the microarray lab. The information displayed is the result of various SQL

commands embedded in the active page. New Chips that arrive are available for

assignation to a particular experiment. Chips loaded into the system will originally

appear with a ‘standby’ status. If a Chip has been used for a hybridisation, it will be

logged with a ‘passed’ or ‘failed’ status. Then the Chip is not in ‘standby’ status

anymore, and therefore will not be counted as a Chip in stock. The ‘Chips in Stock’

page shows only the batches of Chips which are available for hybridisation. The data

is grouped by Chip type; at the moment Organon works with 13 different chip types.

The following information is presented for every Chip type:

Batch Number: Every batch of Chips Affymetrix delivers has a factory number

printed on the Chips for future identification. The same batch may arrive with

different deliveries, but will always have the same expiry date.

Total: This is the total amount of Chips that were originally in the batch at the

moment of arrival.

In Stock: Refers to the amount of Chips available in ‘standby’ status, i.e. those

chips which have not yet been assigned to a Sample.

Special: Chips marked as special refers to those chips which have been reserved for

a specific experiment by the Chip team. Chips can only be marked as special

through the ‘special assignation’ option in the ‘Assign Chips to Samples’ page of the

system.

-52-

Assigned: This is the total number of Chips that are assigned to Samples but are on

‘standby’ status. This may include chips which are in the process of being

hybridised or those chips which have been hybridised, but have not yet been logged

as ‘passed’, ‘failed’ or ‘fail & redo’ in the ‘Log of Hybridised Chips’ page.

Ordered: Refers to the total number of Chips that were included in Purchase Orders

but have not yet arrived into the microarray laboratory.

To be Order: The number of Chips that have to be purchased to accommodate the

number of Samples in stock as populated from ExpAnD. These Chips will be

automatically included in the next Request for Purchase Order. The number of Chips

to be order is calculated by counting the amount of Chips needed to pair the samples

in stock having already counted the chips available in stock and the chips already

ordered.

Figure 20. Snapshot of the Chips in Stock page. The Chips in Stock page informs the Chip team how each batch of chips are being

used, how many are available to use (chips having a ‘standby’ status and not

assigned), the number of Chips already hybridised, the number of Chips marked as

special (reserved to a specific experiment), the Chips expected to arrive and the Chips

which need to be purchased based on the samples presently in stock. If any Chip

-53-

batches in stock are within two months of their expiry date a warning message will

appear. This control is part of the system logic that has been programmed to ensure

that the Chip assignation process will automatically give priority to older Chips,

therefore avoiding the waste of such an expensive commodity due to expiration.

The user is able to create a Request for Purchase Order for chips automatically by

clicking the Create Purchase Order icon shown on the bottom right of the ‘Chips in

Stock’ page. How the Purchase Order is pre-populated will be explained later in the

chapter.

Chip Type Page. By clicking on the chip type name (i.e. Human Genome U133 A) in

the Chips in Stock page, the user can access the Chip Type page (figure 21) where

detailed information on all the available batches belonging to the chip type selected is

presented.

Figure 21. Snapshot of the Chips Type page. The information presented includes chip arrival and expiry dates and the total amount

of Chips in the batches corresponding to the selected Chip type. Also included are the

amount of assigned chips, date of assignation, the sample code and project that are

-54-

assigned to the chip, the status and date of hybridisation of Chips already used, the

user name of the scientist that completed the hybridisation and the amount of

remaining available Chips.

From this page the user is able to mark some of the available chips as special. By

doing this, the lab worker ensures that the marked chips can not be assigned to any

other experiment and will remain reserved until they are assigned using a special

assignment function or made available again by clicking on the ‘Set free’ icon. The

marked Chips are updated in the Chips table with a ‘special’ status. This feature not

only serves as a booking facility, it is also associated with a strategy to minimise

certain effects of variability on a microarray experiment known as Block design. This

will be discussed later in the chapter when we explain the Assigning of Chips to

Samples functionality.

Chips which have been entered into the system and subsequently are found to be

damaged and unusable can be removed from the system. From the ‘Chip Type’ page,

the user is able to delete chips from a batch. The Chips and Batches tables are then

updated with the new information about the amount of chips available in the batch.

Only non-assigned chips can be marked as special or deleted.

Samples in Stock

Samples shipped and submitted from researchers are automatically uploaded from

ExpAnD into the Chip Tracker Database and can be viewed in the ‘Samples in Stock’

page. This page presents a list of all samples to be hybridised. Once a sample has been

hybridised to a Chip, the hybridisation result will be logged into the system and the

sample information updated to a ‘pass’ or ‘fail’ state, similarly for the chips. The

sample will not appear in the list of Samples in Stock. The following information is

presented per every sample (see figure 22):

Sample Code: This is bar-code printed on each sample tube. ExpAnD, Resource

Tracker and Resolver systems all use this same code to identify the sample.

-55-

Chip type: This is the Chip type requested to be hybridised to the sample. The

assignation functionality of the system will only choose this type of chips to be

assigned to the sample.

Project: Refers to the name of the project that includes this sample.

Group: This is the name of the research group who submitted the sample to be

hybridised.

Responsible: The name of the researcher that submitted the sample data from

ExpAnD and shipped the biological sample to the microarray lab.

Assign to batch: If the sample has been assigned to a particular Chip then this

column will show the batch number of the corresponding chip. A flag will appear if

the assigned chip is marked as ‘special’.

Figure 22. Snapshot of the Samples in Stock page. Affymetrix GeneChip arrays do not have a unique identification number printed on

the Chip cartridge. Samples are therefore not assigned to a particular chip but to a

particular batch where the chip belongs. Several different batches from the same chip

type may populate the stock.

-56-

When Chips arrive and are loaded into the system, a unique id is assigned to every

chip as a means for the Chip Tracker system to track every individual chip and

sample. Samples already have a unique sample code. While assigning a sample to a

Chip, the system will update the sample record with a chip_id from one of the chips in

the assigned batch, and the chip record will be updated with the corresponding sample

code. In addition to this, the assignation date will also be updated from null to the

present one. Note that both chips and samples assigned will remain in ‘standby’ status

until they are logged as hybridised.

Sample Page. By clicking on the sample code (i.e. 000872) in the ‘Samples in Stock’

page, the users can access the Sample page (figure 23) where detailed information

of the sample is presented. The information shown on this page refers to the detailed

information of the sample such as Shipment Date, Laboratory Book and Page,

Experiment name, Chip types requested, Status, Assigned batch number and Assign

Date.

Figure 23. Snapshot of the Sample page. A single sample can request one or more different chips from a set for hybridisation

(i.e. MG_U74Av2 or Bv2 or Cv2). A single sample is used on up to three Chips. Each

-57-

sample, or extract of it, is called a Prep (preparation) and every prep can be assigned

and hybridised independently and at different times.

If a sample does not pass the QC criteria as set by the Chip Team, the responsible

researcher will be asked to resubmit the sample. The original sample will be deleted

from the system by checking the delete box on this page. In the same way, if the prep

was assigned to a batch which was found to be defective, then it is also possible to un-

assign the prep from this page.

Prompt of Purchase Orders

This process has been systemised in order to guide and manage the Purchase Orders

that are being sent from the microarray laboratory. Previously, each time there was a

chip requirement, the microarray lab worker would fill out a Request for Purchase

Order form. This system creates a pre-populated purchase order form and so reduces

the probability of added human error.

By clicking on the ‘Create a Purchase Order’ icon at the bottom right hand side of the

Chips in Stock page the order will be automatically created (figure 24) to include all

the Chips that are necessary for hybridisation to the samples arriving in the lab. These

Chips which were previously in the ‘To be Ordered’ column of the Chips in Stock

page, will now appear in the ‘Ordered’ column.

Affymetrix supplies its GeneChips in single batch packs of 5 chips. The system

automatically calculates the amount of packs necessary for every chip type included

in the order. If the user would like to exclude a certain chip type from the order, to be

ordered later perhaps, then the row corresponding to the chip type can be deleted by

clicking on the Erase this row icon.

-58-

Figure 24. Snapshot of the Request for Purchase Order page. This form is pre populated to reduce the chance of any error added. The form is only partially pre-populated and the user has to complete the remainder of

the order before clicking the Accept & Print icon to print out of the purchase order

ready. On acceptance, the system updates the OrderedChips table with the amounts of

chips requested per chip type.

Chips Arrival

On arrival into the microarray laboratory, of the ordered chips, the user uploads the

new chips information into the system through the form on the Chips Arrival page.

The fields Chip Type, Batch number, amounts of Packs, Chips per Pack (default is 5),

Arrival and Expiry date have to be filled in by the user. The Arrival date will be set to

that days date and the Expiry date is set to one year by default, but can be changed it.

The calendar feature can be accessed by clicking on the Calendar icon to aid setting

the different dates entered during the microarray workflow.

-59-

The Chips Arrival Form (figure 25) has been designed to be short and simple in order

to facilitate the uploading of new shipments that may include several different

batches. Following completion and submission of the form, the system will perform a

set of validations which include; checking that the batch number inserted has not been

previously entered on that same date and the batch number does not belongs to an

existing batch of different chip type. Batch sizes produced by Affymetrix are large, so

it is possible that packs with the same batch number can arrive on different dates.

Figure 25. Snapshot of the Chips Arrival page. Through this form new chips are uploaded into the system. After submitting the new Chip information the system stores these information into

the Chips and Batches tables and a message of success or error is displayed on the

error communication area above the form. The uploaded information will now appear

in the ‘Chips in Stock’ page.

Assignation of Chips to Samples

When designing a microarray experiment, it is important to understand and account

for many possible sources of noise and variation. These sources may contribute to

-60-

problems such as measurement error, confounding, elevated false positives and false

negatives rates, and bias association. To different areas of variation can be identified;

technological and biological. Technological variation may occur due to imprecise

quantification of total RNA sample, variation between microarray batches, non-

validated laboratory protocols, etc. As mentioned in the microarray technology

chapter, the oligonucleotide Chip manufacturing process may introduce errors,

subsequently batch-to-batch variance may contribute to data bias [Li et al., 2001].

Theoretically, with the development of technology and automation, the technological

variation could be virtually eliminated. Biological variation i.e., variation in gene

expression among genes, cells, cell lines, animals, etc., would remain present, even if

the amount of RNA could be accurately quantified. The two components –

technological and biological – contribute to the total variance in an experiment, and

therefore researchers must account for both of them at the experimental design and

analysis stages [Bobashev et al., 2002].

The process of assigning a Chip to a sample is essential not only to avoid any chip

expiration, is part of the design strategy included to minimise the effects of variability

that can be present in gene microarray experiments. The effects on variability can be

reduced by good experimental design, quality experimental material (i.e. the samples

and microarrays) and competent processing using validated protocols.

As part of the design strategy, a random assignation process is included in the Chip

Tracker application. Randomisation also provides an objective basis for the

appropriateness of certain statistical procedures and control of type 1 errors in

hypothesis tests [Piantadosi, 1997].

Another strategy for minimising the effects of variability is known as blocking or

block design experiments. This technique control the source of variation by grouping

experimental units into internally homogeneous batches and in this way keeps

extraneous experimental conditions uniform within a block [Cochran and Cox, 1992].

The Chip Tracker feature of marking the chips of a certain batch as special, meaning

reserved or blocked, in addition to the special assignation included in the Assign

Chips page, is related to the block design technique to control variation.

-61-

A list of all the unassigned samples is presented in the ‘Assign Chips’ page. In order

to assign a Chip to a Sample for hybridisation the user has to check the box of the

sample(s) they would like to assign and select a type of assignation strategy (i.e.

Random) to be followed by the system. The following is an explanation of every

assignation procedure:

Random: This is the default assignation. The key idea of using Random assignation

of Chips to Samples is to reduce the technological variation which may arise from

using different batches. The assignation operates by firstly identifying if there are

chips, within the chip type batches available, that are within two months from expiry.

If so, these chips will be prioritised for use and assigned to the samples. If none of the

available chips are about to expire, then the chips will be chosen randomly from all

the available batches of the corresponding chip type and assigned to the samples.

Special chips (blocked) are not going to be picked by this type of assignation.

Oldest 1st: If this type of assignation is selected then the system will choose the chips

that are closer to their expiry date (chronologically from the available Chips of the

batches corresponding chip type), and assign them to the samples. Special Chips are

not going to be assigned by this type of assignation.

Special: This assignation strategy will selected from chips which have been marked

as special (from the available special chips of the corresponding chip type) by

choosing from those that are closer to their expiry date and assign them to the

samples. Note that chips have to be previously marked as special in order to be

assigned. As mentioned on the other strategies, chips marked as special are not

assigned through any other assignation type. These chips can understand as blocked or

reserved.

-62-

Figure 26. Snapshot of the Assign Chips page. The Samples marked by the user to be assigned are randomly shuffled before the

chosen Chips (which were selected by the chosen assignation strategy) are assigned to

them. This Sample shuffling before assignation is carried out to reduce even further

any risk which may be caused by batch to batch variation.

The SQL commands used by the system to perform the correct selection of chips to be

assigned can be found in Appendix B. The Appendix also includes other SQL queries

embedded in the different system pages to be consulted.

After the samples have been assigned, the user will navigate back to the Samples in

Stock page where the corresponding batch numbers of the chips assigned to the

samples can be viewed. At this stage all the information is available for the Chip

Team to begin working on the hybridisation process. The Chip status will be updated

following hybridisation by using the ‘Log Chips’ page on the system.

-63-

Log of Hybridised Chips

The lab workers should only use this feature after they have hybridised the

samples. Following hybridisation, the Chip Team requires to update the Chip status

with the hybridisation results. The Log Chips provides a list of all previously assigned

samples. Each sample-chip hybridisation has to be logged with the hybridisation date

and a ‘passed’ (default), ‘failed’ or ‘fail & redo’ result. Samples which have not been

hybridised can be unchecked before submission of the page.

If the hybridisation is successful, the user should check the ‘passed’ checkbox and

enter the Hybridisation Date. If the hybridisation has failed, the user can mark the

sample-chip as ‘failed’ or ‘fail & redo’, set the Hybridisation Date and select a failure

reason to log onto the Chip. If the ‘fail & redo’ option is selected, then the Chip

assigned to that sample will be updated to have a ‘failed’ status and the reason of

failure chosen. In addition, a new sample will be created having the same sample code

with _1 in the end (i.e. 000872_1).

Figure 27. Snapshot of the Hybridised Log Chips page.

-64-

When the result of a Sample-Chip hybridisation is logged in the system, the

corresponding sample and chip will no longer appear in stock pages. Used sample-

chips information can be accessed by the Statistics & Reports page.

Statistics and Custom Reports

This functionality enables the Chip Team to create highly flexible reports from the

Chip Tracker Database.

Figure 28. Snapshot of the Chips Statistics and Reports page. By selecting a number of query fields the user is able to create printable, customised

reports from the database. To create a user custom report a full search form is

available at the top of the Chips Statistics and Reports page for the user to select the

search parameters. This information is then used by the active page to create the

corresponding SQL query on the fly and results are displayed in a table format where

each column represents the attributes collected from the chips and samples tables.

Percentages of the platform’s usage and hybridisation results can also be calculated

and included in the reports.

-65-

Before printing the report the user is able to hide some of the unwanted data columns

of the report by clicking on the arrows at the top of every column. This creates a

simpler and more condensed report that can fit on to the width of a single page.

Clicking on the deselected compressed column will make it visible again.

The feature of statistics and reports is very important to the Chip team as a means of

documenting and summarising the platforms throughput and performance.

-66-

Chapter 8

System Validation of Chip Tracker

System Validation

Commercially-available LIMS have been around since the 1980’s. In addition, many

laboratories have designed, implemented, and maintained in-house LIMS as

mentioned in the LIMS chapter. The heart of any LIMS is the software and like other

laboratory systems, the LIMS software is subject to quality control and quality

assurance checks. In regulatory environments, this associated QA/QC is referred to as

system validation. The primary purpose of system validation is to ensure that the

software is performing in a manner for which it was designed. For example, the

system acceptance criteria should be established and tested against quantifiable tasks

to determine if the desired outcome has been achieved [Turner, 2001]. Chip Tracker

features, such as the accurate accounting of the chips and samples in stock, creation of

purchase orders or the assignation of chips to samples for hybridisation must be

quantifiable and verifiable by the end user. System validation ensures that the entire

system has been properly tested, incorporates the required controls, and maintains and

will continue to maintain data integrity. Laboratories must establish protocols and

standards for the validation process and associated documentation.

Proper validation of a LIMS will allow a laboratory to comply with regulations and

also provide comprehensive documentation on the system that is necessary to

troubleshoot future problems. Validation of complex computerised systems

guarantees the intended use and is therefore an unavoidable requirement. The audit of

the supplier, the user requirement specifications and the acceptance test results

respectively on the software are of special interest within a customised LIMS.

-67-

Control Life Cycle of the project

Regular interaction was maintained with the molecular biologist experts at the

microarray laboratory throughout the project life cycle in order to capture the

computational necessities of managing the information in the microarray lab. We used

Rational Rose, a visual modelling tool based on the Unified Modelling Language

(UML), to visualize, understand, and refine the requirements, but especially to have a

common way to communicate with the experts and to document our development. In

addition to this, during the development of the project we had made regular

presentations of the evolving Chip Tracker prototype to the Bioinformatics, System

Development and Chip Team sections of the company in order to make sure that all

the requirements were satisfactory.

By following a model-driven prototype approach we could develop the Chip Tracker

application in a controlled way and have the consensus of the different departments

reaching the stages of implementation, installation and test successfully.

The life cycle activities of this project were the activities in a typical LIMS life-cycle

which include:

• Management / Project Initiation Phase

• Requirements Phase

• Design Phase

• Implementation Phase

• Installation and Test Phase

Future activities involve Operation and Support Phase.

User Acceptance Test

User Acceptance Test is a document that we develop to enable the users test the

software in real situations. This test allows us to make final adjustments to the

software before its release. We asked the users to take some time to read the

document and to follow the instructions in it. These guide them through a complete

execution of a microarray experiment workflow, and are then able to give us they

feedback about the Chip Tracker application.

-68-

The test is essentially a run through of what is involved in the use of the Chip Tracker

application. It is to ensure that the system works well for the users: navigation buttons

work, dropdown menus are correct and data are being entered and retrieved correctly.

There are questions asked generally requiring a yes or no answer where yes means

that the system is working and behaves properly. Before the user starts the test a

couple of microarray experiments are added into the database. This simulates a wet-

lab researcher who has just entered the information of a microarray experiment into

the ExpAnD system (information that is automatically collected by Chip Tracker) and

submitted the biological samples to the microarray laboratory. A start test script was

created to populate the Chip Tracker database with a new set of samples from the

simulated microarray experiments that it is executed on request for the testing user

just before the start of the test.

As the Chip Team is made of three workers other people involved in the project from

the Bioinformatics and Systems Development teams were asked to take the User

Acceptance Test. This made a total of six different testing users. The feedback

returned was excellent and allowed us to identify improvements to the system. The

users were familiar and agreed with the way the system displayed the information.

This is one of the advantages of following a prototype approach during the system

design. Moreover, the users found the interface very friendly and intuitive to the point

that they did not have to read the system help page in order to use the software for

first time. A copy of the User Acceptance Test document can be found in the

Appendix C.

-69-

Chapter 9

Conclusion and Future Work The aim to develop an integrated system to manage and track chips and samples used

in the microarray laboratory has been achieved. The computational tool presented in

this project provides a solution to the management problems encountered by the

microarray laboratory team and captures the microarray workflow logic, administrate

chip purchase emissions or prepare custom reports and statistics. The Chip Tracker is

a robust and coherent microarray LIMS software that enable a systematic workflow

within the oligonucleotide microarray laboratory. Thus, Chip Tracker will improve

performance and reduce the probability errors added on the microarray experiments.

A special effort was made to create an easy to use environment and a simple user

interface in order to facilitate the system usage and incorporation into microarray

laboratories.

Throughout the testing and deployment phases of the project we have identified

necessary improvements in the system. Where time allows, most of these

improvements were implemented. The actual version of Chip Tracker is GEML

compliant and integrates perfectly with the rest of the systems in the platform or any

other software that follows this standard. Although the Organon platform will

continue operating on GEML, future work on this system should extend the Chip

Tracker XML Parser component to be MAGE compliant.

Although Chip Tracker was specifically developed to guide and manage an

oligonucleotide array workflow which involves one-channel hybridisations only, it

does comply to standards and is thus a portable and flexible application that can be

use as a stand alone microarray LIMS. The task to adapt this system to a two-channel

microarray workflow, where two samples are hybridise to a single chip, should by

relatively straight forward.

-70-

In the future protein chips may supplant DNA chips, as protein expression is as good

(or better) a measure of disease than gene sequences. It is possible therefore that

protein testing may become a technology of choice in understanding disease

aetiology. Microarray software should be able to cope with the constant

improvements in technology. The Chip Tracker has a robust architecture, it is flexible

and complies to standards and is therefore easily adaptable. The XML Parser

component, which complies with the microarray standard of communication and the

way the software interacts with other systems are key issues in its design. This

features enabled us to create a small application to carry out specific tasks that

integrates with the Organon Bioinformatics Infrastructure. The desirable microarray

features of software (flexibility, robustness and data integration) have been

successfully implemented in Chip Tracker.

These are some of the advantages on using Chip Tracker microarray LIMS as part of

the microarray laboratory workflow:

• Organise Chip orders,

• Assign Samples to Chips for hybridisation,

• Create Statistics and Reports,

• Link to the existing systems and automate data collection,

• Improve the planning and organisation of the microarray experiments,

• Lack of Chip wastage,

• Improve accuracy and turnaround times,

• Reduce the probability of errors.

Finally, Chip Tracker microarray LIMS is an attempt to close the gap between

computational and wet research laboratories. This task is one of the bioinformatics

community ultimate challenges, to integrate data from many interrelated sources.

Chip Tracker accomplishes this task by providing the microarray laboratory workers

with an integrated data workflow environment to meet the needs of the already fast

increasing demand of microarray experiments in the new process of drug discovery.

-71-

Bibliography [3rd Millennium] <http://www.3rdmill.com/initiatives/resources.html> [Affymetrix, 1] GeneChip® Technology. Affymetrix Image Library. <http://www.affymetrix.com/corporate/media/image_library/image_library_1.affx> [Affymetrix, 2] GeneChip Arrays for Gene Expression Analysis. Affymetrix Technology. <https://www.affymetrix.com/technology/ge_analysis/index.affx> [Affymetrix, 3] GeneChip® Node System. Affymetrix Support, Instruments. <http://www.affymetrix.com/support/technical/datasheets/node_datasheet.pdf> [Affymetrix, 4] Phylogenetic tree schematic. Affymetrix Products. GeneChip Arrays. <http://www.affymetrix.com/en/images/phylogenetic_tree_exp.gif> [Affymetrix, 5] The Role of the GeneChip® System in Drug Discovery & Development. (2003) Affymetrix Research Community. <http://www.affymetrix.com/support/technical/other/drug_discovery_brochure.pdf> [Amersham, 2002] Scierra™ Microarray Laboratory Workflow System < http://www.amershambiosciences.com > Ref. no. 63-0048-53 Rev–A, 2002-08 [Bobashev et al., 2002]. G.V. Bobashev, S. Das, and A. Das Experimental Design for Gene Microarray Experiments and Differential Expression Analysis. Microarray data analysis II. Kluwer Academic. Publ. Pp. 23-40. [Brazma et al., 2001] Brazma,A., Hingamp,P., Quackenbush,J., Sherlock,G., Spellman,P., Stoeckert,C., Aach,J., Ansorge,W., Ball,C.A., Causton,H.C., Gaasterland,T., Glenisson,P., Holstege,F.C.P., Kim,I.F., Markowitz,V., Matese,J.C., Parkinson,H., Robinson,A., Sarkans,U., Schulze-Kremer,S., Stewart,J., Taylor,R., Vilo,J. and Vingron,M. (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics, 29, 365-371. [Bund et al., 1998] Bund C., Heinemann G.W., Jager B., Trinkler M. Validation of a customized LIMS. Pharm Acta Helv. 1998 Feb;72(6):349-56. [Butte, 2002] Butte,A. The use and analysis of microarray data. (2002). Nat. Rev. Drug Discov. 1, 951-960. [Cochran and Cox, 1992] Cochran W.G. and Cox, G.M. Experimental Designs. New York. Wiley, 1992. [Dopazo, 2002] Dopazo, J Microarray Data Processing And Analysis.Microarray data analysis II. Kluwer . (2002) Academic. Publ. Pp. 43-63.

-72-

[GeneCRC]. The Cooperative Research Centre for Discovery of Genes for Common Human Diseases (Gene CRC) <http://www.genecrc.org/> [Hoffman, 2000] Hoffman M. Rosetta Biosoftware April 11, 2000 communiqué. <http://xml.coverpages.org/geml.html> [Kokocinski et al., 2003] Felix Kokocinski, Gunnar Wrobel, Meinhard Hahn, and Peter Lichter QuickLIMS: facilitating the data management for DNA-microarray fabrication Bioinformatics 2003 19: 283-284 [Lao et al., 2002] Lao H. Saal, Carl Troein, Johan Vallon-Christersson, Sofia Gruvberger, Åke Borg and Carsten Peterson BioArray Software Environment: A Platform for Comprehensive Management and Analysis of Microarray Data Genome Biology 2002 3(8): software0003.1-0003.6 [Li et al., 2003] Jiang Li and Jeffrey A. Johnson Comparative Studies Using cDNA vs. Oligonucleotide Arrays An introduction to Toxicogenetics. Burczynski. CRC Press. 2003; Pp. 17-28 [Lockhart et al., 1996] Lockhart,D.J., Dong,H., Byrne,M.C., Follettie,M.T., Gallo,M.V., Chee,M.S., Mittmann,M., Wang,C., Kobayashi,M., Horton,H., and Brown,E.L. (1996). Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675-1680. [McConnell et al., 2002]. Patrick McConnell, Kimberly Johnson, and David J. Lockhart. An Introduction to DNA Microarrays. Microarray data analysis II. Kluwer . (2002) Academic. Publ. Pp. 9-21. [McIndoe et al., 2003] Richard A. McIndoe, Aaron Lanzen, and Kimberly Hurtz MADGE: scalable distributed data management software for cDNA microarrays Bioinformatics 2003 19: 87-89. [Piantadosi, 1997] Piantadosi S. Clinical Trials: A methodological perspective. New York: John Wiley, 1997. [Schena et al., 1995] Schena M, Shalon D, Davis RW, et al. Quantitative monitoring of gene expression patterns with complementary DNA microarray. Science 1995;270;467-470. [Searls, 2000]. David B. Searls, Using bioinformatics in gene and drug discovery, Drug Discovery Today 5 (4) (2000) pp. 135-143. [Spellman et al., 2002]. Spellman,P.T., Miller,M., Stewart,J., Troup,C., Sarkans,U., Chervitz,S., Bernhart,D., Sherlock,G., Ball,C., Lepage,M., Swiatek,M., Marks,W.L., Goncalves,J., Markel,S., Iordan,D., Shojatalab,M., Pizarro,A., White,J., Hubley,R., Deutsch,E., Senger,M., Aronow,B.J., Robinson,A., Bassett,D., Stoeckert Jr,C.J. and Brazma,A. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biology, 3(9), research0046.1-0046.9.

-73-

[Stoeckert et al., 2002] Stoeckert CJ, Causton HC, Ball CA. Microarray databases: standards and ontologies. Nat Genet. 2002 Dec;32 Suppl 2:469-73. [Turner et al., 2001] Turner E, Bolton J. Required steps for the validation of a Laboratory Information Management System. Qual Assur. 2001 Jul-Dec;9(3-4):217-24.

-74-

Appendix A GEML .xml example file This is an example of a Samples file imported by Chip Tracker: <?xml version="1.0" encoding="UTF-8" ?>

- <project name="Ovarian cycle induced immature mice" id="23" date="20-SEP-2002" by="Anjo Van Heijst"

organization="Organon Laboratories Ltd">

- <prep code="000702" name="000702" date="24-JUL-2003" prepared_by="Anjo Van Heijst" type="total RNA"

method="Trizol" description="Prep contains only 6 ug. David Pulford gave permission for this small amount."

cells_per_ml="na" prep_ug="6" prep_conc_ug_ul="1.0" location="na" reference_text="5205-750/018/19/02BB23"

scientist="Anjo Van Heijst">

- <treatment name="Humegon treatment 48h" step_number="1" media="na" volume="" volume_units="" temperature=""

temperature_units="" duration="2880" wash_before="false" description="Humegon treatment 48h">

- <treatment_compound concentration="7.5" concentration_units="active units">

<compound_reference code="63" />

<solvent_reference name="water" />

</treatment_compound>

</treatment>

</prep>

<shipment code="950" shipped_by="Anjo Van Heijst" ship_date="30-JUL-2003" delivered_by="" />

- <resource code="000702" type="oRNA" description="" init_volume="6" init_concentration="1.0" data_file="BioSizing_Total-

RNA-Nano_00269_2002-09-13_15-30-56_Partial.cld" qc_data_file="" qc_flag="true">

<shipment_ref code="950" />

<pattern_ref name="MG_U74Av2" />

<pattern_ref name="MG_U74Bv2" />

<pattern_ref name="MG_U74Cv2" />

</resource>

<other name="project_group" value="Female Contraception" />

</project>

Figure A1. This tree represents the LIMS Category of GEML 2.2

Example of SQL command to insert the parsed data into the Samples table: INSERT INTO SAMPLES (sample_id, chiptype_id, shipment_id, labbooknumber, labbookpage, experiment, project, projectgroup, esponsable, shipmentdate, status) VALUES (prep code, pattern_ref name, shipment code, prep reference_text *first entrance (XX/./.), rep reference_text *second entrance(./XX/.), prep reference_text *third entrance (././XX), project name, other value, shipment hipped_by, shipment ship_date, 'standby');

-75-

Appendix B SQL commands The following are some examples of SQL commands embedded in the system pages: Login Page: str_SQL = "select user_id, fname, sname from chipteamusers where username='" & sUserName & "' AND password='" & sPassword set rsChip = cnChipTracker.Execute(str_SQL)

Chips in Stock: ' SELECTION OF THE ALL THE DIFFERENT CHIPTYPES INSTOCK '______________________________________________________ strSQL = "select * from chiptypes where chiptype_id IN (select chiptype_id from samples where status='standby' )" & _ " OR chiptype_id IN (select chiptype_id from chips where status='standby' ) " & _ " OR chiptype_id IN (select chiptype_id from orderedchips) " set rsChipsTypes = cnChipTracker.Execute(strSQL) 'COLUMN WITH TOTAL NUM OF CHIPS IN A BATCH '_________________________________________ strSQL = "select sum(total) from batches where batch_ID = '" & rsBatches(0) 'SELECT AVALIABLE CHIPS '_________________________________________ strSQL = "select count(*) from chips where chiptype_ID=" & rsChipsTypes(0) & " AND status='standby' AND sample_id is null " & _ "AND special is null AND batch_ID='" & rsBatches(0) 'SELECT SPECIAL CHIPS '_________________________________________ strSQL = "select count(*) from chips where chiptype_ID=" & rsChipsTypes(0) & " AND status='standby' AND special is not null " & _ " AND batch_ID='" & rsBatches(0) 'SELECT ASSIGNED CHIPS '_________________________________________ strSQL = "select count(*) from chips where chiptype_ID=" & rsChipsTypes(0) & " AND status='standby' AND sample_id is not null " & _ " AND batch_ID='" & rsBatches(0) 'SELECT BATCH MONTHS FROM EXPIRY '_________________________________________________ strSQL = "select months_between( expirydate, SYSDATE) FROM DUAL, BATCHES where batch_ID = '" & rsBatches(0) ' SELECTION OF THE ALREADY ORDERED CHIPS '_________________________________________ strSQL = "select amount from orderedchips where chiptype_ID=" & rsChipsTypes(0) ' CALCULATION OF TOTAL CHIPS TO BE ORDERED '_________________________________________ ' NOT ASSIGNED SAMPLES strSQL = "select count(*) from samples where chip_id is null AND assigndate is null " & _ " AND status NOT IN ('passed','fail') AND chiptype_ID=" & rsChipsTypes(0) ' NOT ASSIGNED CHIPS strSQL = "select count(*) from chips where sample_id is null AND assigndate is null " & _ " AND status NOT IN ('passed','fail') AND chiptype_ID=" & rsChipsTypes(0)

Purchase Order strUpdate_SQL = "UPDATE ORDEREDCHIPS SET amount=" & auxTotalOrdered & " WHERE CHIPTYPE_ID =" & chiptypeLoop

Samples in Stock: strSQL = "select * from samples where status='standby' ORDER BY sample_id"

Sample: strSQL = "select sample_id, chiptype_id, shipment_id, labbooknumber, labbookpage, experiment, " & _ " project, projectgroup, responsable, comments, status, chip_id, to_char(assigndate), to_char(shipmentdate) " & _ " from samples where sample_id='" & auxSampleID

Chips Arrival: strInsert_SQL = "INSERT INTO BATCHES " & _ "(BATCH_ID, CHIPTYPE_ID, ARRIVALDATE, EXPIRYDATE, STATUS, PACKS, " & _ "CHIPSPERPACK, TOTAL) VALUES ('" & txtbatchID & "', " & txtChipType & ", '" & txtArrival & "','" & txtExp &_ "','instock'," & txtPacks & ", " & txtChips & ", " & auxTotalChips & ")"

-76-

strInsert_SQL = "INSERT INTO CHIPS " & _ "(CHIP_ID, CHIPTYPE_ID, DESCRIPTION, ARRIVALDATE, BATCH_ID, EXPIRYDATE, " & _ " STATUS) VALUES ( CHIP_ID_SEQ.NextVal, " & txtChipType & ", '" & txtDesc & "', '" & txtArrival & _

"', '" & txtbatchID & "', '" & txtExp & "', 'standby')"

Assignation of Chips to Samples: ' SELECTION OF CHIPS TO ASSIGN '____________________________________________________________________ 'OLDEST FIRST ASSIGNMENT '____________________________________________ if txtAssignation="oldest" then str_SQL = "select chip_id from chips where chiptype_id=" & chiptype_id &" AND status='standby' AND sample_id is null " & _ " AND special is NULL AND expirydate >= SYSDATE ORDER BY expirydate asc" end if '____________________________________________ 'SPECIAL ASSIGNMENT '____________________________________________ if txtAssignation="special" then str_SQL = "select chip_id from chips where chiptype_id=" & chiptype_id &" AND status='standby' AND sample_id is null " & _ " AND special='Y' AND expirydate >= SYSDATE ORDER BY expirydate asc" end if '____________________________________________ 'RANDOM ASSIGNMENT '____________________________________________ if txtAssignation="random" then str_SQL = "select chip_id from chips, DUAL where chiptype_id=" & chiptype_id &" AND status='standby' AND sample_id is null " & _ " AND special is NULL AND expirydate <= sysdate + 61 AND expirydate >= SYSDATE ORDER BY expirydate asc" set rsChipAux = cnChipTracker.Execute(str_SQL) 'IF THERE ARE NO CHIPS OF THIS CHIP TYPE IN WITHIN 2 MONTHS FROM EXPIRY 'THEN A RANDOM SELECTION OF THIS CHIPTYPE IS DONE

if rsChipAux.EOF then cnChipTracker.Execute( " CREATE OR REPLACE VIEW random_chips AS SELECT DBMS_UTILITY.GET_HASH_VALUE (TO_CHAR(SYSDATE, 'HH24:MI:SS')||chip_id,2,32768) as RANDOM_ORDER, chips.* FROM chips, DUAL" )

str_SQL = " select chip_id from random_chips where chiptype_id=" & chiptype_id &" AND status='standby' AND sample_id is null " & _ " AND special is NULL AND expirydate >= SYSDATE ORDER BY RANDOM_ORDER" end if

end if '____________________________________________________________________ 'ASSIGNATION OF CHIPS TO SAMPLES '____________________________________________ strUpdate_SQL = "UPDATE CHIPS SET SAMPLE_ID='" & sample_id & "', ASSIGNDATE='" & oraToday & "', " & _ " PROJECT='" & auxProject & "' WHERE CHIP_ID=" & auxChip_id cnChipTracker.Execute(strUpdate_SQL) strUpdate_SQL = "UPDATE SAMPLES SET CHIP_ID=" & auxChip_id & ", ASSIGNDATE='" & oraToday & "' " & _ " WHERE SAMPLE_ID='" & sample_id & "' AND CHIPTYPE_ID=" & chiptype_id cnChipTracker.Execute(strUpdate_SQL)

Log Hybridised Chips: ' CHIP-SAMPLES TO LOG SELECTION '____________________________________________ strSQL = "select * from chips where sample_id IS NOT NULL AND assigndate IS NOT NULL AND status='standby' ORDER BY sample_id" ' LOG CHIP-SAMPLES AS PASSED '____________________________________________ strUpdate_SQL = "UPDATE CHIPS SET STATUS='passed', HYBDATE='" & auxHDate & "', " & _ " USED='Y', USER_ID=" & _ session("UserID") & " WHERE CHIP_ID=" & auxChip_id strUpdate_SQL = "UPDATE SAMPLES SET STATUS='passed' WHERE SAMPLE_ID='" & auxSample_id & "' AND CHIPTYPE_ID=" & _ auxChipType_id ' LOG CHIP-SAMPLES AS FAILED AND REDO '____________________________________________ strUpdate_SQL = "UPDATE CHIPS SET STATUS='failed', HYBDATE='" & auxHDate & "', " & _ " USED='Y', USER_ID=" & session("UserID") & ", FAILREASON='" & auxReason & "' " & _ " WHERE CHIP_ID=" & auxChip_id strUpdate_SQL = "UPDATE SAMPLES SET STATUS='redo' WHERE SAMPLE_ID='" & auxSample_id & "' AND CHIPTYPE_ID=" & _ auxChipType_id strInsert_SQL = "INSERT INTO SAMPLES(sample_id,chiptype_id,shipment_id,labbooknumber,labbookpage,experiment,project," & _ "projectgroup,responsable,comments,status)values('" & auxNameSample & "'," & auxChipType_id &"," & _ rsSample(2) &",'"& rsSample(3) &"'," & _ rsSample(4) & ",'" & rsSample(5)& "','" & rsSample(6) & "','" & _ rsSample(7) & "','" & rsSample(8) & "','" & rsSample(9) & "','standby')"

Report Query (made on the fly): strSQL = " select c.batch_id,c.chiptype_id,to_char(c.arrivaldate),to_char(c.expirydate),c.status,c.sample_id,to_char(c.hybdate)," & _ " c.user_id,c.special,c.chip_id,c.failreason,s.project,s.projectgroup,s.responsable,s.experiment,s.labbooknumber,s.labbookpage from chips c, samples s where 1=1 AND c.chip_id=s.chip_id(+) " & _ queryChipType & queryStatus & querySpecial & queryArrival & queryExpiry & queryHybraF & queryHybraT & queryBatch & _ " ORDER BY " & queryOrder

-77-

Appendix C Chip Tracker User Acceptance Test


August 2003

The Chip Tracker is a Microarray Laboratory Information Management System (Microarray LIMS). It provides the microarray laboratory workers with a web accessible tool, which enables them to manage and track the submissions of biological samples and microarray chips used in the microarray laboratory.

Scientists that use the microarray facility must first enter details of their experiment into the Expand (Experiment Annotation Database) database. This data is used to populate the Resolver database and to inform the Chip Team of samples that are sent to Newhouse. The Chip Tracker application also collects the Expand data via an automated process on an hourly basis.

The Chip Tracker web pages allow a user to manage and track the microarray experimental workflow from any computer in the Intranet. The Chip Tracker manages many aspects of the experimental workflow such as:

• Automatic assignation of samples to chips for hybridisation using a choice of randomisation processes

• Chip fault logging facility • Dynamic creation of statistical reports on the platform usage

The Chip Tracker Microarray LIMS has been under development at Organon Newhouse since May this year. The Chip Team experts have been involved throughout the life cycle of this project to ensure that their requirements of the application where met.

This document is the "User Acceptance Test". It is intended to enable end users to test the software in real situations. It will allow the development team to make final adjustments to the software before its release in September 2003. Please take some time to read this document and to follow the instructions in it. It may take about an hour to complete, and can be done over several days if necessary.

Please email your completed document to [email protected] by Wednesday 20 of August.

Contact details: Patricio Yankilevich, Bioinformatics, Newhouse.

Email: [email protected]

-78-

1. Some definitions:

The objects involved in the microarray experiment workflow outlined below, and their relationships are illustrated in the following figure

Sample

The aliquot of RNA that is sent to the chip team for the hybridisation. A collection of Samples makes an Experiment, which is part of a Project. A Sample always has:

• a number known as Prep Code (this is from the barcode on the tube), • an experiment, • a Research Group, • a Responsible scientist, • a LabBook number and LabBook page of reference, • a Chip type hybridisation request, • other attributes.

While cRNA produced from your sample can be hybridised to more than one Chip, it cannot be hybridised to the same array type twice. Samples are known as Preps in the ExpAnD system.

Chip

On receiving your sample the Chip team will process it for hybridisation to the GeneChip array you have selected. GeneChip is the commercial name of the Affymetrix microarray chips. The assignation of Chips to Samples for hybridisation

-79-

can be controlled by the Chip Tracker application. Chips have the following attributes:

• Batch number • Chip Type (Human Genome U133 A, Human Genome U133 B, Rat Genome

U34 A, Rat Genome U34 B, Rat Genome U34 C, etc…) • Arrival and Expiry dates • Other attributes.

2. The Test

Please contact us ([email protected]) before you start the test so that we can we can ensure that test data is available for you to work on. We have created a Start Test script, which populates the Chip Tracker database with a new set of samples from a microarray experiment that will be run on your request just before you start testing.

This test is essentially a run through what is involved in the use of the Chip Tracker application. It is to ensure that the system works well for the users for example: navigation buttons work, dropdown menus are correct and data are being entered and retrieved correctly. The questions asked generally require a "yes" or "no" answer. If the system is working (a ‘yes’ answer) correctly please check the box after the question. If you answer no to a question please leave the box unchecked.

2.1 Access to the Chip Tracker Web Site

This test is to ensure that the user can enter the Chip Tracker website.

1. Open a Microsoft Internet Explorer browser session. Go to the Chip Tracker login URL, http://morpheus/chiptracker/

Does this take you to the Chip Tracker Login page?

2. From the Chip Tracker Login page enter your User Name and Password provided with this test and click “Login”.

Have you been able log into the system?

Can you see you Name and log status on the top right corner of the screen?

Does this take you to the Chips in Stock (default home page)?

Pass:

Fail:

Comments:

-80-

2.2 Chips in Stock

This test is to check the management of Chips in Stock is working

1. A successful login will open the Chips is Stock page. There you can see the chip availability in the microarray laboratory and the chips that are requested.

Do you see each Chip Type, showing no Batches and zero availability?

Do you see on the far right column the number of Chips to be ordered per Chip Type?

2. Choose the Samples in Stock option on the navigation frame on the left of your screen. Check that the total amount of Preps (see bottom of the list of samples) present in stock is equal to the total amount of Chips to be order in the Chips in Stock page.

Do you understand the relationship between the Samples in Stock and the Chips to be ordered?

3. Go back to the Chips in Stock page and create a Request for Purchase Order of the chips required to start the “Ovarian cycle induced immature mice” Project which involves Murine Genome Chips only by clicking the Create Purchase Order icon. Modify the Purchase Form to order Murine Chips A, B and C only and not the Human ones by erasing the Human Genome rows of the form. Fill the rest of the form and click on the Accept & Print icon.

Do you see that the amount of Packs has been calculated automatically?

Did you notice that you are able to modify the Purchase Order to include only those chips you wish to order?

Could you print out the Purchase Order correctly?

Were you able to avoid the header and footer from being printed in your report?

Do you see in the Chips in Stock page that there are no more Chips to be Order of the Murine type, but now this amount appear as Ordered?

Do you see that the Human Chips still to be Ordered?

Note: If you like to avoid the annoying header and footer from appear in your printed Purchase Order go to the File menu in your Browser (top left of your screen) and select the Page Setup option, then erase what is appearing in the Header and Footer boxes and accept. Now the next time you print an Order the header and footer will not appear.

Pass:

Fail:

Comments:

-81-

2.3 Samples in Stock

This test is to check the management of Chips in Stock is working

1. Choose the Samples in Stock option on the navigation frame on the left of your screen.

Do you see a list of all Samples in stock showing sample code, Chip type requested, Project, Group, Experiment and Responsible for each sample?

2. The last column on the right shows a batch number in case that the sample had been assign to chip for hybridisation. It should be empty at this stage. Now, click on any the sample codes in order to get additional info.

Do you see the Sample page showing further info about that Sample?

3. Go back to Samples in Stock, suppose that one of the Human samples (e.g. 000875) does not passed the QC done on the arriving to the Microarray Lab, so you decide to get in touch which the responsible researcher to order a new sample and delete this sample from the system. So, go ahead and delete the sample by click into the sample code, this will take you to the Sample page, mark both preps of the sample to be deleted and delete them.

Did you succeed in the task of deleting a useless sample?

Do you see that this sample is not shown on the Samples in Stock list anymore?

Pass:

Fail:

Comments:

2.4 Chips Arrival

This test is to check that the process of loading arriving batches of chips into the system is working

1. Choose the Chips Arrival option on the navigation frame on the left of your screen. Lets say that three different batches of 1 pack of 5 chips each arrived with the Murine Genome Chips. The corresponding batches are:

Chip Type Batch Number Packs Chips Expiry Date Murine Genome A 0000002100 1 5 20 May 2004 Murine Genome B 0000002101 1 5 25 May 2004 Murine Genome C 0000002102 1 5 14 July 2004

Now go ahead and load the every batch, one at a time, into the system. Note that the Expiry Date is been set to one year by default but you may have to change it. If you ever use the calendar feature by clicking on the calendar icon please note that the calendar will be showing the actual date and you may have to navigate to next year to set an expiry date.

-82-

Were you able to load the batches?

Could you use the calendar?

2. Try to load a new batch into the system using the number 0000002100 for a different Chip Type. Now try to load a new batch into the system using the number 0000002100 and Murine Genome A.

Do you find that the validations are correct?

3. Go to the Chips in Stock page and check that all the chips that you just loaded are in Stock, which mean that they are available. Please note that used chips are not going to be shown in the Chips in Stock page for the simple reason that they are not available anymore.

Do you notice that once the Murine chips arrived they are not shown as Ordered anymore?

Can you click at the Chip Type name and see the details of the available batches?

Pass:

Fail:

Comments:

2.5 Assign Chips

This test is to check the Assignation of Chips to Samples process is working

1. Choose the Assign Chips option on the navigation frame on the left of your screen. Now that you have Murine Chips in stock you can start the experiment but first you have to assign a Chip to Sample. In this page you have a list of samples that has not been assigned to a chip yet. Go ahead and mark the samples that you would like to hybridise, remember that there are Murine chips in stock only so you would not be able to assign chips the Human samples for example.

There three different types of assignation Random, Oldest 1st and Special where

Random: If there are any chips of the corresponding chip type available that are within two months from expiry then these chips are chosen be assign to the samples, but if non of the available chips of the corresponding chip type are about to expiry then the chips will be chosen randomly from the available batches of the corresponding chip type and assign to the samples. Special chips are not going to be assigned by this type of assignation.

Older 1st: The system will choose the chips that are closer to expiry date from the available chips of the corresponding chip type, and assign them to the samples. Special chips are not going to be assigned by this type of assignation.

Special: The system will choose chips marked as special that are closer to expiry date from the available special chips of the corresponding chip type, and assign them to the samples. Note that chips have to be previously marked as special in order to be assign. Chips marked as special would not be assigned through any other assignation type, so we can understand these chips as reserved.

-83-

2. Now go ahead and assign chips to the preps in the sample 000702 with the default assignation type (Random)

Were you able to assign the samples?

Try to assign other samples with Special assignation; Do you get an error message?

Go to Samples in Stock, Do you see the samples assigned to the corresponding batches?

Click into the 000702 link to see the detail of the assigned preps, Can you see it?

Go to Chips in Stock, Do you see the chips assigned and a reduction on Chips in Stock?

Go back to Assign Chips, Do you see the assigned samples are not present anymore?

3. Random Assignation. Now we are going to test the randomness of assignation, in order to do this you have to had a couple of batches more per Murine chip type into the system from the Chip Arrival page. Add batches: 0000002103 and 0000002104 of Murine Genome A 0000002105 and 0000002106 of Murine Genome B 0000002107 and 0000002108 of Murine Genome C all batches with the default expiry date. Go back to the Assign Chips page and assign the preps of the sample 000703 with the Random type of assignation.

Were you able to assign the samples?

Go to Samples in Stock, Do you see the samples 000703 were assigned randomly to the batches of the corresponding chip type?

Go to Chips in Stock, Do you see the chips assigned and a reduction on Chips in Stock?

Go back to Assign Chips, Do you see the assigned samples are not present anymore?

4. Now go to the Samples in Stock page and click into the 000703 link to see the sample detail. Check every sample to be Unassigned and click on the Unassign button.

Can you unassign the preps in the 000703 sample?

Go to Samples in Stock, Do you see that the preps in the 000703 sample are now unassigned?

Go to Chips in Stock, Do you see an increase of chips in Stock and a reduction of assigned?

Go back to Assign Chips and assign preps in sample 000703 randomly again. Do you see that the batches assigned are probably not the same as before?

5. Oldest 1st Assignation. In order to test this type of assignation we will unassign a prep first and then we will add a new batch that is about to expire. Go to the Samples in Stock page and click into the 000703 link to see the sample detail and unassign prep Murine Genome A. Now add a batch 0000002109 containing Murine Genome A chips that has a expiry date in the next week for example. Now go to the Assign Chips page, check prep Murine Genome A of the sample 000703 to assign, choose the Oldest 1st assignation type and assign.

Were you able to assign the sample?

Go to Samples in Stock, Do you see the prep Murine Genome A of the sample 000703 is assigned to the 0000002109 batch?

-84-

6. Special Assignation. In order to test this type of assignation you should first mark some chips as special. But now we will start working with the samples of the other project in stock “Validation of the HepG2 celline” which involves Human Genome chips. So now go to the Chips in Stock page and from there create a Purchase Order of Human Genome Chips. Once you print out the order we can assume that times goes fast and now the chips had arrived, so please load into the system through the Chips Arrival page the following batches:

Chip Type Batch Number Packs Chips Expiry Date Human Genome A 0000003000 1 5 20 June 2004 Human Genome B 0000003001 1 5 25 June 2004

Now go to the Chips in Stock page click on the Human Genome A chip type to get to the Chip Type detail page. From this page you are able to mark some of the available chips as Special (which as we mention earlier will make this chips reserved). Go ahead and mark 3 of the chips as Special, do the same for some of the chips of the Human Genome B.

Were you able to mark some chips as Special?

Go to Chips in Stock, Do you see that these chips had been subtracted from the in Stock because now are reserved and not available anymore?

Go to Assign Chips, and assign samples 000872 and 000873 using a Special type of assignation. Were you able to do the special assignation?

Go to Chips in Stock, and click on the Chip Type link to go to Chip Type detail page of the Human Genome chips and set free the rest of special chips that are not assigned. Were you able to free (unreserved) those chips?

Pass:

Fail:

Comments:

HIBRIDISATION TAKES PLACE AT STAGE

(Rosetta Resource Tracker aids in this task)

2.6 Log Chips

This test is to check the logging of hybridised chips process is working

1. Choose the Log Chips option on the navigation frame on the left of your screen. You should be able to see a list of all the assigned samples. The lab worker should only use this feature after he has the hybridisation results. Go ahead and log some of the hybridised chips, every sample-chip hybridised appears as “Pass” by default. If you do not have the hybridisation results yet you can unmark and leave a blank to be filled later. If the hybridisation process failed you can mark the chip as “Fail” or “Fail & Redo” select a failure reason to chip. If you chose “Fail & Redo” on for example the sample 000702 hybridised with a chip of the batch 0000002100 then the chip assigned to that sample will be marked as fail and a new sample 000702_1 will be created to be hybridise again.

Were you able to log the chips?

-85-

Go to Chips in Stock, Do you see that these chips had been subtracted from the assigned chips because now are used and not available anymore?

Go to Samples in Stock, Do you see that the logged samples are there anymore because they have been used already?

Go to Samples in Stock, Do you see the new samples for the sample-chips you marked as “Fail & Redo”?

Pass:

Fail:

Comments:

2.7 Chips Statistics

This test is to check the reports and statistics are working

1. Choose the Chips Statistics option on the navigation frame on the left of your screen. You should be able to see a form to let you create queries on the Chip Tracker Database. By filling the conditions on what you would like to query you are able to create customise reports to print. Now go ahead and do you first report by listing all the Murine Genome Chips that has been used in the facility. Make sure that to selected the Murine Chip Type, the chips with “Passed” and “Failed” status and not the ones on “Standby”. Now click on the Run Query button.

Were you able to see the report?

Can you see that you are able to hide columns of the report by pressing on the arrows? By pressing in the remaining part of the column you are able to unhide it.

2. Now try a different query, this time choose All Chip Types, all the chip status “Standby”, “Passed” and “Failed”, and mark the “Percentage” option. Now run the query.

Were you able to see the percentages at the top of the report?

3. Now play around with different queries by changing the dates or query specifically on a batch number. Also print some of you reports, note that if you are not hiding any of the columns you will have to set you printer to landscape page orientation.

Could you print your reports?

Have you notice that query conditions are shown at the top of the report?

4. If you like to avoid the annoying header and footer from appear in your reports go to the File menu in your Browser (top left of your screen) and select the Page Setup option, then erase what is appearing in the Header and Footer boxes and accept. Now print another report.

-86-

Were you able to avoid the header and footer from being print in your report?

Pass:

Fail:

Comments:

Feel free to continue using the application until you get used to it. Do not be afraid of trying every option and play around, this is the time to do it.

Please now give a brief assessment of the Chip Tracker tool.

3 Chip Tracker Overall Assessment

This is to check the overall

For each question, tick the box for yes, or detail the problem below in Comments.

1. Did concepts of Samples, Chips and the way the info is presented make sense?

2. Considering that no training has yet been given, did you find the Chip Tracker easy to use?

3. Was the speed of (internet/database) connection satisfactory?

4. Did the Chip Tracker environment (window sizes, text colour and sizes) display well on your screen?

5. Were the Help pages of use?

Please give any other criticism (positive or negative) below in Comments. Also, feel fee to contact us with any questions.

Pass:

Fail:

Comments:

chip tracker: a microarray laboratory information management system

Documents