taverna and soaplab experience @ elda rossi – cineca (italy)
Post on 01-Jan-2016
219 Views
Preview:
TRANSCRIPT
What is CINECA
Cineca is a consortium of Italian Universities and CNR
Funded in 1969, now under the control of Research and University Ministry
ResourcesThe most important The most important national infrastructure in national infrastructure in Italy for the computational Italy for the computational support to scientific support to scientific researchresearch
Mission:
promoting the use of the most advanced computing systems to support public and private scientific and technological research
R & Bioconductor
Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.
It is based on R , a language and environment for statistical computing and graphics.
R & Bioconductor
BioConductor is a collection of “packages”
Two main types: 1. provides basic infrastructure support.
2. Provides innovative methodology We chose a function in the affy package
(type 2. )
The affy package
Package: affyDescription: The package contains functions for exploratory oligonucleotide array analysis. The dependance to tkWidgets only concerns few convenience functions. 'affy' is fully functional without it.Version: 1.5.8-1Author: Rafael A. Irizarry , Laurent Gautier , Benjamin Milo Bolstad , and Crispin Miller with contributions from …Maintainer: Rafael A. Irizarry Dependencies: R (>= 1.9.0), Biobase (>= 1.4.22), reposToolsSuggests: tkWidgets (>= 1.2.2), affydataSystemRequirements: NoneLicense: LGPL version 2 or newerURL: None available
Function: Expresso . From raw probe intensities to expression values
The expresso function
Expression measuresThe most common operation is certainly to convert
probe level data to expression values.
1. reading in probe level data 2. background correction 4 methods3. Normalization 7 methods4. probe specific background correction, e.g.
subtracting MM 3 methods5. summarizing the probe set values into one
expression measure and, in some cases, a standard error for this summary 5 methods
How to run expresso
$ R> library(affy)> data<-ReadAffy()> data.mas<-expresso(data,bgcorrect.method="mas", pmcorrect.method="mas", normalize.method="constant", summary.method="medianpolish")
> write.exprs(data.mas,file=“Data.out")
Data.CEL Data.out
$ R CMD BATCH scriptlibrary(affy)data<-ReadAffy()data.mas<-expresso(data,bgcorrect.method="mas", pmcorrect.method="mas", normalize.method="constant", summary.method="medianpolish")write.exprs(data.mas,file=“Data.out")
Report
script
The files
[CEL]Version=3
[HEADER]Cols=126Rows=126TotalX=126TotalY=126Baseline=Not normalizedDatHeader=ctrl150:CLS=1167 …
[INTENSITY]NumberCells=15876CellHeader=X Y MEAN 0 0 551.0 1 0 10651.0 2 0 642.0 3 0 10855.0 4 0 278.0 5 0 452.0 6 0 11139.0
Sample001.cel Sample002.cel Sample003.cel
100084_at 2.68016528652511 2.75619854567269 3.82550383255225
101482_at 2.41830136307405 2.19230548692681 3.4173900695363
31962_at 12.3667390890414 12.4534076075796 12.8658623516881
32466_at 12.4078453130306 12.5262787728982 13.2129784659009
35201_at 6.73875347104673 6.36824635919863 7.53465018481639
36189_at 6.91195864883172 6.77835938949316 7.94585515997792
36678_at 10.0269997503136 9.76893096184106 11.1443619988943
37001_at 8.7690698709579 8.57322443505215 9.80956768540462
37029_at 7.58176898579828 7.24297853600119 8.67002397585278
37046_at 4.7250160934765 4.7250160934765 5.68254863921313
37189_at 7.08125646141077 7.0999566997911 7.92512679504857
37719_at 5.33679629782696 5.33679629782696 6.39140386282694
37725_at 7.634367429284 7.41050271151406 8.85664197069339
38437_at 7.54693596951725 7.16216316289552 8.3816810916508
38730_at 7.61959398527742 7.65907193898742 9.00657184492387
39425_at 6.07663839694708 6.03298499862286 7.14769809957403
40276_at 6.33983152588017 6.21300599988174 6.85968858773872
CEL file OUT file
Setting up SoapLab
A linux based server was chosen Tomcat was installed Java was upgraded Axis was installed SoapLab was installed
Vega.cineca.it
Tomcat 5.0.28
Java 1.4
Axis 1.1
SoapLab precompiled
for Suse Linux
Up to here: No Problems !!!
Defining the Application
1. Write the application wrapper
2. Write the ACD file for the application
3. Convert ACD to XML
4. Start up the SoapLab server
5. Deploy the new service
1. Write the application wrapper
#!/usr/bin/perluse Getopt::Long;
# command arguments (with default)GetOptions("bgcorrect=s"=>\$bgcorrect, "normalize=s"=>\$normalize);$bgcorrect="mas" if $bgcorrect eq "";$normalize="constant" if $normalize eq "";
# location of R executable$rexe="/biotools/R/R-2.1.0/bin/R";
# data directory$datadir=“/biotools/services/data";
# R code to run analysis
open(AFFY,">$datadir/affy"); print AFFY <<EOF ;library(affy)data<-ReadAffy()data.mas<-expresso(data, bgcorrect.method="$bgcorrect", pmcorrect.method="mas", normalize.method="$normalize", summary.method="medianpolish")write.exprs(data.mas,file="data.txt")EOFclose(AFFY);
# now run programsystem "cd $datadir; $rexe CMD BATCH affy";
# print outputopen(OUT,"$datadir/data.txt");while (<OUT>) {print $_;}close(OUT);
/biotools/services/affy-expresso.pl
2. Write the ACD file
appl: bioconductor [ documentation: "affy/expresso function of BioConductor" version: "1.0" groups: "Microarrays" nonemboss: "Y" executable: affy-expresso.pl]string: bgcorrect [ additional: "Y" parameter: "Y" default: "mas"]string: normalize [ additional:"Y" parameter: "Y" default: "constant"]outfile: output [ additional: "Y" default:“stdout"
/biotools/soapbin/analysis-interfaces/metadata/affy.acd
The path is defined in the shell
Input1: Background correction
Input1: Normalization method
Output: standard output
3, 4, 5: Final steps
3. Convert ACD to XML
4. Start up the SoapLab server
5. Deploy the new service
/biotools/soapbin/analysis-interfaces/generator/acd2xml
From: ../metadata/affy.acdTo: ../metadata/microarrays/affy-al.xml
/biotools/soapbin/analysis-interfaces/run-AppLab-server How to shut down the server?
/biotools/soapbin/analysis-interfaces/ws/deploy-web-services
Using the service from Taverna From the Available service window select
Add new SoapLab scavenger and enter our server address http://vega.cineca.it:8082/axis/services
Using the service … (2)
The new processor appears in the microarrays folder you can find the
affy service After connecting input & output ports, the
service can be launched
Problems encountered
Documentation is not so clear and complete How can we transfer (large) files from the
personal WS to the server machine We need a permanent and private data
area for storing data We would like to monitor the service while it
is running (asynchronous services?) How can we return data in addition to
stdOut and stdErr …..
top related