analysis of affymetrix expression data using r on azure cloud anne owen department of mathematical...
TRANSCRIPT
![Page 1: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/1.jpg)
Analysis of Affymetrix expression Analysis of Affymetrix expression data using R on Azure Clouddata using R on Azure Cloud
Anne OwenDepartment of Mathematical Sciences
University of Essex
15/16 March, 2012SAICG Workshop, Oxford
Dr Andrew Harrison, University of EssexDr Hugh Shanahan, Royal Holloway, University of London
![Page 2: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/2.jpg)
IntroductionIntroduction
• The Affymetrix GeneChip• Micro-array data• Venus-C pilot project• R scripts on Azure Cloud• Results to date• Our Experience
![Page 3: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/3.jpg)
• We are developing informatics tools to aid the analysis of Affymetrix chips (GeneChips, Exon arrays).
• Micro-arrays are the data read from GeneChips
Affymetrix GeneChip
• ArrayExpress is an example of a public database containing microarrays and other data from biological experiments
![Page 4: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/4.jpg)
DNA and RNA
![Page 5: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/5.jpg)
Probe cells of an Affymetrix Gene chip contain millions of identical 25-mers
25-mer
![Page 6: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/6.jpg)
Affymetrix GeneChip Hybridization – fragments of RNA stick to the probes
![Page 7: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/7.jpg)
Affymetrix GeneChip Fluorescence
![Page 8: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/8.jpg)
Micro-array datasetsMicro-array datasets
• Fluorescence data put into .cel files• Many 1000’s of experiments• Many 100’s of micro-arrays for each GeneChip• >1Tb data to analyse• 1000’s of published papers using Affymetrix GeneChips
• This data is a free resource to researchers
![Page 9: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/9.jpg)
Going Forward...
• Currently we analyse flaws in Genechip data• Future is new genomic technology known as
‘next generation sequencing’‘next generation sequencing’• Petabytes of data being generated faster than
it can be analysed• Cloud solutions needed for storage of and
access to this data
![Page 10: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/10.jpg)
Venus-C Pilot ProjectVenus-C Pilot Project
• VENUS-C is a project funded under the European Commission’s 7th Framework Programme with computing resources from Microsoft
• Joint co-operation between computing service providers and scientific user communities
• Aim: to develop, test and deploy a large, Cloud computing infrastructure for science and SMEs (small and medium-sized enterprises) in Europe.
![Page 11: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/11.jpg)
![Page 12: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/12.jpg)
Venus-C InfrastructureVenus-C Infrastructure
• 3 main areas dealing with standards:– VM management (OCCI and OVF)– Job submission (BES)– Cloud data storage (CDMI)
• Other specifications, such as– WS-Security
• Programming model:– Task based submission: Generic Worker role
![Page 13: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/13.jpg)
cTQm Project OverviewcTQm Project Overview
BLOB
StoragePublic database
Scripts, R libs and key data uploaded via
Azure webpage
![Page 14: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/14.jpg)
Cloud / Grid InterfacesCloud / Grid Interfaces
Amazon EC2: Amazon EC2: Command line interface into Linux terminal
NGS:NGS: Portal or Command Line to Linux machine
Azure:Azure: Webpage interface to a Windows machine, Visual Studio 2010, C#
![Page 15: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/15.jpg)
![Page 16: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/16.jpg)
Bioinformatics Results to dateBioinformatics Results to date
• Uploading of datasets into Cloud storage is underway• Success with R scripts on Azure to confirm results in
published paper*• Minor problems with ArrayExpress to solve• Work is extending to more GeneChip types• Still need user authentication / accounting
* Nucleic Acids Research, 2011, 1-9, “Normalised Affymetrix expression data are biased by G-quadruplex formation”, by Hugh P. Shanahan, Farhat N. Memon, Graham J. G. Upton and Andrew P. Harrison
![Page 17: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/17.jpg)
Our ExperienceOur Experience
• Azure Cloud is a steep learning curve for a Linux-based scientist
• Vast datasets can be made available• Applications can be user-friendly• Scalability makes Cloud approach attractive• Costs need to be assessed• Enables scientists in developing countries to
perform genome analysis
![Page 18: Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG](https://reader036.vdocument.in/reader036/viewer/2022062511/5514d88f550346935c8b5216/html5/thumbnails/18.jpg)
Acknowledgements and thanks to:-
Dr Andrew Harrison, University of EssexDr Hugh Shanahan, Royal Holloway, University of LondonDepartment of Mathematical Sciences, University of Essex
European Commission’s 7th Framework ProgrammeMicrosoft and Venus-C Venus-C project Organisers
Analysis of Affymetrix expression Analysis of Affymetrix expression data using R on Azure Clouddata using R on Azure Cloud