cool informatics tools and services for biomedical research
TRANSCRIPT
![Page 1: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/1.jpg)
Cool Informatics Tools and Services for Biomedical Research
David Ruau, PhD.August 1st, 2012
Sponsored by the Office of Postdoctoral Affairs andthe Lane Medical Library
@druau
![Page 2: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/2.jpg)
BIG DATA
![Page 3: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/3.jpg)
BIG DATA
![Page 4: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/4.jpg)
Big Data in Biomedicine
http://www.nature.com/news/gene-data-to-hit-milestone-1.11019
![Page 5: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/5.jpg)
We live in a Big Data world
1. Analyzing genomic data1. Traditional bioinformatics tools2. Microarrays/gene lists without any code3. Microarrays/gene lists with code4. NGS and mRNA-seq
2. Beyond genomic1. Protein-protein interaction network
3. General data handling tools 1. Storing your data2. Data are dirty
4. Statistics made easy5. Graphics rules!6. Demystifying “the work”! (the code)7. Conclusion + Q&A
Course outline
![Page 6: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/6.jpg)
We live in a Big Data worldBioinformatics software to solve everyday problems.
The EMBOSS tool suite http://emboss.sourceforge.net/ One web portal is: http://mobyle.pasteur.fr/cgi-bin/portal.py - DNA / AA Pairwise global and local alignment- Sequence feature analysis (CpG island, gene scan, restriction enzyme site,
2D/3D structure...)- Protein structure and domains- Similarity search (Blast, phi-blast, psi-blast, delta-blast...)- Phylogenetics (trees from multiple alignments)- ...
Traditional bioinformatics tools
![Page 7: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/7.jpg)
We live in a Big Data worldBioinformatics software to solve everyday problems.
The EMBOSS tool suite http://emboss.sourceforge.net/ One web portal is: http://mobyle.pasteur.fr/cgi-bin/portal.py - DNA / AA Pairwise global and local alignment- Sequence feature analysis (CpG island, gene scan, restriction enzyme site,
2D/3D structure...)- Protein structure and domains- Similarity search (Blast, phi-blast, psi-blast, delta-blast...)- Phylogenetics (trees from multiple alignments)- ...
Traditional bioinformatics tools
UPGMA joining method
![Page 8: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/8.jpg)
We live in a Big Data worldBioinformatics software to solve everyday problems.
Some tools are provided through databases interface such as NCBI Entrez.- The UCSC genome browser.
- The Encode project results- For example: visualize GC content and restriction enzyme site in your gene
of interest.
Traditional bioinformatics tools
![Page 9: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/9.jpg)
This is not because you have a GUI that the analysis is
brain dead simple.
Stating the obvious
![Page 10: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/10.jpg)
We live in a Big Data worldAnalyzing microarray gene expression microarray without any code.
Gene Pattern: http://genepattern.broadinstitute.org/gp/
Analyzing genomic data
![Page 11: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/11.jpg)
We live in a Big Data worldUpload your expression data as a text file.Gene Pattern takes RES and GCT files. Conversion tools are provided
To transform CEL files to GCT. RES
Analyzing genomic data
![Page 12: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/12.jpg)
We live in a Big Data world
StarBiogene http://web.mit.edu/star/biogene/index.html (java web app)- Part of GenePattern but provide pipeline style process online
SeqExpress http://www.seqexpress.com/ (Windows only)- Alternative independent application (less activity than GenePattern)
Expander http://acgt.cs.tau.ac.il/expander/ - Alternative independent application (less activity than GenePattern)
RMAExpress http://rmaexpress.bmbolstad.com/ - Interesting to perform a quality control of your microarrays.
Cluster http://bonsai.hgc.jp/~mdehoon/software/cluster/ - This is the original program to analyze microarray results. No pre-processing
functionality. You need to pre-process separately (using RMAExpress for example)
SAM http://www-stat.stanford.edu/~tibs/SAM/ (significance Analysis of Microarrays)- To extract the DE genes. This is a Excel plugin. Again, you need to pre-process separately
Analyzing genomic data
![Page 13: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/13.jpg)
We live in a Big Data worldCommercial solutionGenespring GX (first 20 days are free)Access through subscription @ Stanford with CMGM http://cmgm3.stanford.edu
Analyzing genomic data
![Page 14: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/14.jpg)
We live in a Big Data worldInterpreting a gene list rely on external knowledge.Several resources / tools are available to help.
KEGG: http://www.genome.jp/kegg/ pathway database
REACTOME: http://www.reactome.org pathway 2.0 database
Gene Ontology: http://www.geneontology.org/ the ultimate resource for gene function,
processes, localizationBioMart: http://www.biomart.org/
Portal providing access to multiple databaseGSEA: http://www.broadinstitute.org/gsea/index.jsp
part of GenePattern but also RDavid: http://david.abcc.ncifcrf.gov/
to perform an over-representation analysisBingo: http://www.psb.ugent.be/cbd/papers/BiNGO/
Home.html over-representation analysis but produce
graphical result (cytoscape)BioGPS: http://biogps.org/
To know where your gene is expressed in the body or which cell line
Interpreting your results
![Page 15: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/15.jpg)
We live in a Big Data worldReactome• Made to be used programmatically
• Cytoscape (a network tool) has a plugin for Reactome.Just give a gene list or a list of gene + the number of sample where the gene is
mutated (for Cox survival analysis)
Interpreting your results
- Retrieve a network from a gene list- Do network analysis- Perform Gene Ontology analysis- Survival analysishttp://www.reactome.org/userguide/Usersguide.html#FI_Network_Tool
![Page 16: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/16.jpg)
We live in a Big Data worldDAVID databasePerform fast over-representation analysis again different databases- KEGG; Reactome; OMIM (diseases), Generif (literature), protein domain etc...
Interpreting your results
Protein domains
![Page 17: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/17.jpg)
We live in a Big Data worldbioGPS. Exploring expression across tissues and cell lines
Interpreting your results
Look at other library oftissues
![Page 18: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/18.jpg)
We live in a Big Data worldRMAexpress and quality control of microarrays
Several test exist to test if the microarray performed correctly.
Hall of fame of failed microarrays:http://plmimagegallery.bmbolstad.com/
Interpreting your results
![Page 19: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/19.jpg)
We live in a Big Data worldAnalyzing public microarray with code (kind of...)
Analyzing public gene expression data
![Page 20: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/20.jpg)
We live in a Big Data worldAnalyzing public gene expression data
Then clic on “TOP 250” button
![Page 21: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/21.jpg)
We live in a Big Data worldAnalyzing public gene expression data
R code
Top 250 genes
![Page 22: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/22.jpg)
We live in a Big Data worldNext Generation SequencingThe main NGS platform are:
• Roche /454 (Genome Sequencer; GS)
• Illumina/Solexa (Genome Analyzer software)
• SOLiD (Applied Bioscience)
Upcoming challengers:• Ion Torrent (Illumina)• Oxford Nanopore
Next Generation Sequencing
Done by the core facility
What you should request
![Page 23: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/23.jpg)
We live in a Big Data worldAnalyzing mRNA-seq data: 4 steps.1- Alignment and trimming of reads:
[no GUI]Tophat (assembly and splice junction mapper) Cufflinks (assembly and RPKM estimates)GALAXY provide access to Tophat, Cufflinks.
2- Calling variants and indels:GATK (http://www.broadinstitute.org/gsa/wiki/index.php/Home_Page) VarScan (http://varscan.sourceforge.net/) SHRIMP2; VARiD; Atlas-SNP2; SomaticSniper...Interpretation of variants: SIFT (galaxy)
3- Finding differentially expressed genesCuffdiff (galaxy)DEXseq (R)
4- Visualization:SAVANT (http://genomesavant.com/savant/) IGV (http://www.broadinstitute.org/software/igv)
Analyzing mRNA-seq
[with GUI and commercial]Genome Studio from IlluminaGenomequest [looks pretty awesome.]
![Page 24: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/24.jpg)
We live in a Big Data worldAnalyzing mRNAseq data: Introducing GALAXY
How to use Galaxy?
http://galaxy.psu.edu/
![Page 25: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/25.jpg)
We live in a Big Data worldWorking in the cloud
Dudley JT, and Butte AJ. 2010. In silico research in the era of cloud computing. Nat Biotechnol 28: 1181–1185.
![Page 26: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/26.jpg)
We live in a Big Data worldSummary mRNA-seq
GALAXYThis is a compendium of software. You even have UNIX tools and EMBOSS in it.Take home message:FASTQ files > Tophat > Cuffdiff > IGV (for differential expression)FASTQ files > Tophat > GATK > IGV (for variant detection)
Where to find help: http://seqanswers.com
Analyzing RNAseq using RDEXSeq is a R / BioConductor package. R is a statistical programming software widely used in bioinformatics
![Page 27: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/27.jpg)
We live in a Big Data worldSummary mRNA-seq
Additional tools for genomic-- Genomespace: http://www.genomespace.org
Collection of tools: GenePattern, Galaxy, cytoscape, genomica etc... (free apparently). Data are stored in the cloud on Amazon VM.
If you do not want to do it yourself:-- Science exchange: https://www.scienceexchange.com/
Science job for hire! This is where top core facilities compete to provide the best service.-- Assay Depot: https://www.assaydepot.com/
like home depot but for science
-- taskrabbit: http://www.taskrabbit.com/ If science take too much of your time!
![Page 28: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/28.jpg)
We live in a Big Data worldBeyond genomics: results interpretation
Interpreting your gene list with protein-protein interaction network.
iHOP: http://www.ihop-net.org/UniPub/iHOP/
Ingenuity Pathway Analysis(commercial) access through CMGM @ stanford
![Page 29: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/29.jpg)
We live in a Big Data worldBeyond genomics: results interpretation
Looking into PPI databases:IntAct: http://www.ebi.ac.uk/intact/ BioGrid: http://thebiogrid.org/ (soon multigene search)
HPRD: http://www.hprd.org/index_html
What about open-source solutions for searching the interaction between the genes in your gene list?• Cytoscape http://cytoscape.org
• BioNetBuilder http://chianti.ucsd.edu/cyto_web/plugins/ • ...
• R for programmatic access to databases• http://brainchronicle.blogspot.com
The plus of using R is that results are reproducible and you can share your method more easily than with point and clic interface.
![Page 30: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/30.jpg)
We live in a Big Data worldData management and manipulation
REDCap: http://project-redcap.org/ Web app for building and managing online survey and databases
To find participants: https://www.researchmatch.org
MySQL for a professional relational database.Requires some programming skills in SQL and database design.
Application to query and build databases (goodbye command line):[OS X]: SequelPro [Windows]: sqlyog; Toad for MySQL...
![Page 31: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/31.jpg)
We live in a Big Data worldData are dirty...
How to clean your data more efficiently than doing everything by hand?
12:10:00 9999999 POCT Comment GLUCOSE BY METER21:24:00 51 O2 Saturation, ISTAT (Ven) ISTAT EG7, VENOUS
5:39:00 91 Glu GLUCOSE BY METER10:58:00 9999999 Comments BLOOD CULTURE (2 AEROBIC BOTTLES)
9:36:00 9999999 Report Status BLOOD CULTURE (2 AEROBIC BOTTLES)16:25:00 25 CO2, Ser/Plas METABOLIC PANEL, COMPREHENSIVE
8:12:00 132 Glucose, Ser/Plas METABOLIC PANEL, BASIC8:06:00 5.7 MONO, % CBC WITH DIFF8:01:00 9.6 Glucose METABOLIC PANEL, BASIC
13:22:00 16.2 CO2 (a) BLOOD GASES, ARTERIAL4:45:00 2.7 MONO CBC WITH DIFF
DataWrangler @ Stanfordhttp://vimeo.com/19185801
Google-refine @ down the road.A bit less intuitive than Wrangler.
For more complex data transformation: reshape2 package in R
![Page 32: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/32.jpg)
We live in a Big Data worldStatistics made easy...
Excel... Obviously. But what else when you want something more powerful?
• Switch to a statistical software like R.• R graphical interface: Deducer (http://www.deducer.org/) • http://www.youtube.com/watch?v=T6kOvlMaFCA
The case of starting using R1. Powerful statistics procedures
• R has become the lingua franca for statistical programming2. Packages for everything from
• Flow cytometry• DNA microarrays• RNA-seq• Google graph API• ... See http://goo.gl/RwER7
3. Graphics, graphics, graphics...• R graphical manual: http://goo.gl/qSHMQ
![Page 33: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/33.jpg)
We live in a Big Data worldGraphics in R
![Page 34: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/34.jpg)
We live in a Big Data worldData Science Visualization: Circos
CIRCOS: http://circos.ca/To visualize genome scale interaction and functional information
CIRCOS is a Perl program. Some light programming is needed. But it is worth it!
![Page 35: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/35.jpg)
We live in a Big Data worldData Science Visualization
Tableau: http://www.tableausoftware.com/ Great for geo-localized data
![Page 36: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/36.jpg)
We live in a Big Data worldData Science Visualization
Google Visualization: https://developers.google.com/chart/interactive/docs/gallery
Require data in JSON format. Fortunately a bridge with R is possible.
Earthquake in Japan
![Page 37: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/37.jpg)
We live in a Big Data worldData Science Visualization
Google Visualization: https://developers.google.com/chart/interactive/docs/gallery
Motion charthttp://www.youtube.com/watch?v=rnF-7TCIe08
R commands:> M1 <- gvisMotionChart(Fruits, idvar="Fruit", timevar="Year”)> plot(M1)
![Page 38: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/38.jpg)
We live in a Big Data worldDemystifying the work
Its all about “reproducible research”
Sharing your analytical process (aka. what you did) is as important as the final manuscript.
How do you share what you did with a graphical interface?
The solution is to use a programming language, like R if suitable, and share your code.
Several tools can make your life easier.Rstudio or Deducer
Come to the workshop in 2 weeks!
![Page 39: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/39.jpg)
We live in a Big Data worldThe kitchen
TextMate and NotePad++ for coding
Use version control systems like GitHub or Bitbucket
To make research reproducible when data are not available:DataThief: http://www.datathief.org/
To follow the last buzz in science: Twitter @druau
Some R books. Most of those book are available online for free through the Stanford Library.
![Page 40: Cool Informatics Tools and Services for Biomedical Research](https://reader033.vdocument.in/reader033/viewer/2022061306/587f8e9f1a28ab28518b62d1/html5/thumbnails/40.jpg)
We live in a Big Data worldQ&A
This Class was sponsored by the Office of Postdoctoral Affairs and the Lane Library
Offline questions to [email protected]
Thanks!